The purpose of this tutorial project is to tidy and wrangle data with methods from the “tidyr” and “dplyr” libraries so that the data can be used by clients, in an easy-to-use form. This project will focus on the following tasks: • Removing missing data, out-of-bounds values and unexpected values • Transformation to calculate new variables • Concatenate information regarding replicate measurements • Sub-sampling and stratification to make balanced distributions • Re-scaling and other transformations • Saving data sets in easy to read formats • Where a particular package or library is mentioned you must only provide solution using that package or library. Other solutions (even if correct) won’t receive marks. Data Sugar factories measure sugar cane juice at the start of the factory process to determine factory settings and to determine the economic value of the supplied sugar cane. To enable real time factory optimisation, a real-time measurement technology called near infrared spectroscopy (NIRS) is used. NIRS analyses the light spectrum that the sugar cane absorbs – the absorbance spectrum is correlated to the chemical composition of the sugar cane. However, the NIRS instruments need to be calibrated to measure specific components of the sugar cane. To do this, traditional laboratory measurements are collected and used to train (calibrate) the NIRS instruments. The first step in training the NIRS instruments is to prepare laboratory measurements. Because laboratories use multiple assays (different measurement types), measurement information is typically stored in different files or databases. This information needs to be collated, cleaned and appropriately sub-sampled to make training datasets for NIRS instruments.
#Sales Offer!| Get upto 25% Off: