Raw data preparation with Excel
This procedure is followed if raw data comes in Excel spreadsheets or formats compatible with Excel.
Preparatory file
- Creation of new spreadsheet
- Create a preparatory file with the following name: [Short title of the project]PREP[initials of the operator]_[date of file creation using the format YYMMDD].[File Extension] using Excel (Microsoft Corporation)
- Save the preparatory file in the directory Raw Data of the dataset.
- Configuration of the new spreadsheet
- Set the decimal separator required in phpMyAdmin.
- Go to File > Options > Advanced.
- Under Editing options, clear the Use system separators check box.
- Under Decimal separator enter: . (dot)
- Creation of worksheet
- Create a worksheet.
- The name of worksheet and the name of the corresponding table in database of the profiling tool to which the data will be imported must be the same.
- Repeat all previous steps of this paragraph for all tables contained in the dataset.
Configuration of worksheet
- Definition of header row
- Prepare a single header row
- Assign a unique name to a column which corresponds to the name of the column used in the profiling tool to which the data will be imported.
- Repeat the previous step for all columns
- Set a data filter on all header fields.
- Definition of data types
- Define the data type of a column in the preparatory file according to the requirements for the import into the profiling tool to which the data will be imported:
- For a column containing alphanumericals use the cell format category Text.
- For a column containing numbers.
- Use the cell format category Number.
- Select the maximum number of decimal places occurring in the raw data.
- Clear the Use 1000 Separator Checkbox.
- For a column containing date use the format YYYY-MM-DD.
- For a column containing date and time use the format YYYY-MM-DD hh:mm:ss.
- Repeat all previous steps of this paragraph for all columns.
- Transfer of data
- Transfer the values of the raw data to the corresponding worksheet in the preparatory file.
- Repeat the previous step for all tables contained in the raw data files.
- Define NULL instances
- Open a column header.
- Clear the (Select all) checkbox.
- Check all checkboxes of values that should be replaced in the profiling tool with NULL.
- Replace the fields of the column in all remaining records with NULL.
- Check the (Select all) checkbox.
- Repeat all previous steps of this paragraph for all columns contained in the worksheet.
Export of raw data in preparatory file
- Export of raw data
- Use the ‘Save as’ function of Excel to export each worksheet of the preparatory file.
- Select the directory Raw Data of the dataset as target folder.
- Use the following name as file name: [Name of the worksheet][Short title of the project]PREP[initials of the operator][date of creation using the format YYMMDD].[File Extension].
- As file type select CSV UTF-8 (comma delimited) (*.csv)
- Save the file in the selected directory.
- Repeat all previous steps of this paragraph for all worksheets contained in the preparatory file.