Raw data preparation with Excel

This procedure is followed if raw data comes in Excel spreadsheets or formats compatible with Excel.

Preparatory file

  1. Creation of new spreadsheet
  • Create a preparatory file with the following name: [Short title of the project]PREP[initials of the operator]_[date of file creation using the format YYMMDD].[File Extension] using Excel (Microsoft Corporation)
  • Save the preparatory file in the directory Raw Data of the dataset.
  1. Configuration of the new spreadsheet
  • Set the decimal separator required in phpMyAdmin.
    • Go to File > Options > Advanced.
    • Under Editing options, clear the Use system separators check box.
    • Under Decimal separator enter: . (dot)
  1. Creation of worksheet
  • Create a worksheet.
    • The name of worksheet and the name of the corresponding table in database of the profiling tool to which the data will be imported must be the same.
  • Repeat all previous steps of this paragraph for all tables contained in the dataset.

Configuration of worksheet

  1. Definition of header row
  • Prepare a single header row
    • Assign a unique name to a column which corresponds to the name of the column used in the profiling tool to which the data will be imported.
    • Repeat the previous step for all columns
  • Set a data filter on all header fields.
  1. Definition of data types
  • Define the data type of a column in the preparatory file according to the requirements for the import into the profiling tool to which the data will be imported:
    • For a column containing alphanumericals use the cell format category Text.
    • For a column containing numbers.
      • Use the cell format category Number.
      • Select the maximum number of decimal places occurring in the raw data.
      • Clear the Use 1000 Separator Checkbox.
    • For a column containing date use the format YYYY-MM-DD.
    • For a column containing date and time use the format YYYY-MM-DD hh:mm:ss.
  • Repeat all previous steps of this paragraph for all columns.
  1. Transfer of data
  • Transfer the values of the raw data to the corresponding worksheet in the preparatory file.
  • Repeat the previous step for all tables contained in the raw data files.
  1. Define NULL instances
  • Open a column header.
  • Clear the (Select all) checkbox.
  • Check all checkboxes of values that should be replaced in the profiling tool with NULL.
  • Replace the fields of the column in all remaining records with NULL.
  • Check the (Select all) checkbox.
  • Repeat all previous steps of this paragraph for all columns contained in the worksheet.

Export of raw data in preparatory file

  1. Export of raw data
  • Use the ‘Save as’ function of Excel to export each worksheet of the preparatory file.
    • Select the directory Raw Data of the dataset as target folder.
    • Use the following name as file name: [Name of the worksheet][Short title of the project]PREP[initials of the operator][date of creation using the format YYMMDD].[File Extension].
    • As file type select CSV UTF-8 (comma delimited) (*.csv)
    • Save the file in the selected directory.
  • Repeat all previous steps of this paragraph for all worksheets contained in the preparatory file.

See also