Data Organisation Guidelines

  1. Use Excel to enter your data
    • if there is only one data set, put it in the first worksheet
    • put your description of the study and data dictionary in Sheet 2
    • if there two or more data sets, use separate files, or separate sheets
    • do not include graphs, charts, summary tables on the same sheet as the data
  2. Use one row of the worksheet for each observation (experimental unit - eg subject, sample, plot)
  3. Give an ID number to each experimental unit.
  4. Use one column for each characteristic measured on each experimental unit (eg sex, height)
  5. Make column names
    • brief and informative
    • with no spaces or other special characters
    • lower case, for ease of typing
    • consistent across different data files and sheets
  6. Use only one row for column names.
  7. Factor levels within a column can be names or numbers. If using names, make them brief and informative. Explain the names or numbers in the data dictionary.
  8. Leave no blank cells in the worksheet by:
    • explicitly coding missing values. By default, R uses NA as a missing value indicator, GenStat uses *, SPSS uses .
    • downfilling cell contents where they are the same for successive experimental units
  9. If your data set includes any calculated variables, also include the variables from which they were calculated.
  10. Screen your data
    • continuous variables - histograms, scatterplots, boxplots
    • discrete variables - tabulate, barcharts