- Use Excel to enter your data
- if there is only one data set, put it in the first worksheet
- put your description of the study and data dictionary in Sheet 2
- if there two or more data sets, use separate files, or separate sheets
- do not include graphs, charts, summary tables on the same sheet as the data
- Use one row of the worksheet for each observation (experimental unit - eg subject, sample, plot)
- Give an ID number to each experimental unit.
- Use one column for each characteristic measured on each experimental unit (eg sex, height)
- Make column names
- brief and informative
- with no spaces or other special characters
- lower case, for ease of typing
- consistent across different data files and sheets
- Use only one row for column names.
- Factor levels within a column can be names or numbers. If using names, make them brief and informative. Explain the names or numbers in the data dictionary.
- Leave no blank cells in the worksheet by:
- explicitly coding missing values. By default, R uses NA as a missing value indicator, GenStat uses *, SPSS uses .
- downfilling cell contents where they are the same for successive experimental units
- If your data set includes any calculated variables, also include the variables from which they were calculated.
- Screen your data
- continuous variables - histograms, scatterplots, boxplots
- discrete variables - tabulate, barcharts