Principles of data organization

  • Data should be easily understood by you, collaborators, evaluators (e.g. reviewers), and computers

  • Sometimes, organizational strategies that work well for humans don’t work well for computers

  • Develop good practices that make data legible to both humans and computers

  • Keep your raw data raw! Make copies for cleaning.

Spreadsheet management

  • Almost everyone will manage data in a spreadsheet format
  • Remember that the spreadsheet is not a lab notebook
    • Data on spreadsheets should have a “rectangular” format: rows and columns only
    • Avoid encoding information by color or in margin text
  • Columns for variables; rows for observations
  • Leave no cell blank – develop a explicit mechanism for NA/blank/unmeasured values

What is the issue?

Species and sex as separate columns

Exercise

  • In today’s workshop we will be using the Portal Project Teaching Dataset

  • This comes from a longrunning study in Arizona regarding rodent and ant impacts on plant communities

    • 40 year study, used in >100 publications to date!

Exercise

Let’s take a look at a “messy” version of a dataset that might be collected for a project like this.

Our dataset has two tabs. Two field assistants conducted the surveys, one in 2013 and one in 2014, and they both kept track of the data in their own way in tabs 2013 and 2014 of the dataset, respectively. Now you’re the person in charge of this project and you want to be able to start analyzing the data.

Your challenge: With a partner, look through this Google sheet and identify what problems you will have to address to create a “flat” sheet ready for analysis.

  • Make a copy of the sheet and start addressing issues

  • Work on this for ~10-15 minutes

Check-in