How R stores data, and how that affects your workflows

Let’s eat our vegetables

Data types

Each piece of information is assigned of one class

  • Numeric
  • Integer
  • Logical
  • Character
  • Complex
  • Raw
  • Applications in ecology and evolution will rarely require use of complex or raw variables.

Data structures

Types of data structures

  • Scalars and vectors
  • Matrices and arrays
  • Data frames
  • Lists
  • Tibbles

Moving from highly-structured to less-structured

Scalars and Vectors

  • Scalars are variables that have length 1
  • Multiple scalars of the same class can be organized into a vector (== “atomic” vector)

All elements in a vector are of an identical class

All elements in a vector are of an identical class

Vectors can be of any class introduced above

Matrices

  • Matrices comprise of vectors that are of the same class (and of the same length)
    • e.g. a set of numeric vectors; a set of logical vectors, etc.

Matrices

  • Matrices comprise of vectors that are of the same class (and of the same length)
    • e.g. a set of numeric vectors; a set of logical vectors, etc.
  • Matrices cannot comprise vectors of different classes

  • We can extract individual vectors (columns or rows) by indexing the matrix
  • matrixName[rowNumber,columnNumber]

Data frames

  • Data frames comprise of vectors that are of different classes (and of the same length)

  • As with matrices, can extract individual vectors (columns or rows) by indexing the data frame
  • dataframe[rowNumber,columnNumber]

But we can also use the syntax dataframe$columnName

Lists

  • Lists can comprise of vectors that are of different classes and/or of different lengths)

  • As with matrices and data frames, can extract individual vectors (columns or rows) by indexing the list

  • listName[[itemnumber]] or listName$itemName

Tibbles

Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating.

  • Similarities to data frames:
    • Can include columns of different classes
    • All columns need to be of the same length (“rectangular” data set)
  • Differences
    • We will explore these as we go

Creating tibbles

  • Can be very similar to creating data frames
  • Can convert existing data.frames into tibbles using as_tibble():

Important properties of tibbles

Tibbles reject row names

  • Data frames in R can have row names, but tibbles can not.
  • Some examples with the mtcars dataset (inbuilt in R)
  • You might wonder: but the car names were important!
  • In tibble’s opinion: if it’s important, keep it as a column in your dataset.

Viewing tibbles

  • Tibbles print more “cleanly” than do data frames

Example: print the mtcars dataframe (in-built in R)

Example: print mtcars as a tibble

Tibbles reject recycled values

  • Recall that if you tried to make a dataframe with vectors of different lengths, it would work as long as one length was a multiple of the other

The exception to the rule: values of size one are recycled

Tibbles can have non-vector columns

  • Recall that when we made data frame, each column was a vector of the same length

What if we wanted one of our columns to have vectors in it?

  • E.g. Column 1 is site ID, and Column 2 is a vector of the species recorded there

Tibbles can have non-vector columns

  • Tibbles make it easier to have “list-columns”

What can be done with tibbles or data frames?

Lots!

Mapping

Further exploring patterns

We might be interested in generating linear models of body size against bill length

Or, we can use the power of tibbles:

Let’s do some exercises

Announcements

  • Please vote in Discord (#crash-course) for 5 topics you’d like to cover in the remainder of the semester
  • Add your Semester Project repository under gklab/rr-2025/semester-project
  • Next week on Thursday, we will start having open work times - come prepared!