Activity 3

Submit this activity by pushing it to a new “Semester Project” git repository before class on 26 October 2025.

Instructions for completion:

  • Download the .qmd source code for this file from the course git repository.
  • Edit the document with your responses to the questions, and render the document into an html file.
  • Commit and push both the qmd source and the rendered html file to your semester project git repository.

Overview

The goals for this activity are for you to:

  • finalize the published paper in your field whose results you aim to reproduce as part of the semester project for this course,
  • thoroughly explore the dataset and understand the “data generation process”
  • identify the software packages you will need to implement to complete this reproduction analysis.

Fill out the information in the sections below.

Overview of the original paper

What were the motivations for the study? What were the core questions/hypotheses?

Main results

What are the core results from the original analysis that you wish to reproduce?

Details of the dataset

Provide an overview of all details regarding the original dataset that you feel will be relevant for your re-analysis. For example, what are the different “types” of data relavent to this analysis (e.g. what are the columns in the dataset? Is each column categorical or continuous? How many observations are present? How are missing values represented?)

Details of the analytical approach

Provide an overview of the analytical approach used in the original study. Include both conceptual and practical information (e.g. what kind of analyses were conducted, and using which software packages?)

Path to reproduction

  • Describe what data are available, including links to all repositories.

  • Describe what code is available, including links to all repositories.

Getting familiar with the data

  • Download the dataset and use dplyr to start familiarizing yourself with the structures and patterns in the dataset. Please include your code and the output below. (Try making at least 5 plots to conduct exploratory data analysis, and run 1–2 simple statistical models).

# Your code here; add more chunks as needed.