Activity 3

Submit this activity by pushing it to a new “Semester Project” git repository before class on 6 March 2025.

Instructions for completion:

  • Download the .qmd source code for this file from the course git repository.
  • Edit the document with your responses to the questions, and render the document into an html file.
  • Commit and push both the qmd source and the rendered html file to your semester project git repository.

Instructions for graduate students

The goals for this activity are for you to:

  • finalize the published paper in your field whose results you aim to reproduce as part of the semester project for this course,
  • thoroughly explore the dataset and understand the “data generation process”
  • identify the software packages you will need to implement to complete this reproduction analysis.

Fill out the information in the sections below.

Overview of the original paper

Citation

Add an in-text citation to your publication here:

(Recall that to complete this, you will need to ensure that the bibtex entry for your chosen paper is available in a .bib file in your project repository, and that the path to the bibliography file is specified in the YAML header).

Main results

What are the core results from the original analysis that you wish to reproduce?

Details of the dataset

Provide an overview of all details regarding the original dataset that you feel will be relevant for your re-analysis. For example, what are the different “types” of data relavent to this analysis (e.g. what are the columns in the dataset? Is each column categorical or continuous? How many observations are present? How are missing values represented?)

Details of the analytical approach

Provide an overview of the analytical approach used in the original study. Include both conceptual and practical information (e.g. what kind of analyses were conducted, and using which software packages?)

Path to reproduction

  • Describe what data are available, including links to all repositories.

  • Describe what code is available, including links to all repositories.

Getting familiar with the data

  • Download the dataset and use dplyr to start familiarizing yourself with the structures and patterns in the dataset. Please include your code and the output below. (Try making at least 5 plots to conduct exploratory data analysis, and run 1–2 simple statistical models).

# Your code here; add more chunks as needed.

Instructions for undergraduate students

The goal for this activity is for you to finalize the open science dataset that you plan to explore as part of the semester project for this course, to understand the structure and content of the data, and to identify the software packages you will need to conduct your analysis/visualization.

Overview of the dataset

Description and source of data:

  • Describe how the dataset was generated (i.e. who collected the data, over what timeperiod, what was the purpose, etc.).

Your intended visualization

  • Describe the analysis/visualization you hope to generate from these data. For example, if you plan to visualize the data, what kind of graphs do you plan to make? What will be on the X- and Y-axes?

Getting familiar with the data

  • Download the dataset and use dplyr to start familiarizing yourself with the structures and patterns in the dataset. Please include your code and the output below.

# Your code here; add more chunks as needed.