Summary so far

AI Acknowledgment: Claude helped me write this recap based on the course content I wrote, and I verified and edited it.

Module 1.1: Introduction to the Course

Key Concepts

  • Course setup and environment preparation
  • Introduction to reproducible research principles

What You Accomplished

  • Installed the {usethis} package for streamlined R workflows
    • We discussed running code from the console vs. from a script
    • We discussed common error messages related to packages and their functions
  • Configured Git with your identity using usethis::use_git_config()
  • Created a GitHub personal access token for secure authentication
  • Set up credentials using gitcreds::gitcreds_set()

Why This Matters

Proper setup ensures smooth integration between R, Git, and GitHub throughout your research workflow. The {usethis} package automates many common tasks, reducing errors and saving time.


Module 1.2: Introduction to Git and GitHub

Key Concepts

  • Forking: Creating your own copy of someone else’s repository on your GitHub account
  • Cloning: Downloading a repository from GitHub to your local computer
  • Staging: Preparing files for commit (equivalent to git add)
    • Checkbox next to the file in the Git tab in RStudio
  • Committing: Saving changes with a descriptive message
    • Commit (checkmark button); you must write a message and then click “Commit”
  • Pushing: Uploading changes to GitHub
    • Push (up arrow button); this syncs your local changes with the remote repository

What You Accomplished

  • Forked the course repository on GitHub
  • Cloned your fork to your local computer using RStudio
  • Made your first edit (added your name to README.md)
  • Practiced the Git workflow: stage → commit → push
  • Viewed diffs to see exactly what changed

Why This Matters

Version control tracks every change to your code, allowing you to collaborate safely, recover from mistakes, and maintain a complete history of your project’s development.


Module 1.3: R Projects and File Management

Key Concepts

  • R Projects: Self-contained working environments that improve reproducibility
  • File organization: Structured folder hierarchies for different types of files
  • Environment management: Starting fresh each session to avoid hidden dependencies

What You Accomplished

  • Created a proper file structure:

    epi590r-in-class/
    ├─ epi590r-in-class.Rproj
    ├─ README.md
    ├─ R/
    │   └─ clean-data-bad.R
    ├─ data/
    │   ├─ raw/
    │   │  └─ nlsy.csv
    │   └─ clean/
  • Configured RStudio to start with a clean environment

  • Analyzed problematic code patterns in clean-data-bad.R

Why This Matters

Organized projects are easier to navigate, share, and reproduce. Starting with a clean environment each time prevents hidden dependencies that could break your code when run on different computers.


Module 1.4: The {here} Package

Key Concepts

  • Relative vs. Absolute Paths: here() creates paths relative to your project root
  • Cross-platform Compatibility: Paths work on Windows, Mac, and Linux
  • Project Portability: Code runs regardless of where the project folder is located

What You Accomplished

  • Installed and learned to use the {here} package
  • Compared here::here() vs getwd() behavior
  • Examined improved code in clean-data-good.R that uses {here}
  • Explored the course dataset (NLSY data)

Code Comparison

Bad (absolute paths):

setwd("/Users/myname/Documents/project")
data <- read.csv("data/raw/nlsy.csv")

Good (relative paths with here that automatically start from the Project root):

data <- read.csv(here::here("data/raw/nlsy.csv"))

Why This Matters

Using {here} makes your code portable and prevents the “works on my machine” problem. Your collaborators can run your code without modification.


Module 1.5: Starting From Scratch

Key Concepts

  • Project/Repository Creation Workflow: Local first, then connect to GitHub
  • .gitignore: Preventing sensitive or unnecessary files from being tracked

What You Accomplished

  • Created a new R project with Git initialization
    • This can be for your final project!
  • Created a new GitHub repository that will be linked to this project
  • Connected your local repository to a new GitHub repository using the terminal to run Git commands
  • Created and configured .gitignore to protect sensitive files
  • Set up a proper folder structure for a new project

The Complete Workflow

  1. Create R Project → New Directory → Enable Git
  2. Make Initial Commit → Stage → Commit locally
  3. Create GitHub Repository → New repo on GitHub
  4. Connect Them → Use terminal commands to push
  5. Organize Files → Create folder structure
  6. Protect Secrets → Configure .gitignore

Why This Matters

Starting projects correctly from the beginning saves time and prevents problems later. Proper .gitignore configuration protects sensitive data and keeps repositories clean.


Key Takeaways

Best Practices You’ve Learned

  1. Always use R Projects for self-contained, portable analysis
  2. Use relative paths with {here} instead of setwd()
  3. Commit early and often with descriptive messages
  4. Organize files systematically with clear folder structures
  5. Never commit sensitive data - use .gitignore
  6. Start with a clean environment each session

Common Mistakes to Avoid

  • Using absolute file paths that only work on your computer
  • Saving and restoring R workspace between sessions
  • Forgetting to stage files before committing
  • Not writing descriptive commit messages
  • Storing sensitive data in version control