By Nicholas Potter
- A list of best practices
- Ropensci’s rrrpkg suggestions
- An official using GIT with Rstudio
- A better GIT and Rstudio article
Why R projects?
setwd() is evil (source).
- here: an
Rpackages to manage the working directory and file locations
A scenario: A new project on water resources
You are wrapping up a project before you meet with your advisor, and you need to update figure 1 in your paper. You open the project directory to see a list of files like this:
|create_fig1.R||June 1, 2017|
|create_fig1_v2.R||August 23, 2018|
|20181022_create_fig1_v4.R||November 2, 2018|
|fig1.jpg||August 23, 2018|
|fig1_v3.jpg||October 31, 2018|
What are the pitfalls here?
- fig1.jpg looks like it was created by
create_fig1.R, but the modified date suggests it was actually created by
create_fig1_v2.R. Did you forget to change the filename when you created the second script?
- There’s no version 2 or 4 of
fig1.jpg. Did you move those or delete them by accident? On purpose?
20181022_create_fig1_v4.Rwas modified later than the date indicated in the filename. And also
fig1_v3.jpgwas modified just a few days before. Did you create a v4 of the figure and misplace it? Or did you forget to change the name again? And where is the version 3 file?
You pour through your scripts and manage to finally find the code that recreates the latest figure. Making the changes quickly, you rename the new script to
create_fig1_v5_FINAL.R, but forget to change the name of the image, so it outputs
fig1_v3.jpg again, overwriting your previous image. But you’re out of time so you add it to your paper and email it right before your meeting, promising yourself you’ll learn about this version control thing you keep hearing about…
At your meeting, you discuss another project with your advisor about water resources, and your advisor suggests you look into the colorado water rights database and a paper by Leonard and Libecap (2016). You decide you want to have an organized project this time, so you decide to research git and R projects a little bit. And the rest, as they say, is history…
What are RStudio projects?
- R Projects are a directory that include all the files for a given context
- When you open an R project, RStudio does several things: (1) read .RData, .Rprofile, and .Rhistory; (2) change the working directory to the project home
What is Git?
Git is really about taking snapshots of your project and being able to go back to those changes at any time. I can’t explain git nearly as well as The Git Parable.
How do Rstudio Projects and Git work together?
Within a project, Rstudio integrates git into a special tab that allows you to do all of the git work without leaving Rstudio.
- Don’t try to do everything right right now. Focus on one change to your workflow per month
- Don’t modify raw data
- Use version control to avoid file multiplication and naming hell
- Have a standard project folder organization
- Commit early and often