Categories: Cardinal Path

Reproducibility in R

The statistical programming language R can be used to make large tasks more manageable and semi-automated and create reusable code for repeated tasks. R can be used to process data or to build statistical and machine learning models in order to help predict outcomes and measure the impact of certain actions on your business goals. One thing that is great about R is the ability to create reproducible code,  so that others can replicate the analyses you run with zero headaches required. Here are a few tips for doing this.

Always include your data with your R Code

Whether this means sending file structures and setting up your working directory structure appropriately as seen here, storing your data on the web and accessing using one of the many R packages available (for example, on Google drive and access using one the R package googlesheets), your data should be readily available for whoever is going to be running your R code. This is kind of a no brainer, since without the data, how will your coworkers be able to run your code and ensure that it works?

Setting up your working directory or using an R package to connect to an external datasource ensures that no one will have to reset the “setwd” statements in your code, rather, it will run on anyone’s computers. This is a super desirable trait of reproducible code.

Comment and Document Your code

Just as when you are sharing your R code, it is a really good idea to comment on your code. This will let anyone reading it know how your code works, and why you may have written it the way you did.

I like to provide documentation on how to run any program or process. This ensures that when I pass off a program or process, whoever is taking it on is well equipped with everything they need to do the task: both the program and the instructions.

Using knitr/Rmarkdown to Document Your Process

Levelling up on the documentation side is using knitr or Rmarkdown to create a notebook interface. This brings together your documentation and process with chunks of your R code into an HTML file that is both easy to use and fully reproducible.

By putting all your code and your documentation “how to” in one place, this makes things easier for others to reproduce.  

The main idea with reproducible code is to create something that anyone running your code in the future will be able to run with no errors and to get the same results you did. This ensures that everyone knows what the ‘true’ results are, as your code gives the same result on everyone’s computers. It also makes sharing your R code easier. Through setting up your R code and data right, documenting, and using Rmarkdown or knitr, you can achieve reproducible code that anyone at your company using R will be able to run with ease.

Reproducibility should be acknowledged regardless of what tool or programming language you are using, so keeping these same principles in mind and applying them to the tool at hand can make everyone’s life easier.

Danika Law

Danika is a Consultant with Cardinal Path's Data Science team. She has expertise in R and SAS, and has a strong passion for statistical modelling.  She holds a Bachelor's degree in Mathematics and Statistics from the University of Victoria, where she learned time series analysis, multivariate analysis, and other data analysis techniques. In her free time, Danika enjoys playing board games, hiking, and making music in Vancouver, BC. 

Share
Published by
Danika Law

Recent Posts

Google Delays Third-Party Cookie Deprecation to 2025

Google announced on April 23 that it will again delay third-party cookie deprecation (3PCD) in…

5 days ago

Understanding Funnel Reports in GA4

Funnel reports have long been one of the most actionable reports in a marketing analyst’s…

7 days ago

GA4 Monetization Reports: An Overview

GA4’s Monetization reports provide organizations with simple but actionable views into the revenue-generating aspects of…

2 weeks ago

This website uses cookies.