Data Science in 2016- Cardinal Path’s Favorite R Packages

The members of our data science team here at Cardinal Path are frequent users of R – and tend to favor it more so than any other programming language out there today. Since R was designed for statistical analysis, it has a a lot of really useful features; such as a vast network of package creators and maintainers, and robust integration with Tableau. In this post, the Cardinal Path data science team outlines their own favorite R packages of 2016.

Danika

googlesheets

The R package googlesheets is an easy to use way to access and manage Google Sheets through R. I like being able to connect directly to data stored in a Google Sheet because when I pass my R program to someone else, the data source (from a URL) is the same. Also, since Google Sheets can be shared with anyone the results from my programs that have been output to Google Sheets will automatically update so that everyone will be looking at the most up to date version without having to email new files to people. I love that it works with the pipe operator (%>%) so a lot of the operations are intuitive to use if you are familiar with using dplyr.

knitr (or rmarkdown):

This is used to create R Markdown documents. R Markdown is a structured file (html, pdf, MS Word, and more) that has both your comments, your R code, and your R Code output in one place. It looks great and provides a straightforward way to document your work. In addition, you can start using it with minimal changes to your current workflow. Rather than commenting with #, use #’–this will automatically create R Markdown documents with one line of code: spin (“FILENAME.R”). I’m a sucker for documentation, especially where it requires minimal effort.  

Bonus points: In addition to loving data science, I’m also an avid knitter, and knitr has fun function names like spin, knit, hook_plot_html, kable, and stitch.

Charlotte

lubridate

One of my least favorite tasks in any platform is manipulating time and date information. You can end up with a variety of formats depending on where your data has been exported from and the conversions to make the format consistent are never pleasant. The lubridate package describes itself has having “a consistent and memorable syntax that makes working with dates easy and fun”. This might be a slight overstatement, but I have found it a great way to work with time and date information. This is another package that is part of the Hadleyverse.

sqldf

One of my greatest revelations when I started using R was the fact that you could not only connect to a SQL database, but also run SQL statements inside of R. Sometimes the way you think about data matches better to a SQL syntax than it does to an R syntax, and this package makes it possible to use both flexibly as pto match your own requirements.

Jas

readxl

It can be tempting to carry out many of your data manipulation tasks in Excel since it is familiar, user friendly, and usually pretty quick. However, you can improve the process by bringing the Excel data into R and use a package such as dplyr to carry out that manipulation. This process will also allow your colleagues to clearly view all of the changes you have made to the data and, of equal importance, make it fully reproducible on their machines. In addition, with some minor tweaks, you should be able to reuse the same script for Excel workbooks that have a similar format (no one likes unnecessary, manual rework!)

There are several R packages that let you bring Excel data in, but my favorite is readxl (also part of the Hadleyverse). You can read in a Excel workbook (supports both xlsx and xls files) which consist of a list of sheets. These sheets can be cleaned and transformed into data frames (the tabular data structure within R). Of course, it always helps if your Excel data is tidy as possible before you read it in!

Jas Sohi

Jas is a Jr. Analyst, Data Science responsible for helping to develop analytics strategies for Cardinal Path’s clients working in a wide variety of domains. His work includes ad-hoc analysis of raw data, high level model development, research into emerging analytics trends, presentation development, as well as technology research. Prior to joining Cardinal Path, Jas helped establish the new inventory department for a leading online e-commerce platform, BuildDirect.com, considered the “Amazon of building supplies”. Jas earned his Bachelor of Business Administration in MIS, Finance, and Entrepreneurship from Simon Fraser University, is a Certified Supply Chain Professional (CSCP), and has recently completed the John Hopkins Data Science Specialization.

Share
Published by
Jas Sohi

Recent Posts

Google Delays Third-Party Cookie Deprecation to 2025

Google announced on April 23 that it will again delay third-party cookie deprecation (3PCD) in…

3 days ago

Understanding Funnel Reports in GA4

Funnel reports have long been one of the most actionable reports in a marketing analyst’s…

5 days ago

GA4 Monetization Reports: An Overview

GA4’s Monetization reports provide organizations with simple but actionable views into the revenue-generating aspects of…

1 week ago

This website uses cookies.