There are many great conferences hosted on the subject of data science by different programming and community groups along with industry heavy hitters. Since it’s data science month at Cardinal Path, we sat down with our data science team to find out about some of their favorite conferences of 2016.
broom: Converting statistical models to tidy data frames – useR! International R User Conference
Those that know me, know that I hate Excel. But I understand it has a purpose. Frequently, data is stored in Excel sheets in formats that include formulas, images, multiple headers, hyperlinks, and perhaps even random values way off in the 1,000,000th column. I loved this talk because it is about getting your data out of Excel and into R in a clean way: “Jailbreakr- Get out of Excel free”. I’m curious to see what kind of ugly data it can handle. Jenny Bryan also worked to bring us the R package googlesheets, so she is helping make spreadsheets easier to work with in R in more ways than one!
jailbreakr: Get out of Excel, free – useR! International R User Conference
Having data that looks clean is necessary for most modelling and data visualization use cases. Most of the time, we want the data to be ‘tidy’. In this case, tidy means that you should have 1 row per observation, 1 column per variable, and 1 table per observational unit. However, the output of R models, such as regression with lm(), is pretty messy so the resulting output is not tidy. Basically, this talk is about making the output of models tidy, so that it can be used in later data visualizations. David Robinson developed the R package broom to deal with the messy output of R statistical models.
Machine Learning & Art – Google I/O
This presentation is by The Google Cultural Institute from this year’s Google Developer conference. It shows what fun can be had with machine learning from an artistic angle.
Sunspring – Ars Technica
A great follow up video to the Google video on machine learning and art is this short sci-fi film published by Ars Technica – it’s not a conference talk, but it’s directly related to the theme of machine learning and art. This is what happens when you let artificial intelligence write a movie. Gizmondo did a great write up of how the neural network was developed for this on their site.
Size of Datasets for Analytics and Implications for R
A quote that stood out to me from this presentation was: “It takes a big man to admit his data is small” — @jcheng. Big data is a buzz word that doesn’t seem to be going away anytime soon; the majority of data analysts/scientists deal with at most datasets of several Gigabytes. Interestingly, the growth of RAM year over year has outpaced the growth seen in the average size of datasets. Szilard Pafka discusses why it might be wise to stick with R for even relatively large datasets and why using immature big data tools may in fact be counter-productive.
FiveThirtyEight’s data journalism workflow with R
Have you ever visited fiverthirtyeight.com (founded by Nate Silver of 2012 US election prediction fame) and noticed the charts look a lot like ggplot2 charts in R….Well, that’s because they are! The only missing piece is that they hand over the ggplots to their visual journalism team which makes the charts “sexier” using Illustrator*. In fact, R is used in every other step of the journalism pipeline; Andrew Flowers, Quantitative Editor at 538, elaborates on why this is the case.
*Andrew doesn’t actually go into the details on what is done in Illustrator, but this video shows how you can export ggplots into Illustrator and make some quick, visually appealing changes.
If you are interested in learning more about all things data science, don’t forget to join us on November 3rd at 10:00 am PT/ 1:00 pm ET for our webinar hosted by the American Marketing Association: “Applying Data Science Methods for Marketing Lift“. You can register here.