In our latest webinar, “Applying Data Science Methods for Marketing Lift”, hosted by the American Marketing Association, our Cardinal Path data science experts shared their deep experience gained from implementing sophisticated data science techniques for some of the world’s leading brands.
Judging by the great questions posed by our audience, we found that some were looking for answers to very tactical questions, but many were simply looking for ideas about what kinds of data science methods are out there. It is apparent that many are realizing the need to take advantage of the latest advances in data science and machine learning for marketing optimization, but aren’t quite sure where to start.
Our experts, Charlotte Bourne, Manager, Digital Intelligence, and Danika Law, Staff Consultant, took us through the breadth of options available, and the extensive range of applications which they can be applied to. Below, you will find some of the great questions asked during the webinar, as well as those questions that we didn’t have time to answer in the live session (there were a lot!).
Q: Which courses are needed to train a data science team?
A: Danika: Going back to the beginning of the presentation, you’ll recall there was a data science venn diagram that goes through some of the areas a data scientist may be versed in. Some of the key technical skills are “hacking” skills, mathematics and statistics knowledge, and domain expertise.
For “hacking” skills, we are talking about knowing how to write code to manipulate data, do operations, and use algorithms. This could be obtained through courses in R, Python, BigQuery, SQL, or whatever tools are being used to extract data from databases and get it in the right format to use.
Mathematics and statistics knowledge would be using models, like linear regression or random forest models, to extract insights from your data. You need to know in which situations to apply the right statistics or machine learning model to. Courses in statistics, machine learning, model building, regression, and more could get you this knowledge.
Finally, there is domain expertise. This is where we add ‘substance’ to the insights. How do we translate the outputs of our model into usable recommendations? Doing this requires knowledge about the area we are applying the model. This is something that is difficult to learn from courses and usually requires getting substantive experience in a field through exposure.
Q: What is the highest impact way of using DS in an e-commerce operation to drive sales?
A: Charlotte: This is a really good question and the answer will be different depending on the business itself. There’s no one fits all best answer here. The things that I would be thinking about to answer this question would be:
- Level of analytics maturity
- Are you B2B vs B2C
- What’s the current media spend
- What kind of data do you have access to (CRM vs. web analytics)
- Essentially, you can think about it like this: What is going to best move the needle you? Is there something that is obviously driving the most profit that you can continue to optimize? Is there something driving a good deal of loss? Where do you think there are the most opportunities?
If I had to completely generalize and assume you are a large ecommerce site, I would expect you have a large media spend. For e-commerce sites, typically we see that a focus on marketing channel and media mix optimization is a good opportunity, just because of the amount of investment already going on there.
Q: What are some of the best practices, tech and tools to be used:
A: Danika: Much like the answer to all of these questions, it depends on the problem you are trying to answer. A general best practice would be to start with the business requirements: clearly define the question your model is going to answer. Define where the results are going to need to go, that is, what tools will it be integrated with? Then check if any of the tools listed have these capabilities, if they do, you can get started on your analysis.
R can do just about every statistical technique you could possibly want to use. Python perhaps is better with integrating into data pipelines and outputting directly to web apps.
Big Data tools like Hadoop and Amazon S3 are only necessary if you are using BIG data or computer intensive algorithms and methods. For example, user level clickstream data might start becoming unwieldy to run certain models on your local computer. But for any level of aggregated data, for example sales by day for even the past 10 years, your local computer should be able to handle it. It’s when you are working with very large amounts of data that you would want to start assessing tools like Hadoop and Amazon S3 so you can work with this data.
Questions from the audience that were not addressed during the live session:
Q: Can you explain how you come up with the models?
A: Danika: There are different types of statistics models: regression, logistic regression, and more. There are also different types of machine learning models: random forest, neural networks, and more. Each model type has unique traits. Some are better for classification (example: converted or not), some are better for predicting continuous values (example: how much revenue).
An understanding of what each type of model does helps us ‘come up’ with the ones to consider: and this comes from taking courses or reading books on statistics and machine learning models. And then we fit the models and determine which fits the data best.
Q: How did you make the model comparison?
A: Charlotte: This is a really good question. Usually what you will see in data science is that very different methods have very different error metrics, which makes comparing models quite difficult.
What you have to do is find a measure of accuracy (or error) that isn’t model specific. You would also use a different ‘generic’ error metric depending on the type of problem you’re tackling: is it a regression problem, a classification problem, a retrieval problem, or something else?
Because this question was asked in the context of forecasting, we’re talking about a regression problem. In this case, you’re probably looking at one of a variety of mean error metrics: mean absolute error, root mean squared error, perhaps using a weighting or logarithmic transform. All of these mean error metric ask the same question: what is the difference between the real values and the predicted values. That’s a way of framing “error” in a way that models can be comparable.
Q: What is the third approach meaning here [referring to a slide from the webinar]: “Cubist?”
A: Danika: A Cubist model is a tree-based machine learning model which weights different decisions trees and averages them to provide a final prediction value. One of the advantages of this model is that it can handle data with missing values. One of the canonical papers on Cubist models comes from Max Kuhn (who we always recommend!) and can be found here.
Q: What is the general rule of thumb for which approach is best for a problem?
A: Charlotte: The best model for a problem depends largely on what the data set looks like and what type of problem you’re trying to solve. If I had to make a very broad generalization, I’d say ensemble methods tend to work well for numeric data while neural nets work well on non-numeric / unstructured data.
Other factors to consider would be things like: do you have a large or small dataset? If you have a large dataset but not a whole lot of computing power, you’re going to need to chose a ‘simpler’ model that doesn’t require a lot of computation resources. Conversely, other models can handle other data challenges like small data sets or noisy data sets better than others. A small data set is probably going to be prone to overfitting (see next paragraph) and be more sensitive to outliers so you probably want to go with a simpler modelling approach (fewer degrees of freedom).
As you dig deeper into your models, make sure that you understand the bias-variance problem. This essentially asks: does your model overfit, or underfit your data? Different types of models handle overfitting better, while other models handle under fitting data. An excellent deeper dive on bias vs. variance comes from Scott Formann-Roe and can be found here.
Q: “Best performance” define good, better and best. Is it accuracy ie. std error of estimates, cost function, marketer usability…?
A: Danika: Best performance is a combination of “is it accurate enough”, and “is it interpretable enough” for a business. There may have to be a tradeoff between these two, so which one to prioritize depends on the cost of having errors in the predictions, and how the forecast is going to be used.
A business might prioritize accuracy if predicting $1,000,000 in revenue but the real revenue achieved being $850,000 is a costly mistake (say, they hired based on expecting that additional revenue).
A business might prioritize interpretability instead if they want to know how much an increase of $1000 TV spend is likely to impact revenue. This is when a model provides the most value when you interpret it. The decision of what makes a good model is something to be discussed when defining the problem and the business requirements.
A model is ‘best’ if it meets both the accuracy and interpretability goals of the business well enough to put it to use.
Q: In the area of demographics can you sort by geographic location/region and forecast sales potential?
A: Danika: I think you are asking if you can forecast the sales for a particular geographic location or region. As long as the data is available (sales by geographic location, and any relevant influencers of sales), it is possible to create a forecast like this.
Q: Do you or your clients try to simulate Competitor behaviour…and then, impact on client performance?
A: Charlotte: Competitive activity is a very valid input into a forecasting model to understand how it will impact your own (or a client’s) performance. The challenge is to find a metric which can accurately capture competitor activity. You may not have access to the competitive information that would drive the best results in your model. Potential inputs to try and capture competitor activity could be stock price, competitor sales, website visits (captured from different online competitive intelligence tools), information on pricing, promotions, or distribution. Anything is fair game; you’ll have to play with the different data inputs to see which of the competitive intelligence metrics you have access to are in fact predictive.
CUSTOMER LIFETIME VALUE
Q: How would you calculate Customer Value for a company that has a lot of different vertical brands, but has a customer footprint across some or all verticals. The individual business have their own CLV measure, but how to show value of all customers spread across groups?
A: Charlotte: This is a great – and tough! – question. My first question back would be about understanding how the different brands engage customers across brands. Is there they opportunity for cross-brand marketing? Are you doing upsells or cross sells across brands? Are there co-promotions across the brands? In that case, doing a cross-brand LTV analysis would be a very interesting analysis. But this is a key point: if you can’t drive optimization across brands, then understanding cross brand LTV is not going to be truly actionable or transformative.
If we’re talking about brands that operate within their own silos – perhaps with very differing technologies in place – that’s likely to be your second challenge. How can you merge customer records together so that all transactional information is in one place? Data quality issues or the lack of an effective common key could cause issues here.
Let’s pretend you can’t join your brand data at an individual customer level. You still have options here: instead of doing LTV for an individual customer, you could do LTV of a customer segment (say, women age 18 – 24) or a first touch marketing channel or a zip code.
Q: Hi, I am not mathematically inclined. Yet, a talented business development strategist who knows how to execute customer success programs for brands. Presenting Data to clients or employers is unavoidable and I am addressing my weak points with data and math through training. Are there CLV / customer analytics math and stat formulas that you suggest I study? Where can I find these formulas? Book suggestions?
A: Danika: Here is a book suggestion: Database Marketing: Analyzing and Managing Customers by Blattberg, Kim, and Neslin has a chapter on customer lifetime value analysis. It starts with a more formula based approach and looks at a statistical technique from survival analysis to model customer retention.
As you start out with formula based customer lifetime value analysis, you will get insights on the ‘average’ customer or on predefined groups (example: split by gender/age). However, as you move into model based customer lifetime value calculations, such as using regression and logistic regression or clustering, you can be more exploratory with your analysis and may find groupings that were not predefined (example: gender, purchase time). So studying regression techniques and clustering can also help take the analysis to the next level.
Q: Is there an ideal approach for creating the formula for this (LTV) calculation? Ie. retention rate for each year and average revenue for a given audience/demographic?
A: Danika: I think this formula approach for LTV calculation is referring to calculating one value for LTV. This would look at the average revenue per customer per year times the margin divided by the yearly churn rate. The customer lifetime value we are talking about here is looking more at the individual level, so for example, what is Bob’s predicted lifetime value. For this, it is not a formula based approach but using a combination of statistical/machine learning models such as: regression and logistic regression, pareto, and so on.
For an example on moving from formula based customer lifetime value calculations to model based (using ‘survival analysis’) check out my book suggestion from above, Database Marketing: Analyzing and Managing Customers by Blattberg, Kim, and Neslin has a chapter on customer lifetime value analysis.
Q: What’s the minimum budget to get started in data science for example, identifying DRIVERS of consumer LifeTime Value?
A: Charlotte: The very bare minimum to get started in data science for a lifetime value problem could look like this:
- Assuming you already have an existing CRM which is collecting customer data
- Assuming you are using open source (free) platform to run data science exploration, munging and analysis
Which leaves your only cost to one: staffing. You need to find someone ideally with a mix of statistics, computer programming and domain knowledge to execute the problem.
Q: Do the presenters have experience with predictive analytics tool Rapidminer?
A: Charlotte: We do not use Rapidminer at Cardinal Path. However, a recent Forrester vendor evaluation for Big Data Predictive Analytics Solutions placed the product as a “strong performer”. The platform is considered to be very strong in the number of methods available to data scientists as well as offering strong integration with cloud platforms.
Q: Any recommendations on combining data from several sources?
A: Danika: When combining data from several sources, it should be able to be grouped by some ‘key field’. In the case of joining demographic, purchase, and behavioural data for customer lifetime value modelling, this would be a “User ID”. For forecasting data, if we are forecasting “Weekly Sales”, we would want all the data at the week level so it can be grouped by week.
The tool you use to combine the data from several sources will depend on how often it needs to be done. If it is combined often, some sort of automation would be good, but if it is done infrequently or semi-regularly to update a model, using the tools someone on your team has experience with such as R, Python, SQL, BigQuery and more should be able to do the trick.
As always, If you have any other questions or would like help with your data science practice, contact us at firstname.lastname@example.org.