Forecasting is everywhere. For years, people have been forecasting weather patterns, economic and political events, sports outcomes, and more. Because we try to predict so many different events, there are a wide variety of ways in which forecasts can be developed. Using simple intuition, expert opinions, or using of past results to compare with traditional statistical and time series techniques are just a few.

Forecasting accuracy is constantly being improved with the continual introduction of newer data science and machine learning techniques. In this post, we will look at machine learning techniques for forecasting and for time series data in particular.

Time Series Forecasting

Businesses use forecasting extensively to make predictions such as demand, capacity, budgets and revenue. One type of forecasting that routinely comes up in all of these scenarios is time series forecasting. Time series data is any data set that collects information regularly over a period of time. There are specific techniques for picking apart this type of data. Time series modelling has a range of modelling options which can work on different types of techniques. They include:

• Linear vs. non-linear,
• Parametric vs. non-parameteric,
• And univariate vs. multivariate techniques.

Why is Time Series Forecasting Important?

Time series forecasting brings with it a unique set of concerns and challenges. Modelling is driven by studying to understand what it is that is driving changes in the data. With time series data, this can stem from long term trends, seasonal effects, or irregular fluctuations. It is the regular patterns of trends and seasonality which are specific to time series forecasting and aren’t always seen in other types of data. These patterns have to be addressed in order to develop a solid forecast for data over time.

Example

Here is an example which shows how trends and seasonality factor into time series data. This data set is from Google, showing the branded search interest for one of our clients over the past few years.

1. The first row, “data” shows the original data exported from Google.
2. The second row, “seasonal”, shows the seasonal variation that happens. In this case, there’s a spike in demand that happens seasonally every year.
3. The third row, “trend” shows the trend line of the data set once the seasonal component has been removed.
4. The final row shows what can’t be explained by either the seasonal or trend components. This could be a result of any number of factors or just random noise.

In this example, we’re seeing a steady decrease in branded search interest over time. But if those factors can be identified and added to the forecasting prediction model, it will provide greater accuracy – particularly if you start looking at machine learning techniques.

What is Machine Learning?

Machine learning is a branch of computer science where algorithms learn from data. Algorithms can include artificial neural networks, deep learning, association rules, decision trees, reinforcement learning and bayesian networks. The variety of different algorithms provides a range of options for solving problems, and each algorithm will have different requirements and tradeoffs in terms of data input requirements, speed of performance, and accuracy of results. These tradeoffs – along with the accuracy of the final predictions – will be weighed as you decide which algorithm will work best for you.

Because of new technologies, the machine learning we see today is not similar to the type machine learning we saw in the past. While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, and at faster speeds – is fairly recent. If you are unfamiliar with machine learning, here are a few highly publicized examples of machine learning applications which may help you to conceptualize:

• Self-driving cars
• Fraud and spam detection
• Personalized online ads and offers such as those you are would be presented with on Amazon

Machine learning borrows from the field of statistics, but gives new approaches for modelling problems. The fundamental problem for machine learning and time series is the same: to predict new outcomes based on previously known results. In machine learning terms, this is called supervised learning – the modeller is teaching the algorithm how to perform by giving it examples of what good performance looks like.

Time Series or Machine Learning?

Can machine learning beat traditional time series techniques? Yes, it can. There is a range of studies that compare machine learning techniques to more classical statistical techniques for time series data. Neural networks is one technique that has been researched quite extensively, and has often been shown to beat time series approaches. Machine learning techniques also appear in time series-based data mining and data science competitions. These approaches have proved to perform well, beating pure time series approaches in competitions such as the M3 or Kaggle competitions.

Machine learning comes with its own specific set of concerns. Feature engineering, or the creation of new predictors from the data set is an important step for machine learning and can have a huge impact on performance. This engineering can be a necessary way to address the trend and seasonality issues of time series data. In addition, some models encounter issues with how well they fit the data. It is possible that they can both overfit the available data and underperform on new data, or they can underfit and miss the underlying trend.

Time series and machine learning approaches do not need to exist in isolation from each other. They can be combined together in order to give you the benefits of each approach. Time series does a good job at decomposing data into trended and seasonal elements. This analysis can then be used as an input into a machine learning model, which can incorporate the trend and seasonal information into its algorithm, giving you the best of both worlds.