Recently, I had a pleasure of listening to a short presentation on time series prediction by Dr. Jeffrey Yau. He is also a professor at UC Berkeley's Master of Information and Data Science Program. If you haven't noticed, I graduated from this program in January 2018.
The reunion took place at Big Data & Machine Learning Leaders Summit Hong Kong 2018, a big data event where I attended as a guest speaker. Professor Yau gave a memorable presentation on time series prediction using two different approaches: statistical model and neural network model. The topic was too broad for the allotted time, which was 30 minutes only. The following is the recount of his presentation, plus more information to fill in the gap (in my opinion).
You can find the full notebook at this Github repository.
Statistical method: VAR model
There are different types of methods to do time series prediction. For a complete list, please refer here, Classical time series methods (+cheat sheet).
VAR (Vector Auto Regression) is one of the basic models typically used to estimate and evaluate the relationships between multiple variables over time.
1. Visualize dataset
The two time series I will consider are alcohol price index and consumer price index. The data source is from FRED, Federal Reserve Bank of St. Louis. In order to visualize the dataset, I am using the following function, presented by Professor Yau.
After visualizing the outcome, we can observe the following points.
Both series have upward trend.
2. Determine the transformation to make it stationary
There is one important requirement to use VAR model: stationarity. Stationarity mean that the mean, variance structure and the correlation structure does not change over time. It does not mean that the values are the same, but it means that the mean of the series itself does not change, and the correlation is only dependant on the time lag. We need to apply logarithm to remove the seasonality.
This is the result after the removal. It looks like I have removed most of the correlation, even though some were left.
3. Train the model
Var model is quick and easy to train. After training, RMSE is 140.352. Let's see if I can beat this with deep learning model.
Deep learning method: LSTM model
Unlike feed-forward network, RNN has a way to store past information by 'memory.' RNN take the inputs from both current input example they see as well as perceived previously in time, preserving their past inputs through the hidden state layer. Problem with basic RNN is that it has a hard time learning long-range dependency due to the so-called, vanishing gradient problem. This means that gradient contributions from 'far away' converges to zero at an exponential rate (because of all the multiplication of the weights inside). LSTM model is a good alternative model to use because it has gated cell, which controls information flow.
1. Formulate the series for a RNN supervised learning regression problem
We want to transform the series so it can fit the RNN structure.
In using VAR above, we had to fulfil the requirement of stationary in order to use the model. But in LSTM, no such requirement exists. Of course, if the series is stationary, it will be easier for the model to train, but this is not a requirement.
Instead, we might want to consider scaling the series. You don't want the unit of the variables to be too different. You normalize your series to have standard deviation.
2. Fit and train the model