What Makes ARIMA & XTS Objects So Useful for Forecasting
If you’re not using XTS objects to perform your forecasting in R, then you are likely missing out! The major benefits that we’ll explore throughout is that these objects are a lot easier to work with when it comes to modeling, forecasting, & visualization.
What Are They?
XTS objects are composed of two components. The first is a date index and the second of which is a traditional data matrix.
Whether you want to predict churn, sales, demand, or whatever else, let’s get to it!
The first thing you’ll need to do is create your date index. We do so using the
seq function. Very simply this function takes what is your start date, the number of records you have or length, and then the time interval or
by parameter. For us, the dataset starts with the following.
days <- seq(as.Date("2014-01-01"), length = 668, by = "day")
Now that we have our index, we can use it to create our XTS object. For this we will use the xts function.
Don’t forget to
install.packages('xts') and then load the library!
Once we’ve done this we’ll make our xts call and pass along our data matrix, and then for the date index we will pass the index to the
sales_xts <- xts(sales, order.by = days)
Lets Create a Forecast with Arima
Arima stands for auto regressive integrated moving average. A very popular technique when it comes to time series forecasting. We could spend hours talking about ARIMA alone, but for this post we’re going to give a high level explanation and then jump directly into the application.
AR: Auto Regressive
This is where we predict outcomes using lags or values from previous months. It may be that the outcomes of a given month have some dependency on previous values.
When it comes to time series forecasting, an implicit assumption is that our model depends on time in some capacity. This seems pretty obvious as we probably wouldn’t make our model time based otherwise ;). With that assumption out of the way, we need to understand where on the spectrum of dependence time falls in relation to our model. Yes our model depends on time, but how much? Core to this is the idea of Stationarity; which means that the effect of time diminishes as time goes on.
Going deeper, the historical average of a dataset tends to be the best predictor of future outcomes… but there are certainly times when that’s not true.. can you think of any situations when the historical mean would not be the best predictor?
- How about predicting sales for December? Seasonal Trends
- How about sales for a hyper-growth saas company? Consistent upward trends
This is where the process of Differencing is introduced! Differencing is used to eliminate the effects of trends & seasonality.
MA: Moving Average
the moving average model exists to deal with the error of your model.
Let’s Get Modeling!
First things first, let’s break out our data into a training dataset and then what we’ll call our validation dataset.
What makes this different then other validation testing, like cross-validation testing is that here we break it out by time, breaking train up to a given point in time and breaking out validation for everything there after.
train <- sales_xts[index(sales_xts) <= "2015-07-01"] validation <- sales_xts[index(sales_xts) > "2015-07-01"]
Time to Build a Model
auto.arima function approximates the best
model <- auto.arima(train)
Now lets generate a forecast. The same way we did before, we’ll create a date index and then create an xts object with the data matrix.
From here you will plot the validation data and then throw the forecast on top of the plot.
forecast <- forecast(model, h = 121) forecast_dates <- seq(as.Date("2015-09-01"), length = 121, by = "day") forecast_xts <- xts(forecast$mean, order.by = forecast_dates) plot(validation, main = 'Forecast Comparison') lines(forecast_xts, col = "blue")
I hope this was a helpful introduction to ARIMA forecasting. Be sure to let me know what’s helpful and any additional detail you’d like to learn about.
I’ll be adding a more detailed post on the topic of ARIMA forecasting where we will detail evaluation techniques, confidence levels, and more.
Happy Data Science-ing!