Forecasting is a common technique used in several companies to make predictions for the future. There are multiple methods of forecasting such as time series forecasting, multivariate forecasting, etc. In each of these methods there are techniques such as Moving Average (MA), ARIMA, ARMA, ARCH, GARCH, etc.

There are several packages and algorithms available for forecasting in every data mining tool such as SAS, SPSS, etc. These packages help in accomplishing small scale forecasting; here 'small scale' means one or few forecasts. One can perform model fitting, check auto correlations and finally decide on a model based on accuracy metrics (such as MAPE, RMSE and so on) and go ahead with forecasting future values. However, some of the questions one needs to contemplate at this stage are:

1. What do we do about large scale forecasts?
2. First, what does large scale forecasting mean?
3. Why is it complicated/cumbersome?
4. Most importantly, how does one scale up my code or algorithm to fit in such situations.

In the next few paragraphs I discuss examples of batch forecasting. The GitHub link for the code is given at the end. Please go through the code on how I scaled up the code to solve batch forecasting problem using R. I have given detailed comments in the code to make it comprehensible.

Have you ever Googled "DMV office near me" on your mobile? Google presents a distribution chart of the traffic to the DMV on hourly basis. This can help you plan your visits to the DMV. Google gives forecasted values or may be the past distribution. This also helps the DMV if they want to get an estimate of the staff required on hourly basis based on demand.

Let's say the DMV has 2 broad types of services: Licenses and Registrations, and they want to forecast the demand on hourly basis for the next 7 days. The number of forecasts gets multiplied in this manner: 7 days x 2 products x 24 hours = 336 forecasts

Let's discuss another instance. Suppose if a small retailer wants to forecast sales of the company, s/he can use any of the above mentioned techniques. However, if one wants to do forecasting on a granular level, such as product-wise or region-wise forecasting and if the number of products that the retailer carries is high, imagine the number of forecasts that one has to make. Even for a small retailer selling 100 SKUs (Stock Keeping Unit, a fancy word for individual product in retail industry) and forecasting sales for the next one week on a daily basis, the number of forecasts obtained are: 100 x 7 = 700 forecasts.

Imagine the retailer has thousands of products and wants to forecast hourly demand or store-wide demand, in such cases the volume of the forecasts will explode. To handle such kind of situations the algorithms available in SAS and SPSS may not be a good idea. And to solve these problems separate suits for forecasting were developed, such as SAS Forecast Studio and SPSS forecasting. But what if you are an open source user?

In that case Forecast package in R can be very handy. In fact, a single time series is few lines of code in R. My goal here is to explain how that gets multiplied in case of batch forecasting. I have developed a sample mock-up retail data and shown the code scaling up. Please note that the objective is not to write about time series forecasting, but to understand how a bigger problem can be solved by breaking it into several smaller pieces.

At each stage of the code I attempted one programming challenge. At the end I posed some questions and then in the next stage I solved them. This is to help you with structural thinking in programming and problem solving using coding. Here is the link for the code. Let me know if you have any further suggestions. Thank you.

About Rang Technologies:
Headquartered in New Jersey, Rang Technologies has dedicated over a decade delivering innovative solutions and best talent to help businesses get the most out of the latest technologies in their digital transformation journey. Read More...