Retail Sales Forecast — Time Series — Basic to Advance

5 min readDec 2, 2021

PROBLEM STATEMENT :

A Leading nutrition and supplement retail chain offers a comprehensive range of products for all your wellness and fitness needs. It follows a multi-channel distribution strategy with 350+ retail stores spread across 100+ cities.

Effective forecasting for store sales gives essential insights into upcoming cash flow, meaning the Retail company can more accurately plan the cashflow at store level.

Sales data for 18 months from 365 stores is available along with information on Store Type, Location Type for each store, Region Code for every store, Discount provided by the store on every day, Number of Orders everyday.

OBJECTIVE :

1) To predict the store sales for each store for the next two months.

2) Need to build Time series forecasting models based on past sales and several other categorical features.

1. Data Description

We can see that there are about 8 variables which would need data exploration and multiple features can be generated and later be used for modelling.

2. Variable ‘Sales’ is Target/y column which we need to predict.

2. Exploratory Data Analysis:

Before going to any kind of modelling, we will always want to have a look at the kind of data that we have.

We have been provided a file, with information:

Train.csv: We will use this file for training our model. It contains variables or features that we will input to our model, and the target variable that we want to predict.

Now lets go ahead and check the data we have.

Lets also check the dataset shape.

Plotting Numerical and Categorical Features:

Univariate Analysis

Visually we are able to see communicate things more clearly and graphs help us do that. Lets visualize the data distribution in countplots.

Numerical Features Histplot

2. Categorical Features Countplot

Countplots for categorical variables — Store Type, Location Type, Region Code

Countplots for categorical variables — Holiday , Discount

HYPOTHESIS GENERATION

Simply put, a hypothesis is a possible view or assertion of an analyst about the problem he or she is working upon. It may be true or may not be true.

Will the Store_id play a major role in predicting the sales for next 2 months?
Will the Holiday play a major role in predicting?
Does the no of Orders help in forecasting sales for next 2 months?
Does Store Type, Location Type, Region code impact/contribute towards the target sales prediction?
Does the discount impact the sales prediction?

Bi-variate Analysis — Sales v/s other features

Lets perform Bi-variate analysis using the target sales for these variables one by one.

we can infer that:

Region code R1 and R3 have slightly more sales than R2 and R4
Location Type L2 and L1 have more sales than other location Types
Store Type S4 and S3 have more sales than S1 and S2

Now lets add more feature using date variables and see how the sales impact based on certain dates.

We see that using the date variable we can derive many more features like adding day of week, month, weekend/weekday etc. which may be helpful in predicting/forecasting the sales as per the need.

So once we have added these new features lets perform the analysis on these new features as well to see how well they can impact the Sales.

weekday,weekend,monthly, daily wise sales plots

we can infer that:

During weekend we see that more sales happen.
During Monthly sales we see it is more during holiday season like December.
In a Month schedule we see that first 5 days have more sales than other days.

TARGET DISTRIBUTION

Lets plot the target sales and see the distribution pattern.

we see that the distribution is right skewed and we would like to know more using percentile distribution.

So from the above percentiles we see that upto 90th percentile it is 66282, and upto 99th percentile it is 102159 and 100th percentile is 247K which is more than the expected value.

So we could consider anything outside of 99th percentile as outliers and consider only from 01st percentile to 99th percentile which would help the model.

And now using this data lets see the plot of sales distribution if it helps.