Laksh Nath

Related Listings

Diabetic Retinopathy

0 comments, 3 reviews , 4 likes
Lung Opacity Detection

0 comments, 3 reviews , 3 likes

Medical Insurance Cost Prediction using Machine Learning with Python

0 comments, 3 reviews , 4,311 views, 4 likes

Major Concepts

Models Home » Domain Usecases » Retail » Forecasting Future Sales Using ARIMA and SARIMAX

Forecasting Future Sales Using ARIMA and SARIMAX

Models Status

Model Overview

ARIMA and Seasonal ARIMA

Autoregressive Integrated Moving Averages

The general process for ARIMA models is the following:

Visualize the Time Series Data

Make the time series data stationary

Plot the Correlation and AutoCorrelation Charts

Construct the ARIMA Model or Seasonal ARIMA based on the data

Use the model to make predictions

Let's go through these steps!

Let us import the required libraries.

import numpy as np

import pandas as pd



import matplotlib.pyplot as plt

%matplotlib inline

Let us now read the data and store it in df.

df=pd.read_csv('perrin-freres-monthly-champagne-.csv')

Now let us check the first five rows of the dataset by using head function.

df.head()

In the next step, we are going to check the last five rows by using the tail function.

df.tail()

## Cleaning up the data

df.columns=["Month","Sales"]

df.head()

Let us now drop the last two rows by using drop function.

## Drop last 2 rows

df.drop(106,axis=0,inplace=True)

Now let us check the dataset of the last five rows whether they are dropped or not by using tail function.

df.tail()

lets again use the tail function to remove the last row

df.drop(105,axis=0,inplace=True)

let's check the dataset now

df.tail()

# Convert Month into Datetime

df['Month']=pd.to_datetime(df['Month'])

Now check the dataset by using head function.

df.head()

df.set_index('Month',inplace=True)

df.head()

df.describe()

Step 2: Visualize the Data

df.plot()

### Testing For Stationarity



from statsmodels.tsa.stattools import adfuller

test_result=adfuller(df['Sales'])

#Ho: It is non stationary

#H1: It is stationary



def adfuller_test(sales):

    result=adfuller(sales)

    labels = ['ADF Test Statistic','p-value','#Lags Used','Number of Observations Used']

    for value,label in zip(result,labels):

        print(label+' : '+str(value) )

    if result[1] <= 0.05:

        print("strong evidence against the null hypothesis(Ho), reject the null hypothesis. Data has no unit root and is stationary")

    else:

        print("weak evidence against null hypothesis, time series has a unit root, indicating it is non-stationary ")

Differencing

df['Sales First Difference'] = df['Sales'] - df['Sales'].shift(1)

df['Sales'].shift(1)

df['Seasonal First Difference']=df['Sales']-df['Sales'].shift(12)

df.head(14)

Now let us drop the NA values by using dropna function

## Again test dickey fuller test

adfuller_test(df['Seasonal First Difference'].dropna())

df['Seasonal First Difference'].plot()

Auto Regressive Model

from pandas.tools.plotting import autocorrelation_plot

autocorrelation_plot(df['Sales'])

plt.show()

Final Thoughts on Autocorrelation and Partial Autocorrelation

Identification of an AR model is often best done with the PACF.

For an AR model, the theoretical PACF “shuts off” past the order of the model. The phrase “shuts off” means that in theory the partial autocorrelations are equal to 0 beyond that point. Put another way, the number of non-zero partial autocorrelations gives the order of the AR model. By the “order of the model” we mean the most extreme lag of x that is used as a predictor.

Identification of an MA model is often best done with the ACF rather than the PACF.
- For an MA model, the theoretical PACF does not shut off, but instead tapers toward 0 in some manner. A clearer pattern for an MA model is in the ACF. The ACF will have non-zero autocorrelations only at lags involved in the model.
  
  p,d,q p AR model lags d differencing q MA lags

from statsmodels.graphics.tsaplots import plot_acf,plot_pacf

fig = plt.figure(figsize=(12,8))

ax1 = fig.add_subplot(211)

fig = sm.graphics.tsa.plot_acf(df['Seasonal First Difference'].iloc[13:],lags=40,ax=ax1)

ax2 = fig.add_subplot(212)

fig = sm.graphics.tsa.plot_pacf(df['Seasonal First Difference'].iloc[13:],lags=40,ax=ax2)

# For non-seasonal data

#p=1, d=1, q=0 or 1

from statsmodels.tsa.arima_model import ARIMA

model=ARIMA(df['Sales'],order=(1,1,1))

model_fit=model.fit()

model_fit.summary()

df['forecast']=model_fit.predict(start=90,end=103,dynamic=True)

df[['Sales','forecast']].plot(figsize=(12,8))

What is SARIMA?

Seasonal Autoregressive Integrated Moving Average, SARIMA or Seasonal ARIMA, is an extension of ARIMA that explicitly supports univariate time series data with a seasonal component.

It adds three new hyperparameters to specify the autoregression (AR), differencing (I) and moving average (MA) for the seasonal component of the series, as well as an additional parameter for the period of the seasonality.

Now let us import the sarimax from statsmodels.api library

import statsmodels.api as sm

model=sm.tsa.statespace.SARIMAX(df['Sales'],order=(1, 1, 1),seasonal_order=(1,1,1,12))

results=model.fit()

df['forecast']=results.predict(start=90,end=103,dynamic=True)

df[['Sales','forecast']].plot(figsize=(12,8))

from pandas.tseries.offsets import DateOffset

future_dates=[df.index[-1]+ DateOffset(months=x)for x in range(0,24)]

future_datest_df=pd.DataFrame(index=future_dates[1:],columns=df.columns)

future_datest_df.tail()

future_df=pd.concat([df,future_datest_df])

future_df['forecast'] = results.predict(start = 104, end = 120, dynamic= True)  

future_df[['Sales', 'forecast']].plot(figsize=(12, 8))

0 comments

Samar Patil likes this

Related Listings

Laksh Nath's other Models Reports

Major Concepts

Forecasting Future Sales Using ARIMA and SARIMAX

Models Status

Model Overview

ARIMA and Seasonal ARIMA

Autoregressive Integrated Moving Averages

Step 2: Visualize the Data

Deployment

Photos

Vault

Reviews

Connect With Us

Member Sign In

Member Sign In

Create Account

Related Listings

Laksh Nath's other Models Reports

Major Concepts

Forecasting Future Sales Using ARIMA and SARIMAX

Models Status

Model Overview

ARIMA and Seasonal ARIMA

Autoregressive Integrated Moving Averages

Step 2: Visualize the Data

Deployment

Photos

Vault

Reviews

Connect With Us