QBoard » Statistical modeling » Stats - Conceptual » Interpreting ACF and PACF plots for SARIMA model

Interpreting ACF and PACF plots for SARIMA model

  • I'm new to time series and used the monthly ozone concentration data from Rob Hyndman's websiteto do some forecasting.

    After doing a log transformation and differencing by lags 1 and 12 to get rid of the trend and seasonality respectively, I plotted the ACF and PACF shown [in this image][2]. Am I on the right track and how would I interpret this as a SARIMA?

    There seems to be a pattern every 11 lags in the PACF plot, which makes me think I should do more differencing (at 11 lags), but doing so gives me a worse plot. I'd really appreciate any of your help!

    EDIT: I got rid of the differencing at lag 1 and just used lag 12 instead, and this is what I got for the ACF and PACF.

    From there, I deduced that: SARIMA(1,0,1)x(1,1,1) (AIC: 520.098) or SARIMA(1,0,1)x(2,1,1) (AIC: 521.250) would be a good fit, but auto.arima gave me (3,1,1)x(2,0,0) (AIC: 560.7) normally and (1,1,1)x(2,0,0) (AIC: 558.09) without stepwise and approximation.

    I am confused on which model to use, but based on the lowest AIC, SAR(1,0,1)x(1,1,1) would be the best? Also, the thing that concerns me is that none of the models pass the Ljung-Box test. Is there any way I can fix this?
    r plot time-series
      June 11, 2019 4:25 PM IST
    0
  • It is quite difficult to manually select a model order that will perform well at forecasting a dataset. This is why Rob has built the 'auto.arima' function in his R forecast package, to figure out the model that may perform best based on certain metrics.

    When you see a pacf plot with significantly negative lags that usually means you have over differenced your data. Try removing the 1st order difference and keeping the 12 order difference. Then carry on making your best guess.

    I'd recommend trying his auto.arima function and passing it a time series object with frequency = 12. He has a good writeup of seasonal arima models here:

    https://www.otexts.org/fpp/8/9

    If
    you would like more insight into manually selecting a SARIMA model order, this is a good read:

    https://onlinecourses.science.psu.edu/stat510/node/67

    In
    response to your Edit: I think it would be beneficial to this post if you clarify your objective. Which of the following are you trying to achieve?

    Find a model where residuals satisfy Ljung Box Test
    Produce the most accurate out of sample forecast
    Manually select lag orders such that ACF and PACF plots show no significant lags remaining.
    In my opinion, #2 is the most sought after objective so I'll assume that is your goal. From my experience, #3 produces poor results out of sample. In regards to #1, I am usually not concerned about correlations remaining in the residuals. We know we do not have the true model for this time-series, so I do not feel there's any reason to expect an approximate model that performs well out of sample to not have left something behind in the residuals that is more complex perhaps, or nonlinear etc.

    To provide you another SARIMA result, I ran this data through some code I've developed and found the following equation produced the minimal error on a cross-validation period.

    Final model is:
    SARIMA [0,1,1] [1,1,1]12 with a constant using the log normal of the time-series.

    The errors in the cross validation period are:
    MAPE = 16%
    MAE = 0.46
    RSQR = 74%


    Here is the Partial Autocorrelation plot of the residuals for your information.


    This is roughly similar in methodology to selecting an equation based on AICc to my understanding, but is ultimately a different approach. Regardless, if your objective is out of sample accuracy, I'd recommend evaluating equations in terms of their out of sample accuracy versus in-sample fit, tests, or plots.
      June 11, 2019 4:27 PM IST
    0
  • Configuring a SARIMA requires selecting hyperparameters for both the trend and seasonal elements of the series.

    Trend Elements

    There are three trend elements that require configuration.

    They are the same as the ARIMA model; specifically:

    • p: Trend autoregression order.
    • d: Trend difference order.
    • q: Trend moving average order.

    Seasonal Elements

    There are four seasonal elements that are not part of ARIMA that must be configured; they are:

    • P: Seasonal autoregressive order.
    • D: Seasonal difference order.
    • Q: Seasonal moving average order.
    • m: The number of time steps for a single seasonal period.

    Together, the notation for an SARIMA model is specified as:

    Where the specifically chosen hyperparameters for a model are specified; for example:

    Importantly, the m parameter influences the PD, and Q parameters. For example, an m of 12 for monthly data suggests a yearly seasonal cycle.

    P=1 would make use of the first seasonally offset observation in the model, e.g. t-(m*1) or t-12. A P=2, would use the last two seasonally offset observations t-(m * 1), t-(m * 2).

    Similarly, a D of 1 would calculate a first order seasonal difference and a Q=1 would use a first order errors in the model (e.g. moving average).

      January 11, 2022 3:36 PM IST
    0