Introduction to Corellogram Analysis
To all those who know how to model a Time Series data using ARIMA ,you must have come across the term “Correlogram Analysis“,and to all those who don’t know what it is let me start with a basic definition.
In the analysis of data, a correlogram is an image of correlation statistics. In time series analysis, a correlogram, also known as an autocorrelation and partialautocorrelation plot, is a plot of the sample autocorrelations(ACF) and partialautocorrelations(PACF).It is used to find the type and also the order of our mean model i.e the order of AR(p) or MA(q) or ARMA(p,q) which ever is the mean model.But interestingly in this article I will not be using any such plots of ACF and PACF to find the order, but instead show you the underlying statistics behind these plots.The table given below is a summary of the correlogram analysis.
In the above table what does the the term”diminishing” and ‘Zero from lag(p+1) or (q+1)‘ mean?Does “Zero” mean that at some lag (say 5) the ACF or PACF will have “0” value and diminishing means that it will be continue to decrease? Well, theoretically speaking yes,but technically speaking no.It doesn’t make sense. Don’t worry,what it means is that the ACF or PACF must be “statistically insignificant from zero”,the real ACF or PACF values might have some positive or negative values,but statistically speaking it must not differ from “Zero”.Now,how do we find out whether they are statistically significant or insignificant from zero? The answer is simple “Hypothesis Testing“.
ACF and PACF
The test for ACF and PACF are given below,don’t panic though about the mathematics involved in this test,my idea was to show that the test statistics for ACF and PACF follow a certain distribution(chi-square(d.f=k) for ACF and standard normal distribution for PACF).This is all you need to know to find the order of the mean model.We have various tools and packages(‘R’ or ‘python’ or ‘E-views’) which will do the calculations for us and give us the test statistic values.All we need to do is to compare it with our tabulated values.
In both the tests “n” represents the total number of observations.Now that we know how to find the test statistic,lets compare it with tabulated values.if the test statistic value is “greater” than the “tabulated values” we reject the null hypothesis,otherwise we can’t reject it.
Let me give an example of how to find the order for AR(p) using the above hypothesis Test,but first let’s look at the plot for AR(2) comparing theoretical with the actual,so that you can get some better insight.
Well you can see that although theoretically the PACF should be “Zero” from lag 2 onwards but still it has some non-zero values,the same goes for ACF.
Finding the order of AR(p)
As stated above,I will not be using any sort of plots for ACF and PACF to find the order but instead use hypothesis tests.The table given below are the test statistic values and tabulated values for ACF test upto lag 24.You can see that in the beginning some of the values are insignificant from zero but in the end they are found to be significantly different from zero.From statistics point of view these are called “sampling fluctuations”.The Correlogram table suggests that our model can be either AR(p) or ARMA(p,q).
Let’s look at the PACF test statistic values.
The above table seems to be a bit confusing,there are four different lag values(5,7,12 &15) for which the PACF values are statistically different from zero.Now some of them are zero due to sampling fluctuations and some are not.Well,finding this is a critical task,but not to worry we have a test for this also and theory suggests that for AR(p) model the PACF values are zero from lag(p+1) onward and not before that. In that case our model should be AR(15) and all the Zeroes above are due to sampling fluctuations.
The ‘εt’ in the above model represents the residuals and analyzing this residuals will take us a step closer to find the order of our model.Residual analysis tells us whether our final model is AR(5,7,12 or 15).Theoretically it should be AR(15) but lets see what the test indicates.In residual analysis we fit the model for each individual lag values (5,7,12 & 15) and collect the residuals of each mean model.Next we perform ACF test on each of these ‘residuals’ to see which residual values show “Zero” autocorrelation i.e all the ACF values of these residuals should be statistically insignificant from zero. Again you can fit these models using any statistical software,it will give you the coefficients and also the residual values.
For AR(5) we see that the residuals don’t follow white noise as there are some ACF values which are significantly different from ‘Zero‘.So this model is rejected. Similarly lets check for other models.
Only for AR(15), all the ACF values are insignificant from zero.Thus our model is AR(15).Isn’t this what theory suggested our model would be.There are many complications which arise while finding the order and it totally depends on how you tackle the problem,this is one way to do so.Hope you enjoyed the article!!!
*Note-I have used the Unemployment percentage data of U.S citizens(01-01-1992 to 01-12-2016) in the above analysis.