A time series is a set of data collected over time. Some examples of a time series are things like (i) the prices of stocks and shares taken at regular intervals of time, (ii) the temperature reading taken at your house at hourly intervals, (iii) the number of cases of influenza in town taken at daily intervals. Obviously, there are literally millions of potential examples where data are recorded through time.
The key point about time series data is that the ordering of the time points matters. For many sets of data (for example, the heights of a set of school children) it does not really matter which order the data are obtained or listed. One order is good as another. For time series data, the ordering is crucial. The ordering of the data imposes a certain structure on the data. For example, later we will be looking at questions on how observations at one time point might influence observations in the future, which relies on the time series being ordered.
Example: Financial Time Series. The picture below shows values of the FTSE 100 share index plotted over time. (Actually, we have chosen to plot what are called log returns of the series, rather than the values themselves. Returns are commonly used in financial time series: they describe the proportional return that you might get on a given day compared to a previous day).
This series oscillates but it oscillates around a given level, it does not drift too far above or below that level. In fact, if one calculates the mean of all of the observations one obtains 0.000935, a number very close to zero. Indeed, for this kind of data the true mean of the stochastic process often thought to underly the series has zero mean exactly. (We have drawn a horizontal blue dashed line at the zero level to illustrate this).
It is vital at this point to note that the process thought to underly the series is a model statement. For this data, we do not know, and nobody knows, what process, if any, actually underlys and generates the FTSE data. However, it is a common assumption that such a model exists and the mean is zero. One could, if one wished, choose another model, e.g. one where the mean is precisely 0.000935. That model might fit this data set better, but analysis of many other similar series might show very small means, some slightly positive and some slightly negative so that, overall, a mean zero model is typical. The aim of statistics, including time series analysis, is to choose (simple) models that fit the data well.
Example: Measles Data. The picture below shows the weekly number of cases of measles for the cities indicated (from 17th January 1948 to 11th December 1987). Note: the data set, and similar ones, appear in many places on the Internet. See here for the data used here, see here for other sets.
The measles time series data are very different in character compared to the FTSE financial series. Each of the seven series show a similar pattern. Before the mid-1960s the series shows significant oscillations from near zero to several hundreds or thousands (for bigger cities like Birmingham and London). These oscillations are waves of measles epidemics which occured on an annual or biennial basis. However, after the mid-1960s the waves of epidemics are much reduced and this dramatic reduction was due to the impact of measles vaccination.
The nature of all the measles time series changes dramatically before and after the mid-late-1960s. For example, the mean level changes dramatically. For London, the mean level for the first half of the series is 583 cases and it is 257 cases for the second half (to the nearest case). Actually, for most of the series, post-vaccination the mean level not only is reduced but is actually becoming smaller over time. The variation of the series in the two halves is also very different and smaller in the second half of each of the series.
Loosely speaking, a stationary time series is one whose statistical properties are constant over time. What do we mean by statistical properties? We mean things like the mean value (or average level) of the series, the variance (the variation of the time series around the mean) and the autocorrelation. One might consider the financial time series (above) to be appropriately modeled by a stationary model, but almost certainly not the measles series as they experience big change halfway through.
Autocorrelation is a measure of the internal correlation within a time series. It is a way of measuring and explaining internal association between observations in a time series. For example, for the measles time series you might ask: “take an arbitrary spot in time, on the average what does the time series look like in four weeks time, compared to now? ” In other words, you are asking a question about how strong is the internal association within the series at a period of four weeks? That association could be very strong and positive (i.e. the series in four weeks is similar to now), or be very strong and negative (i.e. the series in four weeks is very dissimilar to now) or there could be a weak or no relationship (i.e. there is no identifiable association). Also, you might also be interested in time periods other than four weeks. For example, you might be interested in periods of one week (and for measles you might expect that over one week the series does not change very much, so the association would be strong over periods of one week) or periods of a 52 weeks, a year (and again, you might assume that the association might be strong since measles epidemics have an annual component).
Autocorrelation quantifies the internal association: assigning a value of +1 to strong positive association, -1 to strong negative association and 0 to no association (note: autocorrelation measures linear association, in the same way that ordinary correlation between two variables does).
Example: Measles Autocorrelation. The left plot in the figure below is the first half of the London measles time series from above (the black time series in the plot of seven city's measles cases). The strongly periodic behaviour can be clearly seen. The right-hand plot shows the empirical autocorrelation (the correlogram).
The autocorrelation at lag zero is always one. This is because a series is always perfectly correlated with itself. At lag one the autocorrelation value is 0.963 which is close to one, which means that measles cases at a given week are very similar to the next week (either before or after). Indeed, the autocorrelation at lag two is 0.932, so values of the measles time series two weeks apart are very similar. Indeed, all autocorrelations up to lag six are greater than 0.7, indicating high linear association up to six weeks apart. A different picture emerges when looks at two instants of time separated by a larger number of weeks. The autocorrelation taken 26 weeks apart is -0.166. This indicates that the number of measles cases six months apart are dissimilar. This kind of autocorrelation shows that the disease ebbs and flows on an approximate six-monthly time period. One can extend the autocorrelation to higher lags and one can see that a 104 week period shows a high autocorrelation of 0.495, indicating a strong biennial period for this data.
Example: FTSE autocorrelation. The figure below shows the autocorrelations associated with the FTSE time series plot above. Apart from the lag zero value, which we know is always one all of the other autocorrelation values are small.
In the previous plot we saw that the FTSE autocorrelations were small. The plotted autocorrelations are empirical and computed from a data set of 512 observations. One can postulate a stationary statistical model underlying the FTSE data set which permits one to conceive of a `true' or underlying autocorrelation function which the empirical values estimate. Using such a model one can use some mathematical theory to show that the two parallel dotted blue lines in the above autocorrelation plots are approximate 95% intervals. For the FTSE autocorrelation plot at least four of the acfs appear outside of the confidence intervals suggesting that at least maybe some of the `true' autocorrelations are not zero.
The autocorrelation plot is not the only tool that we can use to spot periodicities in a time series. The spectrum plot gives information about how power (or variance) in a series is distributed according to frequency.
Example: London Measles Spectrum. The figure below shows the spectrum plot obtained from the first half of the London measles time series. The largest peak occurs at the frequency of 0.5 cycles/year, which corresponds to one cycle every two years or a biennial oscillation. There is also a large peak corresponding to annual oscillation and also a slightly smaller one at three cycles per two years.
Indeed, the spectrum plot can be produced for the other cities. Below we have produced it for Bristol, Liverpool, Manchester and Newcastle and interesting differences emerge. Bristol appears fairly similar to London, whereas the biennial peak seems to smaller relative to the annual peak for Liverpool and Newcastle compared to London and the biennial peak for Manchester seems more pronounced.
Again, it is possible to produce confidence intervals for the spectra, but for clarity we have chosen not to show them here.
The spectrum and the autocorrelation are complementary plots. The spectrum produces information about the frequency content of a time series and the correlogram about internal time-linking. However, one can see that they should be related. For example, a time series with a large negative autocorrelation at lag one means that consecutive values typically do not agree with each other which implies a very fast oscillation which will turn up in the spectrum as a high frequency oscillation. In mathematical terms the autocorrelation and spectrum are linked it is possible to write one in terms of the other and vice versa (they are a Fourier transform pair).
In this short article we do not have the time and space to give a full treatment of statistical models. However, we briefly describe one model here in layman's terms. A time series model explains how time series values evolve from one time step to the next.
For example, in words we might say:
The value now = Roughly the value previously + some random disturbance
This is the well-known random walk model.
Mathematically, the model can be written as:
xt = a xt-1 + zt
This should be read as: “the value of the series at time t is equal to a times by the previous value of the series (at time t-1) plus some noise”. Here the zt quantity is independently distributed random noise. This model is called an autoregressive model of order 1 and this is abbreviated to AR(1). (You might imagine that higher orders would involve xt-p for p>1 and you'd be right).
The behaviour of an AR(1) process depends very much on the value of the parameter a. In the left plot below is a realization of an AR(1) process with parameter a=0.9. Here, consecutive values of the time series are reasonably similar so that the process does not oscillate quickly. The right plot shows a realization of an AR(1) process with parameter a=-0.9. Here, consecutive values are dissimilar and the process oscillates wildly around the mean value (which is zero in both cases).
In the AR(1) model the value of a is constant for all time. The statistical quantities underlying this model are constant essentially because a is constant.
The autocorrelation function for an AR(1) process with parameter a can be shown mathematically to be equal to ak for (integer) lag k. The empirical autocorrelation functions for our two AR(1) processes, with parameters of 0.9 and -0.9 can be seen in the left and right-hand plots of the figure below. In the left-hand plot one can see that there is a high degree of linear association at small lags, whereas in the right-hand plot alternate lags result in successive negative and positive autocorrelation values.
The descriptions above only scratch the surface of stationary time series. There are wide range of stationary time series models, methods for estimation of autocorrelation and the spectrum as well as methods for multivariate stationary series, series with heavy-tailed and forecasting future values. The following references cover much of this for stationary series (and some mention nonstationary concepts also).
A good all-round reference is Chatfield, C. (2003) The Analysis of Time Series: An Introduction, Chapman and Hall/CRC Press, sixth edition.
An excellent reference that makes use of the R programming language to explain and expand is Shumway, R.H. and Stoffer, D.S. (2010) Time Series Analysis and Its Applications: With R Examples, Springer Texts in Statistics.
For time series in economics and related fields the books by Hamilton, J. (1994) Time Series Analysis, Princeton University Press and Tsay, R.S. (2010) Analysis of Financial Time Series, Wiley are excellent.
Other books on my bookshelf are (in alphabetical order):
Brillinger, D.R. (2001) Time Series : Data Analysis and Theory, SIAM.
Brockwell, P.J. and Davis, R.A. (2009) Time Series: Theory and Methods, Springer Series in Statistics.
Hannan, E.J. (1960) Time Series Analysis, Methuen, Wiley.
Percival, D.B. and Walden, A.T. (1993) Spectral Analysis for Physical Applications, Cambridge University Press.
Priestley, M. (1982) Spectral Analysis and Time Series, Academic Press.