This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

1.2 Models, methods et al. 

Before we move to nitty gritty details of the models, it is important to agree what we are talking about. So, here is a couple of definitions:

  • Statistical model (or ‘stochastic model,’ or just ‘model’ in this textbook) is a ‘mathematical representation of a real phenomenon with a complete specification of distribution and parameters’ (Svetunkov and Boylan, 2019). Very roughly, the statistical model is something that contains a structure (defined by its parameters) and a noise that follows some distribution.
  • True model is the idealistic statistical model that is correctly specified (has all the necessary components in correct form), and applied to the data in population. By this definition, true model is never reachable in reality, but it is achievable in theory if for some reason we know what components and variables and in what form should be in the model, and have all the data in the world. The notion itself is important when discussing how far the model that we use is from the true one.
  • Estimated model (aka ‘applied model’ or ‘used model’) is the statistical model that was constructed and estimated on the available sample of data. This typically differs from the true model, because the latter is not known. Even if the specification of the true model is known for some reason, the parameters of the estimated model will differ from the true parameters due to sampling randomness, but will hopefully converge to the true ones if the sample size increases.
  • Data generating process (DGP) is an artificial statistical model, showing how the data could be generated in theory. This notion is utopic and can be used in simulation experiments in order to check, how the selected model with the specific estimator behave in a specific setting. In real life, the data is not generated from any process, but is usually based on complex interactions between different agents in a dynamic environment. Note that I make a distinction between DGP and true model, because I do not think that the idea of something being generated using a mathematical formula is helpful. Many statisticians will not agree with me on this distinction.
  • Forecasting method is a mathematical procedure that generates point and / or interval forecasts, with or without a statistical model (Svetunkov and Boylan, 2019). Very roughly, forecasting method is just a way of producing forecasts that does not explain how the components of time series interact with each other. It might be needed in order to filter out the noise and extrapolate the structure.

Mathematically in the simplest case the true model can be presented in the form: \[\begin{equation} y_t = \mu_{y,t} + \epsilon_t, \tag{1.1} \end{equation}\] where \(y_t\) is the actual observation, \(\mu_{y,t}\) is the structure in the data and \(\epsilon_t\) is the noise with zero mean, unpredictable element, which arises because of the effect of a lot of small factors and \(t\) is the time index. An example would be the daily sales of beer in a pub, which has some seasonality (we see growth in sales every weekends), some other elements of structure plus the white noise (I might go to a different pub, reducing the sales of beer by one pint). So what we typically want to do in forecasting is to capture the structure and also represent the noise with a distribution with some parameters.

When it comes to applying the chosen model to the data, it can be presented as: \[\begin{equation} y_t = \hat{\mu}_{y,t} + e_t, \tag{1.2} \end{equation}\] where \(\hat{\mu}_{y,t}\) is the estimate of the structure and \(e_t\) is the estimate of the white noise (also known as “residuals”). As you see even if the structure is correctly captured, the main difference between (1.1) and (1.2) is that the latter is estimated on a sample, so we can only approximate the true structure with some degree of precision.

If we generate the data from the model (1.1), then we can talk about the DGP, keeping in mind that we are talking about an artificial experiment, for which we know the true model and the parameters. This can be useful if we want to see how different models and estimators behave in different conditions.

The simplest forecasting method can be represented with the equation: \[\begin{equation} \hat{y}_t = \hat{\mu}_{y,t}, \tag{1.3} \end{equation}\] where \(\hat{y}_t\) is the point forecast. This equation does not explain where the structure and the noise come from, it just shows the way of producing point forecasts.

In addition, we will discuss in this textbook two types of models:

  1. Additive, where (most) components are added to one another;
  2. Multiplicative, where the components are multiplied.

(1.1) is an example of additive error model. A general example of multiplicative error model is: \[\begin{equation} y_t = \mu_{y,t} \varepsilon_t, \tag{1.4} \end{equation}\] where \(\varepsilon_t\) is some noise again, which in the reasonable cases should take only positive values and have mean of one. We will discuss this type of model later in the textbook. We will also see several examples of statistical models, forecasting methods, DGPs and other notions and discuss how they relate to each other.

References

• Svetunkov, I., Boylan, J.E., 2019. Multiplicative state-space models for intermittent time series. https://doi.org/10.13140/RG.2.2.35897.06242