**Open Review**. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

# Chapter 1 Introduction

I have started writing this book in 2020 during the COVID-19 pandemic, having figured out that it has been more than 10 years since the publishing of the fundamental textbook of (Hyndman et al. 2008), who discuss ETS (Error-Trend-Seasonality) framework in the Single Source of Error (SSOE) form and that the topic has not been updated substantially since then. If you are interested in knowing more about exponential smoothing, then this is a must read material on the topic. However, there has been some progress in the area since 2008, and I have developed some models and functions based on SSOE, making the framework a bit more flexible and general. Given that the publication of all the aspects of these models in peer-reviewed journals is very time consuming, I have decided to summarise all the progress in the book, showing what happens inside the models and how to use the functions in different cases, so that there is a source to refer to.

Before we move to nitty gritty details of the models, it is important to agree what we are talking about. So, here is a couple of definitions:

**Statistical model**(or 'stochastic model', or just 'model' in this textbook) is a 'mathematical representation of a real phenomenon with a complete specification of distribution and parameters' (Svetunkov and Boylan 2019). Very roughly, the statistical model is something that contains a structure (defined by its parameters) and a noise that follows some distribution.**True model**is the idealistic statistical model that is correctly specified (has all the necessary components in correct form), applied to the data in population. By this definition, true model is never reachable in reality, but it is achievable in theory if for some reason we know what components and variables and in what form should be in the model, and have all the data in the world. The notion itself is important when discussing how far the model that we use is from the true one.**Estimated model**(aka 'applied model' or 'used model') is the statistical model that was constructed and estimated on the available sample of data. This typically differs from the true model, because the latter is not known. Even if the specification of the true model is known for some reason, the parameters of the estimated model will differ from the true parameters due to sampling randomness, but will hopefully converge to the true ones if the sample size increases.**Data generating process**(DGP) is an artificial statistical model, showing how the data could be generated in theory. This notion is utopic and can be used in simulation experiments in order to check, how the selected model with the specific estimator behave in a specific setting. In real life, the data is not generated from any process, but is usually based on complex interactions between different agents in a dynamic environment. Note that I make a distinction between DGP and true model, because I do not think that the idea of something being generated using a mathematical formula is helpful. Many statisticians will not agree with me on this distinction.**Forecasting method**is a mathematical procedure that generates point and / or interval forecasts, with or without a statistical model (Svetunkov and Boylan 2019). Very roughly, forecasting method is just a way of producing forecasts that does not explain how the components of time series interact with each other. It might be needed in order to filter out the noise and extrapolate the structure.

where \(y_t\) is the actual observation, \(\mu_t\) is the structure in the data and \(\epsilon_t\) is the noise with zero mean, unpredictable element, which arises because of the effect of a lot of small factors. An example would be the daily sales of beer in a pub, which has some seasonality (we see growth in sales every weekends), some other elements of structure plus the white noise (I might go to a different pub, reducing the sales of beer by one pint). So what we typically want to do in forecasting is to capture the structure and also represnt the noise with a distribution with some parameters.

When it comes to applying the chosen model to the data, it can be presented as: \[\begin{equation} y_t = \hat{\mu}_t + e_t, \tag{1.2} \end{equation}\]where \(\hat{\mu}_t\) is the estimate of the structure and \(e_t\) is the estimate of the white noise (also known as "**residuals**"). As you see even if the structure is correctly captured, the main difference between (1.1) and (1.2) is that the latter is estimated on a sample, so we can only approximate the true structure with some degree of precision.

If we generate the data from the model (1.1), then we can talk about the DGP, keeping in mind that we are talking about an artificial experiment, for which we know the true model and the parameters. This can be useful if we want to see how different models and estimators behave in different conditions.

Finally, the simplest forecasting method can be represented with the equation: \[\begin{equation} \hat{y}_t = \hat{\mu}_t, \tag{1.3} \end{equation}\]where \(\hat{y}_t\) is the point forecast. This equation does not explain where the structure and the noise come from, it just shows the way of producing point forecasts.

In addition, we will discuss in this textbook two types of models:

- Additive, where the components are added to some other components;
- Multiplicative, where the components are multiplied.

where \(\varepsilon_t\) is some noise again, which in the reasonable cases should take only positive values and have mean of one. We will discuss this type of model later in the textbook. We will also see several examples of statistical models, forecasting methods, DGPs and other notions and discuss how they relate to each other.

Finally, we will be talking about forecasting in this textbook, so it is also important to introduce the notation \(\hat{y}_{t+h}\), which corresponds to the \(h\) steps ahead **point forecast** produced from the observation \(t\). Typically, this corresponds to the conditional expectation of the model \(\mu_{t+h|t}\), given all the information on observation \(t\), although this does not hold all the time for the classical ETS models. We will also discuss **prediction interval**, the term which means specific bounds that should include \((1-\alpha)\times100\)% of observations in the holdout sample (for example, 95% of observations). Finally, we will also use sometimes the term **confidence interval** usually refers to similar bounds constructed either for parameters of models or for the specific statistics (e.g. conditional mean or variance).

Note that this textbook assumes that the reader is familiar with introductory statistics and knows basic forecasting principles. Hanck et al. (2020) is a good textbook on econometrics and statistics. Hyndman and Athanasopoulos (2018) can be a good start if you do not know forecasting principles. We will also use elements of linear algebra to explain some modelling parts, but this will not be the main focus of the textbook and you will be able to skip the more challenging parts without jeopardising the main understanding of the topic.

### References

Hanck, Christoph, Martin Arnold, Alexander Gerber, and Martin Schmelzer. 2020. “Introduction to Econometrics with R.” Bookdown. https://www.econometrics-with-r.org/index.html.

Hyndman, Rob J., and George Athanasopoulos. 2018. *Forecasting: Principles and Practice, 2nd Edition. Accessed on 01.04.2020*. OTexts: Melbourne, Australia. https://OTexts.com/fpp2.

Hyndman, Rob J., Anne B. Koehler, J. Keith Ord, and Ralph D. Snyder. 2008. *Forecasting with Exponential Smoothing*. Springer Berlin Heidelberg.

Svetunkov, Ivan, and John E. Boylan. 2019. “Multiplicative State-Space Models for Intermittent Time Series.” Department of Management Science, Lancaster University. doi:10.13140/RG.2.2.35897.06242.