This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

Chapter 5 Dealing with uncertainty in the data

As mentioned in the previous section, we always work with samples and inevitably we deal with randomness just because of that even, when there are no other sources of uncertainty in the data. For example, if we want to estimate the mean of a variable based on the observed data, the value we get will differ from one sample to another. This should have become apparent from the examples we discussed earlier. And, if the LLN and CLT hold, then we know that the estimate of our parameter will have its own distribution and will converge to the population value with the increase of the sample size. This is the basis for the confidence and prediction interval construction, discussed in this section. Depending on our needs, we can focus on the uncertainty of either the estimate of a parameter, or the random variable \(y\) itself. When dealing with the former, we typically work with the confidence interval - the interval constructed for the estimate of a parameter, while in the latter case we are interested in the prediction interval - the interval constructed for the random variable \(y\).

In order to simplify further discussion in this section, we will take the population mean and its in-sample estimate as an example. In this case we have:

A random variable \(y\), which is assumed to follow some distribution with finite mean \(\mu\) and variance \(\sigma^2\);
A sample of size \(T\) from the population of \(y\);
Estimates of mean \(\hat{\mu}=\bar{y}\) and variance \(\hat{\sigma}^2 = s^2\), obtained based on the sample of size \(T\).