7.2 Errors of types 0, I and II

This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

When conducting a conventional statistical test, we can have one of the four situations, depending on what happens in real life and what results we obtain. They are summarised in Table 7.1.

Table 7.1: Four outcomes in hypothesis testing.
		Reality
		\(\mathrm{H}_0\) is true	\(\mathrm{H}_0\) is wrong
The data tells us	Fail to reject \(\mathrm{H}_0\)	Correct decision,	Type II error,
The data tells us	Fail to reject \(\mathrm{H}_0\)	Probability is \(1-\alpha\)	Probability is \(\beta\)
The data tells us	Reject \(\mathrm{H}_0\)	Type I error,	Correct decision,
The data tells us	Reject \(\mathrm{H}_0\)	Probability is \(\alpha\)	Probability is \(1-\beta\)

The Table 7.1 shows two hypothetical outcomes in reality (we never know, which one we have) and two outcomes of hypothesis testing. This gives us the \(2\times 2\) matrix, where \(\alpha\) is the significance level and \(1-\beta\) is so called “Power of the Test” (discussed in detail in Subsection 7.3).

Type I error (aka “false positive”, i.e. we find a positive effect, when we should not have found it) happens when the null hypothesis is actually true, but we reject it. The probability of this event is equal to \(\alpha\). This is one of the definitions of the significance level \(\alpha\) (in how many cases we are ready to make mistakes, when the null hypothesis is true).

Type II error (aka “false negative”, i.e. we do not find effect, while we should have found it) happens when we fail to reject the wrong hypothesis (\(\mathrm{H}_0\) is not true). The probability of this event equals to \(\beta\), which can be calculated based on the assumed distribution, the critical and calculated values for the hypothesis.

In order to remember what Type I and Type II errors stand for, there is a good mnemonic with a story of a boy who cried “wolf”.

Example 7.1 Just as a reminder, in a village, there lived a boy who one day decided to make a practical joke of his fellow villagers. He ran around the main street crying “Wolf!”. We should acknowledge that there was no wolf at that stage, so in our terms we would say that the \(\mathrm{H}_0\): \(\mu=0\) was true. But the villagers who have heard the boy went on the streets to help. They rejected the correct null hypothesis in order to help the boy, and they were surprised to find that there were no wolves on the streets. Thus the villagers made a Type I error.

Next week, the boy encountered a wolf on the main street and started crying “Wolf!”, calling for help. Alas, this time nobody believed the boy and nobody came out to help, and thus the villagers rejected the correct null hypothesis in favour of the wrong one, \(\mathrm{H}_1\): \(\mu\neq 0\). By doing so they have made the Type II error. If the villagers knew statistics, they would understand that failing to reject \(\mathrm{H}_0\) once does not mean that it is true.

While we can regulate the probability of Type I error by changing \(\alpha\), the probability of Type II error cannot be controlled directly. Ideally, we want it to be as low as possible. In general, the more information about the “true” parameters and model you can provide to the test, the lower the Type II error will be. For example, if we want to conduct a test to compare mean with a value and the CLT holds (see Section 6.2), then you might want to choose between t-test and z-test. The latter assumes that the population standard deviation is known (and you can provide it), and as a result has a lower probability of Type II error than the former. We will discuss specific tests in the Section 8.

All the four situations in Table 7.1 rely on the idea that the reality is somehow known. But in real life, we never know whether the null hypothesis is true or not. However, the Table is still useful because it gives an understanding of what to expect from a statistical test and what test to select in each specific situation.

Finally, sometimes analysts refer to the Type 0 error (it is sometimes called “type III” error, but it is more fundamental than Type I or Type II, so I prefer “Type 0”). This is the error that arises when an answer is obtained to the wrong question. This does not have any mathematics behind it but is important in general: we need to understand what questions to ask and how to formulate them correctly before doing the test.