This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

3.6 Negative Binomial distribution

One of the tasks in manufacturing is to understand how many days it might take for a machine to break. In this case, everyday when it works we could encode as a “failure to break”, and when it breaks, we would encode it as “success”. Assuming that there are many small unpredictable factors impacting the possibility to break, we can safely assume that there is a fixed probability that this will happen. Can we somehow model this process to make adequate decisions about the work of the machine?

Yes. The Negative Binomial distribution is one of the appropriate tools here. It models the number of failures in an experiment (e.g. no breaks) before some defined number of successes occurs (a break). In contrast with the Binomial distribution, in the NegBin, the number of the total trials is unknown. And in the example above, we could say that a success is when the machine stops working.

Visually, the PMF of the NegBin can be represented in the following way (Figure 3.11) with \(p\) being the probability of success, \(k\) being the number of successes before the experiment stops, and \(m\) being the number of failures.

PMF of the Negative Binomial distribution with different probabilities and number of successes $k$.

Figure 3.11: PMF of the Negative Binomial distribution with different probabilities and number of successes \(k\).

Figure 3.11 shows that, for example, if p=0.2, and we want to have one success (k=1), we have the probability of 0.2 that we will succeed without having any failures. To make this easier to understand, if we have a 1d5 (five-sided dice), and we define success as rolling 5 on it, the probability that we will have it is 0.2 (this is one out of five possible outcomes), which corresponds to the first bar in the first plot in Figure 3.11. The probability that we will roll 5 in the second trial is \(0.2 \times (1-0.2) = 0.16\), which corresponds to the second bar in the first plot in Figure 3.11, etc. With \(k=2\), we model a situation when we need to get five two times in an experiment (not necessarily sequentially).

The PMF of this distribution is written as: \[\begin{equation} f(m, k, p) = \begin{pmatrix} k+m-1 \\ m \end{pmatrix} p^k (1-p)^m , \tag{3.17} \end{equation}\] where \(k\) is the number of successes and \(m\) is the number of failures. This PMF is very similar to the one of the Binomial distribution, but it is parametrised differently, which makes it useful in some contexts (like the one mentioned in the beginning of this section).