This book is in Open Review. I want your feedback to make the book better for you and other readers. To add your annotation, select some text and then click the on the pop-up menu. To see the annotations of others, click the button in the upper right hand corner of the page

Chapter 5 Data handling and analysis

Take any of the examples discussed in Chapters 3 and 4. How do we know the true parameter of a distribution? For example, how can we get the arrival rate of patients for the Poisson distribution in the example from Section 3.5? There usually is only one way to get the value - collect the data and estimate it.

Remark. While one can also use judgment to set specific values, in general such estimates contain biases and should only be used if no data is available for the analysis.

Furthermore, in some cases we need to get an idea about the distribution of values for further analysis and to have a feeling about the potential relation between the variables.

How can we do all of that?

In this chapter, we discuss what the potential issues with the data can be, how we can clean it, and what sort of basic preliminary analysis we can do based on data to get some estimates of parameters, which we can then use for a more advanced analysis. The latter can be done either using numerical or graphical methods. The numerical one is useful when you want to have a summary information about the data, while the graphical is useful when you can spend more time, investigating relations and issues in the data. In many cases, they complement each other.

Remark. We do not recommend to end your work with just preliminary data analysis. Usually, this is just a first step to get a better understanding of the dataset, which can then be followed by model construction (discussed in Chapters 10 and 11).