3.8 Calculating number of parameters in models
When performing model selection and calculating different statistics, it is important to know how many parameters were estimated in the model. While this might seems trivial there are a number of edge cases and wrinkles that are seldom discussed in detail.
When it comes to inference based on regression models, the general idea is to calculate the number of all the independent estimated parameters \(k\). This typically includes all initial components and all coefficients of the model together with the scale, shape and shift parameters of the assumed distribution (e.g. variance in the Normal distribution).
If likelihood is not used, then the number of parameters might be different. For example, if we estimate the model via the minimisation of MSE (similar to OLS), then the number of all estimated parameters does not include the variance anymore - it is obtained as a by product of the estimation. This is because the likelihood needs to have all the parameters of distribution in order to be maximised, but with MSE, we just minimise the mean of squared errors, and the variance of the distribution is obtained automatically. While the values of parameters might be the same, the logic is slightly different.
In addition, all the restrictions on the parameters can reduce the number of estimated parameters, when they get to the boundary values.
In general, if a parameter is provided in the model, then it does not count towards the number of all estimated parameters. So, setting \(b_1=1\) acts in the same fashion.
Finally, if a parameter is just a function of another one, then it does not count towards the \(k\) as well.
We will come back to the number of parameters later in this textbook, when we discuss specific models.
A final note: typically, the standard maximum likelihood estimators for the scale, shape and shift parameters are biased in small samples and do not coincide with the OLS estimators. For example, in case of Normal distribuiton, OLS estimate of variance has \(T-k\) in the denominator, while the likelihood one has just \(T\). This needs to be taken into account, when the variance is used in forecasting.