16.3 Calculating number of parameters in models
When performing model selection and calculating different statistics, it is important to know how many parameters were estimated in the model. While this might seems trivial there are a number of edge cases and wrinkles that are seldom discussed in detail.
When it comes to inference based on regression models, the general idea is to calculate the number of all the independent estimated parameters \(k\). This typically includes all initial components and all coefficients of the model together with the scale, shape and shift parameters of the assumed distribution (e.g. variance in the Normal distribution).
Example 16.1 In a simple regression model: \(y_j = \beta_0 + \beta_1 x_j + \epsilon_j\) - assuming Normal distribution for \(\epsilon_j\), using the MLE will result in the estimation of \(k=3\): the two parameters of the model (\(\beta_0\) and \(\beta_1\)) and the variance of the error term \(\sigma^2\).
If likelihood is not used, then the number of parameters might be different. For example, if we estimate the model via the minimisation of MSE (similar to OLS), then the number of all estimated parameters does not include the variance anymore - it is obtained as a by product of the estimation. This is because the likelihood needs to have all the parameters of distribution in order to be maximised, but with MSE, we just minimise the mean of squared errors, and the variance of the distribution is obtained automatically. While the values of parameters might be the same, the logic is slightly different.
Example 16.2 This means that for the same simple linear regression, estimated using OLS, the number of parameters is equal to 2: estimates of \(\beta_0\) and \(\beta_1\).
Remark. For the calculation of information criteria, the number of parameters in the example above should be still considered 3 (parameters and scale). See explanation in Section 16.4.
In addition, all the restrictions on the parameters can reduce the number of estimated parameters, when they get to the boundary values.
Example 16.3 If we know that the parameter \(\beta_1\) lies between 0 and 1, and in the estimation process it gets to the value of 1 (due to how the optimiser works), it can be considered as a restriction \(\beta_1=1\). So, when estimated via the minimum of MSE with this restriction, this would imply that \(k=1\).
In general, if a parameter is provided in the model, then it does not count towards the number of all estimated parameters. So, setting \(b_1=1\) acts in the same fashion.
Finally, if a parameter is just a function of another one, then it does not count towards the \(k\) as well.
Example 16.4 If we know that in the same simple linear regression \(\beta_1 = \frac{\beta_0}{\sigma^2}\), then the number of all the estimated parameter via the maximum likelihood is 2: \(\beta_0\) and \(\sigma^2\).
We will come back to the number of parameters later in this textbook, when we discuss specific models.
A final note: typically, the standard maximum likelihood estimators for the scale, shape and shift parameters are biased in small samples and do not coincide with the OLS estimators. For example, in case of Normal distribution, OLS estimate of variance has \(n-k\) in the denominator, while the likelihood one has just \(n\). This needs to be taken into account, when the variance is used in forecasting.