Summary Points to Chapter 1

• Often, data have a clustered (panel or tabular) structure. Classical statistics assumes that observations are independent and identically distributed (iid). Applied to clustered data, this assumption may lead to false results. In contrast, the mixed effects model treats clustered data adequately and assumes two sources of variation, within cluster and between clusters. Two types of coefficients are distinguished in the mixed model: population-averaged and cluster (or subject) - specific. The former have the same meaning as in classical statistics, but the latter are random and are estimated as posteriori means.
• The linear mixed effects (LME) model may be viewed as a generalization of the variance component (VARCOMP) and regression analysis models. When the number of clusters is small and the number of observations per cluster is large, we treat the cluster-specific coefficients as fixed and ordinary regression analysis with dummy variables applies, as in the ANOVA model. Such a model is called a fixed effects model. Vice versa, when the number of clusters is large but the number of observations per cluster is relatively small, a random effects model would be more adequate then the cluster-specific coefficients are random.
• The mixed model technique is a child of the marriage of the frequentist and Bayesian approaches. Similar to the Bayesian approach, a mixed model specifies the model in a hierarchical fashion, assuming that parameters are random. However, unlike the Bayesian approach, hyperparameters are estimated from the data as in the frequentist approach. As in the Bayesian approach, one has to make a decision as to the prior distribution, but that distribution may contain unknown parameters that are estimated from the data, as in the frequentist approach.
• Penalized likelihood is frequently used to cope with parameter multidimensionality. We show that the penalized likelihood may be derived from a mixed model as an approximation to the marginal likelihood after applying the Laplace approximation. Moreover, the penalty coefficient, often derived from a heuristic procedure, is estimated by maximum likelihood as an ordinary parameter.
• The Akaike information criterion (AIC) is used to compare statistical models and to choose the most informative. The AIC has the form of a penalized log-likelihood with the penalty equal to the dimension of the parameter vector. A drawback of the AIC is that it does not penalize ill-posed statistical problems, as in the case of multicollinearity among explanatory variables in linear regression. We develop a healthy AIC that copes with ill-posedness as well because the penalty term involves the average length of the parameter vector. Consequently, among models with the same log-likelihood value and number of parameters, HAIC will choose the model with the shortest parameter vector length.
• Since the mixed model naturally leads to penalized likelihood, it can be applied to penalized smoothing and polynomial fitting. Importantly, the difficult problem of penalty coefficient selection is solved by the mixed model technique by estimating this coefficient from the data. In penalized smoothing, we restrain the parameters through the bending energy, in polynomial fitting through the second derivative.
• The mixed model copes with parameter multidimensionality. For example, if a statistical model contains a large number of parameters, one may assume that a priori parameters have zero mean and unknown variance. Estimating this variance from the data, after Laplace approximation we come to the penalized log-likelihood. We illustrate this approach with a dietary problem in conjunction with logistic regression where the number of food items consumed may be large.
• Tikhonov regularization aims to replace an ill-posed problem with a well-posed problem by adding a quadratic penalty term. However, selection of the penalty coe.cient is a problem. Although Tikhonov regularization receives a nice statistical interpretation in the Bayesian framework, the problem of the penalty coe.cient remains. A nonlinear mixed model estimates the penalty coefficient from the data along with the parameter of interest.
• Computerized tomography (CT) reconstructs an image from projections and belongs to the family of linear image reconstruction. Since the number of image pixels is close to the number of observations, CT leads to an ill-posed problem. To obtain a well-posed problem, a priori assumptions on the reconstructed image should be taken into account. We show that a mixed model may accommodate various prior assumptions without complete specification of the prior distribution.
• Positron emission tomography (PET) uses the Poisson regression model for image reconstruction and the EM algorithm for likelihood maximization. Little statistical hypothesis testing has been reported, perhaps due to the fact that the EM algorithm does not produce the covariance image matrix. Fisher scoring or Unit step algorithms are much faster and allow computation of the covariance matrix needed for various hypothesis testing as if two images in the area of interest are the same. To cope with ill-posedness, Bayesian methods and methods of penalized likelihood have been widely applied. The generalized linear mixed model (GLMM), studied extensively in Chapter 7, also follows the line of the Bayesian approach, but enables estimation of the regularization parameter from PET data. A multilevel GLMM model can combine repeated PET measurements and process them simultaneously increasing statistical power substantially.
• The mixed model is well suited for the analysis of biological data when, on the one hand, observations are of the same biological category (maple leaf), but on the other hand, individuals differ. Consequently, there are two sources of variation: variation between individuals (intersubject variance) and variation within an individual (intrasubject variance). The common biological type corresponds to population-averaged parameters and individuality corresponds to subject-specific parameters. Shape is the simplest biological characteristic. Its analysis is complicated by the fact that shapes may be rotated and translated arbitrarily. Several mixed models for shape analysis are discussed in Chapter 11.
• Image science enables us to derive large data of repeated structure; thus application of the repeated measurements model, such as a mixed model, seems natural. Until now, image comparison in medicine has been subjective and based on "eyeball" evaluation of a few images (often, just a couple). Statistical thinking in image analysis is generally poor. For example, a proper DNA Western blot image evaluation should be based on several tissue samples analyzed by a multilevel mixed model.
• Mixed models can be applied for statistical image analysis, particularly to analyze an ensemble of images (see Chapter 12). As with shape analysis, two sources of variation are considered, the withinimage and between-images variation. Since an image may be described as a large matrix, we may treat the element as a nonlinear function of the index and apply the nonlinear mixed effects model of Chapter 6. The mixed model can also be applied to study the motion of fuzzy objects such as clouds.