Multiple regression estimates average relationships between
response (eg educational attainment) and predictor variables e.g. socio-economic status,
gender, previous (baseline) ability.
The above graph illustrates a typical linear regression
relationship, in this case between outcome attainment and prior-ability among a sample of
students. The red line shows that on average an increase in prior ability is associated
with an increase in outcome attainment.
A fundamental assumption of this regression model is that the
residuals (the distance of the data points from the red regression line) are independent.
However, data often have a multilevel structure which violates this assumption.
In this example students are grouped within schools. If we
believe that the process of student selection by schools or the education given by schools
may influence outcome attainment, then two students within a particular school will tend
to be more similar than two students from different schools.
The pupils at two schools are highlighted in the above graph to
illustrate this point. If we ignore the nesting of pupils within schools - that is,
we analyse the data as though all pupils were independent - then we will tend to underestimate
the standard errors of the regression coefficients.
This problem, called "misestimated precision", means that we will tend to find too many
relationships to be statistically significant.
Generally we are interested not only in the average relationship (the red line) but in
how this relationship varies from school to school.
Multilevel modelling provides a powerful framework for exploring how average
relationships vary across hierarchical structures.
Next Section: Hierarchical Structures