Longitudinal data analysis: Introduction



ITEM

The longitudinal data set

ITEM The response variable is now binary, indicating for each calendar year whether or not there was a migration move. As temporary moves of a few months duration do not imply commitment to an area, they are not considered as migration. Therefore migration events are recorded on an annual basis, with at most one move per year. We do not use annual count data.

ITEM We can now use time-varying explanatory variables. The variables age, calendar year, duration of stay and the presence of children of secondary age in the family are recorded each year, while marital status, employment status and occupational status are recorded at the beginning and end of each year. Other explanatory variables are derived from the raw data; some indicate a change in the status variables during the year, others have been created by collapsing categories of certain variables.

ITEM We look at the marital, employment and occupational status variables both at the beginning and at the end of each year, as it may be either the original status, the destination status or a change in status during the year which influences individual migration.

ITEM It is important to distinguish between two types of explanatory variable: an endogenous explanatory variable, which is in some way a function of an earlier outcome of the process under study, and an exogenous explanatory variable, in which there is no such relationship.

ITEM In this data set duration of stay is an endogenous explanatory variable, because the number of years of residence since the last migration move is related to the timing of that move.


ITEM

Residual heterogeneity

Longitudinal data consist of repeated observations on each individual. The observations are independent between individuals, but correlated within individuals. The differences between individuals are measured by a range of explanatory variables which may differ over time. In practice not all the variables that characterize individuals are observable, and the omitted variables give rise to a residual heterogeneity.

In the cross-sectional analysis, as all explanatory variables were exogenous, the parameter estimates were consistent even though the standard Poisson model was misspecified. This is not the case for cross-sectional or longitudinal analyses if there are endogenous explanatory variables.

In the presence of endogenous explanatory variables, such as duration of stay, inference about temporal variation requires an explicit representation of residual heterogeneity, otherwise parameter estimates will be biased. This is only possible with longitudinal data; the problems posed by endogenous variables cannot be overcome using cross-sectional data.


ITEM

The model

The response variable yit is binary, defined as 1 if the individual i migrates in year t, and 0 otherwise. It has a Bernoulli probability distribution with

Pr(yit) = pityit (1-pit)1-yit

where pit is the probability of a migration move by individual i in year t. The relation between pit and the explanatory variables is made through a suitable linear predictor and the logistic link function. This transforms the linear predictor of explanatory variables, which may have any value between plus and minus infinity, to a probability which necessarily lies between zero and one.

Using the logistic link function log[pit/(1-pit)], the simple logistic regression model is:

log[pit/(1-pit)] = b' xit

where b' xit = b0 + b1xi1 + b2xi2 + b3xi3 + b4xi4+... .
b' xit is a shorthand (vector) way of denoting the linear predictor, which may contain a large number of explanatory variables.

This can be rewritten as

pit=exp(b' xit)/[1+exp(b' xit)]

and the model including residual heterogeneity as

pit=exp(b' xit+ei) /[1+exp(b' xit+ei)]

where xit is a vector of explanatory variables, b' is a vector of unknown parameters and ei is an individual specific term summarizing the effect of the omitted variables.


The large number of possible explanatory variables in the longitudinal data set require a pragmatic approach to model building. We first model the temporal variation.

Next:Longitudinal analysis: Temporal variation

Home page

Contents

Previous