
| The longitudinal data set |
The response variable is now binary, indicating for each calendar year
whether or not there was a migration move. As temporary moves of a few
months duration do not imply commitment to an area, they are not
considered as migration. Therefore migration events are recorded
on an annual basis, with at most one move per year.
We do not use annual count data.
We can now use time-varying explanatory variables.
The variables age, calendar year, duration of stay
and the presence of children of secondary age in the family
are recorded each year, while marital status, employment status and
occupational status are recorded at the beginning and end of each year.
Other explanatory variables are derived from the raw
data; some indicate a change in the status variables during the year,
others have been created by collapsing categories of certain variables.
We look at the marital, employment and occupational status variables both
at the beginning and at the end of each year, as it may be either the
original status, the destination status or a change in status during the
year which influences individual migration.
It is important to distinguish between
two types of explanatory variable: an
endogenous explanatory variable, which is in some way a function
of an earlier outcome of the process under study, and an exogenous
explanatory variable, in which there is no such relationship.
In this data set duration of stay is an endogenous explanatory
variable, because the number of years of residence since the last
migration move is related to the timing of that move.
| Residual heterogeneity |
In the cross-sectional analysis, as all explanatory variables were exogenous, the parameter estimates were consistent even though the standard Poisson model was misspecified. This is not the case for cross-sectional or longitudinal analyses if there are endogenous explanatory variables.
In the presence of endogenous explanatory variables, such as duration of stay, inference about temporal variation requires an explicit representation of residual heterogeneity, otherwise parameter estimates will be biased. This is only possible with longitudinal data; the problems posed by endogenous variables cannot be overcome using cross-sectional data.
| The model |
| Pr(yit) = pityit (1-pit)1-yit |
where pit is the probability of a migration move by individual i in year t. The relation between pit and the explanatory variables is made through a suitable linear predictor and the logistic link function. This transforms the linear predictor of explanatory variables, which may have any value between plus and minus infinity, to a probability which necessarily lies between zero and one.
Using the logistic link function log[pit/(1-pit)], the simple logistic regression model is:
| log[pit/(1-pit)] = b' xit |
where
b' xit
= b0
+ b1xi1
+ b2xi2
+ b3xi3
+ b4xi4+... .
b' xit
is a shorthand (vector) way of denoting the linear predictor,
which may contain a large number of explanatory variables.
This can be rewritten as
| pit=exp(b' xit)/[1+exp(b' xit)] |
and the model including residual heterogeneity as
| pit=exp(b' xit+ei) /[1+exp(b' xit+ei)] |
where xit is a vector of explanatory variables,
b' is a vector of unknown
parameters and ei
is an individual specific term summarizing the effect of
the omitted variables.
Next:Longitudinal analysis: Temporal variation |
Home page | Contents | Previous |