Longitudinal data analysis: Temporal variation

LINE

As a first step we model the temporal variation, and fit models both with and without residual heterogeneity and compare them.

ITEM

Temporal variation

The dynamic characteristics of the data are described by the three temporal explanatory variables: age, year, and duration of stay. Cohort effects are subsumed in the year and age components. Alternatively, it would be possible to reparameterise the model so that age and cohort rather than age and year effects are estimated. This would not affect the goodness of fit of the model.

ITEM Year effects are caused by external economic and social changes generating fluctuations in aggregate migration over time.

ITEM The variation of migration propensity with age is related to life cycle factors, such as marriage and children, and to career progression.

ITEM Duration of stay is a proxy variable for the many social, community and economic ties which strengthen with length of residence. It is a measure of cumulative inertia, which may compound the variation of migration propensity with age. (See Mc Ginnis, 1968; Huff and Clark, 1978.)


What functions of these explanatory variables are appropriate to use in the model?
We first explore the data to find a suitable starting point for model building.

ITEM

The age effect

GRAPH As a first step, it is helpful to examine how the empirical mean migration rate varies with age. The mean migration rate is calculated by dividing the total number of moves by the total number of years of migration opportunity for each distinct age.

The results on the graph show a clear peak around age 20, some evidence of another peak at about 30 and at least two peaks close to each other just under age 50. The latter peaks could be the result of fluctuations because the data are more sparse here.

It must be noted that there are no controls for other temporal variables in this graph. Nevertheless, there is evidence that the variation with age is multimodal (ie. has several peaks). This suggests using a polynomial representation of age in the models.



ITEM

Modelling age, year and duration of stay as categorical variables

To explore how the migration rate varies with the three temporal variables, we split each variable into distinct categories, in such a way that we have a reasonable number of data points within each category. Thus the categories usually span five years, but are longer where the data are sparse near the edge of the data window. We fit the logistic model using these categories as levels of factors.

For age we choose cut-off points 20,25,30,35,40 and 45 years, so that the lowest category represents an age of less than 20 and the highest an age greater than 45. The cut-off points for year will be 55,60,65,70,75 and 80 and for duration of stay 5,10,15,20,25 and 30 years.
The model may be fitted using SABRE software as follows:

SABRE SESSION:INPUT AND OUTPUT
	  
data case move age year dur ed ch1 ch2 ch3 ch4 msb mse esb ese &    
osb ose mbu mrm mfm msb1 epm eoj esb1 ops osb1 msb2 esb2 osb2 osb3     
read rochmig.dat
                           
       6349 observations in dataset
                                    
yvar move                 
C convert variables to factors using the following    
C cut-off points                 
factor age agegp 20 25 30 35 40 45        
factor dur durgp 5 10 15 20 25 30       
factor year yeargp 55 60 65 70 75 80    
lfit int agegp  yeargp durgp     

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2968.3684        5833.    
        3           2335.8507        632.5    
        4           2208.6718        127.2    
        5           2187.8156        20.86    
        6           2185.1153        2.700    
        7           2184.8380       0.2772    
        8           2184.8279       0.1014E-01
        9           2184.8278       0.2246E-04
 
dis est               

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                    -2.1704          0.23184    
    agegp ( 1)                  0.          ALIASED [I]
    agegp ( 2)              1.1042          0.16933    
    agegp ( 3)             0.73531          0.21522    
    agegp ( 4)              1.2723          0.23824    
    agegp ( 5)              1.0235          0.32081    
    agegp ( 6)              1.0312          0.42478    
    agegp ( 7)              1.5378          0.51473    
    yeargp( 1)                  0.          ALIASED [I]
    yeargp( 2)            -0.37839E-01      0.27795    
    yeargp( 3)            -0.50404          0.28618    
    yeargp( 4)            -0.74076          0.28944    
    yeargp( 5)            -0.47078          0.27593    
    yeargp( 6)            -0.86073          0.28758    
    yeargp( 7)             -1.1719          0.28593    
    durgp ( 1)                  0.          ALIASED [I]
    durgp ( 2)             -1.4236          0.15918    
    durgp ( 3)             -1.9089          0.25098    
    durgp ( 4)             -2.6716          0.38781    
    durgp ( 5)             -4.1664           1.0210    
    durgp ( 6)             -2.9408          0.77358    
    durgp ( 7)             -3.0448           1.1063    
 
stop                                           
	



ITEM

Results and conclusion

1 The parameter estimate of the intercept term refers to the lowest category of each categorical variable; the estimates for the higher levels give the contrasts between those categories and this reference level. The estimates for level 1 of each variable are therefore set to zero (and are said to be aliased).
2 Examination of the parameter estimates gives an indication of how the migration rate varies from category to category, when all three temporal variables are controlled for. For clarity the results are displayed on graphs.
3 The parameter estimates for age go up and down, rising three times as we go from category 1 to category 7 (Figure 1). This suggests including age in the model as a sixth order polynomial. We note that the age effect is likely to be better estimated at the lower ages than at the higher ages, because the data are sparse for the older age group.
4 For year there is a downward trend in parameter estimates, but with a small increase at category five (Figure 2). This may be a consequence of sparsity of data or it may show a real trend for these years. To allow for this rise and fall, we shall include year as a third order polynomial.
5 As duration of stay is increased, there is a general downward trend in parameter estimates, however the trend is not quite linear (Figure 3). The fluctuations at durations above 25 years may be due to sparsity of data. Plotting the parameter estimates against log duration (Figure 4) gives a more linear plot. This suggests trying this variable as either a linear or a logarithmic function.
6 From the parameter estimates we can calculate how the probability of migration varies with each of the explanatory variables for fixed values of the other two variables. Figure 5 illustrates the variation of the probability of migration with age in 1985 with duration of stay set to 10 years. Similar graphs may be plotted for the other variables.


Therefore the starting point for model building will be the following model:

age+age2+age3+age4+age5+age6 +year+year2+year3+dur [or alternatively + log(dur)].


Next:Model development: A parsimonious main effects model for temporal data

Home page

Contents

Previous