Longitudinal data analysis: A parsimonious main effects model for temporal data

LINE

ITEM

Model building strategy

In the first instance, we aim to find a parsimonious main effects model for the temporal variables. Using the results of our initial exploratory analysis we start by fitting the simple logistic model and comparing the fits of the following linear predictors:

age+age2+age3+age4+age5+age6 +year+year2+year3+dur

and

age+age2+age3+age4+age5+age6 +year+year2+year3+log(dur)

We choose the better fitting model, and then fit a series of simple logistic models using a backward elimination technique. At each step we test if the removal of the least significant explanatory variable (lowest t-ratio) gives a significant deterioration in the model fit. If the removal of an explanatory variable results in an increase in deviance of less than 3.84 ie. c 2(1) at the 5% level, we exclude it from the model; otherwise it is retained.


ITEM

Sabre analysis

SABRE SESSION:INPUT AND OUTPUT
  
data case move age year dur ed ch1 ch2 ch3 ch4 msb mse esb ese &
osb ose mbu mrm mfm msb1 epm eoj esb1 ops osb1 msb2 esb2 osb2 osb3             
read rochmig.dat
                                                            
       6349 observations in dataset
                                                         
yvar move   
transform age2 age * age                     
transform age3 age2 * age   
transform age4 age3 * age   
transform age5 age4 * age             
transform age6 age5 * age         
transform ldur log dur         
transform year2 year * year        
transform year3 year2 * year      
lfit int dur year year2 year3 age age2 age3 age4 age5 age6     

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2993.0684        5809.    
        3           2373.6995        619.4    
        4           2231.7859        141.9    
        5           2195.4927        36.29    
        6           2190.2053        5.287    
        7           2190.0373       0.1680    
        8           2190.0367       0.6502E-03
        9           2190.0367       0.1007E-06
 
dis est                    

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                    -62.752           32.990    
    dur                   -0.20904          0.17902E-01
    year                  -0.53834          0.94139    
    year2                  0.71197E-02      0.14008E-01
    year3                 -0.33336E-04      0.68744E-04
    age                     11.740           4.8338    
    age2                  -0.70751          0.33134    
    age3                   0.20615E-01      0.10996E-01
    age4                  -0.29015E-03      0.17681E-03
    age5                   0.15811E-05      0.11036E-05
    age6                        0.          ALIASED [E]

C Extrinsic aliasing has occurred for age6.
C Fitting high order polynomials can often cause numerical problems.
C An option is to lower the tolerance for aliasing from the default value.
C As the parameter estimates for the higher order terms are very small
C We choose to transform 'age' to 'trage'=(age-30)/10, roughly 
C standardising this variable. 
C This is done in two stages.
transform tempage age - 30                 
transform trage tempage / 10      
transform trage2 trage * trage    
transform trage3 trage2 * trage       
transform trage4 trage3 * trage 
transform trage5 trage4 * trage    
transform trage6 trage5 * trage   
lfit int dur year year2 year3 trage trage2 trage3 trage4 trage5 trage6    

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2992.7095        5809.    
        3           2373.0803        619.6    
        4           2230.9609        142.1    
        5           2193.9097        37.05    
        6           2187.7970        6.113    
        7           2187.2527       0.5443    
        8           2187.2013       0.5138E-01
        9           2187.2004       0.8804E-03
       10           2187.2004       0.3062E-06
 
                                       
dis est                       

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                     12.878           20.980    
    dur                   -0.20936          0.17929E-01
    year                  -0.53826          0.94677    
    year2                  0.70876E-02      0.14085E-01
    year3                 -0.33068E-04      0.69111E-04
    trage                  0.36390          0.32000    
    trage2                -0.31495E-02      0.58966    
    trage3                -0.56019          0.51877    
    trage4                 0.28100          0.54056    
    trage5                 0.43264          0.20575    
    trage6                -0.22748          0.14640    

C now try log(duration) instead of duration 
lfit int ldur year year2 year3 trage trage2 
trage3 trage4 trage5 trage6         

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2959.3492        5842.    
        3           2315.9106        643.4    
        4           2186.2580        129.7    
        5           2169.6448        16.61    
        6           2168.1606        1.484    
        7           2167.8240       0.3366    
        8           2167.7919       0.3208E-01
        9           2167.7916       0.3470E-03
       10           2167.7916       0.4665E-07
 
dis est               

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                     12.117           21.298    
    ldur                   -1.0483          0.72564E-01
    year                  -0.49783          0.96044    
    year2                  0.65403E-02      0.14278E-01
    year3                 -0.30640E-04      0.70011E-04
    trage                  0.23216          0.32332    
    trage2                -0.11755          0.59711    
    trage3                -0.80204          0.52563    
    trage4                 0.38544          0.55272    
    trage5                 0.58007          0.20935    
    trage6                -0.29310          0.15118  
                                               
C the model fits better with ldur
C start backward elimination using this model
C remove the highest polynomial term for year  
lfit -year3                          

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2959.3891        5842.    
        3           2315.9678        643.4    
        4           2186.3817        129.6    
        5           2169.8304        16.55    
        6           2168.3512        1.479    
        7           2168.0149       0.3363    
        8           2167.9828       0.3205E-01
        9           2167.9825       0.3473E-03
       10           2167.9825       0.4688E-07
 
dis est                          

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                     2.8950           3.3215    
    ldur                   -1.0489          0.72558E-01
    year                  -0.79215E-01      0.96845E-01
    year2                  0.29616E-03      0.70291E-03
    trage                  0.24580          0.32189    
    trage2                -0.12526          0.59701    
    trage3                -0.80970          0.52543    
    trage4                 0.38969          0.55254    
    trage5                 0.57874          0.20935    
    trage6                -0.29289          0.15113    
lfit -year2                               

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2960.7613        5841.    
        3           2317.1450        643.6    
        4           2186.5548        130.6    
        5           2170.0008        16.55    
        6           2168.5289        1.472    
        7           2168.1916       0.3373    
        8           2168.1594       0.3224E-01
        9           2168.1590       0.3511E-03
       10           2168.1590       0.4787E-07
C the increase in deviance on removing year2 and year3
C is not significant at the 5% level 
dis est                                  

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                     1.5139          0.53900    
    ldur                   -1.0488          0.72558E-01
    year                  -0.38518E-01      0.70233E-02
    trage                  0.24860          0.32199    
    trage2                -0.10853          0.59570    
    trage3                -0.81168          0.52582    
    trage4                 0.38768          0.55271    
    trage5                 0.57919          0.20955    
    trage6                -0.29282          0.15125    

C remove the highest polynomial term for age
lfit -trage6                  

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2961.4528        5840.    
        3           2318.8479        642.6    
        4           2189.1451        129.7    
        5           2173.5159        15.63    
        6           2172.9519       0.5640    
        7           2172.9473       0.4616E-02
        8           2172.9473       0.7230E-05
 
dis est                     

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                     1.2943          0.53047    
    ldur                   -1.0417          0.72482E-01
    year                  -0.37779E-01      0.70270E-02
    trage                 -0.46674E-01      0.26454    
    trage2                 0.89932          0.31357    
    trage3                 0.23829E-01      0.30000    
    trage4                -0.64032          0.15238    
    trage5                 0.19928          0.90486E-01

C removing trage6 has produced an increase in deviance significant at
C the 5% level. Therefore keep all terms of sixth order polynomial


lfit +trage6                            

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2960.7613        5841.    
        3           2317.1450        643.6    
        4           2186.5548        130.6    
        5           2170.0008        16.55    
        6           2168.5289        1.472    
        7           2168.1916       0.3373    
        8           2168.1594       0.3224E-01
        9           2168.1590       0.3511E-03
       10           2168.1590       0.4787E-07

C test year 
lfit -year           

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2971.7962        5830.    
        3           2340.6810        631.1    
        4           2216.2755        124.4    
        5           2200.6027        15.67    
        6           2199.2849        1.318    
        7           2199.0021       0.2828    
        8           2198.9772       0.2493E-01
        9           2198.9770       0.2284E-03
       10           2198.9770       0.2177E-07
C significant change in deviance 

lfit +year                         
    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2960.7613        5841.    
        3           2317.1450        643.6    
        4           2186.5548        130.6    
        5           2170.0008        16.55    
        6           2168.5289        1.472    
        7           2168.1916       0.3373    
        8           2168.1594       0.3224E-01
        9           2168.1590       0.3511E-03
       10           2168.1590       0.4787E-07

C test log(duration) 
lfit -ldur                                    

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           3024.8900        5777.    
        3           2455.8074        569.1    
        4           2369.9867        85.82    
        5           2362.7628        7.224    
        6           2362.0790       0.6839    
        7           2361.9175       0.1615    
        8           2361.9060       0.1150E-01
        9           2361.9059       0.6667E-04
C significant change in deviance 
lfit +ldur                                   

    Iteration        Deviance        Reduction
    __________________________________________
        1           8801.5829    
        2           2960.7613        5841.    
        3           2317.1450        643.6    
        4           2186.5548        130.6    
        5           2170.0008        16.55    
        6           2168.5289        1.472    
        7           2168.1916       0.3373    
        8           2168.1594       0.3224E-01
        9           2168.1590       0.3511E-03
       10           2168.1590       0.4787E-07
 
C final model                             
dis est                               

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                     1.5139          0.53900    
    trage                  0.24860          0.32199    
    trage2                -0.10853          0.59570    
    trage3                -0.81168          0.52582    
    trage4                 0.38768          0.55271    
    trage5                 0.57919          0.20955    
    trage6                -0.29282          0.15125    
    year                  -0.38518E-01      0.70233E-02
    ldur                   -1.0488          0.72558E-01
stop                 


ITEM

Results and conclusions

ITEM The first two models fitted compare the effects of duration and log(duration) in the full model. The model with log(duration) gives a much better fit with a reduction of deviance of almost 20; this function of duration is kept in the model.

ITEM During the process of backward elimination the second and third order terms of year have been removed from the model. The sixth order term of age is statistically significant at the 5% level; therefore this and all the lower order terms are retained in this hierarchical model. Both year and log(duration) are highly significant and are retained.

ITEM The parameters for this parsimonious model are as follows:

VariableEstimateStandard Error
constant1.51390.53900
ldur-1.0488 0.72557E-01
year-0.38518E-010.70233E-02
trage0.24860 0.32199
trage**2-0.10853 0.59570
trage**3-0.81168 0.52582
trage**40.38768 0.55271
trage**50.579190.20955
trage**6-0.29282 0.15125

ITEM It is noted that the c2 test used to compare the deviance of nested models is not very powerful with highly correlated explanatory variables, such as powers of age. It may be possible to improve on the above parsimonious model with more powerful tests for individual effects, but that is beyond the scope of the present analysis.

ITEM The negative coefficient estimate for ldur indicates that the probability of migration decreases with duration of stay. This may be due to cumulative inertia effects due to a strengthening of community ties with increasing length of residence. Alternatively, it may be due to residual heterogeneity; with increasing duration, the individuals most likely to migrate will be more and more underrepresented.

ITEM The probability of migration predicted by this parsimonious model may be plotted on graphs. In plotting these figures the year is taken as 1985, the individual to be aged 40 and the duration of residence to be 10 years, as appropriate. This is necessary because the precise relationship between an explanatory variable and the response variable depends on the values of the other explanatory variables.
As there are no interaction terms in the model, the patterns shown on the graphs are generally valid.

ITEM The probability of migration plotted against age shows peaks just above age 20, around 35 and the largest near age 50. As the data are sparse for the older age group, the size and location of the third peak must be interpreted with caution,

ITEM The plot against duration of stay shows the expected decrease in the probability of migration with increasing length of residence. The plot against year also shows a decreasing probability of migration with time over the years 1965 to 1985.


Next:Model development: Random effects model for temporal data

Home page

Contents

Previous