
| Model building strategy |
| age+age2+age3+age4+age5+age6 +year+year2+year3+dur |
and
| age+age2+age3+age4+age5+age6 +year+year2+year3+log(dur) |
We choose the better fitting model, and then fit a series of simple logistic models using a backward elimination technique. At each step we test if the removal of the least significant explanatory variable (lowest t-ratio) gives a significant deterioration in the model fit. If the removal of an explanatory variable results in an increase in deviance of less than 3.84 ie. c 2(1) at the 5% level, we exclude it from the model; otherwise it is retained.
| Sabre analysis |
data case move age year dur ed ch1 ch2 ch3 ch4 msb mse esb ese &
osb ose mbu mrm mfm msb1 epm eoj esb1 ops osb1 msb2 esb2 osb2 osb3
read rochmig.dat
6349 observations in dataset
yvar move
transform age2 age * age
transform age3 age2 * age
transform age4 age3 * age
transform age5 age4 * age
transform age6 age5 * age
transform ldur log dur
transform year2 year * year
transform year3 year2 * year
lfit int dur year year2 year3 age age2 age3 age4 age5 age6
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2993.0684 5809.
3 2373.6995 619.4
4 2231.7859 141.9
5 2195.4927 36.29
6 2190.2053 5.287
7 2190.0373 0.1680
8 2190.0367 0.6502E-03
9 2190.0367 0.1007E-06
dis est
Parameter Estimate S. Error
___________________________________________________
int -62.752 32.990
dur -0.20904 0.17902E-01
year -0.53834 0.94139
year2 0.71197E-02 0.14008E-01
year3 -0.33336E-04 0.68744E-04
age 11.740 4.8338
age2 -0.70751 0.33134
age3 0.20615E-01 0.10996E-01
age4 -0.29015E-03 0.17681E-03
age5 0.15811E-05 0.11036E-05
age6 0. ALIASED [E]
C Extrinsic aliasing has occurred for age6.
C Fitting high order polynomials can often cause numerical problems.
C An option is to lower the tolerance for aliasing from the default value.
C As the parameter estimates for the higher order terms are very small
C We choose to transform 'age' to 'trage'=(age-30)/10, roughly
C standardising this variable.
C This is done in two stages.
transform tempage age - 30
transform trage tempage / 10
transform trage2 trage * trage
transform trage3 trage2 * trage
transform trage4 trage3 * trage
transform trage5 trage4 * trage
transform trage6 trage5 * trage
lfit int dur year year2 year3 trage trage2 trage3 trage4 trage5 trage6
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2992.7095 5809.
3 2373.0803 619.6
4 2230.9609 142.1
5 2193.9097 37.05
6 2187.7970 6.113
7 2187.2527 0.5443
8 2187.2013 0.5138E-01
9 2187.2004 0.8804E-03
10 2187.2004 0.3062E-06
dis est
Parameter Estimate S. Error
___________________________________________________
int 12.878 20.980
dur -0.20936 0.17929E-01
year -0.53826 0.94677
year2 0.70876E-02 0.14085E-01
year3 -0.33068E-04 0.69111E-04
trage 0.36390 0.32000
trage2 -0.31495E-02 0.58966
trage3 -0.56019 0.51877
trage4 0.28100 0.54056
trage5 0.43264 0.20575
trage6 -0.22748 0.14640
C now try log(duration) instead of duration
lfit int ldur year year2 year3 trage trage2
trage3 trage4 trage5 trage6
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2959.3492 5842.
3 2315.9106 643.4
4 2186.2580 129.7
5 2169.6448 16.61
6 2168.1606 1.484
7 2167.8240 0.3366
8 2167.7919 0.3208E-01
9 2167.7916 0.3470E-03
10 2167.7916 0.4665E-07
dis est
Parameter Estimate S. Error
___________________________________________________
int 12.117 21.298
ldur -1.0483 0.72564E-01
year -0.49783 0.96044
year2 0.65403E-02 0.14278E-01
year3 -0.30640E-04 0.70011E-04
trage 0.23216 0.32332
trage2 -0.11755 0.59711
trage3 -0.80204 0.52563
trage4 0.38544 0.55272
trage5 0.58007 0.20935
trage6 -0.29310 0.15118
C the model fits better with ldur
C start backward elimination using this model
C remove the highest polynomial term for year
lfit -year3
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2959.3891 5842.
3 2315.9678 643.4
4 2186.3817 129.6
5 2169.8304 16.55
6 2168.3512 1.479
7 2168.0149 0.3363
8 2167.9828 0.3205E-01
9 2167.9825 0.3473E-03
10 2167.9825 0.4688E-07
dis est
Parameter Estimate S. Error
___________________________________________________
int 2.8950 3.3215
ldur -1.0489 0.72558E-01
year -0.79215E-01 0.96845E-01
year2 0.29616E-03 0.70291E-03
trage 0.24580 0.32189
trage2 -0.12526 0.59701
trage3 -0.80970 0.52543
trage4 0.38969 0.55254
trage5 0.57874 0.20935
trage6 -0.29289 0.15113
lfit -year2
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2960.7613 5841.
3 2317.1450 643.6
4 2186.5548 130.6
5 2170.0008 16.55
6 2168.5289 1.472
7 2168.1916 0.3373
8 2168.1594 0.3224E-01
9 2168.1590 0.3511E-03
10 2168.1590 0.4787E-07
C the increase in deviance on removing year2 and year3
C is not significant at the 5% level
dis est
Parameter Estimate S. Error
___________________________________________________
int 1.5139 0.53900
ldur -1.0488 0.72558E-01
year -0.38518E-01 0.70233E-02
trage 0.24860 0.32199
trage2 -0.10853 0.59570
trage3 -0.81168 0.52582
trage4 0.38768 0.55271
trage5 0.57919 0.20955
trage6 -0.29282 0.15125
C remove the highest polynomial term for age
lfit -trage6
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2961.4528 5840.
3 2318.8479 642.6
4 2189.1451 129.7
5 2173.5159 15.63
6 2172.9519 0.5640
7 2172.9473 0.4616E-02
8 2172.9473 0.7230E-05
dis est
Parameter Estimate S. Error
___________________________________________________
int 1.2943 0.53047
ldur -1.0417 0.72482E-01
year -0.37779E-01 0.70270E-02
trage -0.46674E-01 0.26454
trage2 0.89932 0.31357
trage3 0.23829E-01 0.30000
trage4 -0.64032 0.15238
trage5 0.19928 0.90486E-01
C removing trage6 has produced an increase in deviance significant at
C the 5% level. Therefore keep all terms of sixth order polynomial
lfit +trage6
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2960.7613 5841.
3 2317.1450 643.6
4 2186.5548 130.6
5 2170.0008 16.55
6 2168.5289 1.472
7 2168.1916 0.3373
8 2168.1594 0.3224E-01
9 2168.1590 0.3511E-03
10 2168.1590 0.4787E-07
C test year
lfit -year
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2971.7962 5830.
3 2340.6810 631.1
4 2216.2755 124.4
5 2200.6027 15.67
6 2199.2849 1.318
7 2199.0021 0.2828
8 2198.9772 0.2493E-01
9 2198.9770 0.2284E-03
10 2198.9770 0.2177E-07
C significant change in deviance
lfit +year
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2960.7613 5841.
3 2317.1450 643.6
4 2186.5548 130.6
5 2170.0008 16.55
6 2168.5289 1.472
7 2168.1916 0.3373
8 2168.1594 0.3224E-01
9 2168.1590 0.3511E-03
10 2168.1590 0.4787E-07
C test log(duration)
lfit -ldur
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 3024.8900 5777.
3 2455.8074 569.1
4 2369.9867 85.82
5 2362.7628 7.224
6 2362.0790 0.6839
7 2361.9175 0.1615
8 2361.9060 0.1150E-01
9 2361.9059 0.6667E-04
C significant change in deviance
lfit +ldur
Iteration Deviance Reduction
__________________________________________
1 8801.5829
2 2960.7613 5841.
3 2317.1450 643.6
4 2186.5548 130.6
5 2170.0008 16.55
6 2168.5289 1.472
7 2168.1916 0.3373
8 2168.1594 0.3224E-01
9 2168.1590 0.3511E-03
10 2168.1590 0.4787E-07
C final model
dis est
Parameter Estimate S. Error
___________________________________________________
int 1.5139 0.53900
trage 0.24860 0.32199
trage2 -0.10853 0.59570
trage3 -0.81168 0.52582
trage4 0.38768 0.55271
trage5 0.57919 0.20955
trage6 -0.29282 0.15125
year -0.38518E-01 0.70233E-02
ldur -1.0488 0.72558E-01
stop
|
| Results and conclusions |
The first two models fitted compare the effects of duration and
log(duration) in the full model. The model with log(duration) gives a
much better fit with a reduction of deviance of almost 20; this function of
duration is kept in the model.
During the process of backward elimination the second and third
order terms of year have been removed from the model. The sixth
order term of age is statistically significant at the 5% level;
therefore this and all the lower order terms are retained in this
hierarchical model. Both year and log(duration) are highly significant
and are retained.
The parameters for this parsimonious model are as follows:
| Variable | Estimate | Standard Error |
|---|---|---|
| constant | 1.5139 | 0.53900 |
| ldur | -1.0488 | 0.72557E-01 |
| year | -0.38518E-01 | 0.70233E-02 |
| trage | 0.24860 | 0.32199 |
| trage**2 | -0.10853 | 0.59570 |
| trage**3 | -0.81168 | 0.52582 |
| trage**4 | 0.38768 | 0.55271 |
| trage**5 | 0.57919 | 0.20955 |
| trage**6 | -0.29282 | 0.15125 |
It is noted that the c2 test used to
compare the deviance of nested models is not very powerful with highly correlated
explanatory variables, such as powers of age. It may be possible to improve
on the above parsimonious model with more powerful tests for individual effects,
but that is beyond the scope of the present analysis.
The negative coefficient estimate for ldur indicates that the
probability of migration decreases with duration of stay. This may
be due to cumulative inertia effects due to a strengthening of
community ties with increasing length of residence. Alternatively, it
may be due to residual heterogeneity; with increasing duration, the
individuals most likely to migrate will be more and more
underrepresented.
The probability of migration predicted by this parsimonious model may be
plotted on graphs.
In plotting these figures the year is taken as 1985, the individual
to be aged 40 and the duration of residence to be 10 years, as appropriate.
This is necessary because the precise relationship between an explanatory
variable and the response variable depends on the values of the other
explanatory variables.
As there are no interaction terms in the model, the patterns shown on the
graphs are generally valid.
The probability of migration plotted against age shows peaks just above age 20,
around 35 and the largest near age 50. As the data are sparse for the older
age group, the size and location of the third peak must be interpreted with
caution,
The plot against duration of stay shows the expected decrease in the
probability of migration with increasing length of residence. The plot
against year also shows a decreasing probability of migration with time
over the years 1965 to 1985.
Next:Model development: Random effects model for temporal data |
Home page | Contents | Previous |