Cross-sectional analysis:Poisson model for aggregate count data



ITEM

The Poisson model

If complete randomness in migration behaviour is assumed, then a Poisson model may be used to represent the aggregate count data. Strictly, we should use a Binomial model as each individual is only allowed one migration per year so that the total number of migrations has an upper limit. However, for a large sample and and a low migration rate the Poisson model provides a good approximation.

For a homogeneous population, the probability of obtaining ni outcomes in time ti may be written as

Pr(ni)=(mi)ni exp(-mi) / ni!

where mi is the mean (or expected) number of migrations in time ti.

For a constant annual migration rate r,

mi=r*ti

or

log(mi)=log(r)+log(ti)

This model is an example of a generalised linear model. We will see how to fit such models a little later. When this model is fitted (using log(r) as an OFFSET in SABRE), the average annual migration rate comes out as 0.049 moves per individual per year.

For the time being, we note that this figure can also be calculated by simply dividing the total number of moves in the data set by the total time exposure to migration opportunities for the sample. Thus, there are 312 moves and 6349 annual observations, giving an average of 0.049 moves per individual per year.

This implies that each year a proportion of 0.049 of the population (or 4.9%) migrates, and that a proportion of 0.951 (or 95.1%) remains.

Using this model, the projected proportion moving at least once over a period of T years is equal to [1-(0.951)T].

GRAPH The projected proportion migrating over different time periods is shown by the line on the graph. It is considerably higher than the observed proportion calculated from the data, which is indicated by circles.

It is evident that this model substantially and systematically overpredicts the proportion moving, and therefore underestimates population stability. This is a consequence of assuming that migration behaviour over one time period can be used to predict migration behaviour over a longer time period, and is an example of a general problem, which Coleman (1973) calls the "deficient diagonal" effect.

The assumption that all individuals have the same propensity to migrate, which is not subject to change over time, does not seem compatible with the migration processes generating the data.



ITEM

Allowing the migration rate to vary with time

The migration rate can be allowed to vary systematically with time in this simple model by replacing (ti) in the above equation by (ti) b1. Now the migration rate decreases through migration history if b 1 is less than 1 and increases if b1 is greater than 1. One reason why we may expect b1 to be less than 1 is due to inertia effects, with people increasingly less likely to move with duration in a specific locality.

It is convenient to write

r=exp(b0)

where b0 is an unknown constant, and the exponentiation ensures that r is always non-negative.

The mean number of migrations may now be written as:

mi=exp(b0)*(ti)b1= exp(b0+b1*log(ti))

or

log(mi)=b0+ b1*log(ti)

This model is typical of a generalised linear model, which contains:
  1. a linear regression function or linear predictor in the explanatory variables, [b0+b1*log(ti)],
  2. a transformation, (logarithmic), which relates the linear predictor to the mean mi,
  3. a response variable ni, which has a Poisson distribution with mean mi.

The model may be fitted using SABRE software as follows. To run the example interactively, you will need to download the SABRE software and data sets.

SABRE SESSION:INPUT AND OUTPUT
                                                   
C read in variables from data file
data case n t ed       
read rochmigx.dat  
                                                  
        348 observations in dataset
                                  
C declare response variable
yvar n          
C declare model                    
poisson yes      
C calculate log(time)      
transform ltime log t
C fit Poisson model with intercept
C and log(time) as explanatory variable 
lfit int ltime      

    Iteration        Deviance        Reduction
    __________________________________________
        1           1299.5140    
        2           754.34418        545.2    
        3           658.72919        95.61    
        4           648.79228        9.937    
        5           648.49783       0.2945    
        6           648.49747       0.3547E-03
        7           648.49747       0.5484E-09

C display parameter estimates 
dis est                    

    Parameter              Estimate         S. Error
    ___________________________________________________
    int                    -3.2884          0.35114    
    ltime                   1.0887          0.11119    

C display model fitted 
dis m                   

    X-vars      Y-var
    _________________
    int         n     
    ltime 

    Model type: standard Poisson log-linear 

    Number of observations             =    348

    X-vars df=  2

    Deviance = 648.49747 on 346 residual degrees of freedom
 
stop                            
	



ITEM

Results and conclusion

1 The estimated coefficient b1 of ltime is 1.0887, with a standard error of 0.1112, and is therefore not significantly different from 1. The migration rate does not appear to decline or increase through migration history, but is constant.

Table 2:Observed and expected frequencies
Number of moves 0 1 2 3 4 5 >=6
Observed frequency 228 34 42 17 9 8 10
Expected frequency 164.3 101.6 50.4 21.1 7.5 2.3 0.80
2 The observed migration frequencies are compared in Table 2 with the values predicted by the Poisson model. The model does not seem to fit the data, with the number of individuals making no moves or making four or more moves substantially underpredicted. There appears to be a systematic variation in migration frequency over and above the variation attributable by chance.
3 The fit of the model may be assessed by comparing the value of the sum of
[(Expected frequency-Observed frequency) 2/Expected frequency] with the c 2 distribution on 5 degrees of freedom (7 cells - 2 estimated coefficients). The critical value at the 5% significance level is 11.07. The calculated value is in fact 192.5, an order of magnitude higher.
4 The degree of model misspecification may be measured by the dispersion parameter, which is the ratio of the scaled deviance and the residual degrees of freedom.(648.5/346)=1.87). If the model were well specified, this ratio would be approximately 1.
5 One explanation for the poor fit of the model is that the assumption of a homogeneous population is not valid. Individuals may vary in their likelihood of migration; the assumption of a migration rate which depends only on time may be incorrect. Thus, it may be possible to improve the model specification by including explanatory variables which distinguish between individuals.

Next:Poisson model with explanatory variable

Home page

Contents

Previous