ï»¿ WPS5550
Policy Research Working Paper 5550
Using Repeated Cross-Sections
to Explore Movements in and out of Poverty
Hai-Anh Dang
Peter Lanjouw
Jill Luoto
David McKenzie
The World Bank
Development Research Group
Poverty and Inequality Team
and Finance and Private Sector Development Team
January 2011
Policy Research Working Paper 5550
Abstract
Movements in and out of poverty are of core interest to approaches to obtaining these bounds. They test how
both policymakers and economists. Yet the panel data well the method works on data sets for Vietnam and
needed to analyze such movements are rare. In this paper, Indonesia where we are able to compare our method
the authors build on the methodology used to construct to true panel estimates. The results are sufficiently
poverty maps to show how repeated cross-sections of encouraging to offer the prospect of some limited, basic,
household survey data can allow inferences to be made insights into mobility and poverty duration in settings
about movements in and out of poverty. They illustrate where historically it was judged that the data necessary
that the method permits the estimation of bounds on for such analysis were unavailable.
mobility, and provide non-parametric and parametric
This paper is a product of the Poverty and Inequality Team, and the Finance and Private Sector Development Team;
Development Research Group. It is part of a larger effort by the World Bank to provide open access to its research and
make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted
on the Web at http://econ.worldbank.org. The authors may be contacted at planjouw@worldbank.org and dmckenzie@
worldbank.org.
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Produced by the Research Support Team
Using Repeated Cross-Sections to Explore Movements into and out of
Povertyï€ª
Hai-Anh Dang, World Bank
Peter Lanjouw, World Bank
Jill Luoto, RAND Corporation
David McKenzie, World Bank, BREAD, CEPR and IZA
Keywords: Transitory and Chronic poverty; Synthetic panels; Mobility.
JEL Codes: O15, I32.
ï€ª
We are grateful to the editor, three anonymous referees, Chris Elbers, Roy van der Weide, and seminar participants
at Cornell, Georgetown, Minnesota, and the World Bank for useful comments. This paper represents the views of
the authors only and should not be taken to reflect those of the World Bank or any affiliated organization.
â€•But the whole picture of poverty is not contained in a snapshot income-distribution decile
graph. It says nothing about the vital concept of mobility: the potential for people to get out of a
lower decile â€“ and the speed at which they can do so.â€–
UK Prime Minister David Cameron, October 20101
1. Introduction
Income mobility is currently at the forefront of policy debates around the world. The
prolonged global recession has thrust renewed attention on the problem of chronic poverty, while
discussion of widening inequality (particularly driven by high incomes of the top 1%) has led to
debate about the extent to which opportunities to succeed are open to all.2 Policies to address
poverty will likely differ depending on whether poverty is transitory (in which case safety net
policies will likely be the focus) or chronic (in which case more activist policies designed to
remove poverty traps may be designed). However, despite the importance of mobility for policy,
in many countries, especially developing countries, there is a paucity of evidence on the duration
of poverty and on income mobility due to a lack of panel data.
To overcome the non-availability of panel data, there have been a number of studies,
starting with Deaton (1985), that develop pseudo-panels out of multiple rounds of cross-sectional
data. Compared to analysis using cross sections, pseudo-panels constructed on the basis of age
cohorts followed across multiple surveys have permitted rich investigations into the dynamics of
income and consumption over time (e.g., Deaton and Paxson , 1994; Banks, Blundell, and
Brugiavini, 2001; and Pencavel, 2007) and of cohort-level mobility (Antman and McKenzie,
2007). However, some of these methods rely on having many rounds of repeated cross-sections
(Bourguignon et al, 2004), and the use of cohort-means precludes the examination of income
mobility at a level more disaggregated than that of the cohort. As a result, such methods may be
of limited appeal to policy makers interested in the mobility of certain (disadvantaged)
population groups, or to economists concerned with mobility due to idiosyncratic shocks to
income or consumption.
1
Taken from a commentary â€•What you receive should depend on how you behaveâ€– in The Independent, October 10,
2010, http://www.independent.co.uk/opinion/commentators/david-cameron-what-you-receive-should-depend-on-
how-you-behave-2102576.html
2
In the U.S., for example, Alan Kruegerâ€˜s January 2012 address to the Center for American Progress focused
heavily on income mobility and was followed by substantial discussion in both national media and in economics
blogs. See http://www.whitehouse.gov/sites/default/files/krueger_cap_speech_final_remarks.pdf for the speech.
2
The purpose of this paper is to introduce and explore an alternative statistical
methodology for analyzing movements in and out of poverty based on two or more rounds of
cross-sectional data. The method is less data-demanding than many traditional pseudo-panel
studies, and importantly allows for investigation of income mobility within as well as between
cohorts.3 The approach builds on an â€•out-of-sampleâ€– imputation methodology described in
Elbers et al (2003) for small-area estimation of poverty (the development of â€•poverty mapsâ€–). A
model of consumption (or income) is estimated in the first round of cross-section data, using a
specification which includes only time-invariant covariates. Parameter estimates from this
model are then applied to the same time-invariant regressors in the second survey round to
provide an estimate of the (unobserved) first periodâ€˜s consumption or income for the individuals
surveyed in that second round. Analysis of mobility can then be based on the actual
consumption observed in the second round along with this estimate from the first round.
Although exact point estimates of poverty transitions and income mobility require
knowledge of the underlying autocorrelation structure of the income or consumption generating
process, we show that, under mild assumptions, one can derive upper and lower bounds on entry
into and exit from poverty. We provide two approaches to estimating these bounds. The first is a
non-parametric approach, which imposes no structure on the underlying error distribution. We
show that the width of the bounds provided by this approach depends on the extent to which
time-invariant and deterministic characteristics explain cross-sectional income or consumption.
However, in many cases, while the exact autocorrelation is unknown, evidence from other data
sources might be available, suggesting that the true autocorrelation lies within a much narrower
(and known) range than the extreme values of zero and one underpinning the non-parametric
bounds. We provide a parametric bounding approach that can be used in such cases, which
imposes more assumptions but permits a narrowing of the bounds relative to the non-parametric
case.
3
GÃ¼ell and Hu (2006) provide a GMM estimator for the probability of exiting unemployment that also permits
disaggregation to the individual level using multiple cross-sections. However, Guell and Huâ€˜s method is most
appropriate for duration analysis and can only be applied to two rounds of cross sections given two additional
conditions: i) availability of data on the duration of unemployment spells, and ii) the two cross sections must have
the same population mean and be independent of each other. In this paper our focus is on poverty mobility, and we
require simpler data and much less restrictive assumptions to derive lower and upper bounds on poverty mobility.
See also Gibson (2001) for a somewhat related literature on how panel data on a subset of individuals can be used to
infer chronic poverty for a larger sample, and Foster (2009) and Hojman and Kast (2009) for recent studies that
investigate poverty mobility using actual panel data.
3
To illustrate our methods and examine their performance in practice, we implement both
the non-parametric and the parametric bounding methods in two empirical settings: Vietnam and
Indonesia. Genuine panel data are available in these settings, and this allows us to validate our
method by sampling repeated cross-sections from the panel, constructing mobility estimates
using these cross-sections, and then comparing the results to those obtained using the actual
panel data. We find that the â€•trueâ€– estimate of the extent of mobility (as revealed by the actual
panel data) is generally sandwiched between our upper-bound and lower-bound assessments of
mobility. Our analysis reveals further that the width between the upper- and lower-bound
estimates of mobility is narrowed as the prediction models are more richly specified, as well as
with the addition of the parametric assumption. We thus believe our method may be readily
employed to study mobility for a wide variety of situations where only repeated cross sections
are available.
The remainder of the paper is structured as follows: Section 2 provides a theoretical
framework for obtaining upper and lower bounds on movements into and out of poverty.
Sections 3 and 4 describe our non-parametric and parametric estimation methods respectively.
Section 5 examines robustness to the choice of poverty line and provides an application to
mobility profiling. Section 6 concludes.
2. Theoretical Bounds for Movements In and Out of Poverty with Repeated Cross-
Sections
For ease of exposition we consider the case of two rounds of cross-sectional surveys,
denoted round 1 and round 2. We assume that both survey rounds are random samples of the
underlying population of interest, and each consist of a sample of N 1 and N2 households
respectively.
Let xi1 be a vector of characteristics of household i in survey round 1 which are observed
(for different households) in both the round 1 and round 2 surveys. This will include such time-
invariant characteristics as language, religion, and ethnicity, and if the identity of the household
head remains constant across rounds, will also include time-invariant characteristics of the
household head such as sex, education, place of birth, and parental education as well as
deterministic characteristics such as age. Importantly, xi1 can also include time-varying
4
characteristics of the household that can be easily recalled for round 1 in round 2. Thus variables
such as whether or not the household head is employed in round 1, and his or her occupation, as
well as their place of residence in round 1 could be included in xi1 if asked in round 2.4
Then for the population as a whole, the linear projection of round 1 consumption or
income, yi1, onto xi1 is given by:
(1)
And similarly, letting xi2 denote the set of household characteristics in round 2 that are observed
in both the round 1 and round 2 surveys, the linear projection of round 2 consumption or income,
yi2 onto xi2 is given by:
(2)
Let z1 and z2 denote the poverty line in period 1 and period 2 respectively. Then to
estimate the degree of mobility in and out of poverty we are interested in knowing, for example,
what fraction of households in the population is above the poverty line in round 2 after being
below the poverty line in round 1. That is, we are interested in estimating:
(3)
which represents the degree of movement out of poverty for households over the two periods.
However, the prime difficulty facing us with repeated cross-sections is that we do not know
and for the same households. Without imposing a lot of structure on the data generating
processes, one cannot point-identify the probability in (3). But it is possible to obtain bounds. To
derive these bounds, note that we can rewrite this probability as:
(4)
We see that this probability depends on the joint distribution of the two error terms
and , capturing the correlation of those parts of household consumption in the two periods
which are unexplained by the household characteristics xi1 and xi2. Intuitively, mobility will be
greater the less correlated are and ; household consumption in one period will be less
4
Moreover, if surveys ask about when individuals developed chronic illnesses, or became unemployed, or suffered
other such shocks which are correlated with poverty status, then these variables could also be included in x.
5
associated with that in the other period. One extreme case thus occurs when the two error terms
are completely independent of each other. Another extreme case occurs when these two error
terms are perfectly correlated.
To further operationalize the probability in (4), we make the following two assumptions.5
Assumption 1: The underlying population sampled is the same in survey round 1 and survey
round 2.
In the absence of actual panel data on household consumption, this assumption ensures
that we can use time-invariant household characteristics that are observed in both survey rounds
to obtain predicted household consumption. Given that the underlying population being sampled
in survey rounds 1 and 2 are the same, the time-invariant household characteristics in one survey
round would be the same as in the other round, thus providing the crucial linkage between
household consumption between the two periods. In other words, households in period 2 that
have similar characteristics to those of households in period 1 would have achieved the same
consumption levels in period 1 or vice versa.
Assumption 1 will not be satisfied if the underlying population changes through births,
deaths, or migration out of sample, which could happen if the two survey periods are particularly
far apart in time or as a result of major events, such as natural disasters or a sudden economic
crisis, affecting the whole economy between the survey rounds. Assumption 1 may also not be
satisfied due to survey-related technical issues such as changes in sampling methodology from
one round to the next.6
Assumption 2: The correlation of and is non-negative.
This assumption is to be expected in most applications using household survey data for at
least three reasons. First, if the error term contains a household fixed effect, then households
which have consumption higher than we would predict based on their x variables in round 1 will
5
In addition to these two assumptions, we also use the (popular) standard assumptions that household consumption
aggregates are consistently constructed and comparable over the two periods.
6
In practice one can carry out a number of checks to test whether this assumption appears to hold with the cross-
sectional data at hand by examining whether the observable time-invariant characteristics of a cohort change
significantly from one survey round to the next. McKenzie (2001) provides an illustration of this approach for
pseudo-panel analysis of Taiwanese households.
6
also have consumption higher than we would predict based on their x variables in round 2.
Second, if shocks to consumption or income (for example, finding or losing a job) have some
persistence, and consumption reacts to these income shocks, then consumption errors will also
exhibit positive autocorrelation.
And finally, while for particular households we might see some negative correlation in
incomes over time, the kind of factors leading to such a correlation are unlikely to apply to an
entire population at the same time. For example, a household which lacks access to credit may
cut expenditure in round 1 in order to pay for a wedding in round 2. For such a household we
would see a lower consumption than their x variables would predict in round 1, and higher
consumption than would be predicted for round 2. But this is unlikely to occur for the majority
of households at the same time. Indeed, we will show this using panel data from several
countries used in our analysis.
As in standard pseudo panel analysis these two assumptions will be best satisfied by
restricting attention to households headed by people aged, say, 25 to 55. Analysis of mobility
among households headed by those younger than 25 or older than 55 or 60 is more difficult since
at those ages households are often beginning to form, or starting to dissolve. If income can be
measured at the individual level, this may be less of a concern for individual income mobility
than for household consumption mobility.
Given these two assumptions, we propose the following two theorems that provide the lower
and upper bound estimates for poverty mobility. Since poverty immobility (i.e. households have
the same poverty status in both survey rounds) is the opposite of poverty mobility, two closely
related corollaries based on these two theorems provide the lower bound and upper bound of
poverty immobility.
Theorem 1
The upper bound estimates of poverty mobility are given by the probability in expression (4)
when the two error terms and are completely independent of each other, which implies
. Specifically, the upper bound estimates of poverty mobility are given by
(5)
for movements out of poverty, and
7
(6)
for movements into poverty; where and for yi2U the superscript 2 stands for
1
estimated round 1 consumption for households sampled in round 2, and U stands for the upper
bound estimates of poverty mobility.
Corollary 1.1
The biases for the upper bound estimates of poverty mobility in equations (5) and (6) above are
respectively given by
(7)
(8)
Corollary 1.2
The lower bound estimates of poverty immobility are given by
(9)
for households staying out of poverty in both rounds, and
(10)
for households staying in poverty in both rounds.
Proof
See Appendix 1.
Theorem 2
The lower bound estimates of poverty mobility are given by the probability in expression (4)
when the two error terms and are identical (equal to each other), which implies
. Specifically, the lower bound estimates of poverty mobility are given by
(11)
for movements out of poverty, and
(12)
8
for movements into poverty; where and for yi2 L the superscript 2 stands for
1
estimated round 1 consumption for households sampled in round 2, and L stands for the lower
bound estimates of poverty mobility .
Corollary 2.1
The biases for the lower bound estimates of poverty mobility in equations (11) and (12) above
are respectively given by
(13)
(14)
Corollary 2.2
The upper bound estimates of poverty immobility are given by
(15)
for households staying out of poverty in both rounds, and
(16)
for households staying in poverty in both rounds.
Proof
See Appendix 1.
The methods developed here aim to estimate the same level of movements into and out of
poverty that one would observe in the genuine panel. Of course some of the mobility in the
genuine panel data is spurious, arising from measurement error. There are several approaches in
the existing literature for ways to correct mobility measures for such measurement error (e.g.
Glewwe, 2010; Antman and McKenzie, 2007; Fields et al. 2007). The basic idea underlying all
of these approaches is to study the mobility of some underlying variableâ€”such as health, cohort
characteristics, or assetsâ€”which is analogous to studying only the mobility which comes from
the term and ignoring mobility which comes from Îµ.
While such an approach could be pursued here as well, it is not the purpose of our current
exercise, which is to determine whether one can use repeated cross-sections to estimate the same
level of mobility one sees in a panel, and whether the method is useful for showing which
9
characteristics are associated with more movements into and out of poverty. Note however that
our estimates will still remain valid bounds for the true degree of mobility even under many
types of measurement error, as stated in the theorem below.
Theorem 3
The lower bound and upper bound estimates of poverty mobility provided in Theorems 1 and 2
and Corollaries 1.2 and 2.2 are robust to classical measurement errors. The lower bound is also
robust to general forms of non-classical measurement error, while the upper bound will still
continue to be an upper bound in the presence of non-classical measurement error provided that
this non-classical error does not cause assumption 2 to be violated.
Proof
See Appendix 1.
3. Non-parametric bounds
The theorems and corollaries in the previous section provide the theoretical framework for us to
consider concrete procedures to estimate the lower and upper bounds of poverty mobility and
immobility. This framework also shows that assumptions about the joint distribution for the two
error terms are crucial for our estimates of poverty mobility, and there can be different
approaches depending on different assumptions about this distribution. We consider two
approaches to estimate the bounds on mobility: a non-parametric approach where we make no
assumption about this joint distribution and then, in the next section, a parametric approach
where we assume this joint distribution is bivariate normal. We start first with the non-
parametric approach.7
3.1 Non-parametric Bounds
Upper-bound estimates for poverty mobility (and lower-bound estimates for poverty
immobility)
We propose the following steps to obtain the quantities in (5), (6), (9) and (10)
7
If we consider together the estimation method (OLS) and the distribution of the error term, perhaps it is more
accurate to refer to this as a semi-parametric approach. However, we are using the terms â€•non-parametricâ€– and
â€•parametricâ€– to highlight our assumptions about the distribution for the error terms. Also note that the phrases
â€•upper boundâ€– and â€•lower boundâ€– pertain to their bounds on mobility, not to their bounds on levels of poverty.
10
Step 1: Using the data in survey round 1, estimate equation (1) and obtain the predicted
Ë†
coefficients ï?¢1 ' and predicted residuals ï?¥ i1 .
Ë†
Step 2: For each household in round 2, take a random draw with replacement from the empirical
Ë†
distribution of the predicted residuals ï?¥ i1 obtained in step 1 and denote it by ï?¥ i1 . Then using the
Ë† ~
Ë† Ë†
~
data in survey round 2, the predicted coefficients ï?¢1 ' , and the residual ï?¥ i1 , estimate, for each
household in round 2, its consumption level in round 1, as follows
Ë† Ë†
~
yi2U ï€½ ï?¢1 ' xi 2 ï€« ï?¥ i1
Ë†1 (17)
Ë†1
Step 3: Estimate the quantities in (5), (6), (9) and (10), using yi2U obtained from Step 2 above.
Step 4: Repeat steps 2 to 3 R times, and take the average of each quantity in (5), (6), (9) and (10)
over the R replications to obtain the upper bound estimates of poverty mobility (or immobility).
We use R= 500 in our simulations below.
Lower-bound estimates for poverty mobility (and upper-bound estimates for poverty
immobility)
To obtain the lower bound estimates of the movement into and out of poverty for (3), we take
the following steps
Step 1: Using the data in survey round 1, estimate equation (1) and obtain the predicted
Ë†
coefficients ï?¢1 ' . Then using the data in survey round 2, estimate equation (2) and obtain the
residuals ï?¥ i 2 .
Ë†
Ë†
Step 2: Then using the data in survey round 2, the predicted coefficients ï?¢1 ' , and the residual ï?¥ i 2 ,
Ë†
estimate the consumption level in round 1 for each household in round 2 as follows
Ë†
yi2 L ï€½ ï?¢1 ' xi 2 ï€« ï?¥ i 2
Ë†1 Ë† (18)
Ë†1
Step 3: Estimate the quantities in (11), (12), (15) and (16) using yi2 L obtained from Step 2 above.
11
A couple of remarks are in order about the above procedures. First, the bootstrapping of the
error terms for the upper bound estimates is based on the condition of independence for the two
error terms and as stated in Theorem 1. Second, unlike the upper bound estimates, the
procedure for obtaining the lower bound estimates does not require repeating steps 2 to 3 R times
since we are using each householdâ€˜s own predicted errors. And finally, we do not have to restrict
estimation of predicted household consumption to the data in the second survey round (Steps 2
above) but can also use the data in the first survey round since the following identity always
holds P( yi1 ï€¼ z1 and yi 2 ï€¾ z2 ) ï‚º P( yi 2 ï€¾ z2 and yi1 ï€¼ z1 ) .8
3.2. Sharpening the Non-parametric Bounds
From Corollary 1.1, we see that the bias for our upper bound estimate of the probability a
household is poor in the first period but non-poor in the second period is given by
. Other things being equal, this
probability will be smaller the greater is the variation in that can be explained by the set of
variables in the vector x, and the lower the variation left to be represented by the error terms
and . In particular, a weaker correlation between these error terms will tend to decrease the
second term in this bias. Similarly, Corollary 2.1 also indicates that a weaker correlation between
the error terms and will also tend to increase the second terms in (15) and (16) and thus
decrease the overall biases.
This is equivalent to obtaining a high R2 in the regression of on x. We can increase this R2
and narrow the bounds by including a host of time-invariant (or deterministic) household
characteristics. In addition, one can control for detailed geographic variables or region fixed
effects. Taken together, a combination of household and regional characteristics may control for
shocks which occur in particular regions or for people of particular characteristics, and may
allow one to span household fixed effects. We shall see how well this strategy works in our
empirical application in the next section.
3.3. Datasets
8
If one wants to get standard errors for these bounds, then a bootstrap approach can be used. This would involve
bootstrap resampling from the original cross-sections (taking account of survey weights) and then running the
method described above within each bootstrap sample.
12
To examine how well our method performs in practice we implement our procedure
using genuine panel data from Vietnam and Indonesia. Our two main data sets are the Vietnam
Household Living Standards Surveys (VHLSSs) and the Indonesian Family Life Surveys
(IFLSs). We use the VHLSSs in 2006 and 2008, which are nationally representative surveys
implemented by Vietnamâ€˜s General Statistical Office (GSO) with technical assistance from the
World Bank. The VHLSSs are similar to the LSMS-type (Living Standards Measurement
Survey) surveys supported by the World Bank in a number of developing countries and provide
detailed information on the schooling, health, employment, migration, and housing, as well as
household consumption and ownership of a variety of household durables for 9,189 households
across the country in each round. These surveys are widely used in poverty assessment by the
government and the donor community in Vietnam. One particular feature with these surveys is a
rotating panel module, which collects panel data for one half of each survey round between two
adjacent years. This combination of both cross-sectional data and panel data in one survey
provides a perfect setting for us to validate our method.
Our data for Indonesia come from the Indonesian Family Life Surveys that were fielded
by the RAND Corporation as part of their Labor and Population Program in collaboration with
UCLA and the University of Indonesia. We use the IFLS2 and IFLS3 rounds corresponding to
respectively, 1997 and 2000. The IFLS2 interviewed 7,500 households and the IFLS3 survey
interviewed 10,400. The IFLS surveys are remarkable in the extent to which efforts were made
to follow households over time. The IFLS2 and IFLS3 managed to resurvey 94.4 and 95.3%,
respectively, of the original 7224 households interviewed in 1993 for the IFLS1 round. As is the
case for the VHLSS, the IFLS surveys are multipurpose surveys that collect detailed information
on a range of different topics â€“ thereby permitting analysis of interrelated issues that single-
purpose surveys do not. Information on economic outcomes like income and labor market
outcomes can be combined with information on health outcomes, education and a whole host of
additional socioeconomic indictors. Finally, in 1997, the IFLS fielded, alongside the IFLS2
household survey, a community survey about respondentsâ€˜ communities and public and private
facilities. The analysis below draws on both household and community level information.
Since the IFLSs are panel surveys, we split the IFLS panels into two randomly drawn
sub-samples (each representing half of the total sample), and we do the same for the VHLSS
13
panel component.9 Call these sub-samples A and B respectively. Then we can use sub-sample A
in the first round and sub-sample B in the second round as two repeated cross-sections which we
then carry out our method on. We can then compare the mobility results obtained from using
sub-sample A to impute round 1 values for sub-sample B to the results we would get using the
genuine panel for sub-sample B. And we use panels with the same heads only for the genuine
panels.
For our basic analysis we use the national poverty line in Vietnam provided with the
VHLSSs (corresponding to D 2,559,850, and D 3,358,118 respectively for 2006 and 2008
(Glewwe, 2009)), and the Tornquist poverty line in the IFLS dataset (corresponding to Rp
86,128.1 in 2000 prices).10 We show later in the paper that our results are robust to the choice of
poverty line used.
3.4. Variable Choice
Our approach is built on a linear projection of consumption in round 1 onto individual,
household and community-level characteristics that are also present in the data for round 2. As
described in Elbers, Lanjouw and Leite (2009) in regard to poverty-mapping procedures, there is
no obvious theory to guide the specification of what is essentially a forecasting model.
However, certain diagnostics can be looked to for guidance. In general one would want to look
well beyond explanatory power (a higher R2 would tend to reduce the variance of the prediction
error) to consider also statistical significance of the parameter estimates (in order to reduce
model error and the resultant overstatement of mobility) and to pay attention as well to concerns
about over fitting. In the literature on poverty mapping, regressors have typically been drawn
from several broad classes of variables including demographic variables (household size, gender
and age profiles of households, etc.); human capital variables; labor market variables
(occupational profiles), access to basic services and infrastructure (electricity access, connection
to a piped water network, etc.); housing quality variables; ownership of durables; and community
and locality-level variables.
9
We only use the VHLSS panel component for non-parametric estimates to illustrate our method. For the
parametric estimation in the next section, we construct our estimates using the VHLSS cross section component and
then compare to the VHLSS panel component.
10
We thank Kathleen Beegle and Kristin Himelein for help with the IFLS data.
14
Central to the present application of this approach is the additional requirement that
regressors in these models be time invariant Obvious candidates are the ethnic, religious, or
social-group membership of the household head. Other time-invariant variables can be readily
constructed from the data, such as whether the household head was aged 15 or higher and
educated at the primary school level by a particular moment in time. When retrospective data
are collected, the range of time-invariant variables can be greatly expanded. For example, if both
the 1997 and 1992 surveys collect information on whether the household had a fridge in 1992,
this time-invariant variable can be used in the prediction models. Some retrospective variables,
such as place of residence at the time of the last survey, are reasonably common in cross-
sectional surveys, while other variables, such as sector of work, education level, and occupation
at the time of the past survey, could easily be collected retrospectively. Context will also
determine the choice of variables to use. If the main interest is on mobility in rural farming areas,
one could presumably ask retrospective questions about land and major livestock holdings, and
also condition on time-varying environmental variables like rainfall.
In our empirical applications below, we thus consider a hierarchy of six classes of
prediction models which progressively employ more and more data that is sometimes, but not
always, collected retrospectively. Since we have the actual panel data to work with, we can
â€•forceâ€– regressors in round 2 to be time-invariant by using the round 1 values of selected
variables. Clearly in a real-world application we would be dependent only on those variables
collected during the second round, and would be concerned about possible recall error. But for
the purpose of illustration here, we select variables we believe are likely to be recalled fairly
accurately, and which could be asked retrospectively.11
The six models are built up progressively as follows:
1. (Basic Model) We begin with a sparse model, including only variables that can be readily
judged as time-invariant. For example, we can include such regressors as the gender of
the head, age of the household head (defined in round 1 year), birthplace of the head
(rural/urban), whether the head ever attended primary school (or the headâ€˜s completed
11
In section 4 below, where we analyze the parametric variant of our approach, we wish to explore the scope for
narrowing bounds via the imposition of additional structure and assumptions. In doing so we confine our attention
to a basic model specification that can be readily estimated with currently available cross-section data.
15
years of schooling), the education level of the headâ€˜s parents, and the headâ€˜s religion and
ethnicity.
2. We then introduce locational dummies such as urban/rural, or regional, dummies to
measure where the household was living at the time of the first round survey. Most
multipurpose surveys with a migration module would collect the information needed to
allow these variables to be constructed, and even without a specific migration module, it
is common to ask where households were living five years ago.12
3. Next, â€•communityâ€– variables are added, which can be obtained from community modules
in most household surveys or perhaps population censuses. Once the retrospective
location is identified (as per model 2), the use of such variables depends only on the
availability of such auxiliary data, and not on further recall per se. In the case of
Indonesia, these come from the community-level survey from 1997 and are inserted into
both the IFLS2 and IFLS3 household surveys. For Vietnam, unfortunately the community
module only collects data on rural communes, which can reduce the estimation sample
size significantly. Thus we will use instead a household-level variable which indicates
household poverty status as classified by the government in the first survey round.
4. We then add variables describing a household headâ€˜s sector of work. At this point we
clearly start to lean more heavily on our ability to explicitly insert round 1 values of these
variables into the round 2 data. However, information on these variables could probably
be easily collected on a retrospective basis. Indeed retrospective work histories have been
collected in a number of labor surveys.
5. Further demographic variables that we force to be time-invariant are then added - such as
household size and the number of children aged under 5. These would possibly be more
difficult to collect retrospectively if household composition is very fluid, especially if the
time interval between survey rounds increased. Nonetheless, it is not uncommon for
surveys with a migration focus to ask about all individuals who have lived in the
household in the past five years, and our impression is that households in many societies
are able to recall such information relatively accurately.
12
For example, Smith and Thomas (2003) find that Malaysian households can accurately recall migration histories,
particularly for moves which are not very local or very short in duration.
16
6. (Full model) Finally, we include a number of variables describing a householdâ€˜s assets
and housing quality at the time of round 1 - such as ownership of specific consumer
durables like a TV and motorcycle, and the type of roofing and flooring material the
household had. Including these variables increases the predictive power of the
consumption models significantly. Such variables are not commonly collected in
retrospective fashion in large multipurpose surveys, but they have been collected in some
specific survey contexts.13
We estimate these models for log consumption per capita. We only use levels of the variables
indicated above, but one could additionally enrich the models by including interactions (e.g.
allowing the predictive impact of education for consumption to vary with region, sex of
household head, etc.). The precise regression results used for the upper and lower bound
estimates for model 1 (the â€•basic modelâ€–) and model 6 (the â€•full modelâ€–) for household
consumption in the first period are presented in Tables 2.1a and 2.1b in Appendix 2.
3.5. Estimation Results
We turn, now, to one of the central questions in our study, namely whether analysis of
duration of poverty, and mobility in and out of poverty, based on our synthetic panel data, can
deliver results approximating the findings one would obtain with genuine panel data. 14 Table 1
presents our results. As we expected, the lower bound estimates underestimate mobility
(understating movements into and out of poverty and overstating the extent to which people
remain poor or remain non-poor) and the upper bound estimates overestimate mobility. The
â€•truthâ€– (true rate) tends to lie about midway between these bounds. We find thus that our
approach does indeed present bounds within which the â€•truthâ€– can be observed.15
13
For example, de Mel, McKenzie and Woodruff (2009) ask Sri Lankan business owners and wage workers
questions on whether their family owned a bicycle, radio, telephone, or vehicle when they were aged 12, and on the
floor type their household had then. Individuals were able to recall such information relatively easily, although
further work is needed to test how accurate such recall is. Berney and Blane (1997) offer some encouraging findings
from a small sample in the U.K., showing high accuracy recall of toilet facilities, water facilities, and number of
children in the household over a 50-year recall period.
14
We refer to â€•synthetic panelsâ€– in our approach in an effort to distinguish our household-level analysis from the
broader literature that works with cohort-means.
15
Estimation is very similar when we obtain predicted household consumption on data from the first survey round
instead of the second survey round. Thus for both the non-parametric and parametric estimates (in the next section),
we only show results obtained on data from the second survey round.
17
What is particularly encouraging is that the width of these bounds is fairly reasonable.
For example, using the full model, our bounds would suggest that between 3 and 10 percent of
households in Indonesia, and between 3 and 7 percent of households in Vietnam moved out of
poverty between the two rounds. Analysis based on the genuine panel data suggests that the true
rates are well captured in these ranges, even after we adjust for one to two standard errors to
these rates.
The results also illustrate the importance of being able to fit more detailed models to
predict consumption, with generally narrower bounds for the models with richer specifications
than the basic modelâ€”which is to be expected given our discussion in the previous Section. For
example, the bounds for the proportion of the population falling into poverty in Vietnam between
2006 and 2008 are (0.5-8.6) using the basic model, (2.8-8.5) using model 2, (3.0-7.8) using
model 3, (2.3-7.2) using model 5, and (2.1-6.8) using the full model. Corresponding to these
narrower bounds is respectively a steady increase in R2 of 0.33, 0.49, 0.55, 0.60, and 0.71 and a
similar constant decrease in the correlation coefficient (which is always positive and consistent
with our Assumption 2).
In both countries it is the inclusion of locational variables to get to model 2, retrospective
demographic variables to get to model 5, and especially the inclusion of the retrospective
household asset variables to get to the full model that most increase the share of variation
explained by the regressors and the greatest reduction in the size of the bounds. Efforts to collect
retrospective data so as to be able to enrich the model specification thus do appear to be
important.16 The basic model has less predictive power, leading to wider intervals.
4. Sharpening the Bounds Further through a Parametric Method
The non-parametric method introduced and explored above has the advantage of requiring
few assumptions to obtain bounds on the degree of mobility and producing fairly encouraging
results. However, while the rich sets of regressors as used in the estimates in Table 1 may offer
some directions on future survey designs (as well as a good illustration of what is feasible with
16
This accords well with experience of applying the Elbers et al. (2003) method for small-area estimation purposes
to poverty mapping. In those applications the methodology pursued most closely resembles the â€•upper boundâ€–,
â€•fullâ€–, approach here, and it is generally found that predicted poverty rates (calculated in the population census)
closely track survey estimates at the broad-stratum level (see Demombynes et al. 2004).
18
our method), these may not currently be available for most countries. Without such a full set of
variables, the bounds provided by the basic models may be too wide to be of use for practical
purposes.
We thus move from this â€•idealâ€– setting to the rather more prosaic real-world one where only
a subset of the above-considered regressors exists. We explore a parametric variant to our basic
approach and impose some structure on the error terms in order to sharpen our bounds on
mobility. We work with only with the basic model specification (i.e., Model 1) introduced
above, including, in addition one dummy variable indicating urban or rural area of residence (and
also show the non-parametric estimates for this specification).We now also estimate our models
using only the cross-sectional components of the survey data, and compare our estimates of
mobility against the â€•trueâ€– estimates calculated from the panel components.
This model thus puts modest demands on the data and would likely be applicable in most
household surveys. We show that by introducing a distributional assumption on the error terms,
and additional information on the likely plausible range of autocorrelation in these error terms,
we can produce narrower bounds on mobility. We start with the following additional assumption.
Assumption 3: and have a bivariate normal distribution with correlation coefficient Ï?
and standard deviations and respectively.
Log-normality is a reasonable and often used approximation for the distribution of income or
consumption, so this condition may hold approximately in practice and can be checked, as will
be illustrated in our empirical section.
4.1. Parametric Estimation Framework
Given Assumptions 1 and 3, it is straightforward to see that the percentage of households that
are poor in the first period but nonpoor in the second period P( yi1 ï€¼ z1 and yi 2 ï€¾ z2 ) can be
estimated by
P E ( yi1 ï€¼ z1 and yi 2 ï€¾ z 2 ) ï€½ P( ï?¢1 ' xi 2 ï€« ï?¥ i1 ï€¼ z1 and ï?¢ 2 ' xi 2 ï€« ï?¥ i 2 ï€¾ z 2 )
ïƒ¦ z ï€ ï?¢1 ' xi 2 z 2 ï€ ï?¢ 2 ' xi 2 ïƒ¶ (19)
ï€½ ï?†2 ïƒ§ 1 ,ï€ ,ï€ ï?² ïƒ·
ïƒ§ ï?³ ï?¥1 ï?³ ï?¥2 ïƒ·
ïƒ¨ ïƒ¸
19
where ï?† 2 ï€¨.ï€© stands for the bivariate normal cumulative distribution function (cdf) ) (and ï?¦2 ï€¨.ï€©
stands for the bivariate normal probability density function (pdf)).
ï‚¶ï?† 2 ï€¨x, y, ï?² ï€©
Since we know that for any x, y, and Ï?, ï€½ ï?¦2 ï€¨x, y, ï?² ï€© ï€¾ 0 (Sungur, 1990), equation
ï‚¶ï?²
(19) indicates that the key difference between a householdâ€˜s true consumption level and its lower
bound and upper estimates of mobility lies with the correlation term ï?² . Since ï?² is bounded by
the interval [0, 1] (Assumption 2), and the correlation term in equation (19) above has a negative
sign ( ï€ ï?² ), a lower value of ï?² means a higher probability of entering/ exiting poverty (i.e., a
higher degree of mobility or lower degree of immobility) in the second period and vice versa.
In fact, the non-parametric lower bound and upper bound estimates of poverty mobility
correspond to assuming ï?² being equal to its maximum value (1) and minimum value (0)
respectively.17 However, as was noted in our discussion of Table 1, the true value of ï?² in all
likelihood lies somewhere in between these two values of 0 and 1. If we can have a better
estimate of ï?² , we can narrow the gap between these lower bound and upper bound estimates of
poverty mobility. Thus we can tighten Assumption 2 as follows.
Assumption 2â€™: where is the smallest hypothesized value of and the highest
hypothesized value, with
In searching for the range of appropriate values for ï?² , there seem to be two options
available: i) we can look at actual panel data in previous time periods from the same country (or
for sub-samples of the data) or, ii) we can consider actual panel data in (say, economically or
geographically) similar settings elsewhere. We will pursue this second option below and
calculate a range of different values for ï?² from a similar model specification estimated in a
number of different countries for which panel data exist.
4.2. Parametric Estimation Procedures
17
In particular, when ï?² ï€½ 0 or ï?² ï€½ 1 , the parametric analogues of the upper and lower bound estimates of poverty
mobility in (5), (6), (11) and (12) are obtained by replacing the general probability notation â€•P(.)â€– with the normal
cdf ï?†ï€¨.ï€© .
20
Upper-bound estimates for poverty mobility (and lower-bound estimates for poverty
immobility)
We propose the following steps to obtain the quantities in (5), (6), (9) and (10)
Step 1: Using the data in survey round 1, estimate equation (1) and obtain the predicted
Ë†
coefficients ï?¢1 ' , and the predicted standard error ï?³ ï?¥1 for the error term ï?¥ i1 . Using the data in
Ë†
Ë†
survey round 2, estimate equation (2) and obtain similar parameters ï?¢ 2 ' and ï?³ ï?¥ 2 .
Ë†
Step 2: For each household in round 2, calculate the quantities in (5), (6), (9) and (10) as follows
using the smallest hypothesized value of ,
ïƒ¦ z ï€ ï?¢1 ' xi 2 z 2 ï€ ï?¢ 2 ' xi 2
Ë† Ë† ïƒ¶
P 2U ( yi1 ï€¼ z1 and yi 2 ï€¼ z 2 ) ï€½ ï?† 2 ïƒ§ 1
Ë† , , ï?²S ïƒ· (20)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
ïƒ¦ z ï€ ï?¢1 ' xi 2 z 2 ï€ ï?¢ 2 ' xi 2
Ë† Ë† ïƒ¶
P 2U ( yi1 ï€¼ z1 and yi 2 ï€¾ z 2 ) ï€½ ï?† 2 ïƒ§ 1
Ë† ,ï€ ,ï€ ï?² S ïƒ· (21)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
ïƒ¦ z ï€ ï?¢1 ' xi 2 z 2 ï€ ï?¢ 2 ' xi 2
Ë† Ë† ïƒ¶
P 2U ( yi1 ï€¾ z1 and yi 2 ï€¼ z 2 ) ï€½ ï?† 2 ïƒ§ ï€ 1
Ë† , ,ï€ ï?² S ïƒ· (22)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
ïƒ¦ z ï€ ï?¢1 ' xi 2 z 2 ï€ ï?¢ 2 ' xi 2
Ë† Ë† ïƒ¶
P 2U ( yi1 ï€¾ z1 and yi 2 ï€¾ z 2 ) ï€½ ï?† 2 ïƒ§ ï€ 1
Ë† ,ï€ , ï?²S ïƒ· (23)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
Lower-bound estimates for poverty mobility (and upper-bound estimates for poverty
immobility)
Lower-bound estimates of poverty mobility (and upper-bound estimates for poverty
immobility) can likewise be obtained by using the same steps with in place of .
Note that in the special case that the true value of is somehow known, the bounds collapse
to a point estimate. It is not unreasonable to think of possible scenarios whereâ€”say, to save
costsâ€”small but representative panel surveys were fielded and estimated from such surveys
could be combined with cross sectional surveys to estimate poverty transitions in the larger
datasets.
21
As with the non-parametric case, it should be noted that we obtain the predicted parameters
from both survey rounds and then calculate the poverty dynamics on data from the second
survey round ( xi 2 ), but we can also first obtain the predicted parameters from both survey
rounds and then calculate the poverty dynamics on data from the first survey round ( xi1 ). The
two approaches should give us the same results,18 since the same identity holds as for the non-
parametric estimation.
4.3. Parametric Estimation Results
Normality Assumptions and determining Ï?
Since the key assumption required for our parametric approach is normality of the error terms in
the regressions of household consumption on household (time-invariant) characteristics, we start
off by plotting for each country and year the distribution for the estimated error terms ( ï?¥ i )
against the normal distribution. A casual visual inspection indicates that the former (dotted line)
closely resembles the latter (solid line) in each year (Appendix 2, Figure 2.1), although the
graphs for Vietnam look somewhat better than those for Indonesia. However, formal multivariate
normality tests (Doornik and Hansen, 2008) reject the assumption of normality distribution
(univariate or bivariate) for the error terms in both countries. Despite this rejection we will
maintain the assumption below, and thereby illustrate the performance of our parametric
bounding methods in a typical practical situation where the underlying distributional assumption
may not hold precisely.
18
However, this variant approach results in changes to the bivariate probability formulas to calculate the poverty
dynamics probabilities in equations (20)- (23), which are given below
ïƒ¦ z ï€ ï?¢1 ' xi1 z 2 ï€ ï?¢ 2 ' xi1 ïƒ¶
Ë† Ë†
P 2U ( yi 2 ï€¼ z 2 and yi1 ï€¼ z1 ) ï€½ ï?† 2 ïƒ§ 1
Ë† , ,ï?²ïƒ· (20â€˜)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
ïƒ¦ z ï€ ï?¢1 ' xi1 z 2 ï€ ï?¢ 2 ' xi1
Ë† Ë† ïƒ¶
P 2U ( yi 2 ï€¾ z 2 and yi1 ï€¼ z1 ) ï€½ ï?† 2 ïƒ§ 1
Ë† ,ï€ ,ï€ ï?² ïƒ· (21â€˜)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
ïƒ¦ z ï€ ï?¢1 ' xi1 z 2 ï€ ï?¢ 2 ' xi1
Ë† Ë† ïƒ¶
P 2U ( yi 2 ï€¼ z 2 and yi1 ï€¾ z1 ) ï€½ ï?† 2 ïƒ§ ï€ 1
Ë† , ,ï€ ï?² ïƒ· (22â€˜)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
ïƒ¦ z ï€ ï?¢1 ' xi1 z 2 ï€ ï?¢ 2 ' xi1 ïƒ¶
Ë† Ë†
P 2U ( yi 2 ï€¾ z 2 and yi1 ï€¾ z1 ) ï€½ ï?† 2 ïƒ§ ï€ 1
Ë† ,ï€ ,ï?²ïƒ· (23â€˜)
ïƒ§ ï?³ ï?¥1
Ë† ï?³ ï?¥2
Ë† ïƒ·
ïƒ¨ ïƒ¸
where is set to equal and respectively for the upper bound and lower bound estimates for poverty mobility.
22
We calculate different values for ï?² using true panel data from several developing countries:
Bosnia- Herzegovina, Indonesia, Lao PDR, Nepal, Peru, and Vietnam. Our estimates are
provided in Table 2.19. Clearly, this list is far from being exhaustiveâ€”and we expect future
research will build on thisâ€”but this sample of countries spans different regions and income
levels at different points in time over the past decade. For these estimates, we use model
specifications which are as similar as permissible by the data available to the basic model
employed above for the non-parametric estimates plus a dummy variable indicating area of
residence (urban/ rural). These are also the same model specifications we use for predictions
using the cross sectional data.
The estimates in Table 2 show that Ï? ranges from 0.39 (for Nepal during 1995-2004) to 0.66
(for Vietnam during 2004-2006) which is arguably a rather tight range compared to its
theoretical range of [0, 1]. 20 However, to be on the safe side, we will widen this range a bit more
and use the two pairs of values of (0.2, 0.8) and (0.3, 0.7) for our subsequent bound estimates.
Lower and Upper Bound Estimates
The lower bounds and upper bounds of poverty mobility for Vietnam and Indonesia are
further examined in Table 3. Our bound estimates are considered in three model specifications:
Specification 1 provides the most conservative bounds where Ï? are respectively set to 1 and 0,
and Specifications 2 and 3 provide less conservative bounds where Ï? are respectively assumed to
be equal to [0.8, 0.2] and [0.7, 0.3]. Clearly, the estimates from Specification 1 would be the
parametric equivalence of our previous non-parametric estimatesâ€”which are also shown for
comparison under the column â€•Non-parametric boundâ€–â€”but we will focus here on the
parametric estimates for interpretation. The bound estimates are expected to be sequentially
tighter for Specifications 1, 2 and 3; however, this naturally comes with a trade-off since the
tighter the bounds, the higher the chance that these bounds do not encompass the true rates.
19
The data are from Bosnia- Herzegovina during 2001-2004 (Demirguc-Kunt, Klapper and Panos, 2009), Lao PDR
during 2002-2007 (Lao Department of Statistics, 2009), Nepal during 1995-2004 (Nepalâ€˜s Central Bureau of
Statistics, 2004), and Peru during 2004-2006 (Peruvian Statistics Bureauâ€”INEI). These countriesâ€˜ household
surveys are similar to the LSMSs and thus can provide a relevant and comparable range of values for this correlation
coefficient. In addition we also employ the 2004 VLHSS.
20
These positive values for Ï? confirm again the validity of our Assumptions 2 and 2â€˜.
23
Table 3 shows that the true poverty dynamic rates obtained from the panel data are well
within the lower and upper bounds respectively provided by Specification 1, which are very
similar to those obtained by the non-parametric method. Notably, except for those remaining
non-poor in both periods, these true poverty rates are also bounded by the less conservative
estimates from Specification 2, which shrink the intervals between the lower and upper bound in
Specification 1 by around half for both countries. For example, the proportion of households who
were poor in 2006 but nonpoor in 2008 for Vietnam is 5.7 percent, which lies between the less
conservative lower and upper bound estimates of [4.3, 8.5] under Specification 2. This interval
width of 4.2 percent is half that of the most conservative bounds under Specification 1, which
has interval [0.4, 9.4].
As expected, estimates under Specification 3 provide even a tighter range, but these bounds
now do not contain the true rates not only for those remaining nonpoor in both periods, but also
those falling into poverty in the second period for Vietnam and those remaining poor in both
periods for Indonesia. The silver lining, however, is that the differences between the imprecise
bounds and the true rates range from 0.3 to 0.9 percentage points (which are roughly 5 to 20
percent in relative terms), except for the estimates for those who remained non-poor in both
periods. Even in these worst cases, the order of magnitude for the miscalculation only amounts to
around 1 percent of the true rate for Vietnam (e.g., (82.3- 81.1)/ 82.3= 0.014) and 4 percent of
the true rate for Indonesia. Moreover, the width of the intervals obtained is now typically less
than one third of the corresponding intervals offered by Specification 1.21
5. Alternative Poverty Lines and Mobility Profiles
We examine in this section robustness to the choice of poverty line, and an extension of our
analysis to subpopulation groups.
5.1. Robustness to Choice of Poverty Line
The preceding analysis has all been based on one particular poverty line. The question
then arises as to whether the approach described here is also successful in bounding true mobility
21
The estimates in Table 3 are obtained by applying the predicted coefficients and error terms from both survey
rounds to data in the second survey round. Results are similar when we replicate these results using data in the first
survey round. Results available on request.
24
when alternative poverty lines are considered. From the proofs offered in Appendix 1, there is no
particular reason this should not be the case. However, as an empirical robustness check on the
estimation, we consider different poverty lines. A related question is whether the tightness with
which our bounds â€•sandwichâ€– the truth is constant for different values of the poverty line. We
investigate these questions by calculating upper and lower bounds on mobility, as well as the
truth, for the set of poverty lines spanning the range of possible base year poverty rates from 0 to
100 percent using the non-parametric method. Figure 2 illustrate our results in terms of the
fraction of the population who escape poverty for Indonesia.22
The IFLS â€•trueâ€– panel data indicate that the share of the population able to escape
poverty is low when the base year poverty line (and hence aggregate poverty) are sufficiently
low (Figure 1). As the poverty line increases in value, a larger share of the base year population
is considered poor and the percent that escapes poverty also rises. As the poverty line continues
to rise an increasing fraction of the base year population is counted as poor and eventually the
share of that underlying population that manages to escape poverty starts to decline. When the
line is sufficiently high the whole population is poor and remains poor. Figure 1 shows that the
inverted U-curve pattern traced out by the IFLS panel data is tracked fairly closely by our lower
and upper bound synthetic panel estimates of mobility out of poverty. Allowing for some
overlap and crossing attributable to statistical uncertainty, the bounds do â€•sandwichâ€– the truth
over the full range of possible poverty lines. Figure 1 also illustrates that the gap between the
upper and lower bound estimates is at its widest when around half of the base-year population is
considered poor, and also the largest share of the population is able to escape poverty. At more
extreme poverty lines, the bounds are much closer together, pointing also to much lower rates of
mobility out of poverty.
Other figures considering poverty immobility (not shown) also provide similar results. In
sum, our approach is found to work well for the full possible range of poverty lines that might be
specified, and we find that our bounds are, indeed, upper and lower bounds to the â€•truthâ€–
irrespective of where the poverty line is drawn.
5.2. Poverty Transitions Among Population Sub-Groups
22
Similar results for Vietnam are available upon request.
25
While our proposed bounds appear to work well for the whole population, it is of interest to
investigate whether the same is true for smaller population groups for several reasons. First, in
designing effective social safety nets, policy makers often focus on smaller but more
disadvantaged groups, rather than the whole population. This is especially the case in developing
countries where due to resources constraints, allocations must be prioritized. Second, due to cost
and logistical considerations sample sizes of true panel data are often fairly small, and this limits
their applicability to the assessment of mobility across small population groups. In cases where
the sample sizes of panel data are too small, these data may offer either imprecise or even
unreliable estimates due to large standard errors or the non-representativeness of the data
themselves. One of the advantages of the approach considered here is that our synthetic panels
are based on cross-sectional data which often comprise far larger samples; if the samples of our
synthetic panels are large enough, estimates based on these synthetic panels may better represent
the target population.23
We estimate and plot the proposed parametric bounds (using Specifications 1 and 2, Table 3)
against the true poverty dynamic rates for sub-groups of the population in Vietnam categorized
by ethnicity (i.e., ethnic minority groups), female-headed households, education achievement
(i.e., primary education or higher, lower secondary education or higher), and residence areas
(i.e., urban households or regions the household live in) in Figures 2 to 5. Clearly, these
categorizations can overlap but they can provide a first cut at profiling poverty mobility for
different groups. Except for a few cases (e.g., households living in the North Central in Figure 2
and Figure 3, in the Mekong Delta, North Central or Southeast regions in Figure 5), the true rates
lie within the less conservative bounds. Again, for these exceptional cases where the bounds are
off, the differences do not appear to be large either.
These graphs also indicate that ethnic minority groups are the group most vulnerable to
chronic poverty (Figure 2) and have very high mobility both into and out of poverty (Figures 3
and 4).24 The Northwest group has similar patterns with ethnic minority groups since the
23
It is a well-known fact that while panel data may be representative of the whole population, they may not be
representative of all sub-population groups. For an (extreme) example, most panel data can perhaps provide good
estimates of income dynamics for the population that is literate, but may not be able to provide reliable estimates for
the population that has a Ph.D. degree.
24
See Dang (forthcoming) for a more detailed discussion of the welfare for ethnic groups in recent years in
Vietnam.
26
majority of the population in this region (76%) belong to ethnic minority groups.25 On the other
hand, households living in the urban area or households with their heads having a lower
secondary education or higher appear to be better off than most other groups in the country.
Again, these evaluations of our bounds are only predicated on the assumptions that these
small but true panel data are representative of the target population; otherwise, we may simply
use estimates from the synthetic panels because of their larger sample sizes and supposedly
better representativeness.
6. Conclusions and Future Directions
Genuine panel data are still rare in the developing world, and when they are available, the
samples are often relatively small, with limited or infrequent duration, and in some cases, occur
with significant attrition. This has limited the feasibility of constructing even the most simple
descriptions of movements in and out of poverty for most countries. Yet policymakers and
researchers do care about such movements, and most countries do field repeated cross-sectional
surveys of income or consumption on a reasonably regular basis. In this paper we have
developed a method for using existing cross-section data to provide some bounds on the extent
of movements into and out of poverty, and results from both Indonesia and Vietnam suggest
these bounds can be made narrow enough in practice to make the estimates useful.26
The success of the method depends on either how well one can predict the dependent
variable of interest (for the non-parametric approach) or how well we can capture the range of
autocorrelation for the error terms (for the parametric approach). For the former in the case of
consumption or income dynamics, we have found that our accuracy in doing this, and the
resulting width of the bounds for mobility, is significantly better when we are able to use
retrospective information on the demographic composition of the household, the ownership of
consumer durables and basic housing materials. Such variables are typically collected only
concurrently, and not retrospectively, in most household surveys. It could also be promising to
ask questions on when certain shocks such as development of chronic illness or death of a spouse
25
Authorsâ€˜ calculation from the 2008 VHLSS.
26
Preliminary evidence to support this can be seen by new efforts underway to use the methodology developed in
this paper to systematically examine poverty dynamics in a number of Latin American countries. This work is being
carried out by the World Bankâ€˜s Latin American and the Caribbean office, not the authors of this study.
27
occur, since such variables might also help predict poverty status. Since it is certainly much less
costly to collect this information than it is to field panel surveys, our results suggest it might be
worth experimenting with the inclusion of such questions in some upcoming nationally
representative surveys in order to be able to provide basic estimates of poverty transitions.
While better predicted household consumption would clearly improve parametric estimates
as well, for the latter, we note that the empirically relevant ranges for the correlation term Ï?
would likely vary for different welfare outcomes (those for, say, household consumption can
clearly differ from those for employment). Future research could thus focus on extending the list
of empirically estimated correlation terms by looking at panel data from different countries, as
well as creating a similar list for other welfare outcomes. These typologies of the range of
autocorrelation for the error terms could then be used to provide estimates for countries with
similar settings. Another promising direction is to collect data on a smaller subpanel (i.e., for
cost savings) and combine the estimated correlation terms from this subpanel with the larger
sample-sized cross sections to estimate poverty mobility.
References
Antman, Francisca and David McKenzie (2007) â€•Earnings Mobility and Measurement Error: A
Synthetic panel Approachâ€–, Economic Development and Cultural Change 56(1): 125-162.
Banks, James, Richard Blundell, and Agar Brugiavini. (2001). â€•Risk Pooling, Precautionary Saving
and Consumption Growthâ€–. Review of Economic Studies, 68(4): 757-779.
Berney, L.R. and D.B. Blane (1997) â€•Collecting Retrospective Data: Accuracy of recall after 50
years judged against historical recordsâ€–, Social Science and Medicine 45(10): 1519-25.
Casella, George and Roger L. Berger. (2002). Statistical Inference, 2nd Edition. California:
Duxbury Press.
Dang, Hai-Anh. (forthcoming). â€•Vietnam: A Widening Poverty Gap for Ethnic Minoritiesâ€–, in
Gillette Hall and Harry Patrinos. (Eds.) â€•Indigenous Peoples, Poverty and Developmentâ€–.
Cambridge University Press.
Deaton, Angus (1985) â€•Panel Data from Time Series of Cross-Sectionsâ€–, Journal of Econometrics
30: 109-216.
Deaton, Angus and Christina Paxson. (1994). â€•Intertemporal Choice and Inequalityâ€–. Journal of
Political Economy, 102(3): 437- 467.
De Mel, Suresh, David McKenzie, and Christopher Woodruff (2010) â€•Who are the microenterprise
owners? Evidence from Sri Lanka on Tokman v. de Sotoâ€–, pp.63-87 in Joshua Lerner and
Antoinette Schoar (eds.) International Differences in Entrepreneurship. NBER, Cambridge,
MA.
28
Demirguc-Kunt, Asli, Leora F. Klapper, and Georgios A. Panos. (2009). â€•Entrepreneurship in Post-
Conflict Transition: The Role of Informality and Access to Financeâ€–. Policy Research Working
Paper 4935, DECRG, The World Bank.
Demombynes, G., Elbers, C., Lanjouw, J., Lanjouw, P., Mistiaen, J. and Ozler, B. (2004)
â€—Producing a Better Geographic Profile of Poverty: Methodology and Evidence from Three
Developing Countriesâ€˜. In Shorrocks, A. and van der Hoeven, R. (eds) Growth, Inequality and
Poverty (Oxford University Press).
Elbers, C., Lanjouw, J.O, and Lanjouw, P. (2002) â€•Micro-Level Estimation of Welfareâ€– Policy
Research Working Paper 2911, DECRG, The World Bank.
Elbers, C. Lanjouw, J.O. and Lanjouw, P. (2003) â€•Micro-level Estimation of Poverty and Inequalityâ€–
Econometrica, 71(1): 355-364.Elbers, C. Lanjouw, P. and Leite, P. (2010) â€—Brazil Within Brazil:
Testing the Poverty Map Methodology in Minas Geraisâ€˜, mimeo, DECRG, the World Bank.
Fields, Gary, Robert Duval-HernÃ¡ndez, Samuel Freije RodrÃguez, and MarÃa Laura SÃ¡nchez Puerta.
(2007). â€•Earnings Mobility in Argentina, Mexico, and Venezuela: Testing the Divergence of
Earnings and the Symmetry of Mobility Hypotheses.â€– Mimeo. School of Industrial and Labor
Relations, Cornell University.
Foster, James E. (2009) â€•A Class of Chronic Poverty Measuresâ€–, pp.59-76 in Tony Addison, David
Hulme, and Ravi Kanbur. (eds.) Poverty Dynamics: Interdisciplinary Perspectives. Oxford
University Press: New York.
Gibson, John (2001) â€•Measuring Chronic Poverty Without a Panelâ€–, Journal of Development
Economics 65(2): 243-66.
Glewwe, Paul (2009). â€•Mission Report for Trip to Vietnam June 5-16, 2009â€–. Reported submitted
to the World Bank.
___. (2010). â€•How Much of Observed Mobility is Measurement Error? IV Methods to Reduce
Measurement Error Bias, with an Application to Vietnamâ€–, Mimeo. University of Minnesota.
G ell, Maia and Luojia Hu. (2006). â€•Estimating the Probability of Leaving Unemployment Using
Uncompleted Spells from Repeated Cross-Section Dataâ€–. Journal of Econometrics, 133: 307â€“
341.
Hojman, Daniel and Felipe Kast. (2009). â€•On the Measurement of Poverty Dynamicsâ€–, Working
Paper Series RWP09-035, John F. Kennedy School of Government, Harvard University.
McKenzie, David (2001) â€•Consumption Growth in a Booming Economy: Taiwan 1976-96â€–, Yale
University Economic Growth Center Discussion Paper no. 823.
Pencavel, John. (2007). â€•A Life Cycle Perspective on Changes in Earnings Inequality among
Married Men and Womenâ€–. Review of Economics and Statistics, 88(2): 232-242.
Smith, James P. and Duncan Thomas (2003) â€•Remembrance of Things Past: Test-retest reliability
of retrospective migration historiesâ€–, Journal of the Royal Statistical Society Series A, 166(1):
23-49.
Sungur, Engin A. (1990). â€•Dependence Information in Parameterized Copulasâ€–. Communications in
Statistics- Simulation and Computation, 19: 4, 1339 â€” 1360.
Verbeek, Marno (2008) â€•Synthetic panels and repeated cross-sectionsâ€–, pp.369-383 in L. Matyas
and P. Sevestre (eds.) The Econometrics of Panel Data. Springer-Verlag: Berlin.
29
Table 1: Poverty Dynamics from Synthetic Panel Data and Actual Panel Data for Indonesia and Vietnam
Poverty status Non-parametric lower bound Non-parametric upper bound
Country Truth
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 6 Model 5 Model 4 Model 3 Model 2 Model 1
Poor, Poor 12.8 12.1 11.9 11.1 11.8 11.7 5.9 4.2 3.6 3.0 3.0 2.9 2.9
(0.4)
Poor, Nonpoor 1.2 1.4 1.4 2.0 2.6 3.2 8.1 10.3 10.2 10.8 10.9 10.8 11.1
(0.5)
Nonpoor, Poor 1.7 2.4 2.5 3.4 2.7 2.8 7.9 10.3 10.9 11.5 11.5 11.6 11.6
Indonesia
(0.5)
1997-2000
Nonpoor, Nonpoor 84.3 84.1 84.1 83.5 82.9 82.3 78.1 75.2 75.3 74.8 74.6 74.7 74.4
(0.7)
Ï? 0.54 0.529 0.521 0.521 0.475 0.421
Adjusted R2 0.193 0.21 0.215 0.231 0.329 0.421
N 1638 1638 1638 1638 1638 1638 3517 1638 1638 1638 1638 1638 1638
Poor, Poor 12.5 10.2 10.1 10.1 10.8 11 7.6 6.3 5.9 5.2 5.2 4.6 4.5
(0.5)
Poor, Nonpoor 0.4 2.6 2.6 2.7 3.3 3.3 5.7 6.6 7.3 7.3 7.4 8.5 9.4
(0.4)
Nonpoor, Poor 0.5 2.8 3.0 3.0 2.3 2.1 4.4 6.8 7.2 7.9 7.8 8.5 8.6
Vietnam
(0.4)
2006-2008
Nonpoor, Nonpoor 86.5 84.3 84.3 84.2 83.6 83.6 82.3 80.3 79.6 79.6 79.5 78.4 77.6
(0.7)
Ï? 0.654 0.584 0.554 0.547 0.516 0.394
Adjusted R2 0.334 0.494 0.548 0.559 0.60 0.71
N 1335 1335 1335 1335 1335 1335 2728 1335 1335 1335 1335 1335 1335
Note: 1.Poverty rates in percent are calculated using halves from the IFLS panel and the VHLSS panel component, and predictions obtained using data in the second survey rounds.
Full regression results are provided in Tables 2.1a and 2.1b in Appendix 2.
2. All numbers are weighted using population weights for each survey round. Standard errors in parentheses.
3. Number of replications for the estimates is 500.
4. Household heads' ages are restricted to between 25 and 55 in the first survey round.
30
Table 2: Estimated Ï? from Actual Panel Data for Different Countries
Country Survey Year Ï?
2001
Bosnia- Herzegovina 0.43
2004
1997
Indonesia 0.47
2000
2002-03
Lao PDR 0.40
2007-08
1995-96
Nepal 0.39
2003-04
2004
Peru 0.58
2006
2004
0.66
2006
2004
Vietnam 0.35
2008
2006
0.62
2008
Note: 1. Each cell represents results from one regression, except for the cells under " Ï?".
2. Household heads' ages are restricted to between 25 and 55 in the first survey round.
3. Ï? is the correlation coefficient between the error terms for the panel data.
31
Table 3: Poverty Dynamics from Synthetic Panel Data and Actual Panel Data for
Indonesia and Vietnam
Parametric lower Parametric upper
Poverty status Non- Non-
bound bound
Country parametric Truth parametric
bound bound
Spec. 1 Spec. 2 Spec. 3 Spec. 3 Spec. 2 Spec. 1
Poor, Poor 13.3 15.9 11.1 9.8 5.9 6.1 5.4 4.0 3.3
(0.4)
Poor, Nonpoor 1.6 1.7 6.5 7.8 8.1 11.5 12.2 13.5 12.3
(0.5)
Indonesia
Nonpoor, Poor 0.9 0.9 5.7 7.0 7.9 10.7 11.5 12.8 11.7
1997-2000
(0.5)
Nonpoor, Nonpoor 84.3 81.5 76.7 75.4 78.1 71.7 71.0 69.6 72.7
(0.7)
N 1710 1710 1710 1710 3517 1710 1710 1710 1710
Poor, Poor 11.8 13.1 9.2 8.3 7.6 5.6 5.1 4.1 3.9
(0.5)
Poor, Nonpoor 0.6 0.4 4.3 5.3 5.7 8.0 8.5 9.4 9.2
(0.4)
Vietnam
Nonpoor, Poor 0.4 0.5 4.4 5.3 4.4 8.0 8.5 9.5 8.4
2006-2008
(0.4)
Nonpoor, Nonpoor 87.2 86.0 82.1 81.1 82.3 78.4 77.9 77.0 78.6
(0.7)
N 3701 3701 3701 3701 2728 3701 3701 3701 3701
Note: 1.Poverty rates in percent are calculated using halves from the IFLS panel and the VHLSS cross section component, and predictions obtained
using data in the second survey rounds.
2. All numbers are weighted using population weights for each survey round. Standard errors in parentheses.
3. Specification 1 assumes Ï?= 1 and Ï?= 0 for the lower bounds and upper bounds respectively and is the parametric equivalence of the
nonparametric bounds. Specification 2 approximates Ï? with 0.8 and 0.2, and Specification 3 approximates Ï? with 0.7 an 0.3 for the
lower bounds and upper bounds respectively. Number of replications for non-parametric estimates is 500.
4. Household heads' ages are restricted to between 25 and 55 for the first survey round and between
27 and 57 for the second survey round.
32
Figure 1: Estimates of Mobility Out of Poverty for Alternative Poverty Lines, Indonesia
Figure 2: Profiles for Those Who Remained Poor in Both Periods, Vietnam 2006- 2008
33
Figure 3: Profiles for Those Who Were Poor in the First Period but Non-poor in the
Second Period, Vietnam 2006- 2008
Figure 4: Profiles for Those Who Were Non-poor in the First Period but Poor in the
Second Period, Vietnam 2006- 2008
34
Figure 5: Profiles for Those Who Were Non-poor in Both Periods, Vietnam 2006- 2008
35
APPENDICES FOR ONLINE PUBLICATION ONLY
Appendix 1
Proof of Theorem 1 and Corollaries 1.1 and 1.2
The probability a household is poor in the first period but non-poor in the second period can be written as
P( yi1 ï€¼ z1 ï?‰ yi 2 ï€¾ z 2 ) ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi1 ï?‰ ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 )
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ï?‰ ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 ) (A1.1a)
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) P(ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 | ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 )
where the second line follows from replacing xi1 with xi 2 by Assumption 127, and the third line follows
from the multiplication rule for conditional probabilities.28 Since the probability
P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) P(ï?¥ i 2 ï€¾ z2 ï€ ï?¢ 2 ' xi 2 | ï?¥ i1 ï‚³ z1 ï€ ï?¢1 ' xi 2 ) (*) is non-negative by definition, we then
have
P( yi1 ï€¼ z1 ï?‰ yi 2 ï€¾ z 2 ) ï‚£ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) P(ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 | ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 )
ï€« P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) P(ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 | ï?¥ i1 ï‚³ z1 ï€ ï?¢1 ' xi 2 ) (A1.2)
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) P(ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 )
where the second line follows from the partition rule.29
Our upper bound estimate of mobility can be written as
P( yi2U ï€¼ z1 ï?‰ yi 2 ï€¾ z2 ) ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) P(ï?¥ i 2 ï€¾ z2 ï€ ï?¢ 2 ' xi 2 )
1 (A1.3)
where the right-hand side results when the two error terms and are completely independent of each
other.
Thus combining (A1.2) and (A1.3) it follows that
P( yi2U ï€¼ z1 ï?‰ yi 2 ï€¾ z2 ) ï‚³ P( yi1 ï€¼ z1 ï?‰ yi 2 ï€¾ z2 )
1 (A.1.4)
which establishes the upper bound estimate of mobility. Incidentally, the probability (*) is the bias for the
upper bound estimate of mobility, which establishes Corollary 1.1.
Then subtracting each of the terms in (A1.4) from P( yi 2 ï€¾ z 2 ) , we would have
27
Note that we can directly replace xi1 with xi2 if x contains only time-invariant variables. If x also contains
deterministic variables, then we would replace xi1 with the period 1 values determined by knowing xi2. We abstract
from this case to simplify notation, since the key idea remains the same.
28
Strictly speaking, we need P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) ï€¾ 0 to derive the third line, which is satisfied as long as the poverty
rate is not zero for period 1. Also note that the equality signs â€•=â€– in all the equal-or-greater-than â€•â‰¥â€– signs inside
parentheses for the following probabilities are optional since household consumptions (and their error terms) are
continuous variables.
29
See, for example, Theorem 1.2.11 in Casella and Berger (2002).
36
P( yi 2 ï€¾ z2 ) ï€ P( yi2U ï€¼ z1 ï?‰ yi 2 ï€¾ z2 ) ï‚£ P( yi 2 ï€¾ z2 ) ï€ P( yi1 ï€¼ z1 ï?‰ yi 2 ï€¾ z2 )
1
or equivalently, using the partition rule again,
P( yi2U ï‚³ z1 ï?‰ yi 2 ï€¾ z2 ) ï‚£ P( yi1 ï‚³ z1 ï?‰ yi 2 ï€¾ z2 )
1 (A1.5)
which establishes Corollary 1.2. And it is rather straightforward to show the remaining cases.
Proof of Theorem 2 and Corollaries 2.1 and 2.2
The probability a household is poor in the first period but non-poor in the second period in (A1.1a) can
also be rewritten as
P( yi1 ï€¼ z1 ï?‰ yi 2 ï€¾ z 2 ) ï€½ P( ï?¢1 ' xi1 ï€« ï?¥ i1 ï€¼ z1 ï?‰ ï?¢ 2 ' xi 2 ï€« ï?¥ i 2 ï€¾ z 2 )
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi1 ) ï€« P(ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 ) ï€ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi1 ï?• ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 )
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi1 ) ï€« ï?›1 ï€ P(ï?¥ i 2 ï‚£ z 2 ï€ ï?¢ 2 ' xi 2 )ï?? ï€ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi1 ï?• ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 ) (A1.1b)
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi1 ) ï€ P(ï?¥ i 2 ï‚£ z 2 ï€ ï?¢ 2 ' xi 2 ) ï€« ï?›1 ï€ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi1 ï?• ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 )ï??
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) ï€ P(ï?¥ i 2 ï‚£ z 2 ï€ ï?¢ 2 ' xi 2 ) ï€« ï?›1 ï€ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ï?• ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 )ï??
where the second and third lines follow from the basic properties of probability, 30 the fourth line follows
from rearranging expressions, and the fifth line follows from replacing xi1 with xi 2 using Assumption 1.
Our lower bound estimate of mobility is
P( yi2 L ï€¼ z1 ï?‰ yi 2 ï€¾ z 2 ) ï€½ P(ï?¥ i 2 ï€¼ z1 ï€ ï?¢1 ' xi 2 ï?‰ ï?¥ i 2 ï€¾ z 2 ï€ ï?¢ 2 ' xi 2 )
1
ï€½ P(ï?¥ i 2 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) ï€ P(ï?¥ i 2 ï‚£ z 2 ï€ ï?¢ 2 ' xi 2 ) (A1.6)
ï€½ P(ï?¥ i1 ï€¼ z1 ï€ ï?¢1 ' xi 2 ) ï€ P(ï?¥ i 2 ï‚£ z 2 ï€ ï?¢ 2 ' xi 2 )
where the last line follows when ï?¥ i1 has perfect correlation with ï?¥ i 2 . Since the third term on the right-hand
side in the last line in equation (A1.1b) is non-negative by definition, combining (A1.1b) and (A1.6) it
follows that P( yi2 L ï€¼ z1 ï?‰ yi 2 ï€¾ z2 ) ï‚£ P( yi1 ï€¼ z1 ï?‰ yi 2 ï€¾ z2 )
1 (A1.7)
which establishes our conservative lower bound of mobility. Incidentally, the third term on the right-hand
side in the last line in equation (A1.1b) is the bias for the lower bound estimate of mobility, which
establishes Corollary 2.1.
Then subtracting each of the terms in (A1.7) from P( yi 2 ï€¾ z 2 ) , we would have
P( yi 2 ï€¾ z2 ) ï€ P( yi2 L ï€¼ z1 ï?‰ yi 2 ï€¾ z2 ) ï‚³ P( yi 2 ï€¾ z2 ) ï€ P( yi1 ï€¼ z1 ï?‰ yi 2 ï€¾ z2 )
1
or equivalently
30
See, for example, Theorem 1.2.9 in Casella and Berger (2002).
37
P( yi2 L ï‚³ z1 ï?‰ yi 2 ï€¾ z2 ) ï‚³ P( yi1 ï‚³ z1 ï?‰ yi 2 ï€¾ z2 )
1 (A1.8)
which establishes Corollary 2.2. And it is rather straightforward to show the remaining cases.
Proof of Theorem 3
When at least one independent variable is measured with error, the vector of household iâ€˜s true variables
*
xij for j= 1, 2, are not observed, but instead we observe xij that are measured with errors. Similarly, if
*
there are measurement errors in household consumption, true household consumption yij is not measured,
but we only observe yij . The linear projection of true household consumption on true household
characteristics in period j in equations (1) and (2) then becomes
(A1.9)
The true and observed variables are postulated to have the following relationship
xij ï€½ xij ï€« ï?´ ij
*
(A1.10)
yij ï€½ yij ï€« ï?µij
*
(A1.11)
where ï?´ ij and ï?µ ij are the measurement errors. In the classical measurement error model, ï?´ ij and ï?µ ij are
* *
assumed to be uncorrelated respectively with the true variables xij and yij , as well as both uncorrelated
with the model error . In the non-classical error model, there is less restriction on the correlation
between these measurement errors and the true variables and ï?´ ij and ï?µ ij can be assumed to be correlated
* *
with xij and yij .
However, regardless of the correlation between the measurement errors and the true variables, using
equations (A1.10) and (A1.11), we can rewrite (A1.9) as
ï?µ ij (A1.12a)
or conveniently in a more general format
(A1.12b)
Equation (A1.12b) is identical to our original equations (1) and (2), which shows that measurement errors
do not affect our results in the proofs for Theorems 1 and 2. Indeed, equations (1) and (2) only provide the
linear projection of observed household consumption on observed household characteristics, where we
make no assumption about the correlation between the measurement errors and the true variables, except
that they do not cause the autocorrelation of the to become negative. Thus, the lower bound (which is
based only on assuming the autocorrelation is less than or equal to one) will continue to be a lower bound,
38
while the upper bound will still be an upper bound with classical measurement error (since this will not
change the autocorrelation of the term), and will be an upper bound with non-classical measurement
error provided this non-classical error doesnâ€˜t induce negative autocorrelation. This could be violated if
the measurement error in consumption is strongly negatively autocorrelated enough to offset the positive
autocorrelation in the genuine consumption residual, which doesnâ€˜t seem that likely in practice as
evidenced by the positive overall autocorrelations of the seen in our empirical applications.
Appendix 2
Figure 2.1: Distribution Graphs for the Residuals, Indonesia and Vietnam
Residuals, Indonesia 1997 Residuals, Indonesia 2000
.8
.8
.6
.6
Density
Density
.4
.4
.2
.2
0
0
-2 0 2 4 6 -2 -1 0 1 2 3
Residuals Residuals
kernel = epanechnikov, bandwidth = 0.1180 kernel = epanechnikov, bandwidth = 0.1194
Residuals, Vietnam 2006 Residuals, Vietnam 2008
.8
.8
.6
.6
Density
Density
.4
.4
.2
.2
0
0
-2 -1 0 1 2 -2 -1 0 1 2
Residuals Residuals
kernel = epanechnikov, bandwidth = 0.0821 kernel = epanechnikov, bandwidth = 0.0828
39
Table 2.1a: Estimated Parameters of Household Consumption, Vietnam 2006
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Heads' age 0.012*** 0.010*** 0.009*** 0.009*** 0.010*** 0.009***
(0.002) (0.002) (0.002) (0.002) (0.002) (0.002)
Head is female 0.118*** 0.009 0.030 0.023 -0.071** -0.029
(0.037) (0.036) (0.035) (0.035) (0.034) (0.028)
Head's years of schooling 0.064*** 0.057*** 0.047*** 0.046*** 0.042*** 0.021***
(0.004) (0.004) (0.004) (0.004) (0.004) (0.004)
Ethnic majority groups 0.437*** 0.333*** 0.272*** 0.254*** 0.224*** 0.194***
(0.038) (0.047) (0.042) (0.042) (0.039) (0.035)
Urban in 2006 0.297*** 0.285*** 0.215*** 0.201*** 0.088**
(0.041) (0.039) (0.040) (0.040) (0.036)
Poor as classified by government in 2006 -0.435*** -0.434*** -0.417*** -0.238***
(0.034) (0.034) (0.031) (0.030)
Head works in agriculture only 0.070** 0.056** 0.038*
(0.027) (0.026) (0.022)
Head works in wage only 0.197*** 0.191*** 0.099***
(0.042) (0.040) (0.033)
Head works in service only 0.187*** 0.192*** 0.049
(0.042) (0.040) (0.035)
Household size -0.080*** -0.102***
(0.009) (0.008)
Number of children age 0 to 5 -0.068*** -0.062***
(0.021) (0.017)
Household owns a tivi 0.153***
(0.032)
Household owns a motobicycle 0.283***
(0.023)
Household owns a refrigerator 0.229***
(0.032)
Household owns a wasing machine 0.172***
(0.055)
Household owns an air conditioner 0.417***
(0.109)
Household owns toilet 0.152***
(0.043)
Drinking water from own running water or bottled water 0.034
(0.039)
Constant 7.057*** 7.601*** 7.849*** 7.791*** 8.178*** 7.926***
(0.090) (0.147) (0.135) (0.130) (0.134) (0.112)
Adjusted R2 0.334 0.494 0.548 0.559 0.600 0.710
Ïƒ 0.500 0.436 0.412 0.407 0.387 0.330
N 1334 1334 1334 1334 1334 1334
Note: 1. *p<0 .1, **p<0.05, ***p<0.01; robust standard errors in parentheses accounts for
clustering at the primary sampling unit level.
2. Models 2 to 6 control for province dummy variables.
3. All estimates are obtained using cross sectional data.
40
Table 2.1b: Estimated Parameters of Household Consumption, Indonesia 1997
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Heads' age 0.007*** 0.007*** 0.007*** 0.006*** 0.007*** 0.004**
(0.002) (0.002) (0.002) (0.002) (0.002) (0.002)
Head is female 0.152*** 0.142** 0.154*** 0.209*** -0.013 -0.003
(0.058) (0.056) (0.057) (0.062) (0.057) (0.053)
Head's years of schooling 0.052*** 0.053*** 0.052*** 0.045*** 0.046*** 0.026***
(0.005) (0.005) (0.005) (0.005) (0.005) (0.005)
Head's birth place is small town 0.093** 0.087* 0.069 0.062 0.046 0.015
(0.046) (0.050) (0.050) (0.050) (0.048) (0.042)
Head's birth place is big city 0.092 0.045 0.038 0.042 0.054 0.015
(0.082) (0.086) (0.087) (0.084) (0.079) (0.073)
Head's birth place is other -0.076 -0.091 -0.114 -0.072 -0.392 -0.460
(0.424) (0.432) (0.433) (0.449) (0.397) (0.422)
Urban 0.015 -0.006 -0.026 0.014 -0.094*
(0.045) (0.051) (0.054) (0.052) (0.051)
Community rate of electrification 0.002** 0.002** 0.003*** 0.002**
(0.001) (0.001) (0.001) (0.001)
Community has a primary school 0.077 0.058 0.093 0.099
(0.088) (0.084) (0.081) (0.075)
Head is self-employed 0.312*** 0.269*** 0.251***
(0.084) (0.073) (0.063)
Head works for the government 0.475*** 0.411*** 0.289***
(0.103) (0.095) (0.084)
Head works in the private sector 0.199** 0.146* 0.154**
(0.088) (0.078) (0.069)
Head is unpaid family worker 0.476* 0.450* 0.382*
(0.280) (0.263) (0.218)
Household farms -0.102** -0.067 -0.023
(0.050) (0.046) (0.042)
Household size -0.311*** -0.345***
(0.040) (0.039)
Household size squared 0.019*** 0.021***
(0.003) (0.003)
Number of children age 0 to 5 -0.101*** -0.084***
(0.025) (0.023)
Log of housing floor space (m2) 0.117***
(0.026)
Main drinking water from pipe 0.100**
(0.040)
Household owns a tivi 0.188***
(0.031)
Constant 11.642*** 11.383*** 11.184*** 10.960*** 11.999*** 11.782***
(0.123) (0.154) (0.178) (0.208) (0.208) (0.312)
Adjusted R2 0.193 0.210 0.215 0.231 0.329 0.421
Ïƒ 0.678 0.670 0.668 0.662 0.618 0.574
N 1659 1659 1659 1659 1659 1659
Note: 1. *p<0 .1, **p<0.05, ***p<0.01; robust standard errors in parentheses accounts for
clustering at the primary sampling unit level.
2. Models 2 to 6 include dummy variables for provinces, languages spoken at home, religions, education levels
of head's father. Models 3 to 6 include dummy variables for community road types.
Models 6 includes dummy variables for types of cooking fuel and primary roof materials.
3. All estimates are obtained using cross sectional data.
41
Table 2.2: Estimated Parameters of Household Consumption Using Actual Panel Data for Different
Countries
Bosnia-
Vietnam Lao PDR Nepal Peru
Herzegovina
2006-08 2001-04 2002/03-2007/08 1995/96- 2003/04 2004-06
Age 0.020*** 0.010*** 0.030*** 0.030*** 0.012***
(0.001) (0.002) (0.001) (0.003) (0.001)
Female 0.042* 0.233*** 0.037 0.310*** 0.184***
(0.022) (0.035) (0.065) (0.065) (0.026)
Years of schooling 0.048*** 0.037*** 0.042*** 0.065*** 0.057***
(0.002) (0.004) (0.003) (0.007) (0.003)
Ethnic majority groups/ upper
0.379*** 0.145*** -0.104** 0.150***
caste
(0.023) (0.025) (0.049) (0.023)
Bosniac -0.123***
(0.041)
Serb -0.088**
(0.041)
Urban 0.362*** -0.084*** 0.131*** 0.341*** 0.440***
(0.022) (0.026) (0.027) (0.078) (0.023)
Constant 6.939*** 7.213*** 10.470*** 7.586*** 4.062***
(0.050) (0.103) (0.060) (0.127) (0.059)
Ïƒu 0.37 0.35 0.34 0.35 0.41
Ïƒv 0.29 0.40 0.42 0.43 0.35
Ï? 0.62 0.43 0.40 0.39 0.58
2 0.37 0.07 0.15 0.27 0.40
R
Number of households 2728 1341 2000 419 2665
Total no of observations 5456 2682 3877 838 4095
Note: 1. *p<0 .1, **p<0.05, ***p<0.01; robust standard errors in parentheses accounts for clustering at the individual level.
2. Household heads' ages are restricted to between 25 and 55 in the first round.
42