WPS5066
Policy Research Working Paper 5066
Do Value-Added Estimates Add Value?
Accounting for Learning Dynamics
Tahir Andrabi
Jishnu Das
Asim Ijaz Khwaja
Tristan Zajonc
The World Bank
Development Research Group
Human Development and Public Services Team
September 2009
Policy Research Working Paper 5066
Abstract
Evaluations of educational programs commonly assume and that private schools increase average achievement by
that what children learn persists over time. The authors 0.25 standard deviations each year. In contrast, estimates
compare learning in Pakistani public and private schools from commonly used value-added models significantly
using dynamic panel methods that account for three key understate the impact of private schools' on student
empirical challenges to widely used value-added models: achievement and/or overstate persistence. These results
imperfect persistence, unobserved student heterogeneity, have implications for program evaluation and value-
and measurement error. Their estimates suggest that added accountability system design.
only a fifth to a half of learning persists between grades
This paper--a product of the Human Development and Public Services Team, Development Research Group--is part of a
larger effort in the department to expand our knowledge of child learning and test scores as a broad measure of educational
outcomes. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The author may be
contacted at jdas1@worldbank.org.
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Produced by the Research Support Team
Do Value-Added Estimates Add Value? Accounting for
Learning Dynamics
Tahir Andrabi Jishnu Das Asim Ijaz Khwaja Tristan Zajonc
1 Introduction
Models of learning often assume that children's achievement persists between grades--what
a child learns today largely stays with her tomorrow. Yet recent research highlights that
treatment effects measured by test scores fade rapidly, both in randomized interventions and
observational studies. Jacob, Lefgren and Sims (2008), Kane and Staiger (2008), and Rothstein
(2008) find that teacher effects dissipate by between 50 and 80 percent over one year. The
same pattern holds in several studies of supplemental education programs in developed and
developing countries. Currie and Thomas (1995) document the rapid fade out of Head Start's
impact in the United States, and Glewwe, Ilias and Kremer (2003) and Banerjee et al. (2007)
report on education experiments in Kenya and India where over 70 percent of the one-year
treatment effect is lost after an additional year. Low persistence may in fact be the norm
rather than the exception, and a central feature of learning.
Low persistence has critical implications for commonly used program evaluation strategies
that rest heavily on assumptions about or estimation of persistence. Using primary data on
public and private schools in Pakistan, this paper addresses the challenges to value-added
evaluation strategies posed by 1) imperfect persistence of achievement, 2) heterogeneity in
An earlier version of this paper also circulated under the title "Here Today, Gone Tomorrow? Examining
the Extent and Implications of Low Persistence in Child Learning".
tandrabi@pomona.edu, Pomona College. jdas1@worldbank.org, World Bank, Washington DC and Center
for Policy Research, New Delhi; akhwaja@ksg.harvard.edu, Kennedy School of Government, Harvard University,
BREAD, NBER; tzajonc@fas.harvard.edu, Kennedy School of Government, Harvard University. We are grateful
to Alberto Abadie, Chris Avery, David Deming, Pascaline Dupas, Brian Jacob, Dale Jorgenson, Elizabeth King,
Karthik Muralidharan, David McKenzie, Rohini Pande, Lant Pritchett, Jesse Rothstein, Douglas Staiger, Tara
Vishwanath, and seminar participants at Harvard, NEUDC and BREAD for helpful comments on drafts of this
paper. This research was funded by grants from the Poverty and Social Impact Analysis and Knowledge for
Change Program Trust Funds and the South Asia region of the World Bank. The findings, interpretations, and
conclusions expressed here are those of the authors and do not necessarily represent the views of the World
Bank, its Executive Directors, or the governments they represent.
1
learning, and 3) measurement error in test scores. We find that ignoring any of these learning
dynamics biases estimates of persistence and can dramatically affect estimates of the value-
added of private schools.
To fix concepts, consider a simple model of learning evolution, yit = Tit + yi,t-1 + i + it ,
where yit is child achievement measured by test scores in period t, Tit is the treatment or
program effect in period t, and i is unobserved student ability that speeds learning each
period. We refer to , the parameter that links test scores across periods, as persistence.1
The canonical restricted value-added or gain-score model assumes that = 1 (for examples,
see Hanushek (2003)). When < 1, test scores exhibit mean reversion. Estimates of the
treatment or program effect, , that assume = 1 will be biased if the baseline achievement
of the treatment and control groups differs and persistence is imperfect. This has led many
researchers to advocate leaving lagged achievement on the right-hand side. However doing so is
not entirely straightforward: if estimated by OLS, omitted heterogeneity that speeds learning,
i , will generally bias upward and any measurement in test scores yi,t-1 will bias downward.
Both the estimate of persistence and the treatment effect may remain biased when estimated
by standard methods.
To address these concerns, we introduce techniques from the dynamic panel literature (Arel-
lano and Honore, 2001; Arellano, 2003) that require three years of data. There are several find-
ings. First, we find that learning persistence is low: only a fifth to a half of achievement persists
between grades. That is, is between 0.2 and 0.5 rather than closer to 1. These estimates
are remarkably similar to those obtained in the United States (Jacob, Lefgren and Sims, 2008;
Kane and Staiger, 2008; Rothstein, 2008). Second, OLS estimates of are contaminated both
by measurement error in test scores and unobserved student-level heterogeneity in learning.
Ignoring both biases leads to higher persistence estimates between 0.5 and 0.6; correcting only
for measurement error results in estimates between 0.7 and 0.8. For persistence, the upward
bias from omitted heterogeneity outweighs measurement error attenuation.
Third, our estimate of the private schooling effect is highly sensitive to the persistence
parameter. Since private schooling is a school input that that is continually applied and leads
to a large baseline gap in achievement, this is expected. Indeed, we find that incorrectly
assuming = 1 significantly understates and occasionally yields the wrong sign for private
schools' impact on achievement--providing a compelling example of Lord's paradox (Lord,
1967). The restricted value-added model suggests that private schools contribute no more
than public schools; in contrast, our dynamic panel estimates suggest large and significant
1
There are several different uses of the term persistence in the education literature. We refer to persistence as
the fraction of knowledge that persist from one period to the next. The education literature, however, also uses
the term "persistence" to indicate the probability of continuation from grade to grade (as opposed to dropping
out), or to indicate a child's motivation or propensity to complete tasks in the face of adversity.
2
contributions ranging from 0.19 to 0.32 standard-deviations a year.2 Notably, the lagged value-
added model estimated by OLS gives similar results for the private school effect as our more
data intensive dynamic panel methods. This is due to the countervailing heterogeneity and
measurement error biases on and because lagged achievement can act as a partial proxy for
omitted heterogeneity in learning.3
Finally, towards an economic interpretation of low persistence, we use question-level exam
responses as well as household expenditure and time-use data to explore whether psychome-
tric testing issues, behavioral responses, or forgetting contribute to low persistence--causes
that have different welfare implications. This investigation suggests that measurement error,
mechanical psychometric testing, and behavioral response based explanations are insufficient.
Understanding the behavioral or technological reasons for low persistence remains a critical
issue in the literature.
The value-added of our contribution is several fold. To begin with, we show that the
restricted value-added estimates based on longitudinal data may be worse than the naive cross-
sectional OLS estimates. Second, we demonstrate how unobserved heterogeneity in learning
and measurement error in test scores can bias estimates of persistence. The low persistence we
find implies that long-run extrapolations from short-run impacts are fraught with danger. In
the model above, the long-run impact of continued treatment is /(1 - ); with estimates of
around 0.2 to 0.5, these gains may be much smaller than those obtained by assuming that is
close to 1.4 Third, we find that students in Pakistan's private schools learn significantly more
each year than their public school counterparts but that the popular gain-score or restricted
value-added model would have detected no effect of private schools on learning.5 From a public
2
Harris and Sass (2006) find the that the persistence parameter makes little difference to estimates of teacher
effects, while we find it starkly affects the estimates of school type. This can be explained by the relative gaps
in baseline achievement. It is likely that a child does not continue with the same teacher, or an equally good
teacher, over time. Hence, even if we don't observe children's educational history, two children who currently
have different teachers may have been exposed to a similar quality teachers in the past. As such, children
with different teachers often do not differ substantially in their baseline learning levels. In contrast, given that
there is little switching across school types, children currently in different schools differ substantially in baseline
learning levels.
3
This results suggests that correcting for measurement error alone may do more harm than good. For
example, Ladd and Walsh (2002) correct for measurement error in the lagged value-added model of school
effects by instrumenting using double-lagged test scores but don't address potential omitted heterogeneity.
They show this correction significantly changes school rankings and benefits poorly performing districts. Given
that we find unobserved heterogeneity in learning rates, rankings that correct for measurement error may be
poorer than those that do not.
4
For example, Krueger and Whitmore (2001), Angrist et al. (2002), Krueger (2003), and Gordon, Kane and
Staiger (2006) calculate the economic return of various educational interventions by citing research linking test
scores to earnings of young adults (e.g. Murnane, Willett and Levy, 1995; Neal and Johnson, 1996). Although
effects on learning as measured by test-scores may fade, non-cognitive skills that are rewarded in the labor
market could persist. For instance, Currie and Thomas (1995), Deming (2008) and Schweinhart et al. (2005)
provide evidence of long run effects of Head Start and the Perry Preschool Project, even though cognitive gains
largely fade after children enroll in regular classes.
5
An alternative identification strategy to value-added models that we do not pursue here is instrumental
3
finance point of view, these different estimates matter particularly since per pupil expenditures
are lower in private relative to public schools.6 Our results are consistent with growing evidence
that relatively inexpensive, mainstream, private schools hold potential in the developing country
context (Jimenez, Lockheed and Paqueo, 1991; Alderman, Orazem and Paterno, 2001; Angrist
et al., 2002; Alderman, Kim and Orazem, 2003; Tooley and Dixon, 2003; Muralidharan and
Kremer, forthcoming; Andrabi, Das and Khwaja, 2008). Overall, our general results support a
movement towards long-run, experimental evaluations of educational interventions.
A final contribution of our work is that it applies a wider set of econometric tools from the
dynamic panel data literature than have been typically used in the education literature. In
the use of dynamic panel methods, our estimators bear greatest resemblance to those discussed
by Schwartz and Zabel (2005) and Sass (2006). Both use simple dynamic panel estimators,
in the first case using school-level data and in the second using the Arellano and Bond (1991)
differences GMM approach. Santibanez (2006) also uses the Arellano and Bond (1991) estimator
to analyze the impact of teacher quality. Our efforts extend to include system GMM approaches
(Arellano and Bover, 1995) and to address measurement error in test scores and alternative
assumptions regarding omitted heterogeneity. We find both are important.
The rest of the paper is organized as follows: Section 2 presents the basic education pro-
duction function analogy and discusses the specification and estimation of the value-added
approximations to it. Section 3 summarizes our data. Section 4 reports our main results,
several robustness checks, and a preliminary exploration of the economic interpretation of per-
sistence. Section 5 concludes by discussing implications for experimental and non-experimental
program evaluation.
2 Empirical Learning Framework
The "education production function" approach to learning relates current achievement to all
previous inputs. Boardman and Murnane (1979) and Todd and Wolpin (2003) provide two
accounts of this approach and the assumptions it requires; the following is a brief summary.7
Using notation consistent with the dynamic panel literature, we aggregate all inputs into a
single vector xit and exclude interactions between past and present inputs. Achievement for
variables. We are exploring such strategies using plausible exogenous geographical variation in a companion
paper focused on the returns to private school education in Pakistan and competition between public and private
schools. Our emphasis here is on the challenges faced by popular value-added strategies.
6
For details on the costs of private schooling in Pakistan see Andrabi, Das and Khwaja (2008).
7
Researchers generally assume that the model is additively separable across time and that input interactions
can be captured by separable linear interactions. Cunha, Heckman and Schennach (2006) and Cunha and
Heckman (2007) are two exceptions to this pattern, where dynamic complementarity between early and late
investments and between cognitive and non-cognitive skills are permitted.
4
child i at time (grade) t is therefore
s=t
yit = 1 xit + 2 xi,t-1 + ... + t xi1 + t+1-s µis , (1)
s=1
where yit is true achievement, measured without error, and the summed µis are cumulative
productivity shocks.8 Estimating (1) is generally impossible because researchers do not observe
the full set of inputs, past and present. The value-added strategy makes estimation feasible by
rewriting (1) to avoid the need for past inputs. Adding and subtracting yit , normalizing 1 to
unity, and assuming that coefficients decline geometrically (j = j-1 and j = j-1 for all
j) yields the lagged value-added model
yit = 1 xit + yi,t-1 + µit . (2)
The basic idea behind this specification is that lagged achievement will capture the contribution
of all previous inputs and any past unobservable endowments or shocks. As before, we refer to
as the input coefficient and as the persistence coefficient. Finally, imposing the restriction
that = 1 yields the gain-score or restricted value-added model that is often used in the
education literature:
yit - yi,t-1 = 1 xit + µit .
This model asserts that past achievement contains no information about future gains, or equiv-
alently, that an input's effect on any subsequent level of achievement does not depend on how
long ago it was applied. As we will see from our results, the assumption that = 1 is clearly
violated in the data, and increasingly it appears, in the literature, as well. As a result, we will
focus primarily on estimating (2).
There are two potential problems with estimating (2). First, the error term µit could include
individual (child-level) heterogeneity in learning (e.g., µit i + it ). Lagged achievement only
captures individual heterogeneity if it enters through a one-time process or endowment, but
talented children may also learn faster. Since this unobserved heterogeneity enters in each
period, Cov(yi,t-1 , µit ) > 0 and will be biased upwards.
The second likely problem is that test scores are inherently a noisy measure of latent achieve-
ment. Letting yit = yit + it denote observed achievement, we can rewrite the latent lagged
value-added model (2) in terms of observables. The full error term now includes measurement
8
This starting point is more restrictive than the more general starting framework presented by Todd and
Wolpin (2003). In particular, it assumes an input applied in first grade has the same effect on first grade scores
as an input applied in second grade has on second grade scores.
5
error, µit + it - i,t-1 .
Dropping all the inputs to focus solely on the persistence coefficient, the expected bias due
to both of these sources is
2
Cov(i , yi,t-1 )
plim OLS = + 2 2
- 2 2
. (3)
y + y +
The coefficient is biased upward by learning heterogeneity and downward by measurement error.
These effects only cancel exactly when Cov(i , yi,t-1 ) = (Arellano, 2003).
2
Furthermore, bias in the persistence coefficient leads to bias in the input coefficients, . To
^
see this, consider imposing a biased and estimating the resulting model
^ ^
yit - yi,t-1 = xit + [( - )yi,t-1 + µit + it - i,t-1 ].
^
The error term now includes ( - )yi,t-1 . Since inputs and lagged achievement are generally
^
positively correlated, the input coefficient will, in general, by biased downward if > . The
precise bias, however, depends on the degree of serial correlation of inputs and on the potential
correlation between inputs and learning heterogeneity that remains in µit .
This is more clearly illustrated in the case of the restricted value-added model (assuming
that = 1) where:
Cov(xit , yi,t-1 ) Cov(xit , i )
plim OLS = - (1 - )
^ + . (4)
Var(xit ) Var(xit )
Therefore, if indeed there is perfect persistence as assumed and if inputs are uncorrelated with
i , OLS yields consistent estimates of the parameters . However, if < 1, OLS estimation
of now results in two competing biases. By assuming an incorrect persistence coefficient we
leave a portion of past achievement in the error term. This misspecification biases the input
coefficient downward by the first term in (4). The second term captures possible correlation
between current inputs and omitted learning heterogeneity. If there is none, then the second
term is zero, and the bias will be unambiguously negative.
6
2.1 Addressing Child-Level Heterogeneity: Dynamic Panel Approaches
to the Education Production Function
Dynamic panel approaches can address omitted child-level heterogeneity in value-added ap-
proximations of the education production function. We interpret the value-added model (2) as
an autoregressive dynamic panel model with unobserved student-level effects:
yit = xit + yi,t-1 + µit , (5)
µit i + it . (6)
Identification of and is achieved by imposing appropriate moment conditions. Following
Arellano and Bond (1991) and Arellano and Bover (1995), we focus on linear moment conditions
and split our analysis into three groups: "differences" GMM, "differences and levels" GMM,
and "levels only" GMM, which respectively refer to whether the estimates are based on the
undifferenced "levels" equation (5), a differenced equation (see equation (7) below), or both.
The section below provides a brief overview of the estimators we explore. Table 1 summarizes
the estimators, including the standard static value-added estimators (M1-M4) and dynamic
panel estimators (M5-M10). For more complete descriptions, Arellano and Honore (2001) and
Arellano (2003) provide excellent reviews of these and other panel models.
2.1.1 Differences GMM: Switching estimators
As noted previously, the value-added model differences out omitted endowments that might
be correlated with the inputs. It does not, however, difference out heterogeneity that speeds
learning. To accomplish this, the basic intuition behind the Arellano and Bond (1991) difference
GMM estimator is to difference again. Differencing the dynamic panel specification of the lagged
value-added model (5) yields
yit - yi,t-1 = (xit - xi,t-1 ) + (yi,t-1 - yi,t-2 ) + [it - i,t-1 ]. (7)
Here, the differenced model eliminates the unobserved fixed effect i . However, (7) cannot be
estimated by OLS because yi,t-1 is correlated by construction with i,t-1 in the error term.
Arellano and Bond (1991) propose instrumenting for yi,t-1 - yi,t-2 using lags two periods and
beyond, such as yi,t-2 , or certain inputs, depending on the exogeneity conditions. These lags
are uncorrelated with the error term but are correlated with the change in lagged achievement,
provided < 1. The input coefficient, in our case the added contribution of private schools, is
primarily identified from the set of children who switch schools in the observation period.
The implementation of the difference GMM approach depends on the precise assumptions
7
about inputs. We consider two candidate assumptions: strictly exogenous inputs (M5) and
predetermined inputs (M6). Strict exogeneity assumes past disturbances do not affect current
and future inputs, ruling out feedback effects. In the educational context, this is a strong
assumption. A child who experiences a positive or negative shock may adjust inputs in response.
In our case, a shock may cause a child to switch schools.
To account for this possibility, we also consider the weaker case where inputs are prede-
termined but not strictly exogenous. Specifically, the predetermined inputs case assumes that
inputs are uncorrelated with present and future disturbances but are potentially correlated with
past disturbances. This case also assumes lagged achievement is uncorrelated with present and
future disturbances. Compared to strict exogeneity, this approach uses only lagged inputs as
instruments. Switching schools is instrumented by the original school type, allowing switches
to depend on previous shocks. This estimator remains consistent if a child switches school at
the same time as an achievement shock but still rules out parents anticipating and adjusting
to future expected shocks.
2.1.2 Levels and differences GMM: Uncorrelated or constantly correlated effects
One difficulty with the differences GMM approach (M5 and M6) is that time-invariant inputs
drop out of the estimated equation and their effects are therefore not identified. In our case, this
means that the identification of the private school effect is based on the five percent of children
who switch between public and private schools. We address the limited time-series variation
using the levels and differences GMM framework proposed by Arellano and Bover (1995) and
extended by Blundell and Bond (1998). Levels and differences GMM estimates a system of
equations, one for the undifferenced levels equation (5) and another for the differenced equation
(7). Further assumptions regarding the correlation between inputs and heterogeneity (though
not necessarily between heterogeneity and lagged achievement) yield additional instruments.
We first consider predetermined inputs that have a constant correlation with the individual
effects (M7). While inputs may be correlated with the omitted effects, constant correlation
implies switching is not. The constant correlation assumption implies that xit are available
as instruments in the levels equation (Arellano and Bover, 1995). In the context of estimating
school type, this estimator can be viewed as a levels and differences switching estimator since
it relies on children switching school types in both the levels and differences equations. In
practice, we often must assume that any time-invariant inputs are uncorrelated with the fixed
effect or the levels equation, which includes the time-invariant inputs, is not fully identified.
A second possibility is that inputs are predetermined but are also uncorrelated with the
omitted effects (M8). This allows using inputs xt as instruments in the levels model (5). The
i
required assumption is fairly strong; it is natural to believe that inputs are correlated with
8
the omitted effect. Certainly, the decision to attend private school may be correlated with
the child's ability to learn. At the same time, the assumption is weaker than OLS estimation
of lagged value-added model since the model (M8) allows for the omitted fixed effect to be
correlated with lagged achievement.
2.1.3 Levels GMM: Conditional mean stationarity
In some instances, it may be reasonable to assume that, while learning heterogeneity exists,
it does not affect achievement gains. A talented child may be so far ahead that imperfect
persistence cancels the benefit of faster learning. That is, individual heterogeneity may be
uncorrelated with gains, yit -yit-1 , but not necessarily with learning, yit -yit-1 . This situation
arises when the initial conditions have reached a convergent level with respect to the fixed effect
such that
i
yi1 = + di , (8)
1-
where t = 1 is the first observed period and not the first period in the learning life-cycle. Blun-
dell and Bond (1998) discuss this type of conditional mean stationarity restriction in consider-
able depth. As they point out, the key assumption is that initial deviations, di , are uncorrelated
with the level of i /(1 - ). It does not imply that the achievement path, {yi1 , yi2 , . . . , yiT }, is
stationary; inputs, including time dummies, continue to spur achievement and can be nonsta-
tionary. The assumption only requires that, conditional on the full set of controls and common
time dummies, the individual effect does not influence achievement gains.
While this assumption seems too strong in the context of education, we discuss it because
the dynamic panel literature has documented large downward biases of other estimators when
the instruments are weak (e.g. Blundell and Bond, 1998). This occurs when persistence is
perfect ( = 1) since the lagged value-added model then exhibits a unit root and lagged tests
scores become weak instruments in the differenced model. The conditional mean stationarity
assumption provides an additional T - 2 non-redundant moment conditions that can augment
the system GMM estimators. While a fully efficient approach uses these additional moments
along with typical moments in the differenced equation, the conditional mean stationarity
assumption ensures strong instruments in the levels equation to identify . Thus, if we prefer
simplicity over efficiency, we can estimate the model using levels GMM or 2SLS and avoid the
need to use a system estimator. In this simpler approach, we instrument the undifferenced
value-added model (5) using lagged changes in achievement, yi , and either changes in
t-1
inputs, xt , or inputs directly, xt , depending on whether inputs are constantly correlated (M9)
i i
or are uncorrelated with the individual effect (M10).
9
2.2 Addressing Measurement Error in Test Scores
Measurement error in test scores is a central feature of educational program evaluation. Ladd
and Walsh (2002), Kane and Staiger (2002), and Chay, McEwan and Urquiola (2005) all docu-
ment how test-score measurement error can pose difficulties for program evaluation and value-
added accountability systems. In the context of value-added estimation, measurement error
attenuates the coefficient on lagged achievement and can bias the input coefficient in the pro-
cess. Dynamic panel estimators do not address measurement error on their own. For instance,
if we replace true achievement with observed achievement in the standard Arellano and Bond
(1991) setup, (7) becomes
yit = xit + yi,t-1 + [ it + i,t - i,t-1 ]. (9)
The standard potential instrument, yi,t-2 , is uncorrelated with it but is correlated with
i,t-1 = i,t-1 - i,t-2 by construction.
The easiest solution is to use either three-period lagged test scores or alternate subjects as
instruments. In the dynamic panel models discussed above, correcting for measurement error
using additional lags requires four years of data for each child--a difficult requirement in most
longitudinal datasets, including ours. We therefore use alternate subjects, although doing so
does not address the possibility of correlated measurement error across subjects.
An alternative to instrumental variables strategies is to correct for measurement error an-
alytically using the standard error of each test score, available from Item Response Theory.9
Because the standard error is heteroscedastic--tests discriminate poorly between children at
the tails of the ability distribution--one can gain efficiency by using the heteroscedastic errors-
in-variables (HEIV) procedure outlined in Sullivan (2001) and followed by Jacob and Lefgren
(2005), among others. Appendix A provides a detailed explanation of this analytical correction.
While this correction is easy to apply in an OLS model, it becomes considerably more compli-
cated in the dynamic panel context, and we therefore use an instrumental variable strategy for
most of our estimators.
3 Data
To demonstrate these issues, we use data collected by the authors as part of the Learning and
Educational Achievement in Punjab Schools (LEAPS) project, an ongoing survey of learning
in Pakistan. The sample comprises 112 villages in 3 districts of Punjab: Attock, Faisalabad,
and Rahim Yar Khan. Because the project was envisioned in part to study to dramatic rise of
9
Item Response Theory provides the standard error for each score from the inverse Fisher information matrix
after ML estimation of the IRT model. This standard error is reported in many educational datasets.
10
private schools in Pakistan, the 112 villages in these districts were chosen randomly from the
list of all villages with an existing private school. As would be expected given the presence of
a private school, the sample villages are generally larger, wealthier, and more educated than
the average rural village. Nevertheless, at the time of the survey, more than 50 percent of the
province's population resided in such villages (Andrabi, Das and Khwaja, 2006).
The survey covers all schools within the sample village boundaries and within a short walk
of any village household. Including schools that opened and closed over the three rounds, 858
schools were surveyed, while three refused to cooperate. Sample schools account for over 90
percent of enrollment in the sample villages.
The first panel of children consists of 13,735 third-graders, 12,110 of which were tested
in Urdu, English, and mathematics. These children were subsequently followed for two years
and retested in each period. Every effort was made to track children across rounds, even
when they were not promoted. In total, 12 percent and 13 percent of children dropped out
or were lost between rounds one and two, and two and three, respectively.10 In addition to
being tested, 6,379 children--up to ten in each school--were randomly administered a survey
including anthropometrics (height and weight) and detailed family characteristics such parental
education and wealth, as measured by principal components analysis analysis of 20 assets.
When exploring the economic interpretation of persistence, we also use a small subsample of
approximately 650 children that can be matched to a detailed household survey that includes,
among other things, child and parental time use and educational spending.
For our analysis, we use two subsamples of the data: all children who were tested in all
three years (N=8120) and children who were tested and given a detailed child survey in all
three years (N=4031). Table 2 presents the characteristics of these children split by whether
they attend public or private schools. The patterns across each subsample is relatively stable.
Children attending privates schools are slightly younger, have fewer elder siblings, and come
from wealthier and more educated households. Years of schooling, which largely captures grade
retention, is lower in private schools. Children in private schools are also less likely to have
a father living at home, perhaps due to a migration or remittance effect on private school
attendance.
The measures of achievement are based on exams in English, Urdu (the vernacular), and
mathematics. The tests were relatively long (over 40 questions per subject) and were designed
to maximize the precision over a range of abilities in each grade. While a fraction of questions
10
Attrition in private schools is 2 percentage points higher than in public schools. Children who drop out
between rounds one and two have scores roughly 0.2 s.d. lower than children that don't. Controlling for school
type and drop out status, drop outs in private schools are slightly better (0.05 sd) than children in public
schools, although the difference is only statistically significant for math. Given the small relative differences
in attrition between public and private schools, additional corrections for attrition are unlikely to significantly
affect our results.
11
changed over the years, the content covered remained consistent, and a significant portion
of questions appeared across all years. To avoid the possibility of cheating, the tests were
administered directly by our project staff and not by classroom teachers. The tests were scored
and equated across years by the authors using Item Response Theory so that the scale has
cardinal meaning. Preserving cardinality is important for longitudinal analysis since many other
transformations, such as the percent correct score or percentile rank, are bounded artificially
by the transformations that describe them. By comparison, IRT scores attempt to ensure that
change in one part of the distribution is equal to a change in another, in terms of the latent
trait captured by the test. Children were tested in third, fourth, and fifth grades during the
winter at roughly one year intervals. Because the school year ends in the early spring, the test
scores gains from third to fourth grade are largely attributable to the fourth grade school.
4 Results
4.1 Cross-sectional and Graphical Results
Before presenting our estimates of learning persistence and the implied private school effect,
we provide some rough evidence for a significant private school effect using cross-sectional and
graphical evidence. These results do not take advantage of the more sophisticated specifications
above but nevertheless provide initial evidence that the value-added of private schools is large
and significant.
4.1.1 Baseline estimates from cross-section data
Table 3 presents results for a cross-section regression of third grade achievement on child,
household, and school characteristics. These regressions provide some initial evidence that the
public-private gap is more than omitted variables and selection. Adding a comprehensive set
of child and family controls reduces the estimated coefficient on private schools only slightly.
Adding village fixed effects also does not change the coefficient, even though the R2 increases
substantially. Across all baseline specifications, the gap remains large: over 0.9 standard devia-
tions in English, 0.5 standard deviations in Urdu, and 0.4 standard deviations in mathematics.
Besides the coefficient on school type, few controls are strongly associated with achievement.
By far the largest other effect is for females, who outperform their male peers in English and
Urdu. However, even for Urdu, where the female effect is largest, the private school effect is still
nearly three times as large. Height, assets, and whether the father (and for Column 3, mother)
is educated past elementary school also enter the regression as positive and significant. More
elder brothers and more years of schooling (i.e. being previously retained) correlates with lower
achievement. Children with a mother living at home perform worse although this result is driven
12
by an abnormal subpopulation of two percent of children with absent mothers. Overall, these
results confirm mild positive selection into private schools but also suggest that, controlling
a host of other observables typically not available in other datasets (such as child height and
household assets) does not alter significantly the size of the private schooling coefficient.
4.1.2 Graphical evidence
Figure 1 plots learning levels in the tested subjects (English, mathematics, and the vernacular,
Urdu) over three years. While, levels are always higher for children in private schools, there
is little difference in learning gains (the gradient) between public and private schools. This
illustrates why a specification that uses learning gains (i.e., assumes perfect persistence) would
conclude that private schools add no greater value to learning than their public counterparts.
Many of the dynamic panel estimators that we explore identify the private school effect
using children who switch schools. Figure 2 illustrates the patterns of achievement for these
children. For each subject we plot two panels: the first containing children who start in public
school and the second containing those who start in private school. We then graph achievement
patterns for children who never switch, switch after third grade, and switch after fourth grade.
For simplicity, we exclude children who switch back and forth between school types.
As the table at the bottom of the figure shows, very few children change schools. Only
48 children move from public to private schools in fourth grade, while 40 move in fifth grade.
Consistent with the role of private schools serving primarily younger children, 167 children
switch to public schools in fourth grade, and 160 switch in fifth grade. These numbers are
roughly double the number of children available for our estimates that include controls, since
only a random subset of children were surveyed regarding their family characteristics.
Even given the small number of children switching school types, Figure 2 provides prelimi-
nary evidence that the private school effect is not simply a cross-sectional phenomenon. In all
three subjects, children who switch to private schools between third and fourth grade experi-
ence large achievement gains. Children switching from private schools to public schools exhibit
similar achievement patterns, except reversed. Moving to a public school is associated with
slower learning or even learning losses. Most gains or losses occur immediately after moving;
once achievement converges to the new level, children experience parallel growth in public and
private schools.
4.2 OLS and Dynamic Panel Value-Added Estimates
Tables 4 (English), 5 (Urdu), and 6 (mathematics) summarize our main value-added results.
All estimates include the full set of controls in the child survey sample, the survey date, round
(grade) dummies, and village fixed effects. For brevity, we only report the persistence and
13
private school coefficients.11 We group the discussion of our results in three domains: estimates
of the persistence coefficient, estimates of the private schooling coefficient, and regression diag-
nostics.
4.2.1 The persistence parameter
We immediately reject the hypothesis of perfect persistence ( = 1). Across all specifications
and all subject (except M1 which imposes = 1), the estimated persistence coefficient is
significantly lower than one, even in the specifications that correct for measurement error only
and should be biased upward (M3 and M4). The typical lagged value-added model (M2),
which assumes no omitted student heterogeneity and no measurement error, returns estimates
between 0.52 and 0.58 for the persistence coefficient. Correcting only for measurement error
by instrumenting using the two alternate subjects (M3), or using the analytical correction
described by in the appendix (M4), increases the persistence coefficient to between 0.70 and 0.79,
consistent with significant measurement error attenuation. This estimate, however, remains
biased upward by omitted heterogeneity.
Moving to our dynamic panel estimators, Panel B of each table gives the Arellano and Bond
(1991) difference GMM estimates under the assumption that inputs are strictly exogenous (M5)
or predetermined (M6). In English and Urdu, the persistence parameter falls to between 0.19
and 0.35. The estimates are (statistically) different from models that correct for measurement
error only. In other words, omitted heterogeneity in learning exists, and biases the static esti-
mates upward. For mathematics, the estimated persistence coefficient is indistinguishable from
zero, considerably below all the other estimates. These estimates are higher and somewhat
more stable in the systems GMM approach summarized in Panel C (M7, M8).
With the addition of a conditional mean stationarity assumption (Panel D), we can more
precisely estimate the persistence coefficient. In this model, we only use moments in levels to
illustrate a dynamic panel estimator that improves over the lagged value-added model estimated
by OLS but doesn't require estimating a system of equations. The persistence coefficient rises
substantially to between 0.39 and 0.56. This upward movement is consistent with a violation
of the stationarity assumption (the fixed-effect still contributes to achievement growth) but an
overall reduction in the omitted heterogeneity bias. Across the various dynamic panel models
and subjects, estimates of the persistence parameter vary from 0.2 to 0.55. However the highest
dynamic panel estimates come from assuming conditional mean stationary, which is likely too
strong in the context of education.
11
As discussed, time-invariant controls drop out of the differenced models. For the system and levels estimators
we also assume, by necessity, that time-invariant controls are uncorrelated with the fixed effect or act as proxy
variables.
14
4.2.2 The contribution of private schools
Assuming perfect persistence biases the private school coefficient downward. For English, the
estimated private school effect in the restricted model that incorrectly assumes = 1 (Panel A,
Table 4) is negative and significant. For Urdu and mathematics, the private school coefficient
is small and insignificant or marginally significant (Panel A, Tables 5 and 6). By comparison,
all the dynamic panel estimates are positive and statistically significant, with the exception of
one of the difference GMM estimates, which is too weak to identify the private school effect
with any precision.
Panel C (levels and differences GMM) illustrates the benefit of a systems approach. Adding a
levels equation (Panel C, Tables 4-6), using the assumption that inputs are constantly correlated
or uncorrelated with the omitted effects, reduces the standard errors for the private school
coefficient while maintaining the assumption that inputs are predetermined but not strictly
exogenous. Under the scenario that private school enrollment is constantly correlated with
the omitted effect (M7), the private school coefficient is large: 0.19 to 0.32 standard deviations
depending on the subject and statistically significant. This estimate allows for past achievement
shocks to affect enrollment decisions but assumes that switching school type is uncorrelated with
unobserved student heterogeneity. This is our preferred estimate.
An overarching theme in this analysis is that the persistence parameter influences the esti-
mated private school effect but that it is rarely possible to get enough precision to distinguish
estimates based on different exogeneity conditions. This is largely due to the small number of
children switching between public and private schools in our sample. In Figure 3, we graph
the relationship between both coefficients explicitly. Rather than estimating the persistence
coefficient, we assume a specific rate and then estimate the value-added model. That is, we
use yit - yi,t-1 as the dependent variable. This provides a robustness check for any estimated
effects, requires only two years of data, and eliminates the need for complicated measurement
error corrections. (It assumes, however, that inputs are uncorrelated with the omitted learning
heterogeneity.) As expected given the large baseline differences, the estimated private school
effect strongly depends on the assumed persistence rate. Moving from the restricted value-
added model ( = 1) to the pooled cross-section model ( = 0) increases the estimated effect
from negative or insignificant to large and significant. For most of the range of the persistence
parameter, the private school effect is positive and significant, but pinning down the precise
yearly contribution of private schooling depends on our assumptions about how children learn.
A couple of natural questions are how these estimates compare to the private-public dif-
ferences reported in the cross-section and why the trajectories in Figure 1 are parallel even
though the private school effect is positive. Controlling for observables suggests that after three
years, children in private schools are 0.9 (English), 0.5 (Urdu), and 0.45 (mathematics) stan-
15
dard deviations ahead of their public school counterparts. If persistence is 0.4 and the yearly
private school effect is 0.3, children's trajectories will become parallel when that achievement
gap reaches 0.5 (= 0.3/(1 - 0.4)). This is roughly the gap we find in Urdu and mathematics.
Any small disagreement, including the larger gap in English, may be attributable to baseline
selection effects. Thus our results can consistently explain the large baseline gap in achieve-
ment, the parallel achievement trajectories in public and private schools, and the significant
and ongoing positive private school effect.
4.2.3 Regression diagnostics
For many of the GMM estimates, Hansen's J test rejects the overidentifying restrictions implied
by the model. This is troubling but not entirely unexpected. Different instruments may be
identifying different local average treatment effects in the education context. For example, the
portion of third grade achievement that remains correlated with fourth grade achievement may
decay at a different rate than what was learned most recently. This is particularly true in
an optimizing model of skill formation where parents smooth away shocks to achievement. In
such a model, unexpected shocks to achievement, beyond measurement error, would fade more
quickly than expected gains. Instrumenting using contemporaneous alternate subject scores
will therefore more likely identify different parameters than instrumenting using previous year
scores. Likewise, instrumenting using alternate lags and differenced achievement and inputs
may also identify different effects. This type of heterogeneity is important and suggests that
a richer model than a constant coefficient lagged value-added may be warranted.12 Given the
rejection of the overidentifying restrictions in some cases, the next section provides a series of
robustness exercises around the estimation of the persistence parameter.
4.3 Robustness Checks
If our estimates are interpreted as forgetting, children lose over half of their achievement in a
single year. For some subjects, such as mathematics, this fraction may be even larger. While
the estimates reported may appear to be implausibly high, they match recent work on fade-out
in value-added models, as well as the rapid fade-out observed in most educational interventions.
Table 7 summarizes six randomized (or quasi-randomized) interventions that followed chil-
dren after the program ended. This follow-up enables estimation of both immediate and ex-
tended treatment effects. For the interventions summarized, the extended treatment effect
represents test scores roughly one year after the particular program ended. For a number of
12
Another common strategy to address potentially invalid instruments is to slowly reduce the instrument
set, testing each subset, until the overidentification test is accepted or the model becomes just identified. We
explored this approach but no clear story emerged. One result of note is that dropping the overidentifying
inputs typically raises the the persistence coefficient slightly, to roughly 0.25 for math.
16
the interventions, the persistence coefficient is less than 0.10. In two interventions--learning
incentives and grade retention--the coefficient is between 0.6 and 0.7. However, this higher
level of persistence may in part be explained by the specific nature of these interventions.13
Although the link between fade-out in experimental studies and the persistence parameter is
not always exact, all the evidence suggests that current learning does not carry over to future
learning without loss and, in fact, these losses may be substantial.
Exploring the magnitude of the potential bias in a basic lagged value-added model can also
give us some sense for whether our estimates are reasonable. Consider, for example, the bias in
the regression yit = + yi,t-1 + it , where we have omitted all potential inputs and corrected
only for measurement error bias. Our estimates of this model suggest that the persistence
coefficient is at most 0.8 to 0.9--far higher than our highest dynamic panel estimates of around
0.5. Is this discrepancy reasonable?
Aggregating all the omitted contemporaneous inputs into one variable it implies the upward
bias of the persistence coefficient is Cov(it , yi,t-1 )/ Var(yit-1 ). If the correlation between inputs
in any two periods is a constant X , and all children in grade zero start from the same place,
the persistence coefficient in a lagged value-added model for fourth grade will be biased upward
to
Cov(i4 , yi3 ) X
= . (10)
Var(yi3 ) 2X - + 2 + 1
Figure 4 gives a graphical representation of this bias calculation. To read the graph, choose
a true persistence coefficient, (the dotted lines), and a degree of correlation of inputs over
time, X (the horizontal axis). Given these choices, the y-axis reveals the persistence coefficient
that a lagged value-added specification estimated by OLS would yield. Working with our
estimates, if the true persistence effect, , is 0.4 and inputs are correlated only 0.6 over time,
the (incorrectly) estimated will be 0.9. Given that the vast majority of inputs are fixed, this
seems quite reasonable, and perhaps even too low.14
13
In the case of grade retention, there is no real "post treatment" period since children always remain one
grade behind after being retained. If one views grade retention as an ongoing multi-period treatment, then
lasting effects can be consistent with low persistence. In the case of learning incentives, Kremer, Miguel and
Thornton (2003) argue that student incentives increased effort (not just achievement) even after the program
ended, leading to ongoing learning.
14
Another way to get at the reasonableness of rapid fade out is motivated by Altonji, Elder and Taber's
(2005) assumption of equal selection on observed and unobserved variables. Absent controls and correcting for
measurement error the persistence coefficient is 0.91, while the R2 of the regression is 0.52. Adding controls
raises the R2 only modestly to 0.56 but at the same time reduces the estimated persistence coefficient to 0.74.
Thus, just by explaining an additional four percent of the total variation, we reduced the persistence coefficient
substantially. Assuming equal selection on observed and unobserved variables would lead to a persistence
estimate below our dynamic panel estimates.
17
4.4 Why Is Persistence So Low?
The low estimates of persistence are worrying, not only for program evaluation but for the more
substantive issue of how to improve cognitive achievement. One major concern is that imperfect
persistence is a psychometric testing issue, and therefore not a "true" feature of the learning
dynamic. For instance, later test forms may capture fundamentally different latent traits than
earlier test forms. To address this concern, we replicated our results using IRT scores based
solely on a common set of items that appeared on every test form--our tests had a significant
number of overlapping items in each year. Our results were similar using these scores, with the
difference GMM persistence estimates in fact dropping slightly. The score equating methods
used to create a single cardinal measure of learning therefore do not appear to be the driving
force behind the low observed persistence.
Several other possible mechanical explanations for low persistence are also unlikely. First,
artificial ceiling effects can appear like low persistence in models that use bounded scores. To
address this concern, we exclusively use unbounded IRT scale scores and our exam is designed
to maximize the variation over the entire range of observed abilities. Second, cheating, often
driven by high stakes testing, can create artificially low estimates of persistence. Jacob and
Levitt (2003), for instance, detect teacher cheating in part by looking for poor subsequent
performance of students who made rapid gains. In our data, cheating is unlikely both because
our test is relatively low stakes and because our project staff administered the exam directly to
avoid this possibility. Third, critics of high stakes testing often argue that shallow "teaching to
the test" leads to low persistence. This is also an unlikely explanation in our context; our exam
is relatively low stakes, is not part of the standard educational infrastructure, and covers only
subject matter that all students should know and that Pakistani parents generally demand.15
We looked at a couple of other candidate explanations, but there are no "smoking guns"
that could explain low persistence. Tables 8 and 9, for instance, present the results of a
preliminary exercise that assesses whether low persistence arises from household and school re-
sponses to unexpected achievement shocks--an explanation that has different implications for
cost-benefit analysis than simple forgetting.16 Parents' perceptions of their child's performance
reacts strongly to unexpected gains in achievement, but there is only weak evidence of substi-
15
On average, the children tested at the end of Grade 3 could complete two-digit addition, but not subtraction
or multiplication (in mathematics); recognize simple words (but not sentences) in the vernacular (Urdu); and
recognize alphabets and match simple three-letter words to pictures in English.
16
We examine whether inputs adjust to unexpected achievement shocks for roughly 650 children for whom we
have detailed information from a survey collected at households. As a measure of the unexpected shock, we first
compute the residual from a regression of fourth grade scores on third grade scores and a host of known controls.
We then test whether this residual predicts changes between fourth and fifth grade in parents' perceptions of the
child's performance, expenditure on school, and time spent helping children on homework, being tutored, doing
homework, and playing.We instrument for the subject specific residuals using the alternate subject residuals to
lessen measurement error attenuation.
18
tution effects. School expenditures do drop slightly as do the hours spent helping the child on
his or her homework. Minutes spent playing increases, but tuition also increases. While some
of these responses are in the direction of substitution, they are generally not statistically signif-
icant. Given the detailed household data we obtained, it suggests that household substitution
is unlikely to be a main driver behind low persistence. This may not be particularly surprising
given that very low achievement suggests that children may be below parents' desired learning
levels.
Table 9 explores the possibility that fade out captures teachers targeting poorly performing
students. If teachers target poorly performing children in each classroom, persistence should
be lower within schools than between schools. To test this hypothesis, we estimate a basic
lagged value-added model with no controls and instrument for lagged achievement using lagged
differences in alternate subjects. We estimate this model using average school scores (between
school specification) and child scores measured in deviations for the school average (within
school specification). If anything, the persistence coefficient is lower for the between school
regressions, suggesting that within school targeting is not the primary reason for low persistence.
Finally, Table 10 looks at whether heterogeneity in persistence can provide some hints about
its origin(Semb and Ellis, 1994).17 To obtain the most power possible, we estimate the value-
added model for specific sub-populations using the "predetermined inputs, uncorrelated effects,
and conditionally stationary" based estimator (M10 of Tables 4, 5, and 6). Unfortunately,
large standard errors make it difficult to find statistically different decay rates between groups.
Learning in private schools seems to decay faster than learning in government schools, but the
difference is not statistically significant. A similar pattern holds for richer families and children
with educated parents. These results hint that learning decays faster for faster learners.
These candidate explanations do not yield a compelling story thus far; it could just be that
children forget what they learned. Psychology and neuroscience provide some compelling evi-
dence for this using laboratory experiments. Psychological research on the "curve of forgetting"
dates back to Ebbinghaus's (1885) seminal study on memorization. Rubin and Wenzel (1996)
review the laboratory research spawned by this contribution. Semb and Ellis (1994) review
classroom studies that test how much students remember after taking a course. Both litera-
tures document the fragility of human memory. Cooper et al. (1996) studies the learning losses
that children experience between spring and fall achievement tests. These losses are generally
not as rapid as the effects we find, but the experiment is different: we estimate the depreciation
with no inputs, whereas summer activities provide some stimulus, particularly for privileged
17
To give one example, MacKenzie and White (1982) report that fade out for geographical knowledge was
much higher for in-class exercises compared to field excursions (or, passive versus active learning.) Similarly,
Rothstein (2008) finds heterogeneity in the long-run effects of teachers who produce equal short-run gains;
Jacob, Lefgren and Sims (2008) estimate that teacher effects are only a third as persistent as achievement in
general.
19
children.
5 Conclusion
In the absence of randomized studies, the value-added approach to estimating education pro-
duction functions has gained momentum as a valid methodology for removing unobserved indi-
vidual heterogeneity in assessing the contribution of specific programs or in understanding the
contribution of school-level factors for learning (e.g. Boardman and Murnane, 1979; Hanushek,
1979; Todd and Wolpin, 2003; Hanushek, 2003; Doran and Izumi, 2004; McCaffrey, 2004; Gor-
don, Kane and Staiger, 2006). In such models, assumptions about learning persistence and
unobserved heterogeneity play central roles. Our results reject both the assumption of perfect
persistence required for the restricted value-added model and of no learning heterogeneity re-
quired for the lagged value-added model. Our results for Pakistan should illustrate the danger of
incorrectly modeling or estimating education production functions: the restricted value-added
model is fundamentally misspecified and can even yield wrong-signed estimates of a program's
impact. Underscoring the potential of affordable, mainstream, private schools in developing
countries, we find that Pakistan's private schools contribute roughly 0.25 standard deviations
more to achievement each year than government schools, an effect greater than the average
yearly gain between third and fourth grade.
Our estimate of persistence is consistent with recent work on teacher effects, with analytical
and empirical estimates of the expected bias under OLS, and with experimental evidence of
program fade out in developing and developed countries. But the economic interpretation still
remains an open area of enquiry. Our context and test largely rule out mechanical explanations
of low persistence. We also find little evidence that low persistence results from substitution
by parents and teachers; the behavioral adjustments we are able to measure are unlikely to
represent the primary reason achievement gains fade out. Simple forgetting, consistent with a
large body of memory research in psychology, appears to be a likely explanation and hence a
core component of education production functions, although more research is needed to provide
direct evidence for it.
Our results also highlight that short evaluations, even when experimental, may yield little
information about the cost-effectiveness of a program. Using the one or two year increase
from a program gives an upper-bound on the longer term achievement gains. As our estimates
suggest, and Table 7 confirms, we should expect program impacts to fade quickly. Calculating
the internal rate of return by citing research linking test scores to earnings of young adults is
therefore a doubtful proposition. The techniques described here, with three periods of data,
can theoretically obtain a lower bound on cost-effectiveness by assuming exponential fade out.
At the same time, the causes of fade out are equally important: if parents no longer need to hire
20
tutors or buy textbooks (the substitution interpretation of imperfect persistence), a program
may be cost-effective even if test scores fade out.
Moving forward, empirical estimates of education production functions may benefit from
further unpacking persistence. Overall, the agenda pleads for a richer model of education and
for empirical techniques for modelling the broader learning process, not simply to add nuance
to our understanding of learning, but to get the most basic parameters right.
21
A Analytical Corrections for Measurement Error
Consider the lagged value-added model
yit = xit + yi,t-1 + vit , (11)
where yit and yi,t-1 are true achievement, vit is the error term, and we have put aside the
possibility of omitted heterogeneity. Since achievement is a latent variable, we can only estimate
it with error. Thus, we actually estimate
yit = xit + yi,t-1 + [vit + it - i,t-1 ] (12)
and OLS is inconsistent because yi,t-1 is correlated with i,t-1 .
The analytic correction we apply replaces yi,t-1 with the best linear predictor
yi,t-1 E [yi,t-1 | yi,t-1 , xit ] = xit + ri,t-1 yi,t-1 ,
~
(13)
where and ri,t-1 are parameters. To see why this works, add and subtract yi,t-1 from (11)
~
to get
yit = xit + yi,t-1 + [(yi,t-1 - yi,t-1 ) + vit + it - it-1 ]
~ ~ (14)
~
= xit + yi,t-1 + [(yi,t-1 - yi,t-1 ) + vit + it ].
~ (15)
where the second line follows from yi,t-1 = yi,t-1 - it-1 . Assuming exogeneity with respect to
vit + it , OLS is consistent if
E[xit (yi,t-1 - yi,t-1 )] = 0,
~ (16)
y
E[~i,t-1 (yi,t-1 - yi,t-1 )] = 0.
~ (17)
These conditions are automatically satisfied since the fitted value yi,t-1 and independent vari-
~
ables xit are orthogonal to the residual yi,t-1 - yi,t-1 by the definition of the projection (13).
~
The only difficulty is estimating the projection parameters and ri,t-1 since the dependent
variable yi,t-1 is unobserved. But it turns out that we do not need to observe the true score.
The orthogonality conditions that define the projection (13) are
E[xit (yi,t-1 - xit - ri,t-1 yi,t-1 )] = 0, (18)
E[yi,t-1 (yi,t-1 - xit - ri,t-1 yi,t-1 )] = 0. (19)
Solving first for , we have
= E[xit xit ]-1 E[xit (yi,t-1 - ri,t-1 yi,t-1 )].
(20)
22
Plugging (20) into (19) and solving for ri,t-1 yields
ri,t-1 = E[yi,t-1 mx yi,t-1 ]-1 E[yi,t-1 mx yi,t-1 ]
(21)
= E[e2 ]-1 E[e2 ] - E[2 ]
i,t-1 i,t-1 i,t-1 (22)
2 2
ei,t-1 - i,t-1
= 2
, (23)
ei,t-1
where mx 1- xit (xit xit )-1 xit is an annihilator vector and ei,t-1 is the residual from a regres-
sion of yi,t-1 on xit . We can estimate ri,t-1 by computing ei,t-1 from the regression of yi,t-1
2
on xit and taking i,t-1 from IRT--i.e., from the inverse Fisher information matrix of . In-
2 ^
tuitively, ri,t-1 is the heteroscedastic reliability ratio of the score minus the variation explained
by the independent variables. That is, the reliability ratio of yi,t-1 - E[yi,t-1 | xit ].
We compute by plugging ri,t-1 into (20) to get
= E[xit xit ]-1 E[xit (yi,t-1 - ri,t-1 yi,t-1 )]
(24)
= E[xit xit ]-1 E[xit yi,t-1 ](1 - ri,t-1 ). (25)
The best predictor is
yi,t-1 = E[yi,t-1 |xit ](1 - ri,t-1 ) + ri,t-1 yi,t-1
~ (26)
This takes the familiar form of an empirical Bayes estimate that shrinks the observed score to
the predicted mean. The shrinkage performs the same function as blowing up the coefficient
using the reliability ratio after estimation. Here, however, our shrunken estimate provides a
more efficient correction by using the full heteroscedastic error structure (Sullivan, 2001) .
Table A1 reports persistence coefficients corrected only for measurement error using the
instrumental variable (using alternate subjects) and the analytical correction approach. Each
cell contains the estimated coefficient on lagged achievement from a regression with no controls
and the associated standard error. Where applicable, we also report the p-value for Hansen's
overidentification test statistic. This is possible for the instrumental variables estimators since
we have three subject tests and three years of data.
Absent any correction (OLS), the estimated persistence coefficient ranges between 0.65 and
0.70. Instrumenting using alternate subjects raises the estimated coefficient significantly to 0.85
for English, 0.89 for mathematics, and 0.97 for Urdu. However, the overidentifying restriction
is rejected at the one percent level in all three subjects. This suggests that measurement
errors may be correlated across subjects at the same sitting and that this correlation may
differ depending on the subject. By comparison, when we instrument for lagged achievement
using double lagged scores we cannot reject the overidentifying restrictions. Unfortunately, in
the context of dynamic panel methods, additional lags to address measurement error require
T = 4. The final line of Table A1 shows estimates based on our analytical correction around
0.9. Of course, all of these estimates remain biased upward by learning heterogeneity.
23
References
Alderman, H., J. Kim and P.F. Orazem. 2003. "Design, Evaluation, and Sustainability of
Private Schools for the Poor: The Pakistan Urban and Rural Fellowship School Experiments."
Economics of Education Review 22(3):265274.
Alderman, H., P.F. Orazem and E.M. Paterno. 2001. "School Quality, School Cost, and the Pub-
lic/Private School Choices of Low-Income Households in Pakistan." The Journal of Human
Resources 36(2):304326.
Altonji, J.G., T.E. Elder and C.R. Taber. 2005. "Selection on Observed and Unobserved
Variables: Assessing the Effectiveness of Catholic Schools." Journal of Political Economy
113(1):151184.
Andrabi, T., J. Das and A.I. Khwaja. 2008. "A dime a day: The possibilities and limits of
private schooling in Pakistan." Comparative Education Review 52(3):329355.
Andrabi, Tahir, Jishnu Das and Asim Ijaz Khwaja. 2006. "A dime a day : the possibilities and
limits of private schooling in Pakistan." World Bank Policy Research Working Paper 4066 .
Angrist, J., E. Bettinger, E. Bloom, E. King and M. Kremer. 2002. "Vouchers for Private
Schooling in Colombia: Evidence from a Randomized Natural Experiment." The American
Economic Review 92(5):15351558.
Arellano, M. 2003. Panel Data Econometrics. Oxford University Press.
Arellano, M. and O. Bover. 1995. "Another look at the instrumental variable estimation of
error-components models." Journal of Econometrics 68(1):2951.
Arellano, M. and S. Bond. 1991. "Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations." The Review of Economic Studies
58(2):277297.
Arellano, Manuel and Bo Honore. 2001. Panel data models: some recent developments. In
Handbook of Econometrics, ed. J.J. Heckman and E.E. Leamer. Vol. 5 of Handbook of Econo-
metrics Elsevier chapter 53, pp. 32293296.
Banerjee, Abhijit, Shawn Cole, Esther Duflo and Leigh Linden. 2007. "Remedying Education:
Evidence from Two Randomized Experiments in India." Quarterly Journal of Economics
122(3).
Blundell, R. and S. Bond. 1998. "Initial conditions and Moment Conditions in Dynamic Panel
Data Models." Journal of Econometrics 87(1):11543.
Boardman, A.E. and R.J. Murnane. 1979. "Using Panel Data to Improve Estimates of the
Determinants of Educational Achievement." Sociology of Education 52(2):113121.
Chay, K.Y., P.J. McEwan and M. Urquiola. 2005. "The Central Role of Noise in Evaluat-
ing Interventions That Use Test Scores to Rank Schools." The American Economic Review
95(4):12371258.
24
Cooper, H., B. Nye, K. Charlton, J. Lindsay and S. Greathouse. 1996. "The Effects of Summer
Vacation on Achievement Test Scores: A Narrative and Meta-Analytic Review." Review of
Educational Research 66(3):22768.
Cunha, F. and J.J. Heckman. 2007. "Formulating, Identifying and Estimating the Technology
of Cognitive and Noncognitive Skill Formation." Journal of Human Resources .
Cunha, F, JJ Heckman and SM Schennach. 2006. "Estimating the Elasticity of Substitution
Between Early and Late Investments in the Technology of Cognitive and Noncognitive Skill
Formation." Unpublished, University of Chicago, Department of Economics .
Currie, J. and D. Thomas. 1995. "Does Head Start Make a Difference?" The American Eco-
nomic Review 85(3):341364.
Deming, David. 2008. "Early Childhood Intervention and Life-Cycle Skill Development: Evi-
dence from Head Start." Harvard University. Processed.
Doran, H. and L.T. Izumi. 2004. "Putting Education to the Test: A Value-Added Model for
California." San Francisco: Pacific Research Institute .
Ebbinghaus, H. 1885. Memory: A contribution to experimental psychology. New York: Teachers
College, Columbia University.
Glewwe, P., N. Ilias and M. Kremer. 2003. "Teacher Incentives." NBER Working Paper .
Gordon, Robert, Thomas J. Kane and Douglas O. Staiger. 2006. "Identifying Effective Teachers
Using Performance on the Job." Hamilton Project Discussion Paper .
Hanushek, E.A. 1979. "Conceptual and Empirical Issues in the Estimation of Educational
Production Functions." The Journal of Human Resources 14(3):351388.
Hanushek, E.A. 2003. "The Failure of Input-Based Schooling Policies." Economic Journal
113(485):6498.
Harris, D. and T.R. Sass. 2006. "Value-Added Models and the Measurement of Teacher Quality."
Unpublished manuscript .
Jacob, B. A. and L. Lefgren. 2005. "What Do Parents Value in Education: An Empirical
Investigation of Parents' Revealed Preferences for Teachers." NBER Working Paper 11494 .
Jacob, Brian, Lars John Lefgren and David Sims. 2008. "The Persistence of Teacher-Induced
Learning Gains." NBER Working Paper .
Jacob, Brian and S.D. Levitt. 2003. "Rotten Apples: An Investigation of the Prevalence and
Predictors of Teacher Cheating." The Quarterly Journal of Economics 118(3):843877.
Jimenez, E., M.E. Lockheed and V. Paqueo. 1991. "The relative efficiency of private and public
schools in developing countries." The World Bank Research Observer 6(2):205218.
Kane, T.J. and D.O. Staiger. 2002. "The Promise and Pitfalls of Using Imprecise School
Accountability Measures." The Journal of Economic Perspectives 16(4):91114.
25
Kane, T.J. and D.O. Staiger. 2008. "Estimating Teacher Impacts on Student Achievement: An
Experimental Evaluation." Unpublished. Cambridge, MA: Harvard University .
Kremer, M., E. Miguel and R. Thornton. 2003. "Incentives to Learn." NBER Working Paper .
Krueger, A.B. 2003. "Economic Considerations and Class Size." Economic Journal .
Krueger, A.B. and D.M. Whitmore. 2001. "The Effect of Attending a Small Class in the Early
Grades on College-test Taking and Middle School Test Results: Evidence from Project Star."
The Economic Journal 111(468):128.
Ladd, H.F. and R.P. Walsh. 2002. "Implementing value-added measures of school effectiveness:
getting the incentives right." Economics of Education Review 21(1):117.
Lord, F.M. 1967. "A paradox in the interpretation of group comparisons." Psychological Bulletin
68(5):304305.
MacKenzie, A.A. and R.T. White. 1982. "Fieldwork in Geography and Long-Term Memory
Structures." American Educational Research Journal 19(4):623632.
McCaffrey, D.F. 2004. Evaluating Value-added Models for Teacher Accountability. Rand Cor-
poration.
Muralidharan, Karthik and Michael Kremer. forthcoming. Public and Private Schools in Rural
India. In School Choice International, ed. Paul Peterson and Rajashri Chakrabarti.
Murnane, R.J., J.B. Willett and F. Levy. 1995. "The Growing Importance of Cognitive Skills
in Wage Determination." The Review of Economics and Statistics 77(2):251266.
Neal, D. and W. Johnson. 1996. "The Role of Premarket Factors in Black-White Wage Differ-
entials." Journal of Political Economy 104(5):869895.
Rothstein, Jesse. 2008. "Teacher Quality in Educational Production: Tracking, Decay, and
Student Achievement." Working Paper .
Rubin, DC and AE Wenzel. 1996. "One Hundred Years Of Forgetting: A Quantitative Descrip-
tion Of Retention." Psychological Review 103(4):734760.
Santibanez, Lucrecia. 2006. "Why we should care if teachers get A's: Teacher test scores and
student achievement in Mexico." Economics Of Education Review 25(5):510520.
Sass, T.R. 2006. "Charter Schools and Student Achievement in Florida." Education Finance
and Policy 1(1):91122.
Schwartz, A.E. and J. Zabel. 2005. The Good, the Bad, and the Ugly: Measuring School Effi-
ciency Using School Production Functions. In Measuring School Performance and Efficiency:
Implications for Practice and Research, ed. L. Stiefel, A. E. Schwartz, R. Rubenstein and J.
Zabel. NY: Eye on Education, Inc. pp. 3766.
Schweinhart, L. J., J. Montie, Z. Xiang, W. S. Barnett, C. R. Belfield and M. Nores. 2005.
Lifetime effects: The High/Scope Perry Preschool study through age 40. Ypsilanti, MI:
High/Scope Press.
26
Semb, G.B. and J.A. Ellis. 1994. "Knowledge taught in school: What is remembered." Review
of Educational Research 64(2):253286.
Sullivan, D.G. 2001. "A Note on the Estimation of Linear Regression Models with Heteroskedas-
tic Measurement Errors." Federal Reserve Bank of Chicago .
Todd, P.E. and K.I. Wolpin. 2003. "On the Specification and Estimation of the Production
Function for Cognitive Achievement." Economic Journal 113(485):333.
Tooley, J. and P. Dixon. 2003. Private schools for the poor: A case study from India. Reading,
Royaume-Uni: Centre for British Teachers.
27
TABLE 1. DYNAMIC PANEL SPECIFICATION
"Difference" "Levels"
Estimator Assumptions Instruments Instruments Notes
Panel A: Static Estimates
M1. No depreciation =1 n/a n/a Assumes perfect persistence and no
(OLS) or uncorrelated heterogeneity.
M2. No effects, no n/a n/a Assumes no measurement error and
measurement error (OLS) no effects.
M3. No effects (2SLS/IV n/a Alternate Assumes no effects and uncorrelated
correction) subjects measurement errors across subjects
M4. No effects (HEIV/ n/a n/a Assumes no effects and analytical
Analytical correction) correction is valid.
Panel B: Difference GMM
M5. Strictly exogenous inputs Inputs: 1...T n/a Assumes no feedback effects.
Score: 1...t-2
M6. Predetermined inputs Inputs: 1...t-1 n/a None (beyond those that apply to all
Score: 1...t-2 estimators)
Panel C: Levels and Difference
SGMM
M7. Predetermined inputs, Inputs: 1...t-1 Inputs: 1..t Assumes effects have constant
constantly correlated effects Score: 1...t-2 correlation with inputs.
M8. Predetermined inputs, Inputs: 1...t-1 Inputs: 1..t Assumes effects are uncorrelated with
uncorrelated effects Score: 1...t-2 inputs (random effects).
Panel D: Levels GMM (Proxy
Style)
M9. Predetermined inputs, Inputs: 1...t-1 Inputs: 1..t Assumes effects have constant
constantly correlated effects, Score: 1...t-2 Score: t-1 correlation with inputs and scores are
conditional stationarity (not used) conditionally mean stationary.
M10. Predetermined inputs, Inputs: 1...t-1 Inputs: 1..t Assumes effects are uncorrelated with
uncorrelated effects, Score: 1...t-2 Score: t-1 inputs (random effects) and scores
conditional stationarity (not used) are conditionally mean stationary.
Notes: The notes columns do not include knife-edge cases such as perfectly offsetting biases. None of the
dynamic panel estimators allow for serial correlation, as written. Redundant instruments in levels and
differences are dropped. Panel D lists the valid difference instruments, but our application does not use them
in order to demonstrate a simple, single equation estimator.
TABLE 2. BASELINE CHARACTERISTICS OF CHILDREN IN PUBLIC AND PRIVATE SCHOOLS
Variable Private School Public School Difference
Panel A: Full Sample
Age 9.58 9.63 -0.04
[1.49] [1.35] (0.08)
Female 0.45 0.47 -0.02
(0.03)
English score (third grade) 0.74 -0.23 0.97***
[0.61] [0.94] (0.05)
Urdu score (third grade) 0.52 -0.12 0.63***
[0.78] [0.98] (0.05)
Math score (third grade) 0.39 -0.07 0.46***
[0.81] [1.00] (0.05)
N 2337 5783
Panel B: Surveyed Child Sample
Age 9.63 9.72 -0.09
[1.49] [1.34] (0.08)
Female 0.47 0.48 -0.02
(0.03)
Years of schooling 3.39 3.75 -0.35***
[1.57] [1.10] (0.08)
Weight z-score (normalized to U.S.) -0.75 -0.64 -0.10
[4.21] [1.71] (0.13)
Height z-score (normalized to U.S.) -0.42 -0.22 -0.20
[3.32] [2.39] (0.13)
Number of elder brothers 0.98 1.34 -0.36***
[1.23] [1.36] (0.05)
Number of elder sisters 1.08 1.27 -0.19***
[1.27] [1.30] (0.05)
Father lives at home 0.88 0.91 -0.04***
(0.01)
Mother lives at home 0.98 0.98 0.00
(0.01)
Father educated past elementary 0.64 0.46 0.18***
(0.02)
Mother educated past elementary 0.36 0.18 0.18***
(0.02)
Asset index (PCA) 0.78 -0.30 1.08***
[1.50] [1.68] (0.07)
English score (third grade) 0.74 -0.24 0.99***
[0.62] [0.95] (0.05)
Urdu score (third grade) 0.53 -0.14 0.67***
[0.78] [0.98] (0.05)
Math score (third grade) 0.42 -0.09 0.51***
[0.80] [1.02] (0.05)
N 1374 2657
* Significant at the 10%; ** significant at the 5%; *** significant at 1%.
Notes: Cells contain means, brackets contain standard deviations, and parentheses contain standard errors.
Standard errors for the private-public difference are clustered at the school level. Sample includes only those
children tested (A) and surveyed (B) in all three years.
TABLE 3. THIRD GRADE ACHIEVEMENT AND CHILD, HOUSEHOLD AND SCHOOL CHARACTERISTICS
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Dependent variable
(third grade): English English English Urdu Urdu Urdu Math Math Math
Private School 0.985 0.907 0.916 0.670 0.595 0.575 0.512 0.446 0.451
(0.047)***(0.048)***(0.048)***(0.049)***(0.050)***(0.047)***(0.051)***(0.053)***(0.052)***
Age 0.004 0.015 0.013 0.013 0.033 0.048
(0.013) (0.012) (0.013) (0.012) (0.014)** (0.013)***
Female 0.125 0.133 0.209 0.205 -0.040 -0.057
(0.047)***(0.041)*** (0.046)***(0.040)*** (0.051) (0.043)
Years of schooling -0.029 -0.019 -0.039 -0.028 -0.038 -0.025
(0.013)** (0.012) (0.014)*** (0.014)** (0.015)** (0.014)*
Number of elder -0.030 -0.035 -0.020 -0.025 -0.020 -0.023
brothers (0.011)***(0.010)*** (0.012)* (0.011)** (0.012)* (0.011)**
Number of elder sisters 0.008 0.013 0.001 -0.001 -0.002 -0.006
(0.011) (0.010) (0.012) (0.012) (0.013) (0.012)
Height z-score 0.027 0.016 0.017 0.012 0.034 0.024
(normalized to U.S.) (0.007)***(0.006)*** (0.006)*** (0.006)** (0.008)***(0.007)***
Weight z-score -0.005 -0.001 -0.004 0.001 -0.009 -0.002
(normalized to U.S.) (0.008) (0.006) (0.005) (0.005) (0.007) (0.006)
Asset index 0.041 0.050 0.043 0.045 0.030 0.034
(0.012)***(0.009)*** (0.011)***(0.010)*** (0.011)***(0.010)***
Mother educated past 0.048 0.062 0.014 0.011 0.023 -0.006
elementary (0.036) (0.031)** (0.040) (0.035) (0.040) (0.037)
Father educated past 0.061 0.066 0.062 0.049 0.069 0.053
elementary (0.033)* (0.028)** (0.034)* (0.031) (0.035)** (0.032)*
Mother lives at home -0.131 -0.025 -0.174 -0.108 -0.210 -0.091
(0.095) (0.081) (0.102)* (0.092) (0.097)** (0.090)
Father lives at home 0.006 -0.038 0.019 0.005 -0.009 -0.026
(0.049) (0.044) (0.053) (0.048) (0.057) (0.051)
Survey Date 0.003 0.000 0.001 0.004 0.003 0.003
(0.002) (0.004) (0.002) (0.003) (0.002) (0.003)
Constant -0.243 -49.721 -3.690 -0.137 -23.750 -59.528 -0.095 -56.196 -51.248
(0.038)*** (38.467) (62.432) (0.035)*** (31.915) (45.357) (0.038)** (35.415) (50.310)
Village Fixed Effects No No Yes No No Yes No No Yes
Observations 4031 4031 4031 4031 4031 4031 4031 4031 4031
R-squared 0.23 0.25 0.37 0.11 0.13 0.25 0.06 0.08 0.21
* significant at 10%; ** significant at 5%; *** significant at 1%
Notes: Standard errors clustered at the school level. Sample includes only those children tested and
surveyed in all three years.
FIGURE 1. EVOLUTION OF TEST SCORES IN PUBLIC AND PRIVATE SCHOOLS
English Urdu Math
1.5 1.5 1.5
1 1 1
English Score
Math Score
Urdu Score
.5 .5 .5
0 0 0
-.5 -.5 -.5
3 4 5 3 4 5 3 4 5
Grade Grade Grade
Public School Private School
Notes: Vertical bars represent 95% confidence intervals around the group means, allowing for arbitrary
clustering within schools. Tests scores are IRT based scale scores normalized to have mean zero and standard
deviation one for the full sample of children in third grade. Children who were tested in third grade were
subsequently followed and counted as being in fourth or fifth grade regardless of whether they were actually
promoted. The graph's sample is limited to children who were tested in all three periods (Table 2, Panel A:
Full Sample).
FIGURE 2. ACHIEVEMENT OVER TIME FOR CHILDREN WHO SWITCHED SCHOOL TYPES
Starts in Public Starts in Private
1.5 1.5
Public 3, 4 & 5 Private 3, 4 & 5
Public 3 & 4 - Private 5 Private 3 & 4 - Public 5
Public 3 - Private 4 & 5 Private 3 - Public 4 & 5
1 1
English Score
English Score
.5 .5
0 0
-.5 -.5
3 4 5 3 4 5
Grade Grade
Starts in Public Starts in Private
1.5 1.5
Public 3, 4 & 5 Private 3, 4 & 5
Public 3 & 4 - Private 5 Private 3 & 4 - Public 5
Public 3 - Private 4 & 5 Private 3 - Public 4 & 5
1 1
Urdu Score
Urdu Score
.5 .5
0 0
-.5 -.5
3 4 5 3 4 5
Grade Grade
Starts in Public Starts in Private
1.5 1.5
Public 3, 4 & 5 Private 3, 4 & 5
Public 3 & 4 - Private 5 Private 3 & 4 - Public 5
Public 3 - Private 4 & 5 Private 3 - Public 4 & 5
1 1
Math Score
Math Score
.5 .5
0 0
3 4 5 3 4 5
Grade Grade
Public 3, 4, & Public 3, 4 Public 3 Private 3, 4 & Private 3 & 4 Private 3
5 Private 5 Private 4 & 5 5 Public 5 Public 4, 5
N 5688 40 48 2007 160 167
Notes: Lines connect group means for children who were enrolled in all three periods and have a particular
private/public enrollment pattern. Children were tested in the second half of the school year; most of the
gains from a child in a third grade government school and fourth grade private school should be attributed
to the private school.
TABLE 4. VALUE-ADDED MODEL ESTIMATES OF PERSISTENCE AND PRIVATE SCHOOL COEFFICIENT (ENGLISH)
Hansen's J
Estimator's Key Assumption Persistence Coefficient Private School Coefficient 2 df
(p-value)
0 .2 .4 .6 .8 1 -.2 0 .2 .4 .6
Panel A: Static Estimates
M1. No depreciation =1 (OLS) 1.00 -0.08
(0.02)
M2. No effects, no measurement error (OLS) 0.52 0.31
(0.02) (0.02)
M3. No effects (2SLS/IV Correction) 0.70 0.16 4.69 1
(0.02) (0.02) (0.03)
M4. No effects (HEIV/Analytical Correction) 0.74 0.21
(0.02) (0.02)
Panel B: Difference GMM 0 .2 .4 .6 .8 1 -.2 0 .2 .4 .6
M5. Strictly exogenous inputs 0.19 0 .2 .4 .6 .8 1 0.25 -.2 0 .2 .4 .6 25.44 13
(0.10) (0.07) (0.02)
M6. Predetermined inputs 0.19 1.15 16.82 7
(0.10) (0.39) (0.02)
Panel C: Levels and Difference SGMM
M7. Predetermined inputs, constantly 0.36 0 .2 .4 .6 .8 1 0.21 -.2 0 .2 .4 .6 45.50 23
correlated effects (0.07) (0.06) (0.00)
M8. Predetermined inputs, uncorrelated 0.53 0.32 79.08 29
effects (0.05) (0.04) (0.00)
Panel D: Levels Only GMM
M9. Predetermined inputs, constantly 0.40 0.29 24.74 12
correlated effects, conditional stationarity (0.05) (0.07) (0.02)
M10. Predetermined inputs, uncorrelated 0.39 0.24 23.43
effects, conditional stationarity (0.05) (0.04) (0.02) 11
Notes: Dots represent the estimated coefficients, thicker dark lines are 90 percent confidence intervals, and thin gray lines are 95 percent confidence intervals.
All intervals and standard errors are clustered by school. See text and Table 1 for details on instruments and assumptions.
TABLE 5. VALUE-ADDED MODEL ESTIMATES OF PERSISTENCE AND PRIVATE SCHOOL COEFFICIENT (URDU)
Hansen's J
Estimator's Key Assumption Persistence Coefficient Private School Coefficient 2 df
(p-value)
0 .2 .4 .6 .8 1 -.2 0 .2 .4 .6
Panel A: Static Estimates
M1. No depreciation =1 (OLS) 1.00 0.01
(0.02)
M2. No effects, no measurement error (OLS) 0.58 0.26
(0.01) (0.02)
M3. No effects (2SLS/IV Correction) 0.73 0.17 3.67 1
(0.02) (0.02) (0.06)
M4. No effects (HEIV/Analytical
0.79 0.20
Correction)
(0.02) (0.02)
0 .2 .4 .6 .8 1 -.2 0 .2 .4 .6
Panel B: Difference GMM
M5. Strictly exogenous inputs 0.21 0 .2 .4 .6 .8 1 0.29 -.2 0 .2 .4 .6 49.50 13
(0.09) (0.07) (0.00)
M6. Predetermined inputs 0.35 0.90 18.90 7
(0.11) (0.48) (0.01)
Panel C: Levels and Difference SGMM
M7. Predetermined inputs, constantly 0.26 0 .2 .4 .6 .8 1 0.22 -.2 0 .2 .4 .6 66.58 23
correlated effects (0.08) (0.06) (0.00)
M8. Predetermined inputs, uncorrelated 0.51 0.30 81.89 29
effects (0.06) (0.04) (0.00)
Panel D: Levels Only GMM
M9. Predetermined inputs, constantly 0.55 0.31 13.49 12
correlated effects, conditional stationarity (0.05) (0.07) (0.33)
M10. Predetermined inputs, uncorrelated 0.56 0.27 13.30
effects, conditional stationarity (0.05) (0.03) (0.27) 11
Notes: Dots represent the estimated coefficients, thicker dark lines are 90 percent confidence intervals, and thin gray lines are 95 percent confidence intervals.
All intervals and standard errors are clustered by school. See text and Table 1 for details on instruments and assumptions.
TABLE 6. VALUE-ADDED MODEL ESTIMATES OF PERSISTENCE AND PRIVATE SCHOOL COEFFICIENT (MATH)
Hansen's J
Estimator's Key Assumption Persistence Coefficient Private School Coefficient 2 df
(p-value)
0 .2 .4 .6 .8 1 -.2 0 .2 .4 .6
Panel A: Static Estimates
M1. No depreciation =1 (OLS) 1.00 0.05
(0.02)
M2. No effects, no measurement error (OLS) 0.57 0.27
(0.02) (0.03)
M3. No effects (2SLS/IV Correction) 0.76 0.17 0.02 1
(0.02) (0.03) (0.89)
M4. No effects (HEIV/Analytical
0.75 0.23
Correction)
(0.02) (0.03)
Panel B: Difference GMM 0 .2 .4 .6 .8 1 -.2 0 .2 .4 .6
M5. Strictly exogenous inputs -0.00 0 .2 .4 .6 .8 1 0.26 -.2 0 .2 .4 .6 33.97 13
(0.09) (0.08) (0.00)
M6. Predetermined inputs 0.12 0.46 12.06 7
(0.12) (0.50) (0.10)
Panel C: Levels and Difference SGMM
M7. Predetermined inputs, constantly 0.12 0 .2 .4 .6 .8 1 0.19 -.2 0 .2 .4 .6 57.63 23
correlated effects (0.10) (0.08) (0.00)
M8. Predetermined inputs, uncorrelated 0.51 0.30 82.19 29
effects (0.08) (0.05) (0.00)
Panel D: Levels Only GMM
M9. Predetermined inputs, constantly 0.51 0.30 29.45 12
correlated effects, conditional stationarity (0.06) (0.07) (0.00)
M10. Predetermined inputs, uncorrelated 0.53 0.27 28.36
effects, conditional stationarity (0.06) (0.04) (0.00) 11
Notes: Dots represent the estimated coefficients, thicker dark lines are 90 percent confidence intervals, and thin gray lines are 95 percent confidence intervals.
All intervals and standard errors are clustered by school. See text and Table 1 for details on instruments and assumptions.
FIGURE 3. PRIVATE SCHOOL VALUE-ADDED ASSUMING VARIOUS PERSISTENCE RATES
English
Full Fade Out Perfect Persistence
1
95% Confidence Interval
Private School Coefficient
.75
Value-Added Estimate .5
.25
0
-.25
-.5
0 .2 .4 .6 .8 1
Persistence Coefficient
Urdu
Full Fade Out Perfect Persistence
1
95% Confidence Interval
Private School Coefficient
.75
Value-Added Estimate
.5
.25
0
-.25
-.5
0 .2 .4 .6 .8 1
Persistence Coefficient
Math
Full Fade Out Perfect Persistence
1
95% Confidence Interval
Private School Coefficient
.75
Value-Added Estimate
.5
.25
0
-.25
-.5
0 .2 .4 .6 .8 1
Persistence Coefficient
Notes: These graphs show the estimated value-added effect of private schools depending on the assumed
persistence coefficient of lagged achievement. The restricted value-added model, for example, assumes the
persistence coefficient equals one--no fade-out. The estimated value-added pooled for third to fourth and
fourth to fifth grades is estimated by OLS controlling for age, gender, years of schooling, weight z-score,
height z-score, number of elder brothers, number of elder sisters, whether father lives at home, whether
mother lives at home, whether father educated past elementary, whether mother educated past elementary,
an asset index, survey date, and round and village fixed effects. The confidence intervals are based on
standard errors clustered at the school level.
FIGURE 4. TRUE AND ESTIMATED PERSISTENCE IN A LAGGED VALUE-ADDED MODEL WITH
SERIALLY CORRELATED OMITTED INPUTS
1
Lagged Value-Added Estimate
.8
.6
If the true coefficient is 0.4 and inputs are correlated 0.6,
.4 then the estimated effect will be 0.9.
.2
A true persistence coefficient of 0.4.
0
0 .2 .4 .6 .8 1
corr(X,X*)
Notes: In the lagged value-added model, the persistence coefficient is biased upward by the correlation
between omitted contemporaneous inputs and past inputs that are captured in the lagged test score.
Assuming constant correlation between any two years of inputs X and X*, the bias can be calculated
analytically (see text). The graph above gives the implied bias for fourth grade. The dashed lines represent
the true persistence coefficient, indicated by the associated when corr(X,X*)=0. The y-axis gives the
biased estimate that results from estimating a lagged value-added model. This estimate depends on the true
persistence rate (dashed lines) and on the assumed correlation of inputs over time (x-axis). For example, a
(biased) estimated persistence coefficient of 0.9 may result from a true persistence coefficient of 0.4 and
correlation between inputs around 0.6. These calculations that assume achievement is measured perfectly
and that all inputs are omitted (i.e. unobserved) in the regression.
TABLE 7. EXPERIMENTAL ESTIMATES OF PROGRAM FADE OUT
Implied
Immediate Extended Persistence
Program Subject Treatment Effect Treatment Effect Coefficient Source
Balsakhi Program Math 0.348 0.030 0.086 Banerjee et al
Verbal 0.227 0.014 0.062 (2007)
CAL Program Math 0.366 0.097 0.265 Banerjee et al
Verbal 0.014 -0.078 !0.0 (2007)
Learning Incentives Multi-subject 0.23 0.16 0.70 Kremer et al (2003)
Teacher Incentives Multi-subject 0.139 -0.008 !0.0 Glewwe et al (2003)
STAR Class Size Stanford-9 and !5 percentile points !2 percentile points ! .25 to .5 Krueger and
Experiment CTBS Whitmore (2001)
Summer School and Math 0.136 0.095 0.70 Jacob and Lefgren
Grade Retention Reading 0.104 0.062 0.60 (2004)
Notes: Extended treatment effect is achievement approximately one year after the treatment ended. Unless
otherwise noted, effects are expressed in standard deviations. Results for Kremer et al. (2003) are averaged
across boys and girls. Estimated effects for Jacob and Lefgren (2004) are taken for the third grade sample.
TABLE 8. HOUSEHOLD RESPONSES TO PERFORMANCE SHOCKS
Test Score Residual (Grade 4)
Household Changes (Grade 4 to 5) English Urdu Math N
Perception of child performance 0.25*** 0.17** 0.20*** 652
(0.09) (0.07) (0.08)
Log expenditure on school -0.05 -0.07* -0.03 643
(0.05) (0.04) (0.04)
Hours helping child -0.66 -0.34 -0.44 645
(0.46) (0.36) (0.38)
Log minutes spent on homework -0.14 0.10 0.04 617
(0.25) (0.20) (0.21)
Log minutes for tuition (tutoring) 0.21 0.10 0.02 620
(0.20) (0.16) (0.17)
Log minutes spent playing 0.54* 0.27 0.29 619
(0.30) (0.23) (0.25)
* significant at 10%; ** significant at 5%; *** significant at 1%
Notes: The grade 4 test score residual is computed from a lagged value-added model OLS regression that
controls for third grade scores and a comprehensive set of household controls (age, gender, health status,
household size, elder brothers, elder sisters, father education, mother education, adult education index,
minutes spent helping child, asset index, log monthly expenditure, and wealth relative to village). The
coefficients and standard errors reported are for separate 2SLS regressions of the grade 4 to 5 household
behavior change on the subject residual, instrumented using the alternate subject residuals. Roughly half
of the households received the score results as part of a randomized evaluation of school and child report
cards. Logged variables are computed as ln(1+x).
TABLE 9. PERSISTENCE COEFFICIENT USING WITHIN AND BETWEEN SCHOOL
VARIATION ONLY
Subject Variation Persistence Coefficient
0 .2 .4 .6 .8 1
English Within 0.57
(0.02)
Between 0.45
(0.09)
Urdu Within 0.64
(0.02)
Between 0.31
(0.10)
Math Within 0.64
(0.03)
Between 0.13
(0.14)
Notes: Persistence coefficient are calculated using a 2SLS regression of test scores on lagged test
scores, instrumented using lagged differences in alternate subjects (i.e. basic moments from
conditional stationarity). Within regressions use school demeaned child scores whereas between
regressions use mean school scores. The sample is from Table 2, Panel A with no covariates.
Within N = 8620, between N = 761.
TABLE 10. PERSISTENCE COEFFICIENT HETEROGENEITY ACROSS SCHOOL, CHILD AND HOUSEHOLD CHARACTERISTICS
Persistence Coefficient
Within category: English Urdu Math
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
Private School 0.35 0.45 0.31
(0.09) (0.09) (0.11)
Public School 0.41 0.56 0.55
(0.05) (0.06) (0.07)
Female 0.48 0.59 0.53
(0.08) (0.08) (0.08)
Male 0.37 0.51 0.50
(0.06) (0.06) (0.08)
Richer Family 0.34 0.57 0.33
(0.09) (0.10) (0.11)
Poorer Family 0.40 0.52 0.55
(0.09) (0.08) (0.09)
Mother educated past primary 0.43 0.49 0.44
(0.12) (0.14) (0.11)
Mother not education past primary 0.40 0.57 0.52
(0.05) (0.05) (0.07)
Father educated past primary 0.49 0.56 0.48
(0.07) (0.08) (0.08)
Father not educated beyond primary 0.34 0.54 0.54
(0.07) (0.06) (0.08)
Notes: This table reports estimates for specific sub-populations; each coefficient is from a separate regressions. The coefficients are estimated using 2SLS (levels only) under the
assumption of predetermined inputs, constantly correlated effects and conditional stationarity. Standard errors are clustered at the school level.
TABLE A1. CORRECTING FOR MEASUREMENT ERROR BIAS
Strategy English Urdu Math
No Correction (OLS) 0.65 0.66 0.69
(0.015) (0.013) (0.013)
Alternate Subject Scores (2SLS) 0.85 0.89 0.97
(0.018) (0.015) (0.019)
[0.000] [0.000] [0.000]
Lagged Scores (2SLS) 0.88 0.86 0.93
(0.019) (0.019) (0.020)
[0.140] [0.637] [0.262]
Alternate Subjects and Lagged Scores (2SLS) 0.81 0.80 0.85
(0.016) (0.014) (0.015)
[0.000] [0.000] [0.000]
Analytical Correction (HEIV OLS) 0.90 0.87 0.88
(0.020) (0.016) (0.017)
Notes: Cells contain coefficients from a regression of round 3 test scores on round 2 test scoresi.e. the lagged
value-added model with no covariates. Parentheses contain standard errors clustered at the school level.
Brackets contain the p-value for Hansen's J statistic testing the overidentifying restrictions. The 2SLS
estimates use alternate subjects or lagged scores as instruments, or both. These estimators have 1, 2 and 3
overidentifying restrictions, respectively. The analytical correction uses the score standard errors from IRT
to blow-up the estimate appropriately (see Appendix A). All regressions use the same set of children.