WPS6787
Policy Research Working Paper 6787
Academic Peer Effects with Different Group
Assignment Policies
Residential Tracking versus Random Assignment
Robert Garlick
The World Bank
Development Research Group
Human Development and Public Services Team
February 2014
Policy Research Working Paper 6787
Abstract
This paper studies the relative academic performance of combination of peer effects and differences in teacher
students tracked or randomly assigned to South African behavior across tracked and untracked classrooms. The
university dormitories. Tracked or streamed assignment negative pure peer effect of residential tracking suggests
creates dormitories where all students obtained similar that classroom tracking may also have negative effects
scores on high school graduation examinations. Random unless teachers are more effective in homogeneous
assignment creates dormitories that are approximately classrooms.
representative of the population of students. Random variation in peer group composition under
Tracking lowers students’ mean grades in their first year random dormitory assignment also generates peer effects.
of university and increases the variance or inequality of Living with higher-scoring peers increases students’
grades. This result is driven by a large negative effect of grades and the effect is larger for low-scoring students.
tracking on low-scoring students’ grades and a near-zero This is consistent with the aggregate effects of tracking
effect on high-scoring students’ grades. Low-scoring relative to random assignment. However, using peer
students are more sensitive to changes in their peer group effects estimated in randomly assigned groups to predict
composition and their grades suffer if they live only with outcomes in tracked groups yields unreliable predictions.
low-scoring peers. In this setting, residential tracking has This illustrates a more general risk that peer effects
undesirable efficiency (lower mean) and equity (higher estimated under one peer group assignment policy
variance) effects. The result isolates a pure peer effect of provide limited information about how peer effects might
tracking, whereas classroom tracking studies identify a work with a different peer group assignment policy.
This paper is a product of the Human Development and Public Services Team, Development Research Group. It is part
of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy
discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org.
The author may be contacted at rgarlick@worldbank.org.
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Produced by the Research Support Team
Academic Peer Eﬀects with Diﬀerent Group Assignment
Policies: Residential Tracking versus Random
Assignment∗
Robert Garlick†
February 25, 2014
Keywords: education; inequality; peer eﬀects; South Africa; tracking
JEL classiﬁcation: I25; I25; O15
∗
This paper is a revised version of the ﬁrst chapter of my dissertation. I am grateful to my advisors
David Lam, Jeﬀ Smith, Manuela Angelucci, John DiNardo, and Brian Jacob for their extensive guidance
and support. I thank Raj Arunachalam, Emily Beam, John Bound, Tanya Byker, Scott Carrell, Julian
Cristia, Susan Godlonton, Andrew Goodman-Bacon, Italo Gutierrez, Brad Hershbein, Claudia Martinez,
David Slusky, Rebecca Thornton, Adam Wagstaﬀ, and Dean Yang for helpful comments on earlier drafts of
the paper, as well as conference and seminar participants at ASSA 2014, Chicago Harris School, Columbia,
Columbia Teachers College, CSAE 2012, Cornell, Duke, EconCon 2012, ESSA 2011, Harvard Business School,
LSE, Michigan, MIEDC 2012, Michigan State, NEUDC 2012, Northeastern, Notre Dame, PacDev 2011,
SALDRU, Stanford SIEPR, SOLE 2012, UC Davis, the World Bank, and Yale School of Management, I
received invaluable assistance with student data and institutional information from Jane Hendry, Josiah
Mavundla, and Charmaine January at the University of Cape Town. I acknowledge ﬁnancial support from
the Gerald R. Ford School of Public Policy and Horace H. Rackham School of Graduate Studies at the
University of Michigan. The ﬁndings, interpretations and conclusions are entirely those of the author. They
do not necessarily represent the views of the World Bank, its Executive Directors, or the countries they
represent.
†
Postdoctoral Researcher in the World Bank Development Research Group and Assistant Professor in the
Duke University Department of Economics; rob.garlick@gmail.com
1
1 Introduction
Group structures are ubiquitous in education and group composition may have important
eﬀects on education outcomes. Students in diﬀerent classrooms, living environments, schools,
and social groups are exposed to diﬀerent peer groups, receive diﬀerent education inputs,
and face diﬀerent institutional environments. A growing literature shows that students’ peer
groups inﬂuence their education outcomes even without resource and institutional diﬀerences
across groups.1 Peer eﬀects play a role in empirical and theoretical research on diﬀerent ways
of organizing students into classrooms and schools.2 Most studies focus on the eﬀect of
assignment or selection into diﬀerent peer groups for a given group assignment or selection
process.3
This paper advances the literature by asking a subtly diﬀerent question: What are the
relative eﬀects of two group assignment policies – randomization and tracking or streaming
based on academic performance – on the distribution of student outcomes? This contributes
to a small but growing empirical literature on optimal group design. Comparison of diﬀerent
group assignment policies corresponds to a clear social planning problem: How should stu-
dents be assigned to groups to maximize some target outcome, subject to a given distribution
of student characteristics? Diﬀerent group assignment policies leave the marginal distribution
of education inputs unchanged. This raises the possibility of improving academic outcomes
with few pecuniary costs. Such low cost education interventions are particularly attractive
for resource-constrained education systems.
Studying peer eﬀects under one group assignment policy provides limited information
about the eﬀect of changing the group assignment policy. Consider the comparison between
1
Manski (1993) lays out the identiﬁcation challenge in studying peer eﬀects: do correlated outcomes within
peer groups reﬂect correlated unobserved pre-determined characteristics, common institutional factors, or
peer eﬀects – causal relationships between students’ outcomes and their peers’ characteristics? Many papers
address this challenge using randomized or controlled variation in peer group composition; peer eﬀect have
been documented on standardized test scores (Hoxby, 2000), college GPAs (Sacerdote, 2001), college entrance
examination scores (Ding and Lehrer, 2007), cheating (Carrell, Malmstrom, and West, 2008), job search
(Marmaros and Sacerdote, 2002), and major choices (Di Giorgi, Pellizzari, and Redaelli, 2010). Estimated
peer eﬀects may be sensitive to the deﬁnition of peer groups (Foster, 2006) and the measurement of peer
characteristics (Stinebrickner and Stinebrickner, 2006).
2
Examples include Arnott (1987) and Duﬂo, Dupas, and Kremer (2011) on classroom tracking, Benabou
(1996) and Kling, Liebman, and Katz (2007) on neighborhood segregation, Epple and Romano (1998) and
Hsieh and Urquiola (2006) on school choice and vouchers, and Angrist and Lang (2004) on school integration.
3
See Sacerdote (2011) for a recent review that reaches a similar conclusion.
2
random group assignment and academic tracking, in which students are assigned to academ-
ically homogeneous groups. First, tracking generates groups consisting of only high- or only
low-performing students, which are unlikely to be observed under random assignment. Strong
assumptions are required to extrapolate from small cross-group diﬀerences in mean scores
observed under random assignment to large cross-group diﬀerences under that will be gener-
ated under tracking.4 Second, student outcomes may depend on multiple dimensions of their
peer group characteristics. Econometric models estimated under one assignment policy may
omit characteristics that would be important under another assignment policy. For exam-
ple, within-group variance in peer characteristics may appear unimportant in homogeneous
groups under tracking but matter in heterogeneous groups under random assignment. Third,
peer eﬀects will not be policy-invariant if students’ interaction patterns change with group
assignment policies. If, for example, students prefer homogeneous social groups, then the
intensity of within-group interaction will be higher under tracking than random assignment.
Peer eﬀects estimated in “low-intensity” randomly assigned groups will then understate the
strength of peer eﬀects in “high-intensity” tracked groups.
I study peer eﬀects under two diﬀerent group assignment policies at the University of Cape
Town in South Africa. First year students at the university were tracked into dormitories up
to 2005 and randomly assigned from 2006 onward. This generated residential peer groups
that were respectively homogeneous and heterogeneous in baseline academic performance. I
contrast the distribution of ﬁrst year students’ academic outcomes under the two policies. I
use non-dormitory students as a control group in a diﬀerence-in-diﬀerences design to remove
time trends and cohort eﬀects.
I show that tracking leads to lower and more unequally distributed grade point aver-
ages (GPAs) than random assignment. Mean GPA is 0.13 standard deviations lower under
tracking. Low-scoring students perform substantially worse under tracking than random
assignment, while high-scoring students’ GPAs are approximately equal under the two poli-
cies. I adapt results from the econometric theory literature to estimate the eﬀect of tracking
on academic inequality. Standard measures of inequality are substantially higher under
4
Random assignment may generate all possible types of groups if the groups are suﬃciently small and
group composition can be captured by a small number of summary statistics. I thank Todd Stinebrickner
for this observation.
3
tracking than random assignment. I explore a variety of alternative explanations for these
results: time-varying student selection into dormitory or non-dormitory status, diﬀerential
time trends in student performance between dormitory and non-dormitory students, limita-
tions of GPA as an outcome measure, and direct eﬀects of dormitory assignment on GPAs.
I conclude that the results are not explained by these factors.
The mean eﬀect size of 0.13 standard deviations is substantial for an education interven-
tion. McEwan (2013) conducts a meta-study of experimental primary school interventions in
developing countries. He ﬁnds average eﬀects across studies of 0.12 for class size and compo-
sition interventions and 0.06 for school management or supervision interventions. Replacing
tracking with random assignment thus generates gains that compare favorably to many other
education interventions, albeit in diﬀerent settings. The direct pecuniary cost is almost zero,
yielding a particularly high beneﬁt to cost ratio.
I then use randomly assigned dormitory-level peer groups to estimate directly the eﬀect
of living with higher- or lower-scoring peers. I ﬁnd that students’ GPAs are increasing in
the mean high school test scores of their peers. Low-scoring students beneﬁt more than
high-scoring students from living with high-scoring peers. Equivalently, own and peer aca-
demic performance are substitutes, rather than complements, in GPA production. This is
qualitatively consistent with the eﬀects of tracking. Peer eﬀects estimated under random
assignment can quantitatively predict features of the GPA distribution under tracking. How-
ever, the predictions are sensitive to model speciﬁcation choices over which economic theory
and statistical model selection criteria provide little guidance. This prediction challenge re-
inforces the value of cross-policy evidence on peer eﬀects. I go on to explore the mechanisms
driving these peer eﬀects. I ﬁnd that peer eﬀects operate largely within race groups. This
suggests that peer eﬀects only arise when residential peers are also socially proximate and
likely to interact directly. However, peer eﬀects do not appear to operate through direct aca-
demic collaboration. They may operate through spillovers on time use or through transfers
of soft skills.
This paper makes four contributions. First, I contribute to the literature on optimal
group design in the presence of peer eﬀects. Models by Arnott (1987) and Benabou (1996)
show that the eﬀect of peers’ characteristics on agents’ outcomes inﬂuences optimal class-
4
room or neighborhood assignment policies.5 Empirical evidence on this topic is very limited.
My paper most closely relates to Carrell, Sacerdote, and West (2013), who use peer eﬀects
estimated under random group assignment to derive an “optimal” assignment policy. Mean
outcomes are, however, worse under this policy than under random assignment. They as-
cribe this result to changes in the structure of within-group student interaction induced by
the policy change. Bhattacharya (2009) and Graham, Imbens, and Ridder (2013) establish
assumptions under which peer eﬀects based on random group assignment can predict out-
comes under a new group assignment policy. The assumptions are strong: that peer eﬀects
are policy-invariant, that no out-of-sample extrapolation is required, and that relevant peer
characteristics have low dimension. These results emphasize the diﬃculty of using peer eﬀects
estimated under one group assignment policy to predict the eﬀects of changing the policy.
Second, I contribute to the literature on peer eﬀects in education.6 I show that stu-
dent outcomes are aﬀected by residential peers’ characteristics and by changes in the peer
group assignment policy. Both analyses show that low-scoring students are more sensitive
to changes in peer group composition, implying that own and peer academic performance
are substitutes in GPA production. This is the ﬁrst ﬁnding of substitutability in the peer
eﬀects literature of which I am aware.7 I ﬁnd that peer eﬀects operate almost entirely within
race groups, suggesting that spatial proximity generates peer eﬀects only between socially
proximate students.8 I also ﬁnd that dormitory peer eﬀects are not stronger within than
across classes. An economics student, for example, is no more strongly aﬀected by other
economics students in her dormitory than by non-economics students in her dormitory. This
suggests that peer eﬀects do not operate through direct academic collaboration but may op-
erate through channels such as time use or transfer of soft skills, consistent with Stinebrickner
5
A closely related literature studies the eﬃciency implications of private schools and vouchers in the
presence of peer eﬀects (Epple and Romano, 1998; Nechyba, 2000).
6
This paper most closely relates to the empirical literature studying randomized or controlled group
assignments. Other related work studies the theoretical foundations of peer eﬀects models and identiﬁcation
conditions for peer eﬀects with endogenously formed groups (Blume, Brock, Durlauf, and Ioannides, 2011).
7
Hoxby and Weingarth (2006) provide a general taxonomy of peer eﬀects other than the linear-in-means
model studied by Manski (1993). Burke and Sass (2013), Cooley (2013), Hoxby and Weingarth (2006),
Imberman, Kugler, and Sacerdote (2012) and Lavy, Silva, and Weinhardt (2012) ﬁnd evidence of nonlinear
peer eﬀects.
8
Hanushek, Kain, and Rivkin (2009) and Hoxby (2000) document stronger within- than across-race class-
room peer eﬀects.
5
and Stinebrickner (2006).
Third, I contribute to the literature on academic tracking by isolating a peer eﬀects
mechanism. Most existing papers estimate the eﬀect of school or classroom tracking relative
to another assignment policy or of assignment to diﬀerent tracks.9 However, tracked and
untracked groups may diﬀer on multiple dimensions: peer group composition, instructor
behavior, and school resources (Betts, 2011; Figlio and Page, 2002). Isolating the causal
eﬀect of tracking on student outcomes via peer group composition, net of these other factors,
requires strong assumptions in standard research designs. I study a setting where instruction
does not diﬀer across tracked and untracked students or across students in diﬀerent tracks.
Students living in diﬀerent dormitories take classes together from the same instructors. While
variation in dormitory-level characteristics might in principle aﬀect student outcomes, my
results are entirely robust to conditioning on these characteristics. I thus ascribe the eﬀect of
tracking to peer eﬀects. Studying dormitories as assignment units limits the generalizability
of my results but allows me to focus on one mechanism at work in school or classroom
tracking. My ﬁndings are consistent with the results from Duﬂo, Dupas, and Kremer (2011).
They ﬁnd that tracked Kenyan students in ﬁrst grade classrooms obtain higher average test
scores than untracked students. They ascribe this to a combination of targeted instruction
(positive eﬀect for all students) and peer eﬀects (positive and negative eﬀects for high- and
low-track students respectively).
Fourth, I make a methodological contribution to the study of peer eﬀects and of academic
tracking. These literatures strongly emphasize inequality considerations but generally do not
measure the eﬀect of diﬀerent group assignment policies on inequality (Betts, 2011; Epple
and Romano, 2011). I note that an inequality treatment eﬀect of tracking can be obtained
by comparing inequality measures for the observed distribution of outcomes under tracking
and the counterfactual distribution of outcomes that would have been obtained in the ab-
sence of tracking. This counterfactual distribution can be estimated using standard methods
for quantile treatment eﬀects (Firpo, 2007; Heckman, Smith, and Clements, 1997). Firpo
9
Betts (2011) reviews the tracking literature, including cross-country (Hanushek and Woessmann, 2006),
cross-cohort (Meghir and Palme, 2005), and cross-school (Slavin, 1987, 1990) comparisons. A smaller liter-
ature studies the eﬀect of assignment to diﬀerent tracks in an academic tracking system (Abdulkadiroglu,
Angrist, and Pathak, 2011; Ding and Lehrer, 2007; Pop-Eleches and Urquiola, 2013).
6
(2010) and, in a diﬀerent context, Rothe (2010) establish formal identiﬁcation, estimation,
and inference results for inequality treatment eﬀects. I use a diﬀerence-in-diﬀerences design
to calculate the treatment eﬀects of tracking net of time trends and cohort eﬀects. I there-
fore combine a nonlinear diﬀerence-in-diﬀerences model (Athey and Imbens, 2006) with an
inequality treatment eﬀects framework (Firpo, 2010). I also propose a conditional nonlinear
diﬀerence-in-diﬀerences model in the online appendix that extends the original Athey-Imbens
model. This extension accounts ﬂexibly for time trends or cohort eﬀects using inverse prob-
ability weighting (DiNardo, Fortin, and Lemiuex, 1996; Hirano, Imbens, and Ridder, 2003).
I outline the setting, research design, and data in section 2. I present the average eﬀects
of tracking in section 3, for the entire sample and for students with diﬀerent high school
graduation test scores. In section 4, I discuss the eﬀects of tracking on the entire GPA
distribution. I show the resultant eﬀects on academic inequality in section 5. I then discuss
the eﬀects of random assignment to live with higher- or lower-scoring peers in section 6. I
present a framework to reconcile the cross-policy and cross-dormitory results in section 7. In
section 8, I report a variety of robustness checks to verify the validity of the research design
used to identify the eﬀects of tracking. I conclude in section 9 and outline the conditional
nonlinear diﬀerence-in-diﬀerences model in appendix A.
2 Research Design
I study a natural experiment at the University of Cape Town in South Africa, where ﬁrst-year
students are allocated to dormitories using either random assignment or academic tracking.
This is a selective research university. During the time period I study, admissions decisions
employed aﬃrmative action favoring low-income students. The student population is thus
relatively heterogeneous but not representative of South Africa.
Approximately half of the 3500-4000 ﬁrst-year students live in university dormitories.10
The dormitories provide accommodation, meals, and some organized social activities. Classes
and instructors are shared across students from diﬀerent dormitories and students who do
10
The mean dormitory size is 123 students and the interdecile range is 50 – 216. There are 16 dormitories
in total, one of which closes in 2006 and one of which opens in 2007. I exclude seven very small dormitories
that each hold fewer than 10 ﬁrst-year students.
7
not live in dormitories. Dormitory assignment therefore determines the set of residentially
proximate peers but not the set of classroom peers. Students are normally allowed to live
in dormitories for at most two years. They can move out of their dormitory after one year
but cannot change to another dormitory. Dormitory assignment thus determines students’
residential peer groups in their ﬁrst year of university; the second year peer group depends
on students’ location choices. Most students live in two-person rooms and the roommate
assignment process varies across dormitories. I do not observe roommate assignments. The
other half of the incoming ﬁrst year students live in private accommodation, typically with
family in the Cape Town region.
Incoming students were tracked into dormitories up until the 2005 academic year. Track-
ing was based on a set of national, content-based high school graduation tests taken by all
South African grade 12 students.11 Students with high scores on this examination were as-
signed to diﬀerent dormitories than students with low scores. The resultant assignments
do not partition the distribution of test scores for three reasons. First, assignment incor-
porated loose racial quotas, so the threshold score for assignment to the top dormitory was
higher for white than black students. Second, most dormitories were single-sex, creating
pairs of female and male dormitories at each track. Third, late applicants for admission were
waitlisted and assigned to the ﬁrst available dormitory slot created by an admitted student
withdrawing. A small number of high-scoring students thus appear in low-track dormitories
and vice versa. These factors generate substantial overlap across dormitories’ test scores.12
However, the mean peer test score for a student in the top quartile of the high school test
score distribution was still 0.93 standard deviations higher than for a student in the bottom
quartile.
From 2006 onward, incoming students were randomly assigned to dormitories. The policy
change reﬂected concern by university administrators that tracking was inegalitarian and
11
These tests are developed and moderated by a statutory body reporting to the Minister of Education.
The tests are nominally criterion-referenced. Students select six subjects in grade 10 in which they will
be tested in grade 12. The university converts their subject-speciﬁc letter grades into a single score for
admissions decisions. A time-invariant conversion scale is used to convert international students’ A-level or
International Baccalaureate scores into a comparable metric.
12
The overlap is such that it is not feasible to use a regression discontinuity design to study the eﬀect of
assignment to higher- or lower-track dormitories. The ﬁrst stage of such a design does not pass standard
instrument strength tests.
8
contributed to social segregation by income.13 Assignment used a random number generator
with ex post changes to ensure racial balance.14 One small dormitory (≈ 1.5% of the sample)
was excluded from the randomization. This dormitory charged lower fees but did not provide
meals. Students could request to live in this dormitory, resulting in a disproportionate
number of low-scoring students under both tracking and randomization. Results are robust
to excluding this dormitory.
The policy change induced a large change in students’ peer groups. Figure 1 shows how
the relationship between students’ own high school graduation test scores and their peers’
test scores changed. For example, students in the top decile lived with peers who scored
approximately 0.4 standard deviations higher under tracking than random assignment; stu-
dents in the bottom decile lived with peers who scored approximately 0.4 standard deviations
lower. This is the identifying variation I use to study the eﬀect of tracking.
My research design compares the students’ ﬁrst year GPAs between the tracking period
(2004 and 2005) and the random assignment period (2007 and 2008). I deﬁne tracking as
the “treatment” even though it is the earlier policy.15 I omit 2006 because ﬁrst year students
were randomly assigned to dormitories while second year students continued to live in the
dormitories into which they had been tracked. GPA diﬀerences between the two periods
may reﬂect cohort eﬀects as well as peer eﬀects. In particular, benchmarking tests show a
downward trend in the academic performance of incoming ﬁrst year students at South African
universities over this time period (Higher Education South Africa, 2009). I therefore use a
diﬀerence-in-diﬀerences design that compares the time change in dormitory students’ GPAs
with the time change in non-dormitory students’ GPAs over the same period:
GP Aid = β0 + β1 Dormid + β2 Trackid + β3 Dormid × Trackid
(1)
+ f Xid + µd + id
where i and d index students and dormitories, Dorm and T rack are indicator variables
13
This discussion draws on personal interviews with the university’s Director of Admissions and Director
of Student Housing.
14
There is no oﬃcial record of how often changes were made. In a 2009 interview, the staﬀ member
responsible for assignment recalled making only occasional changes.
15
Deﬁning random assignment as the treatment necessarily yields point estimates with identical magnitude
and opposite sign.
9
Figure 1: Eﬀect of Tracking on Peer Group Composition
0.6
Dormmates' mean HS grad. test scores
0.4
Mean change in peers'
HS grad. test scores
0.2
0
0
10
20
90
30
40
50
60
70
80
0
10
-0.2
-0.4
-0.6
-0.8
Percentiles of own HS grad. test scores
Notes: The curve is constructed in three steps. First, I estimate a student-level local linear regression of mean dormitory high
school test scores on students’ own test scores, separately for tracked and randomly assigned dormitory students. Second, I
evaluate the diﬀerence at each percentile of the test score distribution. Third, I use a percentile bootstrap with 1000 replications
to construct the 95% conﬁdence interval, stratifying by assignment policy.
equal to 1 for students living in dormitories and for students enrolled in the tracking period,
f (Xid ) is a function of students’ demographic characteristics and high school graduation test
scores,16 and µd is a vector of dormitory ﬁxed eﬀects. β3 equals the average treatment eﬀect
of tracking on the tracked students under an “equal trends” assumption: that dormitory
and non-dormitory students would have experienced the same mean time change in GPAs if
the assignment policy had remained constant. The diﬀerence-in-diﬀerences model identiﬁes
only a “treatment on the treated” eﬀect; caution should be exercised in extrapolating this
16
I use a quadratic speciﬁcation. The results are similar with linear or cubic f (·).
10
to non-dormitory students. Model 1 requires only that the equal trends assumption holds
conditional on student covariates and dormitory ﬁxed eﬀects. I also estimate model 1 with
inverse probability weights that reweight each group of students to have the same distribution
of covariates as the tracked dormitory students.17
β3 does not equal the average treatment eﬀect of tracking on the tracked students if
dormitory and non-dormitory students have diﬀerent counterfactual GPA time trends. If
the assignment policy change aﬀects students through channels other than peer eﬀects, β3
recovers the correct treatment eﬀect but its interpretation changes. I discuss these concerns
in section 8.
The data on students’ demographic characteristics and high school test scores (reported
in table 1) are broadly consistent with the assumption of equal time trends. Dormitory stu-
dents have on average slightly higher and more dispersed scores than non-dormitory students
on high school graduation tests (panel A).18 They are more likely to be black, less likely to
speak English as a home language, and more likely to be international students (panel B).
However, the time changes between the tracking and random assignment periods are small
and not signiﬁcantly diﬀerent between dormitory and non-dormitory students. The notable
exception is that the proportion of English-speaking students moves in diﬀerent directions.
The proportion of students who graduated from high school early enough to enroll in univer-
sity during the tracking period (2004 or earlier) but did not enroll until random assignment
was introduced (2006 or later) is very small and not signiﬁcantly diﬀerent between dormitory
and non-dormitory students (panel C). I interpret this as evidence that students did not
strategically delay their entrance to university in order to avoid the tracking policy. Finally,
17
Unlike the regression-adjusted model 1, reweighting estimators permit the treatment eﬀect of tracking
to vary across student covariates. This is potentially important in this study, where tracking is likely to have
heterogeneous eﬀects. However, the regression-adjusted and reweighted results in section 3 are very similar.
DiNardo, Fortin, and Lemiuex (1996) and Hirano, Imbens, and Ridder (2003) discuss reweighting estimators
with binary treatments. Reweighted diﬀerence-in-diﬀerences models are discussed in Abadie (2005) and Cat-
taneo (2010), who also derive appropriate weights for treatment-on-the-treated parameters. The reweighted
and regression-adjusted model is robust to misspeciﬁcation of either the regression or the propensity score
model.
18
I construct students’ high school graduation test scores from subject-speciﬁc letter grades, following the
university’s admissions algorithm. I observe grades for all six tested subjects for 85% of the sample, for ﬁve
subjects for 6% of the sample, and for four or fewer subjects for 9% of the sample. I treat the third group
of students as having missing scores. I assign the second group of students the average of their ﬁve observed
grades but omit them from analyses that sub-divide students by their grades.
11
Table 1: Summary Statistics and Balance Tests
(1) (2) (3) (4) (5) (6)
Entire Track Random Track Random Balance
sample dorm dorm non-dorm non-dorm test p
Panel A: High school graduation test scores
Mean score (standardized) 0.088 0.169 0.198 0.000 0.000 0.426
A on graduation test 0.278 0.320 0.325 0.222 0.253 0.108
≤C on graduation test 0.233 0.224 0.201 0.254 0.250 0.198
Panel B: Demographic characteristics
Female 0.513 0.499 0.517 0.523 0.514 0.103
Black 0.319 0.503 0.524 0.116 0.118 0.181
White 0.423 0.354 0.332 0.520 0.495 0.851
Other race 0.257 0.143 0.144 0.364 0.387 0.124
English-speaking 0.714 0.593 0.560 0.851 0.863 0.001
International 0.144 0.225 0.180 0.106 0.061 0.913
Panel C: Graduated high school in 2004 or earlier, necessary to enroll under tracking
Eligible for tracking 0.516 1.000 0.027 1.000 0.033 0.124
Eligible | A student 0.475 1.000 0.002 1.000 0.010 0.037
Eligible | ≤C student 0.527 1.000 0.039 1.000 0.050 0.330
Panel D: High school located in Cape Town, proxy for dormitory eligibility
Cape Town high school 0.411 0.088 0.083 0.765 0.754 0.657
Cape Town | A student 0.414 0.101 0.065 0.848 0.811 0.976
Cape Town | ≤C student 0.523 0.146 0.186 0.798 0.800 0.224
Notes: Table 1 reports summary statistics of student characteristics at the time of enrollment, for the entire sample (column
1), tracked dormitory students (column 2), randomly assigned dormitory students (column 3), tracked non-dormitory students
(column 4), and randomly assigned non-dormitory students (column 5). The p-values reported in column 6 are from testing
whether the mean change in each variable between the tracking and random assignment periods is equal for dormitory and
non-dormitory students.
there is a high and time-invariant correlation between living in a dormitory and graduat-
ing from a high school outside Cape Town. This relationship reﬂects the university’s policy
of restricting the number of students who live in Cape Town who may be admitted to the
dormitory system.19 The fact that this relationship does not change through time provides
some reassurance that students are not strategically choosing whether or not to live in dor-
mitories in response to the dormitory assignment policy change. This pattern may in part
reﬂect prospective students’ limited information about the dormitory assignment policy: the
19
I do not observe students’ home addresses, which are used for the university’s dormitory admissions.
Instead, I match records on students’ high schools to a public database of high school GIS codes. I then
determine whether students attended high schools in or outside the Cape Town metropolitan area. This is
an imperfect proxy of their home address for three reasons: long commutes and boarding schools are fairly
common, the university allows students from very low-income neighborhoods on the outskirts of Cape Town
to live in dormitories, and a small number of Cape Town students with medical conditions or exceptional
academic records are permitted to live in the dormitories.
12
change was not announced in the university’s admissions materials or in internal, local, or
national media. On balance, these descriptive statistics support the identifying assumption
that dormitory and non-dormitory students’ mean GPAs would have experienced similar time
changes if the assignment policy had remained constant.20
The primary outcome variable is ﬁrst-year students’ GPAs. The university did not at
this time report students’ GPAs or any other measure of average grades. I instead observe
students’ complete transcripts, which report percentage scores from 0 to 100 for each course.
I construct a credit-weighted average score and then transform this to have mean zero and
standard deviation one in the control group of non-dormitory students, separately by year.
The eﬀects of tracking discussed below should therefore be interpreted in standard devia-
tions of GPA. The numerical scores are intended to be time-invariant measures of student
performance and are not typically “curved.”21 The nominal ceiling score of 100 does not
bind: the highest score any student obtains averaged across her courses is 97 and the 99th
percentile of student scores is 84. These features provide some reassurance that my results
are not driven by time-varying grading standards or by ceiling eﬀects on the grades of top
students. I return to these potential concerns in section 8.
3 Eﬀects of Tracking on Mean Outcomes
Tracked dormitory students obtain GPAs 0.13 standard deviations lower than randomly
assigned dormitory students (table 2 column 1). The 95% conﬁdence interval is [-0.27, 0.01].
Controlling for dormitory ﬁxed eﬀects, student demographics, and high school graduation test
scores yields a slightly smaller treatment eﬀect of -0.11 standard deviations with a narrower
95% conﬁdence interval of [-0.17, -0.04] (column 2).22 The average eﬀect of tracking is thus
20
I also test the joint null hypothesis that the mean time changes in all the covariates are equal for dormitory
and non-dormitory students. The bootstrap p-value is 0.911.
21
For example, mean percentage scores on Economics 1 and Mathematics 1 change by respectively six and
nine points from year to year, roughly half of a standard deviation.
22
The bootstrapped standard errors reported in table 2 allow clustering at the dormitory-year level. Non-
dormitory students are treated as individual clusters, yielding 60 large clusters and approximately 7000
singleton clusters. As a robustness check, I also use a wild cluster bootstrap (Cameron, Miller, and Gelbach,
2008). The p-values are 0.090 for the basic regression model (column 1) and < 0.001 for the model with
dormitory ﬁxed eﬀects and student covariates (column 3). I also account for the possibility of persistent
dormitory-level shocks with a wild bootstrap clustered at the dormitory level. The p-values are 0.104 and
0.002 for the models in columns 1 and 3.
13
Table 2: Average Treatment Eﬀect of Tracking on Tracked Students
(1) (2) (3) (4) (5)
Tracking × Dormitory -0.129 -0.107 -0.130 -0.144 -0.141
(0.073) (0.040) (0.042) (0.073) (0.069)
Tracking 0.000 0.002 -0.013 0.042 -0.009
(0.023) (0.021) (0.020) (0.057) (0.049)
Dormitory 0.172 0.138 0.173 0.221 0.245
(0.035) (0.071) (0.072) (0.061) (0.064)
Dormitory ﬁxed eﬀects × × × ×
Student covariates × × × ×
Missing data indicators × ×
Reweighting × ×
Adjusted R2 0.006 0.255 0.230 0.260 0.275
# dormitory-year clusters 60 60 60 60 60
# dormitory students 7480 6600 7480 6600 7480
# non-dormitory students 7188 6685 7188 6685 7188
Notes: Table 2 reports results from regressing GPA on indicators for living in a dormitory, the tracking period and their inter-
action. Columns 2-5 report results controlling for dormitory ﬁxed eﬀects and student covariates: gender, language, nationality,
race, a quadratic in high school graduation test scores, and all pairwise interactions. Columns 2 and 4 report results excluding
students with missing test scores from the sample. Columns 3 and 5 report results including all students, with missing test
scores replaced with zeros and controlling for a missing test score indicator. Columns 4 and 5 report results from propensity
score-weighted regressions that reweight all groups to have the same distribution of observed student covariates as tracked
dormitory students. Standard errors in parentheses are from 1000 bootstrap replications clustering at the dormitory-year level,
stratifying by dormitory status and assignment policy, and re-estimating the weights on each iteration.
negative and robust to accounting for dormitory ﬁxed eﬀects and student covariates.23 This
pattern holds for all results reported in the paper: accounting for student and dormitory
characteristics yields narrower conﬁdence intervals and unchanged treatment eﬀect estimates.
How large is a treatment eﬀect of 0.11 to 0.13 standard deviations? This is substantially
smaller than the black-white GPA gap at this university (0.46 standard deviations) but larger
than the female-male GPA gap (0.09). The eﬀect size is marginally larger than when students
are strategically assigned to squadrons at the US Airforce Academy (Carrell, Sacerdote, and
West, 2013) and marginally smaller than when Kenyan primary school students are tracked
into classrooms (Duﬂo, Dupas, and Kremer, 2011). These results provide a consistent picture
about the plausible average short-run eﬀects of alternative group assignment policies. These
eﬀects are not “game-changers” but they are substantial relative to many other education
23
The regression-adjusted results in column 2 exclude approximately 9% of students with missing high
school graduation test scores. I also estimate the treatment eﬀect for the entire sample with missing data
indicators and ﬁnd a very similar result (column 3). Eﬀects estimated with both regression adjustment and
inverse probability weighting are marginally larger (columns 4 and 5). Trimming propensity score outliers
following Crump, Hotz, Imbens, and Mitnik (2009) yields similar but less precise point estimates. This
veriﬁes that the results are not driven by lack of common support on the four groups’ observed characteristics.
However, the trimming rule is optimal for the average treatment eﬀect with a two-group research design;
this robustness check is not conclusive for the average treatment eﬀect on the treated with a diﬀerence-in-
diﬀerences design.
14
interventions.
Tracking changes peer groups in diﬀerent ways: high-scoring students live with higher-
scoring peers and low-scoring students live with lower-scoring peers. The eﬀects of tracking
are thus likely to vary systematically with students’ high school test scores. I explore this
heterogeneity in two ways. I ﬁrst estimate conditional average treatment eﬀects for diﬀerent
subgroups of students. In section 4, I estimate quantile treatment eﬀects of tracking, which
show how tracking changes the full distribution of GPAs.
I begin by estimating equation 1 fully interacted with an indicator for students who score
above the sample median on their high school graduation test. Above- and below-median
students’ GPAs fall respectively 0.01 and 0.24 standard deviations under tracking (cluster
bootstrap standard errors 0.06 and 0.07; p-value of diﬀerence 0.014). These very diﬀerent
eﬀects arise even though above- and below-median students experience “treatments” of sim-
ilar magnitude. Above- and below-median scoring students have residential peers who score
on average 0.20 standard deviations higher and 0.27 standard deviations lower under track-
ing. This is not consistent with a linear response to changes in mean peer quality.24 Either
low-scoring students are more sensitive to changes in their mean peer group composition or
GPA depends on some measure of peer quality other than mean test scores.
The near-zero treatment eﬀect on above-median students is perhaps surprising. Splitting
the sample in two may be too coarse to discern positive eﬀects on very high-scoring students.
I therefore estimate treatment eﬀects throughout the distribution of high school test scores.
Figure 2 shows that tracking reduces GPA through more than half of the distribution. The
negative eﬀects in the left tail are considerably larger than the positive eﬀects in the right
tail, though they are not statistically diﬀerent. I reject equality of the treatment eﬀects and
changes in mean peer high school test scores in the right but not the left tail. These results
reinforce the ﬁnding that low-scoring students are substantially more sensitive to changes
in peer group composition than high-scoring students. Tracking may have a small positive
eﬀect on students in the top quartile but this eﬀect is imprecisely estimated.25
24
I test whether the ratio of the treatment eﬀect to the change in mean peer test scores is equal for above-
and below-median students. The cluster bootstrap p-value is 0.070.
25
A linear diﬀerence-in-diﬀerences model interacted with quartile or quintile indicators has positive but
insigniﬁcant point estimates in the top quartile or quintile.
15
Figure 2: Eﬀects of Tracking on GPA by High School Test Scores
0.6
University GPA / Dormmates' mean test scores
0.4
Mean change in peers' HS test scores
0.2
0
0 10 20 30 40 50 60 70 80 90 100
-0.2
Average treatment effect of tracking
-0.4
-0.6
-0.8
Percentiles of high school graduation test scores
Notes: Figure 2 is constructed by estimating a student-level local linear regression of GPA against high school graduation test
scores. I estimate the regression separately for each of the four groups (tracking/randomization policy and dormitory/non-
dormitory status). I evaluate the second diﬀerence at each percentile of the high school test score distribution. The dotted lines
show a 95% conﬁdence interval constructed from a nonparametric percentile bootstrap clustering at the dormitory-year level
and stratifying by assignment policy and dormitory status. The dashed line shows the eﬀect of tracking on mean peer group
composition, discussed in ﬁgure 1.
16
There is stronger evidence of heterogeneity across high school test scores than demo-
graphic subgroups. Treatment eﬀects are larger on black than white students: -0.20 versus
-0.11 standard deviations. However, this diﬀerence is not signiﬁcant (cluster bootstrap p-
value 0.488) and is almost zero after conditioning on high school test scores. I also estimate a
quadruple-diﬀerences model allowing the eﬀect of tracking to diﬀer across four race/academic
subgroups (black/white × above/below median). The point estimates show that tracking af-
fects below-median students more than above-median students within each race group and
aﬀects black students more than white within each test score group. However, neither pattern
is signiﬁcant at any conventional level. I thus lack the power to detect any heterogeneity by
race conditional on test scores. There is no evidence of gender heterogeneity: tracking lowers
female and male GPAs by 0.14 and 0.12 standard deviations respectively (cluster bootstrap
p-value 0.897). I conclude that high school test scores are the primary dimension of treatment
eﬀect heterogeneity.
4 Eﬀects of Tracking on the Distribution of Outcomes
I also estimate quantile treatment eﬀects of tracking on the treated students, which show
how tracking changes the full GPA distribution. I ﬁrst construct the counterfactual GPA
distribution that the tracked dormitory students would have obtained in the absence of track-
ing (ﬁgure 3, ﬁrst panel). The horizontal distance between the observed and counterfactual
GPA distributions at each quantile equals the quantile treatment eﬀect of tracking on the
treated students (ﬁgure 3, second panel). This provides substantially more information than
the average treatment eﬀect but requires stronger identifying assumptions. Speciﬁcally, the
average eﬀect is identiﬁed under the assumption that any time changes in the mean value of
unobserved GPA determinants are common across dormitory and non-dormitory students.
The quantile eﬀects are identiﬁed under the assumption that there are no time changes in
the distribution of unobserved student-level GPA determinants for either dormitory or non-
dormitory students. GPA may experience time trends or cohort-level shocks provided these
are common across all students. I discuss the implementation of this model, developed by
Athey and Imbens (2006), in appendix A. I propose an extension to account ﬂexibly for time
17
trends in observed student characteristics.
Figure 3 shows that tracking aﬀects mainly the left tail. The point estimates are large and
negative in the ﬁrst quintile (0.1 - 1.1 standard deviations), small and negative in the second
to fourth quintiles (≤ 0.2 standard deviations), and small and positive in the top quintile
(≤ 0.2 standard deviations). The estimates are relatively imprecise; the 95% conﬁdence
interval excludes zero only in the ﬁrst quintile.26 This reinforces the pattern that the negative
average eﬀect of tracking is driven by large negative eﬀects on the left tail of the GPA or
high school test score distribution.
There is no necessary relationship between ﬁgures 2 and 3. Figure 2 shows that the
average treatment eﬀect of tracking is large and negative for students with low high school
graduation test scores. Figure 3 shows that the quantile treatment eﬀect of tracking is large
and negative on the left tail of the GPA distribution. The quantile results capture treatment
eﬀect heterogeneity between and within groups of students with similar high school test
scores. However, they do not recover treatment eﬀects on speciﬁc students or groups of
students without additional assumptions. See Bitler, Gelbach, and Hoynes (2010) for further
discussion on this relationship.27
5 Eﬀects of Tracking on Inequality of Outcomes
The counterfactual GPA distribution estimated above also provides information about the
relationship between tracking and academic inequality. Speciﬁcally, I calculate several stan-
dard inequality measures on the observed and counterfactual distributions. The diﬀerences
between these measures are the inequality treatment eﬀects of tracking on the tracked stu-
dents.28 The literature on academic tracking emphasizes inequality concerns (Betts, 2011).
26
I construct the 95% conﬁdence interval at each half-percentile using a percentile cluster bootstrap. The
validity of the bootstrap has not been formally established for the nonlinear diﬀerence-in-diﬀerences model.
However, Athey and Imbens (2006) report that bootstrap conﬁdence intervals have better coverage rates in a
simulation study than conﬁdence intervals based on plug-in estimators of the asymptotic covariance matrix.
27
Garlick (2012) presents an alternative approach to rank-based distributional analysis. Using this ap-
proach, I estimate the eﬀect of tracking on the probability that students change their rank in the distribution
of academic outcomes from high school to the ﬁrst year of university. I ﬁnd no eﬀect on several measures
of rank changes. Informally, this shows that random dormitory assignment, relative to tracking, helps low-
scoring students to “catch-up” to their high-scoring peers but does not facilitate “overtaking.”
28
I apply the same principle to calculate mean GPA for the counterfactual distribution. The observed
mean is 0.16 standard deviations lower than the counterfactual mean (cluster bootstrap standard error 0.07).
18
Figure 3: Quantile Treatment Eﬀects of Tracking on the Tracked Students
100
80
Observed
distribution
60
Percentiles
40
Counterfactual
distribution
20
0
-4 -3 -2 -1 0 1 2 3
University GPA
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
-0.2
University GPA
Quantile treatment effects of tracking
-0.4
-0.6
-0.8
-1
-1.2
-1.4
-1.6
Percentiles of university GPA
Notes: The ﬁrst panel shows the observed GPA distribution for tracked dormitory students (solid line) and the counterfactual
constructed using the reweighted nonlinear diﬀerence-in-diﬀerences model discussed in appendix A (dashed line). The propensity
score weights are constructed from a model including student gender, language, nationality, race, a quadratic in high school
graduation test scores, all pairwise interactions, and dormitory ﬁxed eﬀects. The second panel shows the horizontal distance
between the observed and counterfactual GPA distributions evaluated at each half-percentile. The axes are reversed for ease
of interpretation. The dotted lines show a 95% conﬁdence interval constructed from a percentile bootstrap clustering at the
dormitory-year level, stratifying by assignment policy and dormitory status, and re-estimating the weights on each iteration.
19
Table 3: Inequality Treatment Eﬀects of Tracking
(1) (2) (3) (4)
Observed Counterfactual Treatment Treatment eﬀect
distribution distribution eﬀect in % terms
Interquartile range 1.023 0.907 0.116 12.79
(0.043) (0.047) (0.062) (6.8)
Interdecile range 2.238 1.857 0.381 20.52
(0.083) (0.091) (0.109) (5.9)
Standard deviation 0.909 0.766 0.143 18.67
(0.027) (0.032) (0.037) (4.8)
Notes: Table 3 reports summary measures of academic inequality for the observed distribution of tracked dormitory students’
GPA (column 1) and the counterfactual GPA distribution for the same students in the absence of tracking (column 2). The
counterfactual GPA is constructed using the reweighted nonlinear diﬀerence-in-diﬀerences model described in appendix A.
Column 3 shows the treatment eﬀect of tracking on the tracked students. Column 4 shows the treatment eﬀect expressed as a
ˆ [GP A]}2 )0.5 , with the expectations
ˆ [GP A2 ] −{E
percentage of the counterfactual level. The standard deviation is estimated by (E
constructed by integrating the area to the left of the relevant GPA distribution. The distribution is evaluated at half-percentiles
to minimize measurement error due to the discreteness of the counterfactual distribution. Standard errors in parentheses are
from 1000 bootstrap replications clustering at the dormitory-year level and stratifying by assignment policy and dormitory
status.
This is the ﬁrst study of which I am aware to measure explicitly the eﬀect of tracking on
inequality. Existing results from the econometric theory literature can be applied directly
to this problem (Firpo, 2007, 2010; Rothe, 2010). Identiﬁcation of these inequality eﬀects
requires no additional assumptions beyond those already imposed in the quantile analysis.
Table 3 shows inequality measures for the observed and counterfactual GPA distribu-
tions. The standard deviation and interquartile and interdecile ranges are all signiﬁcantly
higher under tracking than under the counterfactual.29 Tracking increases the interquartile
range by approximately 12% of its baseline level and the other measures by approximately
20%. This reﬂects the particularly large negative eﬀect of tracking on the lowest quantiles
of the GPA distribution. Tracking thus decreases mean academic outcomes and increases
academic inequality. Knowledge of the quantile and inequality treatment eﬀects permits a
more comprehensive evaluation of the welfare consequences of tracking. These parameters
might inform an inequality-averse social planner’s optimal trade-oﬀ between eﬃciency and
equity if the mean eﬀect of tracking were positive, as found in some other contexts.
This is consistent with the average eﬀect from the linear diﬀerence-in-diﬀerences models in section 3.
29
I do not calculate other common inequality measures such as the Gini coeﬃcient and Theil index because
standardized GPA is not a strictly positive variable.
20
6 Eﬀects of Random Variation in Dormitory Composi-
tion
The principal research design uses cross-policy variation by comparing tracked and randomly
assigned dormitory students. My second research design uses cross-dormitory variation in
peer group composition induced by random assignment. I ﬁrst use a standard test to conﬁrm
the presence of residential peer eﬀects, providing additional evidence that the main results
are not driven by confounding factors. I document diﬀerences in dormitory-level peer eﬀects
within and between demographic and academic subgroups, providing some information about
mechanisms. In section 7, I explore whether peer eﬀects estimated using random dormitory
assignment can predict the distributional eﬀects of tracking. I ﬁnd that low-scoring students
are more sensitive to changes in peer group composition than high-scoring students, which
is qualitatively consistent with the eﬀect of tracking. Quantitative predictions are, however,
sensitive to model speciﬁcation choices.
I ﬁrst estimate the standard linear-in-means model (Manski, 1993):
GP Aid = α0 + α1 HSid + α2 HS d + αXid + µd + id , (2)
where HSid and HS d are individual and mean dormitory high school graduation test scores,
Xid is a vector of student demographic characteristics, and µ is a vector of dormitory ﬁxed
eﬀects. α2 measures the average gain in GPA from a one standard deviation increase in
the mean high school graduation test scores of one’s residential peers.30 Random dormitory
assignment ensures that HS d is uncorrelated with individual students’ unobserved charac-
teristics so α2 can be consistently estimated by least squares.31 However, random assignment
also means that average high school graduation test scores are equal in expectation. α2
is identiﬁed using sample variation in scores across dormitories due to ﬁnite numbers of
30
α2 captures both “endogenous” eﬀects of peers’ GPA and “exogenous” eﬀects of peers’ high school
graduation test scores, using Manski’s terminology. Following the bulk of the peer eﬀects literature, I do not
attempt to separate these eﬀects.
31
The observed dormitory assignments are consistent with randomization. I fail to reject equality of
dormitory means for high school graduation test scores (bootstrap p-value 0.762), proportion black (0.857),
proportion white (0.917), proportion other races (0.963), proportion English-speaking (0.895), proportion
international (0.812), and for all covariates jointly (0.886).
21
Table 4: Peer Eﬀects from Random Assignment to Dormitories
(1) (2) (3) (4) (5) (6)
Own HS graduation 0.362 0.332 0.331 0.400 0.373 0.373
test score (0.014) (0.014) (0.014) (0.024) (0.023) (0.023)
Own HS graduation 0.137 0.144 0.142
test score squared (0.017) (0.017) (0.017)
Mean dorm HS graduation 0.241 0.222 0.220 0.221 0.208 0.316
test score (0.093) (0.098) (0.121) (0.095) (0.103) (0.161)
Mean dorm HS graduation 0.306 0.311 -0.159
test score squared (0.189) (0.207) (0.316)
Own × mean dorm HS -0.129 -0.132 -0.132
graduation test score (0.073) (0.069) (0.069)
p-value of test against 0.000 0.000 0.000
equivalent linear model
Adjusted R2 0.213 0.236 0.248 0.244 0.270 0.278
# students 3068 3068 3068 3068 3068 3068
# dormitory-year clusters 30 30 30 30 30 30
Notes: Table 4 reports results from estimating equations 2 (columns 1-3) and 4 (columns 4-6). Columns 2, 3, 5, and 6 control for
students’ gender, language, nationality and race. Columns 3 and 6 include dormitory ﬁxed eﬀects. The sample is all dormitory
students in the random assignment period with non-missing high school graduation test scores. Standard errors in parentheses
are from 1000 bootstrap replications clustering at the dormitory-year level.
students in each dormitory. This variation is relatively low: the range and variance of dor-
mitory means are approximately 10% of the range and variance of individual scores. Given
this limited variation, the results should be interpreted with caution.
I report estimates of equation 2 in table 4, using the sample of all dormitory students in the
ˆ 2 = 0.22, which is robust to conditioning on student
random assignment period. I ﬁnd that α
demographics and dormitory ﬁxed eﬀects. Hence, moving a student from the dormitory with
the lowest observed mean high school graduation test score to the highest would increase
her GPA by 0.18 standard deviations. These eﬀects are large relative to existing estimates
(Sacerdote, 2011). Stinebrickner and Stinebrickner (2006) suggest a possible reason for this
pattern. They document that peers’ study time is an important driver of peer eﬀects and that
peer eﬀects are larger using a measure that attaches more weight to prior study behavior:
high school GPA instead of SAT scores. I measure peer characteristics using scores on
a content-based high school graduation test, while SAT scores are a common measure in
existing research. However, the coeﬃcient from the dormitory ﬁxed eﬀects regression is fairly
imprecisely estimated (90% conﬁdence interval from 0.02 to 0.42) so the magnitude should
be interpreted with caution.32 This may reﬂect the limited variation in HS d .
32
As a robustness check, I use a wild cluster bootstrap to approximate the distribution of the test statistic
under the null hypothesis of zero peer eﬀect. This yields p-values of 0.088 using dormitory-year clusters and
22
Table 5: Subgroup Peer Eﬀects from Random Assignment to Dormitories
(1) (2) (3) (4)
Own HS graduation 0.327 0.327 0.369 0.322
test score (0.016) (0.016) (0.017) (0.017)
Mean dorm HS graduation test 0.203 0.162
score for own race (0.059) (0.083)
Mean dorm HS graduation test -0.007 -0.035
score for other races (0.055) (0.091)
Mean dorm HS graduation test 0.050 0.099
score for own faculty (0.045) (0.048)
Mean dorm HS graduation test 0.198 0.190
score for other faculties (0.062) (0.083)
Adjusted R2 0.219 0.243 0.214 0.249
# students 3068 3068 3068 3068
# dormitory-year clusters 30 30 30 30
Notes: Table 5 reports results from estimating equation 3 using race subgroups (columns 1-2) and faculty subgroups (columns 3-
4). “Faculty” refers to colleges/schools within the university such as commerce and science. Columns 2 and 4 include dormitory
ﬁxed eﬀects and control for students’ gender, language, nationality and race. The sample is all dormitory students in the random
assignment period with non-missing high school graduation test scores. Standard errors in parentheses are from 1000 bootstrap
replications clustering at the dormitory-year level.
The linear-in-means model can be augmented to allow the eﬀect of residential peers to
vary within and across sub-dormitory groups. Speciﬁcally, I explore within- and across-race
peer eﬀects by estimating:
GP Aird = α0 + β1 HSird + β2 HS rd + β3 HS −rd + β Xird + µd + ird . (3)
For student i of race r in dormitory d, HS rd and HS −rd denote the mean high school grad-
uation test scores for other students in dormitory d of, respectively, race r and all other race
ˆ3 equal 0.16 and −0.04 respectively (table 5, column 2). The diﬀerence
ˆ2 and β
groups. β
strongly suggests that peer eﬀects operate primarily within race groups but it is quite impre-
cisely estimated (bootstrap p-value 0.110). I interpret this as evidence that spatial proximity
does not automatically generate peer eﬀects. Instead, peer groups are formed through a
combination of spatial proximity and proximity along other dimensions such as race, which
remains highly salient in South Africa.33 This indicates that interaction patterns by students
may mediate residential peer eﬀects, meaning that estimates are not policy-invariant.
0.186 using dormitory clusters.
33
I ﬁnd a similar result using language instead of race to deﬁne subgroups. This pattern could also arise if
students sort into racially homogeneous geographic units by choosing rooms within their assigned dormitories.
As I do not observe roommate assignments, I cannot test this mechanism.
23
I also explore the content of the interaction patterns that generate residential peer ef-
fects by estimating equation 3 using faculty/school/college groups instead of race groups.
The estimated within- and across-faculty peer eﬀects are respectively 0.10 and 0.19 (cluster
bootstrap standard errors 0.05 and 0.08). Despite their relative imprecision, these results
suggests that within-faculty peer eﬀects are not systematically stronger than cross-faculty
peer eﬀects.34 This result is not consistent with peer eﬀects being driven by direct aca-
demic collaboration such as joint work on problem sets or joint studying for examinations.
Interviews with students at the university suggest two channels through which peer eﬀects
operate: time allocation over study and leisure activities, and transfers of tacit knowledge
such as study skills, norms about how to interact with faculty, and strategies for navigating
academic bureaucracy. This is consistent with prior ﬁndings of strong peer eﬀects on study
time (Stinebrickner and Stinebrickner, 2006) and social activities (Duncan, Boisjoly, Kremer,
Levy, and Eccles, 2005).
Combining the race- and faculty-level peer eﬀects results indicates that spatial proximity
alone does not generate peer eﬀects. Some direct interaction is also necessary and is more
likely when students are also socially proximate. However, the relevant form of the interaction
is not direct academic collaboration. The research design and data cannot conclusively
determine what interactions do generate the estimated peer eﬀects.
34
Each student at the University of Cape Town is registered in one of six faculties: commerce, engineering,
health sciences, humanities and social sciences, law, and science. Some students take courses exclusively
within their faculty (engineering, health sciences) while some courses overlap across multiple faculties (in-
troductory statistics is oﬀered in commerce and science, for example). I obtain similar results using course-
speciﬁc grades as the outcome and allowing residential peer eﬀects to diﬀer at the course level. For example,
I estimate equations 2 and 3 with Introductory Microeconomics grades as an outcome. I ﬁnd that there
are strong peer eﬀects on grades in this course (αˆ 2 = 0.34 with cluster bootstrap standard error 0.15) but
they are not driven primarily by other students in the same course (β ˆ2 = 0.06 and β ˆ3 = 0.17 with cluster
bootstrap standard errors 0.17 and 0.15). This, and other course-level regressions, are consistent with the
main results but the smaller sample sizes yield relatively imprecise estimates that are somewhat sensitive to
the inclusion of covariates.
24
7 Reconciling Cross-Policy and Cross-Dormitory Re-
sults
The linear-in-means model restricts average GPA to be invariant to any group reassignment:
moving a strong student to a new group has equal but oppositely signed eﬀects on her old
and new peers’ average GPA. If the true GPA production function is linear, then the average
treatment eﬀect of tracking relative to random assignment must be zero. I therefore estimate
a more general production function that permits nonlinear peer eﬀects:
2 2
GP Aid = γ0 + γ1 HSid + γ2 HS d + γ11 HSid + γ22 HS d
(4)
+ γ12 HSid × HS d + γ Xid + µd + id
This is a parsimonious speciﬁcation that permits average outcomes to vary over assignment
processes but may not be a perfect description of the GPA production process. In particular,
I use only the mean as a summary of peer group characteristics.35 γ12 and γ22 are the key
parameters of the model. γ12 indicates whether own and peer high school graduation test
scores are complements or substitutes in GPA production, and whether GPA is super- or
submodular in own and peer test scores. If γ12 < 0, the GPA gain from high-scoring peers
is larger for low-scoring students. In classic binary matching models, this parameter governs
whether positive or negative assortative matching is output-maximizing (Becker, 1973). In
matching models with more than two agents, γ12 is not suﬃcient to characterize the output-
maximizing set of matches. γ22 indicates whether GPA is a concave or convex function of
peers’ mean high school graduation test scores. If γ22 < 0, total output is higher when mean
test scores are identical in all groups. If γ22 > 0, total output is higher when some groups
have very high means and some groups have very low means. This parameter has received
relatively little attention in the peer eﬀects literature but features prominently in some models
of neighborhood eﬀects (Benabou, 1996; Graham, Imbens, and Ridder, 2013). Tracking will
deliver higher total GPA than random assignment if both parameters are positive and vice
35
See Carrell, Sacerdote, and West (2013) for an alternative parameterization and Graham (2011) for
background discussion. Equation 4 has the attractive feature of aligning with theoretical literatures on
binary matching and on neighborhood segregation. The results are qualitatively similar if dormitory-year
means are replaced with medians.
25
versa. If the parameters have diﬀerent signs, the average eﬀect of tracking is ambiguous.36
Estimates from equation 4 are shown in table 4 columns 4, 5 (controlling for student
ˆ12 is negative and marginally statis-
demographics) and 6 (with dormitory ﬁxed eﬀects). γ
tically signiﬁcant across all speciﬁcations. The point estimate of −0.13 (cluster bootstrap
standard error 0.07) implies the GPA gain from an increase in peers’ mean test scores is 0.2
standard deviations larger for students at the 25th percentile of the high school test score
distribution than students at the 75th percentile. This is consistent with the section 4 result
that low-scoring students are hurt more by tracking than high-scoring students are helped.
ˆ22 ﬂips from positive to negative with the inclusion of dormitory ﬁxed
However, the sign of γ
eﬀects. It is thus unclear whether GPA is concave or convex in mean peer group test scores.
I draw three conclusions from these results. First, there is clear evidence of nonlinear peer
eﬀects from the cross-dormitory variation generated under random assignment. Likelihood
ratio tests prefer the nonlinear models in columns 4-6 to the corresponding linear models in
columns 1-3. Second, peer eﬀects estimates using randomly induced cross-dormitory variation
may be sensitive to the support of the data. Using dormitory ﬁxed eﬀects reduces the variance
of HS d from 0.19 to 0.11. This leads to diﬀerent conclusions about the curvature of the GPA
production function in columns 5 and 6. Third, the results from the ﬁxed eﬀects speciﬁcation
(column 6) are qualitatively consistent with the negative average treatment eﬀect of tracking.
Are the coeﬃcient estimates from equation 4 quantitatively, as well as qualitatively, con-
sistent with the observed treatment eﬀects of tracking? I combine coeﬃcients from estimating
equation 4 for randomly assigned dormitory students with observed values of individual- and
dormitory-level regressors for tracked dormitory students. I then predict the level of GPA
and the treatment eﬀect of tracking for students in the ﬁrst and fourth quartiles of the high
school graduation test score distribution. I compare these predictions to observed GPA for
tracked dormitory students and to the diﬀerence-in-diﬀerences treatment eﬀect of tracking.
The results in table 6 show that the predictions are sensitive to speciﬁcation of equation
36
To derive this result, note that E[HS d |HSid ] = HSid under tracking and E[HSid ] under random as-
2 2
signment. Hence, E[HSid HS d ] and E[HS d ] both equal E[HSid ] under tracking and E[HSid ]2 under ran-
dom assignment. Plugging these results into equation 4 for each assignment policy yields E[Yid |Tracking] −
2
E[Yid |Randomization] = σHS (γ22 + γ12 ). This simple demonstration assumes an inﬁnite number of students
and dormitories. This assumption is not necessary but simpliﬁes the exposition.
26
Table 6: Observed and Predicted GPA Using Diﬀerent Production Function Speciﬁcations
(1) (2)
Quartile 4 Quartile 1
Panel A: Mean GPA
Observed 0.761 -0.486
Predicted, without dormitory ﬁxed eﬀects 0.889 -0.345
Predicted, least squares with dormitory dummies 0.698 -0.433
Predicted, within-group transformation 0.689 -0.503
Panel B: Mean treatment eﬀect of tracking
Estimated from diﬀerence-in-diﬀerences design 0.032 -0.225
Predicted, without dormitory ﬁxed eﬀects 0.223 -0.050
Predicted, least squares with dormitory dummies 0.041 -0.139
Predicted, within-group transformation 0.032 -0.195
Notes: Table 6 panel A reports observed GPA (row 1) and predicted GPA from three diﬀerent models. All predictions use
observed regressor values from tracked dormitory students and estimated coeﬃcients from randomly assigned dormitory students.
The ﬁrst prediction uses coeﬃcients generated by estimating equation 4 without dormitory ﬁxed eﬀects (shown in column 5 of
table 4). The second prediction uses coeﬃcients generated by estimating equation 4 with dormitory indicator variables (shown
in column 6 of table 4). The third prediction uses coeﬃcients generated by estimating equation 4 with data from a within-
dormitory transformation (shown in column 6 of table 4). The second and third predictions diﬀer because the values of the
dormitory ﬁxed eﬀects respectively are and are not used in the prediction.
4. Excluding dormitory ﬁxed eﬀects (row 2) yields very inaccurate predictions, with GPA
and treatment eﬀects too high for students in the top and bottom quartiles. This reﬂects
the estimated convexity of the GPA production function without dormitory ﬁxed eﬀects
γ22 = 0.31 but insigniﬁcant). With dormitory ﬁxed eﬀects included, the production function
(ˆ
γ22 = −0.16 but insigniﬁcant) and own and peer test scores are substitutes
is not convex (ˆ
γ12 = 0.13). The ﬁxed eﬀects estimates thus predict negative and zero treatment eﬀects on
(ˆ
the ﬁrst and fourth quartiles respectively, matching the diﬀerence-in-diﬀerences estimates.
However, the ﬁrst quartile estimates are quite sensitive to specifying the ﬁxed eﬀects with
dormitory dummies (row 3) or using a within-group data transformation (row 4).
This exercise illustrates that a simple reduced form GPA production function can come
close to predicting the treatment eﬀects of tracking. However, the predictions are very
sensitive to speciﬁcation choices regarding covariates and group ﬁxed eﬀects, which in turn
inﬂuence the support of the data. These are precisely the choices for which economic theory
is likely to provide little guidance. Statistical model selection criteria are also inconclusive
in this setting.37 This sensitivity may be due to out-of-sample extrapolation, dependence of
GPA on group-level statistics other than the mean, or behavioral responses by students that
37
For example, the Akaike and Bayesian information criteria are lower for the models respectively with
and without dormitory ﬁxed eﬀects, while a likelihood ratio test for equality of the models has p-value 0.083.
Hurder (2012) ﬁnds that model selection criteria are also inconclusive in a similar application.
27
make peer eﬀects policy-sensitive.
8 Alternative Explanations for the Eﬀects of Tracking
I consider four alternative explanations that might have generated the observed GPA diﬀer-
ence between tracked and randomly assigned dormitory students. The ﬁrst two explanations
are violations of the “parallel time changes” assumption: time-varying student selection re-
garding whether or not to live in a dormitory and diﬀerential time trends in dormitory and
non-dormitory students’ characteristics. The third explanation is that the treatment eﬀects
are an artefact of the grading system and do not reﬂect any real eﬀect on learning. The
fourth explanation is that dormitory assignment aﬀects GPA through a mechanism other
than peer eﬀects; this would not invalidate the results but would change their interpretation.
8.1 Selection into Dormitory Status
The research design assumes that non-dormitory students are an appropriate control group for
any time trends or cohort eﬀects on dormitory students’ outcomes. This assumption may fail
if students select whether or not to live in a dormitory based on the assignment policy. I argue
that such behavior is unlikely and that my results are robust to accounting for selection. First,
the change in dormitory assignment policy was not oﬃcially announced or widely publicized,
limiting students’ ability to respond. Second, table 1 shows that there are approximately
equal time changes in dormitory and non-dormitory students’ demographic characteristics
and high school graduation test scores. Third, the results are robust to accounting for small
diﬀerences in these time changes using regression or reweighting.
Fourth, admission rules cap the number of students from Cape Town who may be admit-
ted to the dormitory system. Given this rule, I use an indicator for whether each student
attended a high school outside Cape Town as an instrument for whether the student lives in
a dormitory. High school location is an imperfect proxy for home address, which I do not
observe. Nonetheless, the instrument strongly predicts dormitory status: 76% of non-Cape
Town students and 8% of Cape Town students live in dormitories. The intention-to-treat and
instrumented treatment eﬀects (table 7, columns 2 and 3) are very similar to the treatment
28
eﬀects without instruments (table 2).
8.2 Diﬀerential Time Trends in Student Characteristics
The research design assumes that dormitory and non-dormitory students’ GPAs do not have
diﬀerent time trends for reasons unrelated to the change in assignment policy. I present
three arguments against this concern. First, I extend the analysis to include data from the
2001–2002 academic years (“early tracking”), in addition to 2004–2005 (“late tracking”) and
2007–2008 (random assignment). I do not observe dormitory assignments in 2001–2002 so I
report only intention-to-treat eﬀects.38 The raw data are shown in the ﬁrst panel of ﬁgure
4. I estimate the eﬀect of tracking under several possible violations of the parallel trends
assumption. The average eﬀect of tracking comparing 2001-2005 to 2007-2008 is -0.09 with
standard error 0.04 (table 7, column 4). This estimate is appropriate if one group of students
experiences a transitory shock in 2004/2005. A placebo test comparing the diﬀerence between
Cape Town and non-Cape Town students’ GPAs in 2001-2002 and 2004-2005 yields a small
positive but insigniﬁcant eﬀect of 0.06 (standard error 0.05). I subtract the placebo test
result from the original treatment eﬀect estimate to obtain a “trend-adjusted” treatment
eﬀect of -0.18 with standard error 0.10 (table 7, column 6). This estimate is appropriate
if the two groups of students have linear but non-parallel time trends and are subject to
common transitory shocks (Heckman and Hotz, 1989). Finally, I estimate a linear time trend
in the GPA gap between Cape Town and non-Cape Town students from 2001 to 2005. I
then project that trend into 2007–2008 and estimate the deviation of the GPA gap from
its predicted level. This method yields a treatment eﬀect of random assignment relative to
tracking of 0.14 with standard error 0.09 (table 7, column 5). This estimate is appropriate
if the two groups of students have non-parallel time trends whose diﬀerence is linear. The
eﬀect of tracking is relatively robust across the standard diﬀerence-in-diﬀerences model and
all three models estimated under weaker assumptions. However, there is some within-policy
38
The cluster bootstrap standard errors do not take into account potential clustering within (unobserved)
dormitories in 2001–2002 and so may be downward-biased. I omit the 2003 academic year because the data
extract I received from the university had missing identiﬁers for approximately 80% of students in that year.
I omit 2006 because ﬁrst year students were randomly assigned to dormitories that still contained tracked
second year students. The results are robust to including 2006.
29
Table 7: Robustness Checks
Outcome Dorm No. of % credits GPA | non-
(default is GPA) student credits excluded exclusion
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
Cape Town high school 0.601
(0.019)
Cape Town high school -0.093 -0.090 -0.115
× tracking period (0.034) (0.044) (0.055)
Dormitory × -0.133 -0.013 -0.139 -0.165 0.027 -0.077
tracking period (0.050) (0.038) (0.043) (0.044) (0.005) (0.050)
Cape Town high school 0.141
× randomization period (0.093)
Placebo pre-treatment 0.058
diﬀ-in-diﬀ (0.052)
Trend-corrected -0.175
treatment eﬀect (0.100)
Sample period (default 2001- 2001- 2001-
is 2004-2008) -2008 -2008 -2008
Dormitory ﬁxed eﬀects × × × × ×
30
Student covariates × × × × × × × ×
Missing data indicators × × × × × × × ×
Instruments ×
Faculty ﬁxed eﬀects ×
Pre-treatment time trend ×
Adjusted R2 0.525 0.231 0.231 0.002 0.000 0.000 0.127 0.242 0.229 0.052 0.302
# dorm-year clusters 60 60 60 60 60 60 60 52 60 60 60
# dormitory students 6915 6915 6915 8509 8509 8509 7480 6795 7480 7480 7449
# non-dorm students 6466 6466 6466 14203 14203 14203 7188 7188 7188 7188 7043
Notes: Table 7 reports results from the robustness checks discussed in subsections 8.1 - 8.3. Columns 1–3 show the relationship between students’ GPA (outcome), whether
they live in dormitories (treatment) and whether they graduated from high schools located outside Cape Town (instrument). The coeﬃcient of interest is on the treatment
or instrument interacted with an indicator for whether students attended the university during the tracking period. Column 1 shows the ﬁrst stage estimate, column 2 shows
the reduced form estimate, and column shows the IV estimate. Dormitory ﬁxed eﬀects are excluded because they are colinear with the ﬁrst stage outcome. Columns 4–6 use
data from 2001-2002, 2004-2005, and 2007-2008 to test the parallel time trends assumption. Column 4 reports a diﬀerence-in-diﬀerences estimate comparing all four observed
years of tracking to the two observed years of random assignment. Column 5 reports the diﬀerence between observed GPA under random assignment and predicted GPA from
a linear time trend extrapolated from the tracking period. Column 6 reports the placebo diﬀerence-in-diﬀerences test comparing the ﬁrst two years of tracking to the last two
years of tracking and the diﬀerence between the main and placebo eﬀects following Heckman and Hotz (1989). Column 7 reports a diﬀerence-in-diﬀerences estimate with the
credit-weighted number of courses as the outcome. Column 8 reports a diﬀerence-in-diﬀerences estimate excluding dormitories that are either observed in only one period or
use a diﬀerent admission rule. Column 9 reports a diﬀerence-in-diﬀerences estimate including college/faculty/school ﬁxed eﬀects. Column 10 reports a diﬀerence-in-diﬀerences
estimate with the credit-weighted percentage of courses from which students are academically excluded as the outcome. Column 11 reports a diﬀerence-in-diﬀerences estimate
with GPA calculated using only grades from non-excluded courses as the outcome. Standard errors in parentheses are from 1000 bootstrap replications, stratifying by assignment
policy and dormitory status. The bootstrap resamples dormitory-year clusters except for the 2001-2002 data in columns 4-6, for which dormitory assignments are not observed.
GPA variation through time: intention-to-treat students (those from high schools outside
Cape Town) strongly outperform control students in 2006 and 2007 but not 2008. The
reason for this divergence is unclear.
Second, the time trends in the proportion of graduating high school students who qualify
for admission to university are very similar for Cape Town and non-Cape Town high schools
between 2001 and 2008 (shown in the second panel of ﬁgure 4). Hence, the pools of potential
dormitory and non-dormitory students do not have diﬀerent time trends. This helps to
address any concern that students make diﬀerent decisions about whether to attend the
University of Cape Town due to the change in the dormitory assignment policy. However,
the set of students who qualify for university admission is an imperfect proxy for the set of
potential students at this university. Many students whose high school graduation test scores
qualify them for admission to a university may not qualify for admission to this relatively
selective university.
Third, the results are not driven by two approximately simultaneous policy changes at
the university. The university charged a ﬂat tuition fee up to 2005 and per-credit fees from
2006. This may have changed the number of courses for which students registered. However,
the credit-weighted number of courses remained constant for dormitory and non-dormitory
students, with a diﬀerence-in-diﬀerences estimate of 0.013, less than 0.4% of the mean (table
7 column 7). The university also closed one dormitory in 2006 and opened a new dormitory in
2007, as well as reserving one cheaper dormitory for low-income students under both policies.
The estimated treatment eﬀect is robust to excluding all three dormitories (table 7 column
8).
8.3 Limitations of GPA as an Outcome Measure
I explore four ways in which the grading system might pose a problem for validity or inter-
pretation of the results: curving, ceiling eﬀects, course choices, and course exclusions. First,
instructors may use “curves” that keep features of the grade distribution constant through
time within each course. Under this hypothesis, the eﬀects of tracking may be negative ef-
fects on dormitory students relative to non-dormitory students, rather than negative eﬀects
on absolute performance. This would not invalidate the main result but would certainly
31
Figure 4: Long-term Trends in Student Academic Performance
0.4
0.3
0.2
GPA for students from non-
Cape Town high schools
University GPA
0.1
0
2001 2002 2003 2004 2005 2006 2007 2008
Linear prediction
-0.1 using 2001-2005 data
-0.2
-0.3
0.3
High schools in Cape Town
Rate of qualification for university admission
0.25
0.2
0.15
High schools not in Cape Town
0.1
0.05
0
2001 2002 2003 2004 2005 2006 2007 2008
Notes: The ﬁrst panel shows mean GPA for ﬁrst year university students from high schools outside Cape Town. The time series
covers the tracking period (2001-2005) and the random assignment period (2006-2008). Mean GPA for students from Cape Town
high schools is, by construction, zero in each year. Data for 2003 is missing and replaced by a linear imputation. The dotted
lines show a 95% conﬁdence interval constructed from 1000 replications of a percentile bootstrap stratifying by assignment
policy and dormitory status. The bootstrap resamples dormitory-year clusters for 2004-2008, the only years in which dormitory
assignments are observed. The second panel shows the proportion of grade 12 students whose score on the high school graduation
examination qualiﬁed them for admission to university. The mean qualiﬁcation rate for high schools in Cape Town is 0.138 in
the tracking period (2001 - 2005) and 0.133 in the random assignment period (2007 - 2008). The mean qualiﬁcation rate for
high schools outside Cape Town is 0.250 in the tracking period (2001 - 2005) and 0.245 in the random assignment period (2007
- 2008). The second diﬀerence is 0.001 (bootstrap standard error 0.009) or, after weighting by the number of grade 12 students
enrolled in each school, 0.007 (standard error 0.009).
32
change its interpretation. This is a concern for most test score measures but I argue that it
is less pressing in this context. Instructors at this university are not encouraged to use grad-
ing curves and many examinations are subject to external moderation intended to maintain
an approximately time-consistent standard. I observe several patterns in the data that are
not consistent with curving. Mean grades in the three largest introductory courses at the
university (microeconomics, management, information systems) show year-on-year changes
within an assignment policy period of up to 6 points (on a 0 to 100 scale, approximately 1/3
of a standard deviation). Similarly, the 75th and 25th percentiles of the grades within these
large ﬁrst-year courses show year-on-year changes of up to 8 and 7 points respectively. This
demonstrates that grades are not strictly curved in at least some large courses. I also examine
the treatment eﬀect of tracking on grades in the introductory accounting course, which builds
toward an external qualifying examination administered by South Africa’s Independent Reg-
ulatory Board for Auditors. This external assessment for accounting students, although it
is only administered only after they graduate, reduces the scope for internal assessment to
change through time. Tracking reduces mean grades in the introductory accounting course
by 0.11 standard deviations (cluster bootstrap standard error 0.12, sample size 2107 stu-
dents). This provides some reassurance that tracking reduces the academic competence of
low-scoring students.
Second, tracking may have no eﬀect on high-scoring students if they already obtain near
the maximum GPA and are constrained by ceiling eﬀects. I cannot rule out this concern
completely but I argue that it is unlikely to be central. The nominal grade ceiling of 100
does not bind for any student: the highest grade observed in the dataset is 97/100 and the 99th
percentile is 84/100. Some courses may impose ceilings below the maximum grade, which will
not be visible in my data. However, the course convenors for Introductory Microeconomics,
the largest ﬁrst-year course at the university, conﬁrmed that they used no such ceilings.
The treatment eﬀect of tracking on grades in this course is 0.13 standard deviations (cluster
bootstrap standard error 0.06), so the average eﬀect across all courses is at least similar to
the average eﬀect in a course without grade ceilings.
Third, dormitory students may take diﬀerent classes, with diﬀerent grading standards, in
the tracking and random assignment periods. There are some changes in the type of courses
33
students take: dormitory students take slightly fewer commerce and science classes in the
tracking than random assignment period, relative to non-dormitory students. However, the
eﬀect of tracking is consistently negative within each type of class. The treatment eﬀects for
each faculty/school/college range between -0.23 for engineering and -0.04 for medicine. The
average treatment eﬀect with faculty ﬁxed eﬀects is -0.17 with standard error 0.04 (table 7,
column 9). I conclude that the main results are not driven by time-varying course-taking
behavior.
Fourth, the university employs an unusual two-stage grading system which does explain
part of the treatment eﬀect of tracking. Students are graded on ﬁnal exams, class tests,
homework assignments, essays, and class participation and attendance, with the relative
weights varying across classes. Students whose weighted scores before the exam are below
a course-speciﬁc threshold are excluded from the course and do not write the ﬁnal exam.
These students receive a grade of zero in the main data, on a 0-100 scale. I also estimate
the treatment eﬀect of tracking on the credit-weighted percentage of courses from which
students are excluded and on GPA calculated using only non-excluded courses (table 7,
columns 10 and 11). Tracking substantially increases the exclusion rate from 3.7 to 6.4%
and reduces GPA in non-excluded courses by 0.08 standard deviations, though the latter
eﬀect is imprecisely estimated. I cannot calculate the hypothetical eﬀect of tracking if all
students were permitted to write exams but these results show that tracking reduces grades
at both the intensive and extensive margins. This ﬁnding is consistent with the negative
eﬀect of tracking being concentrated on low-scoring students, who are most at risk of course
exclusion. The importance of course exclusions also suggests that peer eﬀects operate from
early in the semester, rather than being concentrated during ﬁnal exams.
8.4 Other Channels Linking Dormitory Assignment to GPA
I ascribe the eﬀect of tracking on dormitory students’ GPAs to changes in the distribution
of peer groups. However, some other feature of the dormitories or assignment policy may
account for this diﬀerence. Dormitories diﬀer in some of their time-invariant characteristics
such as proximity to the main university campus and within-dormitory study space. The
negative treatment eﬀect of tracking is robust to dormitory ﬁxed eﬀects, which account for
34
any relationship between dormitory features and GPA that is common across all types of
students. Dormitory ﬁxed eﬀects do not account for potential interactions between student
and dormitory characteristics. In particular, tracking would have a negative eﬀect on low-
scoring students’ GPAs even without peer eﬀects if there is a negative interaction eﬀect
between high school graduation test scores and the characteristics of low-track dormitories. I
test this hypothesis by estimating equation 2 with an interaction between HSid and the rank
of dormitory d during the tracking period. The interaction term has a small and insigniﬁcant
coeﬃcient: 0.003, cluster bootstrap standard error 0.006. Hence, low-scoring students do not
have systematically lower GPAs when randomly assigned to previously low-track dormitories.
This result is robust to replacing the continuous rank measure with an indicator for below-
median-rank dormitories. I conclude that the results are not explained by time-invariant
dormitory characteristics.
This does not rule out the possibility of time-varying eﬀects of dormitory characteristics
or of eﬀects of time-varying characteristics. I conducted informal interviews with staﬀ in the
university’s Oﬃce of Student Housing and Residence Life to explore this possibility. There
were no substantial changes to dormitories’ physical facilities but there was some routine staﬀ
turnover, which I do not observe in my data. It is also possible that assignment to a low-
track dormitory may directly harm low-scoring students through stereotype threat. Their
dormitory assignment might continuously remind them of their low high school graduation
test score and undermine their conﬁdence or motivation (Steele and Aronson, 1995). I cannot
directly test this explanation and so cannot rule it out. However, the consistent results from
the cross-policy and cross-dormitory analyses suggest that peer eﬀects explain the bulk of
the observed treatment eﬀect of tracking, Wei (2009) also notes that evidence of stereotype
threat outside laboratory conditions is rare.
9 Conclusion
This paper describes the eﬀect of tracked relative to random dormitory assignment on student
GPAs at the University of Cape Town in South Africa. I show that tracking lowered mean
GPA and increased GPA inequality. This result occurs because living with high-scoring peers
35
has a larger positive eﬀect on low-scoring students’ GPAs than on high-scoring students’
GPAs. These peer eﬀects arise largely through interaction with own-race peers and the
relevant form of interaction does not appear to be direct academic collaboration. I present
an extensive set of robustness checks supporting a causal interpretation for these results.
My ﬁndings show that diﬀerent peer group assignment policies can have substantial ef-
fects on students’ academic outcomes. Academic tracking into residential groups, and perhaps
other noninstructional groups, may generate a substantially worse distribution of academic
performance than random assignment. However, my results do not permit a comprehensive
evaluation of the relative merits of the two policies. Tracking clearly harms low-scoring stu-
dents but some (imprecise) results suggest a positive eﬀect on high-scoring students. Chang-
ing the assignment policy may thus entail a transfer from one group of students to another
and, as academic outputs are not directly tradeable, Pareto-ranking the two policies may
not be possible. Non-measured student outcomes may also be aﬀected by diﬀerent group
assignment policies. For example, high-scoring students’ GPAs may be unaﬀected by track-
ing because the rise in their peers’ academic proﬁciency induces them to substitute time
away from studying toward leisure. In future work I plan to study the long-term eﬀects of
tracking on graduation rates, time-to-degree, and labor market outcomes. This will permit
a more comprehensive evaluation of the two group assignment policies. One simple revealed
preference measure of student welfare under the two policies is the proportion of dormitory
students who stay in their dormitory for a second year. Tracking reduces this rate for stu-
dents with above- and below-median high school test scores by 0.4 percentage points and 6.7
percentage points respectively (cluster bootstrap standard errors 3.5 and 3.8 respectively).
Low-scoring dormitory students may thus be aware of the negative eﬀect of tracking and
respond by leaving the dormitory system early.
Despite these provisos, my ﬁndings shed light on the importance of peer group assignment
policies. I provide what appears to be the ﬁrst cleanly identiﬁed evidence on the eﬀects of
noninstructional tracking. This complements the small literature that cleanly identiﬁes the
eﬀect of instructional tracking. For example, Duﬂo, Dupas, and Kremer (2011) suggest that a
positive total eﬀect of instructional tracking may combine a negative peer eﬀect of tracking on
low-scoring students with a positive eﬀect due to changes in instructor behavior. My ﬁndings
36
suggest that policymakers can change the distribution of students’ academic performance by
rearranging the groups in which these students interact without changing the marginal distri-
bution of inputs into the education production function. This is attractive in any setting but
particularly in resource-constrained developing countries. While the external validity of any
result is always questionable, my ﬁndings may be particularly relevant to universities serving
a diverse student body that includes both high performing and academically underprepared
students. This is particularly relevant to selective universities with active aﬃrmative action
programs (Bertrand, Hanna, and Mullainathan, 2010).
The examination of peer eﬀects under random assignment also points to fruitful avenues
for future research. As in Carrell, Sacerdote, and West (2013), peer eﬀects estimated un-
der random assignment do not robustly predict the eﬀects of a new assignment policy and
residential peer eﬀects appear to be mediated by students’ patterns of interaction. This
highlights the risk of relying on reduced form estimates that do not capture the behavioral
content of peer eﬀects. Combining peer eﬀects estimated under diﬀerent group assignment
policies with detailed data on social interactions and explicit models of network formation
may provide additional insights.
References
Abadie, A. (2005): “Semiparametric Diﬀerence-in-Diﬀerences Estimators,” Review of Eco-
nomic Studies, 72, 1–19.
Abdulkadiroglu, A., J. Angrist, and P. Pathak (2011): “The Elite Illusion: Achieve-
ment Eﬀects at Boston and New York Exam Schools,” Working Paper 17264, National
Bureau of Economic Research.
Angrist, J., and K. Lang (2004): “Does School Integration Generate Peer Eﬀects? Evi-
dence from Boston’s Metco Program,” American Economic Review, 94(5), 1613–1634.
Arnott, R. (1987): “Peer Group Eﬀects and Educational Attainment,” Journal of Public
Economics, 32, 287–305.
Athey, S., and G. Imbens (2006): “Identiﬁcation and Inference in Nonlinear Diﬀerence-
in-diﬀerences Models,” Econometrica, 74(2), 431–497.
Becker, G. (1973): “A Theory of Marriage: Part I,” Journal of Political Economy, 81,
813–846.
Benabou, R. (1996): “Equity and Eﬃciency in Human Capital Investment: The Local
Connection,” Review of Economic Studies, 63(2), 237–264.
Bertrand, M., R. Hanna, and S. Mullainathan (2010): “Aﬃrmative Action in Edu-
37
cation: Evidence from Engineering College Admissions in India,” Journal of Public Eco-
nomics, 94(1/2), 16–29.
Betts, J. (2011): “The Economics of Tracking in Education,” in Handbook of the Economics
of Education Volume 3, ed. by E. Hanushek, S. Machin, and L. Woessmann, pp. 341–381.
Elsevier.
Bhattacharya, D. (2009): “Inferring Optimal Peer Assignment from Experimental Data,”
Journal of the American Statistical Association, 104(486), 486–500.
Bitler, M., J. Gelbach, and H. Hoynes (2010): “Can Variation in Subgroups’ Aver-
age Treatment Eﬀects Explain Treatment Eﬀect Heterogeneity? Evidence from a Social
Experiment,” Mimeo.
Blume, L., W. Brock, S. Durlauf, and Y. Ioannides (2011): “Identiﬁcation of Social
Interactions,” in Handbook of Social Economics Volume 1B, ed. by J. Benhabib, A. Bisin,
and M. Jackson, pp. 853–964. Elsevier.
Burke, M., and T. Sass (2013): “Classroom Peer Eﬀects and Student Achievement,”
Journal of Labor Economics, 31(1), 51–82.
Cameron, C., D. Miller, and J. Gelbach (2008): “Bootstrap-based Improvements for
Inference with Clustered Errors,” Review of Economics and Statistics, 90(3), 414–427.
Carrell, S., F. Malmstrom, and J. West (2008): “Peer Eﬀects in Academic Cheating,”
Journal of Human Resources, XLIII(1), 173–207.
Carrell, S., B. Sacerdote, and J. West (2013): “From Natural Variation to Optimal
Policy? The Importance of Endogenous Peer Group Formation,” Econometrica, 81(3),
855–882.
Cattaneo, M. (2010): “Eﬃcient Semiparametric Estimation of Multi-Valued Treatment
Eﬀects under Ignorability,” Journal of Econometrics, 155, 138–154.
Cooley, J. (2013): “Can Achievement Peer Eﬀect Estimates Inform Policy? A View from
Inside the Black Box,” Review of Economics and Statistics, Forthcoming.
Crump, R., J. Hotz, G. Imbens, and O. Mitnik (2009): “Dealing with Limited Overlap
in Estimation of Average Treatment Eﬀects,” Biometrika, 96(1), 187–199.
Di Giorgi, G., M. Pellizzari, and S. Redaelli (2010): “Identiﬁcation of Social Interac-
tions through Partially Overlapping Peer Groups,” American Economic Journal: Applied
Economics, 2(2), 241–275.
DiNardo, J., N. Fortin, and T. Lemiuex (1996): “Labor Market Institutions and the
Distribution of Wages, 1973 - 1992: A Semiparametric Approach,” Econometrica, 64(5),
1001–1044.
Ding, W., and S. Lehrer (2007): “Do Peers Aﬀect Student Achievement in China’s
Secondary Schools?,” Review of Economics and Statistics, 89(2), 300–312.
Duflo, E., P. Dupas, and M. Kremer (2011): “Peer Eﬀects, Teacher Incentives, and
the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya,” American
Economic Review, 101(5), 1739–1774.
Duncan, G., J. Boisjoly, M. Kremer, D. Levy, and J. Eccles (2005): “Peer Eﬀects
in Drug Use and Sex among College Students,” Journal of Abnormal Child Psychology,
33(3), 375–385.
Epple, D., and R. Romano (1998): “Competition Between Private and Public Schools,
Vouchers and Peer-Group Eﬀects,” American Economic Review, 88(1), 33–62.
(2011): “Peer Eﬀects in Education: A Survey of the Theory and Evidence,” in
38
Handbook of Social Economics Volume 1B, ed. by J. Benhabib, A. Bisin, and M. Jackson,
pp. 1053–1163. Elsevier.
Figlio, D., and M. Page (2002): “School Choice and the Distributional Eﬀects of Ability
Tracking: Does Separation Increase Inequality?,” Journal of Urban Economics, 51(3), 497–
514.
Firpo, S. (2007): “Eﬃcient Semiparametric Estimation of Quantile Treatment Eﬀects,”
Econometrica, 75(1), 259–276.
(2010): “Identiﬁcation and Estimation of Distributional Impacts of Interventions
Using Changes in Inequality Measures,” Discussion Paper 4841, IZA.
Foster, G. (2006): “It’s Not Your Peers and it’s Not Your Friends: Some Progress Toward
Understanding the Educational Peer Eﬀect Mechanism,” Journal of Public Economics, 90,
1455–1475.
Garlick, R. (2012): “Mobility Treatment Eﬀects: Identiﬁcation, Estimation and Applica-
tion,” Mimeo.
Graham, B. (2011): “Econometric Methods for the Analysis of Assignment Problems in
the Presence of Complementarity and Social Spillovers,” in Handbook of Social Economics
Volume 1B, ed. by J. Benhabib, A. Bisin, and M. Jackson, pp. 965–1052. Elsevier.
Graham, B., G. Imbens, and G. Ridder (2013): “Measuring the Average Outcome
and Inequality Eﬀects of Segregation in the Presence of Social Spillovers,” Working Paper
16499, National Bureau of Economic Research.
Hanushek, E., J. Kain, and S. Rivkin (2009): “New Evidence about Brown v. Board of
Education: The Complex Eﬀects of School Racial Composition on Achievement,” Journal
of Labor Economics, 27(3), 349–383.
Hanushek, E., and L. Woessmann (2006): “Does Educational Tracking Aﬀect Perfor-
mance and Inequality? Diﬀerence-in-Diﬀerences Evidence across Countries,” Economic
Journal, 116, C63–C76.
Heckman, J., and J. Hotz (1989): “Choosing Among Alternative Nonexperimental Meth-
ods for Estimating the Impact of Social Programs: The Case of Manpower Training,”
Journal of the American Statistical Association, 84(408), 862–880.
Heckman, J., J. Smith, and N. Clements (1997): “Making the Most out of Programme
Evaluations and Social Experiments: Accounting for Heterogeneity in Programme Im-
pacts,” Review of Economic Studies, 64(4), 487–535.
Higher Education South Africa (2009): “Report to the National Assembly Portfolio
Committee on Basic Education,” Available online at www.pmg.org.za/report/20090819-
national-benchmark-tests-project-standards-national-examination-asses.
Hirano, K., G. Imbens, and G. Ridder (2003): “Eﬃcient Estimation of Average Treat-
ment Eﬀects Using the Propensity Score,” Econometrica, 71(4), 1161–1189.
Hoxby, C. (2000): “Peer Eﬀects in the Classroom: Learning from Gender and Race Varia-
tion,” Working paper 7867, National Bureau of Economic Research.
Hoxby, C., and G. Weingarth (2006): “Taking Race out of the Equation: School Reas-
signment and the Structure of Peer Eﬀects,” Mimeo.
Hsieh, C.-T., and M. Urquiola (2006): “The Eﬀects of Generalized School Choice on
Achievement and Stratiﬁcation: Evidence from Chile’s Voucher Program,” Journal of Pub-
lic Economics, 90(8-9), 1477–1503.
Hurder, S. (2012): “Evaluating Econometric Models of Peer Eﬀects with Experimental
39
Data,” Mimeo.
Imberman, S., A. Kugler, and B. Sacerdote (2012): “Katrina’s Children: Evidence
on the Structure of Peer Eﬀects from Hurricane Evacuees,” American Economic Review,
102(5), 2048–2082.
Kling, J., D. Liebman, and L. Katz (2007): “Experimental Analysis of Neighborhood
Eﬀects,” Econometrica, 75(1), 83–119.
Lavy, V., O. Silva, and F. Weinhardt (2012): “The Good, the Bad and the Average:
Evidence on the Scale and Nature of Ability Peer Eﬀects in Schools,” Journal of Labor
Economics, 30(2), 367–414.
Manski, C. (1993): “Identiﬁcation of Endogenous Social Eﬀects: The Reﬂection Problem,”
Review of Economic Studies, 60(3), 531–542.
Marmaros, D., and B. Sacerdote (2002): “Peer and Social Networks in Job Search,”
European Economic Review, 46(4-5), 870–879.
McEwan, P. (2013): “Improving Learning in Primary Schools of Developing Countries: A
Meta-Analysis of Randomized Experiments,” Mimeo.
Meghir, C., and M. Palme (2005): “Educational Reform, Ability, and Family Back-
ground,” American Economic Review, 95(1), 414–424.
Nechyba, T. (2000): “Mobility, Targeting, and Private-School Vouchers,” American Eco-
nomic Review, 90(1), 130–146.
Pop-Eleches, C., and M. Urquiola (2013): “Going to a Better School: Eﬀects and
Behavioral Responses,” American Economic Review, 103(4), 1289–1324.
Rothe, C. (2010): “Nonparametric Estimation of Distributional Policy Eﬀects,” Journal of
Econometrics, 155(1), 56–70.
Sacerdote, B. (2001): “Peer Eﬀects with Random Assignment: Results for Dartmouth
Roommates,” Quarterly Journal of Economics, 116(2), 681–704.
(2011): “Peer Eﬀects in Education: How Might They Work, How Big Are They and
How Much Do We Know Thus Far?,” in Handbook of the Economics of Education Volume
3, ed. by E. Hanushek, S. Machin, and L. Woessmann, pp. 249–277. Elsevier.
Slavin, R. (1987): “Ability Grouping and Student Achievement in Elementary Schools: A
Best-Evidence Synthesis,” Review of Educational Research, 57(3), 293–336.
(1990): “Ability Grouping and Student Achievement in Secondary Schools: A
Best-Evidence Synthesis,” Review of Educational Research, 60(3), 471–499.
Steele, C., and J. Aronson (1995): “Stereotype Threat and the Intellectual Test Per-
formance of African Americans,” Journal of Personality and Social Psychology, 69(5),
797–811.
Stinebrickner, T., and R. Stinebrickner (2006): “What Can Be Learned About Peer
Eﬀects Using College Roommates? Evidence from New Survey Data and Students from
Disadvantaged Backgrounds,” Journal of Public Economics, 90(8/9), 1435–1454.
Wei, T. (2009): “Stereotype Threat, Gender, and Math Performance: Evidence from the
National Assessment of Educational Progress,” Mimeo.
40
A Reweighted Nonlinear Diﬀerence-in-Diﬀerences Model
Athey and Imbens (2006) establish a model for recovering quantile treatment on the treated
eﬀects in a diﬀerence-in-diﬀerences setting. This provides substantially more information
than the standard linear diﬀerence-in-diﬀerences model, which recovers only the average
treatment eﬀect on the treated. However, the model requires stronger identifying assump-
tions.
The original model is identiﬁed under ﬁve assumptions. Deﬁne T as an indicator variable
equal to one in the tracking period and zero in the random assignment period and D as an
indicator variable equal to one for dormitory students and zero for non-dormitory students.
The assumptions are:
A1 GPA in the absence of tracking is generated by the unknown production function
GP A = h(U, T ), where U is an unobserved scalar random variable. GPA does not
depend directly on D.
A2 The production function h(u, t) is strictly increasing in u for t ∈ {0, 1}.
A3 The distribution of U is constant through time for each group, in this case dormitory
and non-dormitory students: U ⊥ T |D.
A4 The support of dormitory students’ GPA is contained in that of non-dormitory students’
GPA: supp(GP A|D = 1) ⊆ supp(GP A|D = 0).39
A5 The distribution of GPA is strictly continuous.40
These assumptions are suﬃcient to identify the counterfactual distribution of tracked dormi-
CF −1
tory students’ GPAs in the absence of tracking, FGP A|D=1,T =1 (·) = FGP A10 FGP A00 (FGP A01 (·)) .
These are the outcomes that tracked students would have obtained if they had been randomly
assigned. The q th quantile treatment eﬀect of tracking on the treated students is deﬁned as
the horizontal diﬀerence between the observed and counterfactual distributions at quantile
−1 CF,−1
q : FGP A|D=1,T =1 (q ) − FGP A|D=1,T =1 (q ).
39
This assumption is testable and holds in my data.
40
This assumption is testable and holds approximately in my data. There are 5505 unique GPA values for
14668 observations. No value accounts for more than 0.3% of the observations.
41
These identifying assumptions may hold conditional on some covariate vector X but not
unconditionally. In my application, some of the demographic characteristics show time trends
(table 1). If these characteristics are subsumed in U and in turn inﬂuence GPA, then the
stationarity assumption A3 will fail. The assumption may, however, hold after conditioning
on X .
Athey and Imbens discuss two ways to include observed covariates in the model. First, a
fully nonparametric method that applies the model separately to each value of the covariates.
This is feasible only if the dimension of X is low. Second, a parametric method that applies
the model to the residuals from a regression of GPA on X . This is valid only under the
strong assumption that the observed covariates X and unobserved scalar U are independent
(conditional on D) and additively separable in the GPA production function. Substantively,
the additively separable model is misspeciﬁed if the treatment eﬀect of tracking at any quan-
tile varies with X . For example, diﬀerent treatment eﬀects on students with high and low
high school graduation test scores would violate this restriction.
I instead use a reweighting scheme that avoids the assumption of additive separability and
may be more robust to speciﬁcation errors. Speciﬁcally, I deﬁne the reweighted counterfactual
distribution at each value g as
RW,CF −1
FGP A11 (g ) = FGP Aω
10 FGP A00
ω
(FGP A01 (g )) (5)
where FGP Ad 0 (·) is the distribution function of GP A × P r (T = 1|D = d, X )/P r (T = 0|D =
ω
d, X ). Intuitively, this scheme assigns high weight to students in the random assignment
period whose observed characteristics are similar to those in the tracking period.41 This is a
straightforward adaptation of the reweighting techniques used in wage decompositions and
program evaluation (DiNardo, Fortin, and Lemiuex, 1996; Hirano, Imbens, and Ridder, 2003).
The counterfactual distribution is identiﬁed under conditional analogues of assumptions A1 −
41 RW,CF −1
I could instead use FGP A11 (g ) = FGP Aω
10 FGP A00 FGP Aω (g )
01 as the counterfactual distribution,
ω
2
with weights P r(T = 1, D = 1|X )/P r(T = t, D = d|X ) for (d, t) ∈ {0, 1} . This reweights all four groups of
students to have the same distribution of observed characteristics. Balancing the distributions may increase
the plausibility of the assumption that dormitory and non-dormitory students share the same production
function h(x, u, t). Results from this model are similar, but with larger negative eﬀects in the left tail.
42
A5.42 Hence, the q th quantile treatment eﬀect on the tracked students is
−1 −1,RW,CF
τ QT T (q ) = FGP A11 (q ) − FGP A11 (q ). (6)
I do not formally establish conditions for consistent estimation. Firpo (2007) recommends a
series logit model for the propensity score P r(T = 1|D = d, X ) with the polynomial order
chosen using cross-validation. I report results with a quadratic function; the treatment eﬀects
are similar with linear, cubic, and quartic speciﬁcations. I implement the estimator in four
steps:
1. For D ∈ {0, 1}, I regress T on student gender, language, nationality, race, a quadratic
in high school graduation test scores and all pairwise interactions. I construct the
predicted probability Pˆr(T = 1|D, X ).
2. I evaluate equation 5 at each half-percentile of the GPA distribution (i.e. quantiles 0.5
to 99.5). I plot this counterfactual GPA distribution for tracked students in the ﬁrst
panel of ﬁgure 3, along with the observed GPA distribution.
3. I plot the diﬀerence between observed and counterfactual distributions at each half-
percentile in the second panel of ﬁgure 3.
4. I construct a 95% bootstrap conﬁdence interval at each half-percentile, clustering at
the dormitory-year level and stratifying by (D, T ).
The Stata code for implementing this estimator is available on my website.
42
Note that the conditional analogues of assumptions A4 and A5 are more restrictive. For example, the
common support assumption A4 may hold for the marginal distribution FGP A (·) but not for the conditional
distribution FGP A|X (·|x) for some value of x.
43