WPS7624 Policy Research Working Paper 7624 School Grants and Education Quality Experimental Evidence from Senegal Pedro Carneiro Oswald Koussihouèdé Nathalie Lahire Costas Meghir Corina Mommaerts Education Global Practice Group April 2016 Policy Research Working Paper 7624 Abstract The effect of increasing school resources on educational effects on test scores at lower grades that persist at least outcomes is a central issue in the debate on improving two years. These effects are concentrated among schools school quality. This paper uses a randomized experiment to that focused funds on human resource improvements analyze the impact of a school grants program in Senegal, rather than school materials, suggesting that teachers and which allowed schools to apply for funding for improve- principals may be a central determinant of school quality. ments of their own choice. The analysis finds positive This paper is a product of the Education Global Practice Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at nlahire@ worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team School Grants and Education Quality: Experimental Evidence from Senegal Pedro Carneiro Oswald Koussihouèdé Nathalie Lahire Costas Meghir ∗ Corina Mommaerts JEL classication : H52; I22; I25; O15 Keywords : Quality of education; Decentralization; School resources; Child Development; Clus- tered Randomized Control Trials ∗ This is a revised version of Decentralizing Education Resources: School Grants in Senegal. Pedro Carneiro is at University College London, IFS and CEMMAP. Oswald Koussihouèdé is at the Programme for the Analysis of Education Systems of the Conférence des Ministres de l'Education des Etats et gouvernements de la Francophonie (PASEC). Nathalie Lahire is at the World Bank. Corina Mommaerts and Costas Meghir are at Yale University. We thank David Evans, Deon Filmer and Waly Wane for helpful comments. Financial support was provided by the Education Program Development Fund (EPDF) of the World Bank. We thank Ibrahima Mbengue and ocials from the ministry of education as well as the research team at the Université Gaston Berger - Saint-Louis du Sénégal for invaluable help during the led work. Pedro Carneiro thanks the nancial support from the Economic and Social Research Council for the ESRC Centre for Microdata Methods and Practice (grant reference RES-589-28-0001), the support of the European Research Council through ERC-2009-StG-240910 and ERC-2009-AdG-249612. Costas Meghir thanks the ISPS and the Cowles foundation at Yale for nancial assistance. Corina Mommaerts thanks the NSF Graduate Research Fellowship for support. All errors and views are ours alone and do not necessarily reect the views and opinions of the World Bank, the funding bodies or those commenting on the paper. 1 Introduction In the last 50 years, primary school enrollment has increased dramatically in the developing world. Even in the poorest areas of Sub-Saharan Africa, gross enrollment rates in primary school are approaching 80 percent (e.g., Glewwe and Kremer (2006)). There is, however, widespread evidence that the quality of education in developing countries remains very low. As a result, increases in school enrollment may not translate into corresponding increases in productivity and wellbeing. This is consistent with recent evidence suggesting that education quality, not quantity, matters most for growth (e.g., Hanushek and Woessmann (2010), Glewwe et al. (2013)). We address the following question: is it possible to improve the quality of poor schools by providing them with cash transfers? The appeal of this idea lies in its simplicity. The assumption behind it is that local decision makers, such as principals and community leaders, are likely to have a deeper understanding of the needs of their schools than central education authorities, and are therefore in the best position to put these resources to their most ecient use. We study a school grant program in Senegal, which was developed to decentralize at least a small part of the country's education budget. Through this program, every elementary school in Senegal could apply for funds for a specic school project that seeks to improve the quality of learning and teaching, with the best proposals being selected through a competitive process. The maximum amount a school could receive for a project amounted to USD$3,190, which corresponded to 7 percent of the total annual school budget of a typical school (inclusive of teacher salaries). We nd large and statistically signicant eects on test scores one year after the start of the intervention, for children who beneted from school grants when they were in second grade - especially for girls. The eects are larger for schools in the South of the country, where projects tended to focus on training human resources (teaching and management), compared to the North, where priority was placed on the acquisition of school material (e.g., textbooks/manuals). We do not observe similar program impacts for children in other grades. The point estimates are very similar in the second follow up for the same children, pointing to persistent eects. Since we examine the impact of the intervention across dierent tests and dierent groups of students, for inferential purposes we implement a step-down procedure proposed by Romano and Wolf (2005) that controls the probability of falsely rejecting at least one true null hypothesis, and improves upon more conservative prior methods for multiple hypothesis testing such as the 2 Bonferroni procedure. We show that our main conclusions survive and are unlikely to be due to false rejections. The evidence on the eect of school resources on primary school student achievement in de- veloping countries is at best mixed (see Glewwe and Kremer (2006), Glewwe et al. (2013), and Murnane and Ganimian (2014) for reviews). While some pedagogical resources, such as textbooks and ipcharts, only have positive eects for high-achieving students (see Glewwe et al. (2009), Glewwe et al. (2004)), other resources such as computer-assisted instruction increased test scores by up to one-half of a standard deviation in India (Banerjee et al. (2007)). If local decision-makers can target resources better than a central authority, however, school grants (and other ways of decentralizing funding) could help boost the eect of school resources by targeting funds toward ecient uses of resources (see Galiani and Perez-Truglia (2013) for a review). The approach used in Senegal is one of decentralization of school resources, in the sense that it is the schools themselves that dene their needs. Recent work on secondary schools in Argentina and primary schools in the Gambia nd positive eects of decentralization (Galiani et al. (2008), Blimpo et al. (2014)). Meanwhile, cross-country comparisons show negative eects of decentral- ization for developing countries (Hanushek et al. (2013)). We cannot conclude whether the grants approach is superior to an alternative where resources are directed centrally, because no other approach was tried. However, our results indicate that decentralized distribution of resources through school grants can have positive eects on student achievement, and we present suggestive evidence that factors such as teacher quality may have enhanced the impacts. The paper proceeds as follows. In Section 2 we describe the school grants program in Senegal and the evaluation design. In Section 3 we describe our data and Section 4 describes our empirical approach. In Section 5 we present our main results and examine potential mediating factors through which the impact of the program may have operated. Section 6 concludes. 2 Description of the Program and Evaluation Primary schooling in Senegal consists of six years of education and is funded through a mix of government, foreign aid, and household resources. 1 Almost all classroom instruction is conducted 1 Fees collected from parents represent around ten percent of school funding in 2006 (PASEC (2007)) and are a non-trivial nancial burden on families: around one-fth of students who dropped out in the rst year of primary school did so because of limited nancial resources of their parents (World Bank (2013)). 3 in French, while the language spoken by students at home is predominantly not French (only 11 percent of the household interviews were conducted in French). Gross enrollment rates in primary schools increased dramatically over the ten years prior to our study, from 67 percent in 2000 to 92 percent in 2009. Despite this large increase in enrollment, in 2009 only 60 percent of students completed primary school. In an eort to increase the quality of primary education, Senegal's Ministry of Education initiated this school grants program. 2.1 School Grants in Senegal For the past several years, Senegal has used school grants (projets d'école) as a tool to fund improvements in education quality, based on the premise that school-level actors are in the best position to identify a school's unique deciencies and the most workable solutions to address them. Beginning in 2009, the emphasis of these grants shifted from strengthening the physical environment toward pedagogic issues. At that point the government also sought technical and nancial support from the World Bank to rigorously evaluate the program. The main goal of the program was to improve school quality, as measured by student learning outcomes, specically by improving pedagogical resources in the school. Instead of providing general funding for all schools, funds were targeted towards problems identied by the school as major obstacles to quality, and identied by a government evaluation committee (Inspection Départementale de l'Education Nationale, IDEN) as being eligible for funding based on district- level and system-wide priorities. Problems were identied at the local level, in the hope that decentralized decision-making would allow more ecient and eective use of funds. Generally, the program worked as follows. The Ministry of Education issued a call for pro- posals, based on the available grant funding, priority areas, and eligible activities (and sometimes eligible regions). Schools that decided to apply for funding completed a grant application for a school project (called the projet d'école) addressing a particular pedagogical issue faced by the school. Another important component of the program was its role in promoting strong community participation in schools. As a result, grants were prepared by a committee of parents, teachers, and local ocials. For schools that received a grant, the grant totaled around 1,500,000 CFA Francs (approximately USD$3,190), which represented a roughly 7 percent increase in expenditures per student in a typical school (inclusive of teacher salaries, which comprise over 90 percent of the 4 budget). 2 We next describe the process through which grants were approved and allocated. 2.2 Evaluation Design In the initial stage of this study, all Senegalese schools were eligible to respond to the call for proposals. The IDEN evaluation committee rst ranked the applications and discarded low quality and ineligible applications. The remaining ones, referred to as approved applications were grouped into two categories. The rst consisted of very good proposals which were eligible for nancing. The second consisted of strong proposals with potential, but which needed revision. These were sent back to schools with comments from the IDEN evaluation committee, then re- submitted. Figure 1 provides a graphical representation of this process. Figure 1: Evaluation Design To implement this process, the procedures manual for the projets d'école was amended (relative to versions used for earlier cohorts of school grants) to include the revision of strong proposals needing adjustments. An additional ocial document issued by the Ministry of Education was circulated throughout the IDENs in the country, establishing the procedure described above as the norm for the allocation of funds for the next cohort of school projects. 2 These numbers are based on collected self-reports from principals and teachers in our sample. 5 Figure 2: Evaluation Timeline This process resulted in the selection of 633 projects to fund, whose locations are shown in Figure 4 in Appendix B. 3 For the purposes of the evaluation, these 633 projects were randomly allocated to three funding cohorts. 211 schools were selected randomly to receive funding in the rst cohort (June 2009), at the end of the school year. This funding could only be executed at the beginning of the following school year (October/November). Of the remaining schools, 211 were to receive funding in June 2010, and another 211 were to receive funding in June 2011. In practice, the disbursement of the second round of grants did not occur until the rst trimester of 2011. This means that between mid-2009 and mid-2011, two groups of schools can be compared. The schools in the rst cohort received school grants during this period, while the schools in the second and third cohorts did not and therefore can be used as a comparison group for the schools in the rst cohort. The school year runs from October/November through June, allowing us to compare the rst cohort to both the second and third cohorts for the 2009-2010 school year and the rst cohort to the third cohort for the 2010-2011 school year (see Figure 2). 3 Of these projects, 96 percent included a component to improve French outcomes, 70 percent had a component to improve math outcomes, and 52 percent had a component to improve science outcomes. 82 percent of the projects aimed to build capacity, 63 percent aimed to increase teaching time, and 45 percent aimed to reduce repetition and drop-out. The intended beneciaries of these projects, in addition to students, were the teachers and principal in 84 percent of projects, and the management committee in 29 percent of projects. 6 The randomization among eligible schools is critical for our study: it ensures that the three successive cohorts are statistically comparable, which in turn ensures unbiased estimates of the eect of the program. In this process it is crucial that the control group contains only schools that were judged as eligible but were not selected to receive funding by the randomization process until a later date. 3 Data and Balance In order to gather data for this study, three waves of surveys were administered to students and their families, teachers, and principals in these schools. A baseline survey was conducted at the start of the 2009-2010 academic year (in November), right as the rst round of grants were able to be executed. Subsequent surveys took place in November 2010 at the beginning of the 2010- 2011 academic year (rst follow-up), and in May 2011 at the end of the 2010-2011 academic year (second follow-up). At baseline, we administered written assessments in mathematics and French to a random sample of 6 children in each of grades 2 and 4, and an oral reading assessment (similar to Early Grade Reading Assessment, or EGRA) to a random sample of 3 of those 6 children in grades 2 and 4. Importantly, the same tests were administered across all waves. In addition, we randomly selected 2 of the 3 children in each grade who took all three assessments, and conducted a household survey that included demographic and nancial information on all household members. Finally, we collected classroom and school level information by surveying the school principals and the teachers of the students in our sample. In the rst follow-up, we surveyed and tested the same children again (at the start of 3rd and 5th grade, respectively) and their households, teachers and principals. Schools who received grants in the rst cohort answered a set of questions on the use of the extra funds. To examine the possibility that funds were disproportionately channelled to students preparing to enter secondary school, we also administered written assessments in mathematics and French to a random sample of children who were in 6th grade at follow-up, and also surveyed their teachers. In the second follow-up, we re-surveyed and tested the same children who were tested at baseline and rst follow-up. In addition, in the second follow-up we administered the Peabody Picture Vocabulary Test (PPVT) to children and their mothers. We did not collect general school 7 and classroom information in the second follow-up. 4 Of the 633 schools, split randomly into three cohorts of 211 schools each, we sampled 525. We were able to contact 478 schools at baseline (among which 447 were successfully surveyed), 528 at rst follow-up 5 (among which 517 were successfully surveyed), and 340 at second follow-up (among which 325 were successfully surveyed and tested). 6 The schools that were not included at baseline were out of bounds either due to inclement weather or rebel activity in the South. While this may have impacted the representativeness of the baseline sample, it did not aect the balance as accessibility was not correlated with treatment status, as we will report later. Due to budgetary constraints, in the second follow-up we dropped Cohort 2 schools, and ended up with a sample of 352 schools, of which 325 schools were successfully surveyed and tested. Since cohorts were randomly allocated, this did not introduce bias. Table 1 shows descriptive statistics and balance between treatment and control schools for grades 2 and 4. Columns 2 and 4 show means and standard deviations of baseline characteristics in control schools, and columns 3 and 5 report the dierences in characteristics between treatment and control at baseline and their standard errors. Panel A reports test scores. 7 The resulting mean scores for the French, mathematics, and oral tests (calculated as the proportion of correct responses on the exam) were around 20-40 percent. The same tests were administered at rst follow-up, so these scores allowed room for noticeable improvement. The fourth row corresponds to an index of the three tests (which is the rst principal component of these three tests, standardized to have unit variance). Panel B shows household characteristics of the students. On average these students live less than a kilometer from the school and miss one day of school per month. Their households spend a fair amount of their income on education expenses as compared to household food consumption, and over half of the parents claim to be involved in school activities. Only 10 percent of the 4 During the 2nd follow up we also tested a random sample of 2nd, 4th, and 6th grade students in French and mathematics, but did not collect any household information on them. However, in this study we concentrate on the panel of children we originally selected at baseline as planned in the randomization protocol. This ensures our results are not in any way aected by composition eects due to mobility of children that could have been induced as a result of the program. 5 We contacted more schools in the rst follow-up than we originally sampled because the enumerators acciden- tally went to an extra treatment and two extra control schools that we had not originally planned on sampling. 6 See Appendix Table 14 for the corresponding number of student-level observations and attrition. In Appendix Table 15 we show the dierence in baseline characteristics between treatment and control schools, for students who did not leave the sample between baseline and rst follow-up or second follow-up, respectively. The sample is similarly balanced as our main sample (see below). 7 The full distribution of test scores is in Appendix A. 8 household interviews were conducted in French. Panel C reports school characteristics. The average school in our sample is not small: it has 347 students and 10 teachers, half of whom hold a baccalaureate degree and half of whom participated in training in the ve years preceding the intervention. The schools are varied in their resources: 56 percent have electricity, and 23 percent have a library. Three-quarters of principals have a baccalaureate degree. Treatment and control schools are very well balanced. All but two dierences (parental in- volvement in school and the percent of teachers who report receiving training in the past ve years, both for second grade) are insignicant at the 5% level. It is noteworthy that the precision of the dierence in test scores is very high, which bodes very well for our ability to detect even small eects of the program. 8 As explained above, some schools were inaccessible at baseline, and thus were only added to the survey in the rst follow up (although they participated in the randomization, and the treatment schools in this group were funded as planned). The exclusion from baseline was unrelated to treatment status, which explains why nevertheless baseline schools are balanced. In Appendix Table 6 we present descriptive statistics for all schools including those added at the rst follow up. As we expect, when we compare the characteristics of treatment and control schools which we did not expect to change as a result of the experiment there is no signicant dierence, other than possibly in distance from school. However this is just one signicant dierence among many dierences; jointly there are no dierences and this one is very small in magnitude. Hence, whether we look at schools surveyed at baseline or at the rst followup, there is no evidence of imbalances between treatment and control with respect to their time-invariant characteristics. Another concern is that these 633 schools may be fundamentally dierent from other primary schools in Senegal as a result of the grant selection process (e.g., these schools were better organized to put together a good grant application). Thus, they may not constitute a random set of schools in Senegal and the results of this study may not generalize. In Appendix Table 7, we show characteristics of a nationally representative sample of Senegalese households using data collected 8 With the exception of the index score, we chose not to standardize the mathematics, French, and oral scores. The tests were designed to appropriately measure the types of skills taught in the rst years of elementary school, and looking at the proportion of right answers in this test is a natural way to assess student knowledge in these subjects, and its progress over time. Furthermore, these scores are specic to Senegal, so standardization would not be useful for international comparisons. Even within sample, we show in Appendix B that the distribution of scores is highly non-normal, so a one standard deviation in test scores does not have the usual meaning. Nevertheless, for our main results we report standard deviations of control schools to convert results to standard deviations. 9 Table 1: Baseline Descriptive Statistics and Balance, by Grade Grade 2 Grade 4 Control Treat-Control Control Treat-Control Panel A: Test Scores Percent Correct: French 0.42 (0.22) -0.01 (0.02) 0.39 (0.17) 0.00 (0.01) Percent Correct: Math 0.37 (0.23) -0.00 (0.02) 0.33 (0.19) -0.01 (0.02) Percent Correct: Oral 0.22 (0.17) 0.01 (0.02) 0.55 (0.24) -0.01 (0.02) Index Score (standardized) 0.00 (0.98) -0.00 (0.09) 0.02 (0.98) -0.05 (0.09) Panel B: Household Characteristics ∗ Days of school missed last week 0.17 (0.86) 0.07 (0.07) 0.16 (0.75) -0.07 (0.04) ∗ Student works after school 0.01 (0.10) 0.01 (0.01) 0.02 (0.14) -0.01 (0.01) Household size 9.26 (4.06) 0.00 (0.32) 9.07 (4.05) 0.14 (0.33) Number of children in household 5.25 (2.61) -0.03 (0.20) 5.14 (2.77) 0.21 (0.22) Head has any education 0.60 (0.49) -0.02 (0.04) 0.56 (0.50) -0.06 (0.04) Percent of adult females with any education 0.37 (0.41) -0.03 (0.03) 0.31 (0.39) -0.01 (0.03) Distance to school (km) 0.71 (0.91) -0.07 (0.06) 0.76 (2.56) -0.03 (0.12) ∗∗ ∗ Parent involved in school 0.38 (0.49) 0.09 (0.04) 0.45 (0.50) -0.07 (0.04) Expenditure on household food (1,000s CFA) 21.83 (15.45) 1.26 (1.17) 22.11 (15.50) 0.51 (1.20) Expenditure on uniform (1,000s CFA) 2.43 (1.19) 0.15 (0.36) 2.28 (1.13) 0.06 (0.35) Expenditure on tuition (1,000s CFA) 1.10 (1.18) -0.01 (0.09) 1.03 (1.01) 0.04 (0.09) Expenditure on supplies (1,000s CFA) 3.85 (5.64) -0.38 (0.29) 4.34 (4.16) -0.38 (0.27) Student has tutor 0.15 (0.36) -0.01 (0.03) 0.14 (0.35) -0.01 (0.03) Home has electricity 0.47 (0.50) 0.03 (0.04) 0.45 (0.50) 0.02 (0.04) Home has modern toilet 0.54 (0.50) -0.01 (0.04) 0.50 (0.50) 0.01 (0.04) Land owned (hectares) 2.37 (3.47) 0.43 (0.46) 2.89 (9.11) -0.50 (0.43) ∗ Interview conducted in French 0.12 (0.32) -0.04 (0.02) 0.11 (0.31) -0.01 (0.02) Panel C: School and Teacher Characteristics Distance to nearest city (km) 18.38 (25.01) -0.07 (2.18) 18.03 (24.56) 0.21 (2.20) Locality population (100,000s) 1.38 (4.40) 0.04 (0.46) 1.41 (4.43) 0.03 (0.45) Locality has health center 0.71 (0.45) 0.03 (0.04) 0.71 (0.45) 0.03 (0.04) School located in South 0.18 (0.39) -0.01 (0.04) 0.19 (0.39) -0.01 (0.04) School has Electricity 0.57 (0.50) 0.01 (0.05) 0.57 (0.50) 0.01 (0.05) Number of Teachers 9.68 (4.97) 0.44 (0.51) 9.74 (4.93) 0.57 (0.52) Number of Pupils 341.11 (252.39) 28.47 (25.60) 343.65 (253.37) 35.57 (26.05) ∗ ∗ School has library 0.21 (0.40) 0.08 (0.04) 0.21 (0.41) 0.08 (0.04) Number of computers 1.28 (4.39) -0.01 (0.40) 1.30 (4.39) 0.01 (0.40) Number of manuals in classroom 59.90 (45.18) 3.17 (4.58) 66.43 (51.96) 5.68 (5.40) Percent teachers female 0.32 (0.24) 0.01 (0.02) 0.32 (0.23) 0.01 (0.02) Average teacher age 33.12 (4.24) -0.13 (0.39) 33.26 (4.23) -0.10 (0.39) Percent of teachers with Baccalaureate 0.41 (0.23) -0.02 (0.02) 0.41 (0.22) -0.02 (0.02) Average teacher experience 6.56 (3.69) 0.08 (0.35) 6.61 (3.69) 0.13 (0.35) ∗∗ Percent teachers with training in past 5 years 0.47 (0.50) 0.10 (0.05) 0.47 (0.50) 0.01 (0.05) Percent of principals with Baccalaureate 0.74 (0.44) -0.05 (0.04) 0.74 (0.44) -0.06 (0.04) Notes: Grouped columns 2 and 4 report means and standard deviations of baseline characteristics in control schools for grades 2 and 4, respectively. Grouped columns 3 and 5 report dierences in characteristis between treatment and ∗ ∗∗ ∗∗∗ control schools at baseline and their standard errors, clustered by school. p < 0.10, p < 0.05, p < 0.01 10 in 2006 by PASEC, 9 a survey aimed at assessing educational attainment in primary school, and variables that correspond to those in our data. Schools in our sample have fewer students and are more likely to have electricity than the average school in Senegal, but are similar on other measures, including the literacy rates, the number of teachers and their education, and whether the school has a library. At least in terms of these variables, our sample does not look drastically dierent from the average Senegalese primary school. 4 Empirical Approach and Inference We use a regression approach to estimate the impacts of the program. Specically, the impacts k are the estimated βt coecients from the following regression: k k k Yist = αt + βt Gs + Xis λt + εk ist (1) k where Yist is the proportion of correct answers in test k , for student i in school s at follow up t (1 or 2), Gs is a treatment indicator, Xis are conditioning variables measured at baseline, and εk ist is an error term. Conditioning variables include household size, number of children, whether the head has any education, distance to school, a wealth index 10 , the interview language, and the baseline scores of all tests. Since household interviews were conducted for only a random subsample of students, two-thirds of our sample has missing household characteristics (at random). In order to keep these observations, we assign zeros to conditioning variables if they are missing and include dummies for observations with missing conditioning variables. 11 We report standard errors, clustered at the school level, and symbols ***, **, and * to denote signicance at the 1%, 5%, and 10% level of standard single hypothesis tests, respectively. In addition, since we are testing multiple hypotheses at once we compute levels of signicance for each 9 Programme for the Analysis of Education Systems of the (Conférence des Ministres de l'Education des Etats et gouvernements de la Francophonie). 10 The wealth index is standardized to have unit variance and is dened as the rst principal component of the following variables: the home has electricity, the home has plumbing, the home has a radio, the home has a television, the home has a telephone, the home has a computer, the home has a refrigerator, the home has gas, the home has an iron, the home has a bicycle, the home has an automobile, the home has a bed, the home has a modern toilet, the number of chickens, the number of sheep, the number of cows, the number of horses, the number of donkeys, the amount of land, savings, debt, food expenditure, child expenditure, other expenditure, wall material, ground material, and roof material. 11 Results without conditioning variables are presented in Appendix Table 8 and they are almost identical, but of course less precise. 11 coecient using the step-down approach of Romano and Wolf (2005). In this way we control for the family-wise error rate (FWE). The FWE is dened as the probability of incorrectly identifying at least one coecient as signicant, which becomes more likely as the number of hypothesis tests increases. The Romano-Wolf approach improves upon more conservative classical methods such as the Bonferroni correction by applying a step-down" algorithm that takes advantage of the dependence structure of individual tests. Our approach is to control for a FWE of 5 and 10 percent and mark each coecient that is signicant at each of these rates with †† and † respectively. However, testing too many hypotheses at once may reduce power to detect anything signicant. We thus test multiple hypotheses in related groups rather than for all eects reported in the paper. 5 Results 5.1 Overall Treatment Eects We begin by showing the overall eect of the program for grades 2 and 4 at baseline (they were in grade 3 and 5 at follow-up). As explained above, at rst follow-up we have measurements of student performance in written tests in French and mathematics, as well as an oral test that covers sound, letter and word recognition, and reading comprehension, but (for cost reasons) was only administered to a third of the students who take written tests. For each of these three tests we compute the proportion of correct answers given by each student. In addition, we use the rst principal component as a summary index of these three tests, which is standardized to have mean zero and standard deviation 1. For the second follow-up, we also have scores for the Peabody Picture Vocabulary Test, which is standardized to have mean zero and standard deviation 1 (within grade). The results are in Table 2. Panel A concerns the rst follow up, which was administered at the start of grades 3 and 5 respectively about a year after the disbursement of the project funds, while Panel B relates to the the second follow up at the end of grade 3 and 5 for the same children, which we are following throughout. The rst three columns report eects on French, mathematics, and oral test scores. The fourth column provides a summary measure by reporting the rst principal component of these three tests. 12 Column 5 reports PPVT scores, which were obtained only in 12 One interpretation of the individual tests is that they are noisy measurements of one underlying human capital factor. By using the rst principal component of the three tests, we may improve precision. 12 Table 2: Program Impacts on Grades 3 and 5 Test Scores French Math Oral Index PPVT Panel A: Beginning of Grade (First Follow-Up) Overall ∗∗ 0.021 † ∗∗ 0.019 † ∗ 0.019 † 0.080 ∗ † (0.010) (0.010) (0.010) (0.044) Observations 5368 5361 2732 2679 Control Mean (SD) 0.51 (0.23) 0.49 (0.23) 0.50 (0.27) Grade 3 0.029 ∗∗ † 0.027 ∗∗ † 0.029 ∗∗ † 0.126 ∗∗ † (0.014) (0.012) (0.014) (0.060) Observations 2720 2718 1385 1350 Control Mean (SD) 0.53 (0.25) 0.54 (0.24) 0.35 (0.22) Grade 5 0.011 0.010 0.008 0.027 (0.011) (0.012) (0.013) (0.053) Observations 2648 2643 1347 1329 Control Mean (SD) 0.48 (0.20) 0.44 (0.20) 0.64 (0.24) Panel B: End of Grade (Second Follow-Up) Overall 0.020 ∗ 0.005 ∗∗ 0.026 † 0.094 ∗ † 0.057 (0.012) (0.011) (0.012) (0.054) (0.082) Observations 3338 3327 1686 1620 1122 Control Mean (SD) 0.63 (0.22) 0.62 (0.22) 0.58 (0.26) Grade 3 0.035 ∗∗ † 0.017 0.039 ∗∗ † 0.160 ∗∗ † 0.153 (0.016) (0.015) (0.018) (0.077) (0.096) Observations 1732 1721 853 826 566 Control Mean (SD) 0.66 (0.23) 0.68 (0.23) 0.45 (0.23) Grade 5 0.003 -0.008 0.007 0.013 -0.060 (0.012) (0.013) (0.014) (0.061) (0.097) Observations 1606 1606 833 794 556 Control Mean (SD) 0.59 (0.20) 0.57 (0.20) 0.72 (0.21) ∗ ∗∗ Notes: Standard errors are in parentheses and are adjusted for clustering. p < 0.10, ∗∗∗ p < 0.05, p < 0.01 correspond to p-values from the usual single-hypothesis tests. † corresponds signicance at the 10% level of Romano Wolf (2005) p-values from joint tests of French, mathematics, and oral (3 tests each, by row) or to the index alone. Conditioning variables: Grade, gender, household size, number of children, education of head, distance to school, wealth index, interview language, baseline scores, missing dummies. 13 the second follow-up. In the pooled sample, the index shows an improvement equal to 8.0% of a standard deviation in the rst follow up, which is signicant at the 5.2% level. This improvement is maintained and increased to 9.4% of a standard deviation in the second follow up, which has a p-value of 7.2%. Thus overall the program improved outcomes in the schools. When we break down the index to the individual tests we administered and adjust the p-values for multiple testing we nd that all test scores improved in the rst follow up by similar amounts and their adjusted p-values are less than 10%. In the second follow up the improvement in math was lost but the one in French and the oral remained and are both signicant at least at the 10% level. The improvement in overall test scores is largely driven by eects in third but not fth grade. There are large impacts of school grants on third grade test scores across all tests. Test scores increased by almost 3 percentage points, which is a large eect in light of the means (and standard deviations) of test scores. When we look at this aggregate index of the three tests, the school grant increases third grade school performance by 0.126 of a standard deviation at the rst follow-up. The eect on the index survives at 0.16 of a standard deviation through the end of grade three, indicating that the program impacts persisted two years after the grant was disbursed to schools. It is interesting that a relatively small grant is able to improve children's learning outcomes to this extent. By contrast, in Glewwe and Kremer's (2006) survey of the recent literature on the eectiveness of improvements in school resources on students' learning in developing countries, there are several interventions that show no signicant impact. In developed countries, there are even fewer examples of successful school resource interventions (Hanushek (2006)). It is possible that the intervention improved outcomes because it provided cash in a decentral- ized way to local decision makers, who could then put these funds to an ecient use. Nevertheless, there is abundant evidence of leakages in other similar grant programs across the world (Reinikka and Svensson (2004), Bruns et al. (2011)). If the extent of local capture of these funds is also substantial in Senegal then the results in this paper are even more remarkable because they would have been produced with minimal resources. However, these eects are absent for fth graders: the impacts are numerically close to zero and statistically insignicant by any criterion. The standard errors of the estimates are similar across grades, but the point estimates are much smaller. 13 This is perhaps surprising. However, 13 Therefore, the lack of statistically signicant results in grade 5 (but not in grade 3) does not appear to be due 14 principals may be investing more in earlier grades driven by a belief that learning delays emerge early in the life of the child. Indeed such a belief is actively promoted by PASEC. Using data from the teachers' questionnaires at follow-up we investigate whether there were dierential impacts of school grants on observable investments in 3rd and 5th grade students in Panel A of Appendix Table 9. 14 Some of the variables we can study are classroom materials (e.g., textbooks/manuals, desks, tables, etc), and teacher training. We nd no dierential impact of the program in any of these. When we examine other classroom characteristics or teacher behaviors, the only interesting dierence to report concerns student (mis-)behavior in the classroom. While in third grade there was a positive impact of the program on student behavior as measured by the number of times a day a teacher needs to demand silence, in fth grade there was a negative impact of the program on student behavior measured by this variable, and by the number of times a teacher has to punish a child for impolite behavior. Observable parental investments are not dierent between grades three and ve (see Panel B of Appendix Table 9), which is prima facie evidence that the dierences are attributable to the eectiveness or administration of grants between the grades. 5.2 Distributional Impacts Whether the program has dierent eects across the distribution is an important question relating to targeting. In Figure 3 we show parameter estimates together with their 95% condence intervals from a quantile regression of the relevant test scores for grade three in the rst follow-up (rst column) and second follow-up (second column), on the treatment indicator and including the usual controls, clustered by school. The eects of the grant are generally spread over most of the distribution as shown by the index in the fourth row, although the results are less precise due to smaller sample sizes, particularly in the second follow-up. In the second follow-up, for mathematics the eects are larger at the lower end of the distribution, while for French, oral, and PPVT scores the eects are somewhat larger in the mid- to the upper end of the distribution. In the remaining part of the paper we look in greater detail at these results and consider to a lack of power. If the point estimates for grade 5 were as large as those for grade 3 it is likely that we would be able to reject that they were statistically equal to zero. When designing our study we anticipated that with our sample we would be able to detect program impacts of between 0.2 and 0.3 standard deviations, which is in line with what we nd. 14 Ideally we would want to do this using 2nd and, say, 4th grade students, but we do not have the follow-up data for these teachers, although we have baseline data for them. 15 Figure 3: Distributional Impacts on Test Scores in Third Grade Beginning of Grade 3 End of Grade 3 .1 .05 French 0 .05-.05 .1 Math 0 -.05 .1 .05 Oral 0 -.05 .5 .25 Index 0 -.25 0 2 4 6 8 10 Decile .5 .25 PPVT 0 -.25 0 2 4 6 8 10 Decile Notes: Point estimates from a quantile regression at each decile with 95% condence intervals. Index and PPVT coecients are standardized. Beginning of grade 3 is rst follow-up. End of grade three is second follow-up. 16 heterogeneity of eects and underlying mechanisms. 5.3 Heterogenous Impacts In this section we consider characteristics by which the impact of the school grants may plausibly dier: gender, prior ability, and region (the South is much poorer and geographically distinct from the North). For baseline ability, we convert corresponding baseline test scores into a "high" (above median) or "low" (below median) binary variable. 15 For region, we distinguish schools located in the most southern regions in Senegal (Ziguinchor and Kolda) from schools in the rest of the country. We consider these regions separately because Ziguinchor and Kolda are much poorer regions (ANSD (2007)) and have been beset by problems related to rebel activity. The regressions we run to construct Table 3 extend equation (1) to include an interaction between the treatment variable Gs and a pre-determined variable Wist (gender, baseline ability, or region): k k k k k Yist = αt + βt Gs + δt (Gs ∗ Wist ) + ψt Wist + εk ist (2) Since our larger estimates of program impacts were for students in 3rd grade, who were rst exposed to the program in 2nd grade, we focus this analysis of heterogeneous impacts on them. 16 The results are shown in Table 3. Panel A reports results from the rst follow-up, and Panel B reports results from the second follow-up. Each panel reports program impacts for each Wist as well as control means and standard deviations. There are large dierences in program impact by gender. For females, the program increased test scores by 3 to 5 percentage points in the rst follow-up, and increased even more in the second follow-up, with the exception of mathematics. This indicates that program impacts persisted two years after the grant was disbursed to schools for girls. The eects we report for girls are all individually signicant, except for the PPVT and mathematics in the second follow up. However, we note that, based on the step-down p-values, the eects are not signicant in the second followup (albeit the sample used is smaller, since we could not use cohort 2 schools). While the individual tests score dierences between genders are not signicant, the dierence in the overall index is 15 As mentioned, several schools were missing at baseline. In Appendix Table 10 we show that missing schools at baseline are mainly in the South, and that they display worse student performance in the rst follow-up than comparable non-missing schools. It is noteworthy that they are not disproportionately control or treatment schools. 16 A similar analysis performed on the test results of students in 5th grade did not produce evidence of any program impacts for this set of students (see Appendix Table 11). 17 Table 3: Program Impacts on Grades 3 Test Scores by Gender, Ability, and Region French Math Oral Index PPVT Panel A: Beginning of Grade (First Follow-Up) Male 0.022 0.024 0.011 0.041 (0.017) (0.014) (0.017) (0.073) Female 0.037 ∗∗ † 0.031 ∗∗ † 0.047 ∗∗∗ † 0.217 ∗∗∗ † (0.016) (0.014) (0.017) (0.073) Male Control Mean (SD) 0.54 (0.25) 0.56 (0.24) 0.37 (0.22) 0.03 (0.97) Female Control Mean (SD) 0.53 (0.24) 0.53 (0.24) 0.33 (0.22) -0.13 (0.99) Low Ability 0.006 -0.007 0.025 -0.019 (0.018) (0.016) (0.018) (0.081) High Ability 0.027 0.029 ∗ 0.005 0.136 ∗ † (0.020) (0.016) (0.019) (0.075) Low Control Mean (SD) 0.47 (0.22) 0.43 (0.20) 0.25 (0.17) -0.46 (0.83) High Control Mean (SD) 0.62 (0.24) 0.68 (0.20) 0.48 (0.20) 0.49 (0.85) North 0.012 0.015 0.019 0.066 (0.016) (0.014) (0.016) (0.067) South 0.102 ∗∗∗ † 0.079 ∗∗∗ † 0.074 ∗∗∗ † 0.390 ∗∗∗ † (0.030) (0.027) (0.028) (0.123) North Control Mean (SD) 0.56 (0.24) 0.57 (0.23) 0.38 (0.22) 0.10 (0.95) South Control Mean (SD) 0.41 (0.23) 0.43 (0.22) 0.23 (0.19) -0.63 (0.88) Panel B: End of Grade (Second Follow-Up) Male 0.026 0.016 0.024 0.079 0.207 ∗ (0.018) (0.017) (0.022) (0.087) (0.115) Female 0.043 ∗∗ † 0.019 0.054 ∗∗ † 0.245 ∗∗ † 0.096 (0.019) (0.018) (0.023) (0.102) (0.128) Male Control Mean (SD) 0.67 (0.23) 0.69 (0.22) 0.46 (0.22) -0.01 (0.92) -0.07 (0.95) Female Control Mean (SD) 0.65 (0.24) 0.67 (0.23) 0.43 (0.23) -0.17 (1.03) -0.10 (1.01) Low Ability 0.027 0.001 0.036 0.081 (0.022) (0.022) (0.026) (0.116) High Ability 0.029 0.027 0.028 0.174 ∗ † (0.022) (0.019) (0.024) (0.096) Low Control Mean (SD) 0.60 (0.23) 0.60 (0.22) 0.34 (0.20) -0.45 (0.93) High Control Mean (SD) 0.75 (0.20) 0.78 (0.19) 0.56 (0.20) 0.39 (0.82) North 0.024 0.000 0.023 0.087 0.130 (0.018) (0.016) (0.020) (0.084) (0.097) South 0.079 ∗∗ † 0.084 ∗∗ † 0.105 ∗∗∗ † 0.450 ∗∗∗ † 0.181 (0.037) (0.033) (0.035) (0.161) (0.250) North Control Mean (SD) 0.69 (0.23) 0.71 (0.22) 0.48 (0.22) 0.07 (0.92) -0.21 (0.88) South Control Mean (SD) 0.58 (0.24) 0.57 (0.23) 0.31 (0.21) -0.67 (0.96) 0.43 (1.13) ∗∗ ∗ Notes: Standard errors are in parentheses and are adjusted for clustering. p < 0.05, ∗∗∗ p < 0.10, p < 0.01 correspond to p-values from the usual single-hypothesis tests. † corresponds signicance at the 10% level of Romano Wolf (2005) p-values from joint tests of French, mathematics, and oral (3 tests each, by row) or to the index alone. Conditioning variables: Grade, gender, household size, number of children, education of head, distance to school, wealth index, interview language, baseline scores, missing dummies. 18 signicant at least at the 10% level. There are also several education interventions that benet mostly girls. It is much less common to nd programs that aect boys alone. Some examples of (early childhood) interventions in devel- oped countries that produce larger cognitive and schooling eects in girls than boys are reviewed in Anderson (2008) (see also the results in Heckman et al. (2010), or Ramey and Campbell (2007), regarding education outcomes of these interventions). Similarly, Krueger (1999) also reports that the STAR class size experiment produce smaller short run impacts, but larger cumulative impacts for girls than boys, and Chetty et al. (2014) show slightly larger long term impacts of teacher quality on girls than boys. Although this was not directly an educational intervention (but which may have partly operated through access to better schools), the Moving to Opportunity exper- iment studied in Kling et al. (2007) also shows much stronger impacts for girls. In developing countries, several papers show stronger impacts of interventions on girls than boys (Kremer and Holla (2009)), although these concern primarily interventions that increase access to school. At baseline girls score between 10% - 20% of a standard deviation below boys in the cognitive tests we administer (not shown) and hence start from a lower base. However, this explanation of larger program impacts is hard to reconcile with the fact that, as we show below, eects are larger for those with higher baseline ability (and this is especially true for females). An alternative hypothesis would be that girls bring to elementary schools more discipline, patience, and higher levels of maturity overall than boys at a given age, which may make them better able to enjoy the benets of additional school resources, such as a better teacher, better training manuals, a library, and so on. The program also had a large impact for higher-ability students: the index of scores (column 4) increased by 0.14 standard deviations in the rst follow-up and 0.17 standard deviations in the second follow-up as a result of the program, though the coecients are only marginally signi- cant. This is consistent with the idea that investments in skills are complementary over time and hence will be more productive for those with high levels of skill to start with. There are several education interventions that share this characteristic. However, despite the large changes in the point estimates across ability groups, the dierences are not signicant. We now turn to dierences by North and South of the country, two very dierent regions. There are dramatic dierences in program impacts depending on whether the school is located in 19 the South of the country (which are poorer and have worse school results) or in the North. 17 In fact, if we focus on 3rd grade French scores, there are no statistically signicant impacts of the program in the North of the country, whereas in the South they are very large. For example, as a result of the program, students in southern schools are able to increase the proportion of correct answers by 10.2 percentage points, which is almost 0.5 of a standard deviation. These eects are qualitatively similar for other tests and persist through the end of the grade (second follow-up). When we examine all of the tests and correct the p-values for multiple testing, the impacts remain signicant despite the high number of hypotheses tested. The South-North dierences in estimates of the impact of school grants are striking as well as highly signicant overall for the rst follow up (p-value 2.1% for the index). It may be the case that the types of investments made in response to the grants varied by region and took dierent amounts of time to manifest themselves in test scores. In the remainder of the paper we examine whether there are dierences between what school principals, teachers, and parents did in response to the availability of school grants in each of these areas, which could help shed light on the sources of regional dierences in the impacts of the program on the performance of students. 5.4 Understanding Dierences Between South and North We start by examining baseline test performance dierences of third grade students between schools in the South and in the North. These are shown in Table 4, Panel A. Students in the southern schools perform worse on almost all tests than their counterparts in the North. For control schools in the rst follow-up, documented in Panel B, the dierences between the North and the South are even larger. As mentioned above, at baseline we were only able to survey a subsample of schools. The missing schools (recovered at follow-up) were, as far as we can see, balanced in their treatment and control status, but they were dierent from the sampled schools. In fact, as we report in the appendix (Appendix Table 13), among control schools, missing schools are worse than the non-missing schools on a number of time-invariant dimensions, as one might expect. Therefore, it is probably safe to say that, once we look at the schools in the follow-up which we are using to measure program impacts, the schools in the South show much lower test results than the schools in the North. 17 In Appendix Table 12 we show that the samples are balanced within each geographic region. 20 Table 4: Regional Dierences, Second-Third Grade South North Dierence Panel A: Test Scores at Beginning of Second Grade (Baseline) Percent Correct: French 0.430 0.420 0.010 (0.027) ∗∗ Percent Correct: Math 0.325 0.373 -0.048 (0.023) ∗∗∗ Percent Correct: Oral 0.154 0.242 -0.088 (0.016) ∗∗ Index Score (standardized) -0.239 0.049 -0.288 (0.112) Panel B: Test Scores at Beginning of Third Grade (First Follow-Up, Control Schools) ∗∗∗ Percent Correct: French 0.411 0.564 -0.153 (0.022) ∗∗∗ Percent Correct: Math 0.434 0.569 -0.136 (0.021) ∗∗∗ Percent Correct: Oral 0.233 0.383 -0.150 (0.022) ∗∗∗ Index Score (standardized) -0.629 0.100 -0.730 (0.100) Panel C: Household Characteristics (First Follow-Up, Control Schools) ∗∗∗ Household size 8.625 10.216 -1.591 (0.412) ∗ Number of children in household 5.050 5.551 -0.501 (0.276) ∗∗∗ Head has any education 0.550 0.401 0.149 (0.050) Percent of adult females with any education 0.261 0.224 0.038 (0.043) ∗∗∗ Wealth index -0.654 0.137 -0.792 (0.092) ∗∗ Interview conducted in French 0.175 0.090 0.085 (0.041) Panel D: Project Characteristics (Second Follow-Up, Treatment Schools) ∗∗∗ Months since project began 15.914 23.479 -7.564 (1.144) ∗∗∗ Students helped draft application 0.800 0.547 0.253 (0.082) Project included manuals 0.800 0.895 -0.095 (0.074) ∗∗ Project included computer materials 0.029 0.121 -0.092 (0.042) ∗∗ Project included teacher training 0.914 0.752 0.162 (0.062) ∗∗∗ Project included management training 0.629 0.368 0.261 (0.093) ∗∗∗ Project included building courses 0.971 0.821 0.151 (0.046) Project included improving general education 0.563 0.456 0.106 (0.100) Project included improving educational outputs 0.114 0.129 -0.015 (0.063) ∗∗∗ Amount spent on principal (1,000,000s CFA) 0.082 0.034 0.048 (0.014) Amount spent on teachers (1,000,000s CFA) 0.317 0.278 0.039 (0.058) ∗∗∗ Amount spent on management (1,000,000s CFA) 0.128 0.041 0.087 (0.022) ∗∗∗ Amount spent on students (1,000,000s CFA) 0.505 1.025 -0.520 (0.092) Each coecient reported is the dierence in test score between south and north (south-north). The mean test scores in the South for French, math, and oral at baseline are 0.400, 0.292, and 0.134 for females and 0.461, 0.358, and 0.173 for males, respectively, and at rst follow- up are 0.426, 0.430, and 0.215 for females and 0.457, 0.486, and 0.298 for males. Clustered standard errors in ∗ ∗∗ ∗∗∗ parentheses. p < 0.10, p < 0.05, p < 0.01 21 Panel C compares household characteristics of students in the South and in the North. Because of the missing schools at baseline, we take characteristics measured in the rst follow-up among students in the control schools. A few interesting patterns emerge. Households in the South are poorer but have fewer children and better educated heads (and more prominently so for the families of female students). Finally, Panel D considers the characteristics of projects being undertaken by schools with the school grant funds. This information comes from a survey conducted in treatment schools which asked principals about the project for which they got funding. We conducted two of these surveys, one at rst follow-up, and one at second follow-up. We report estimates from the second follow-up survey when, presumably, data about the project is more mature and complete. In the South, students were much more frequently named as participants in the drafting of the proposal. Although it is not clear what input students may have had, this could indicate that principals were more sensitive to the needs of the students in the South. It is also signicant that projects in the South started later. By the end of year 2 of the study projects in the North had been running 7.6 months longer than in the South. If results faded out quickly this could explain why we observe eects of the more recent projects than in the earlier projects but this is unlikely to be the case, given our previous results about the sustainability of program impacts (although those are not very precise). If, on the other hand, a project needed time before it started to inuence children's learning (as in the case of activities that take time, such as training a teacher, or building a library), we would expect larger impacts for more mature projects, which goes against what we nd in terms of the South - North comparison. Some of the most remarkable dierences relate to the components of the project. The schools in the North were more likely to have components involving the purchase of textbooks/manuals and in particular computer related materials, while schools in the South were much more likely to have components related to training of teachers, building courses, managerial training, spending on the principal and the teachers. At the same time the Northern schools reported more spending on students. Thus there are clear dierences in the characteristics of projects in schools in the North and the South, as stated by the principals of these schools. Schools in the South seem to be investing more in the teaching and management abilities of their human resources, while schools in the North invest more in equipment. This may well be a force behind the large dierences in program impacts in these two sets of schools. 22 Table 5 reports the impact of the program on principals' (panel A) and teachers' (panel B) behaviors. We present separate estimates of program impacts in the South and in the North, and test whether dierences in program impacts in these two areas are equal to zero (column 3). There are no broad impacts of the school grants on aspects of school infrastructure. This was expected because, as we mentioned above, the projects had to have an explicit pedagogical emphasis, which did not (in the government's denition) include physical infrastructure. However one aspect that can be considered infrastructure was very signicantly aected by school grants both in the North and in the South: the existence of a library in the school. While the impact is twice as large in the South as in the North, we cannot reject that the two impacts are statistically equal. In addition, schools in the North that received a school grant spent more money on electricity and water for the school. Regarding school materials and training, we see that the school grants caused an increase in books in the library in the North and an increase in the amount spent on manuals in both regions. In contrast, schools in the South spent substantially more in tutoring while both sets of schools increased spending on teacher training. All this is very much consistent with the way principals described the grant projects, as reported in Table 4. While the point estimates reveal dierences in direction in the North and South, it is dicult to be conclusive since none of the impacts are signicantly dierent between the two (except expenditure on electricity and water). It is also interesting that there was an increase in the number of students in the North, which is not matched by an equally large increase in the number of teachers, and which could lead to a dilution of treatment eects. In the South both these quantities go down, but not signicantly. Finally, school grants decreased teacher turnover, particularly in the South. Given that teachers are likely to be the most important input in the school production function, the fact that in the South the program signicantly aected the amount of training they got and how likely they were to remain in the school from one year to the next, is consistent with the nding of strong program impacts on student performance in this region of the country. Panel B shows program impacts on teacher and classroom characteristics as reported by the 3rd grade teacher in the rst follow-up. The number of manuals are not reported by the teacher as having increased signicantly either in the north or the south, despite the impact on manuals reported above, and measuring equipment in the South is reported to have increased as a response to the program, but not in the North. 23 Table 5: Program Impacts on School Characteristics by Region, First Follow-Up South North Dierence Panel A: School Characteristics Age of youngest infrastructure 1.135 0.298 0.837 (1.500) (0.928) (1.764) Number of teachers -0.975 0.823 -1.798 (1.085) (0.564) (1.223) ∗ Number of students -29.100 51.321 -80.421 (49.317) (29.009) (57.216) ∗∗ ∗∗ School has library 0.201 0.120 0.081 (0.086) (0.049) (0.099) ∗ Number of books in library 15.343 85.753 -70.410 (80.607) (44.284) (91.971) Amount spent on infrastructure 40.337 53.156 -12.819 (39.718) (40.678) (56.853) ∗ ∗∗ Amount spent on electricity/water -10.421 29.550 -39.972 (7.867) (15.735) (17.592) ∗∗ ∗∗ Amount spent on manuals 27.388 23.019 4.369 (11.112) (10.507) (15.293) ∗ ∗ Amount spent on tutoring 50.230 13.512 36.718 (29.365) (7.731) (30.366) ∗∗ ∗ Amount spent on teacher training 30.487 27.856 2.630 (13.825) (14.315) (19.901) ∗∗ Teacher composition changed in past year -0.201 -0.064 -0.138 (0.086) (0.042) (0.096) Percent teachers female -0.031 0.012 -0.043 (0.040) (0.025) (0.048) Average teacher age 0.273 0.315 -0.041 (0.772) (0.432) (0.885) Percent of teachers with Baccalaureate -0.043 -0.008 -0.035 (0.049) (0.025) (0.056) Average teacher experience 0.224 0.098 0.126 (0.583) (0.396) (0.705) Panel B: Third Grade Teacher Characteristics Minutes spent preparing lesson 3.226 1.894 1.332 (2.941) (2.061) (3.591) Number of manuals 10.475 4.990 5.486 (7.647) (4.980) (9.125) ∗∗ ∗∗ Number of measuring instruments 0.805 -0.039 0.844 (0.338) (0.208) (0.397) ∗∗∗ ∗∗ Times per day ask for silence -5.060 -0.774 -4.287 (1.847) (0.699) (1.975) Times per day punish a student 0.263 -0.242 0.505 (0.792) (0.268) (0.837) Column 3 reports the dierence between the impact of the program in the south ∗ ∗∗ and the north. Clustered standard errors in parentheses, p < 0.10, p < 0.05, ∗∗∗ p < 0.01. Amounts in CFA. 24 Finally, the behavior of students is reported to have improved considerably in the South, but not in the North: treatment aected how often teachers had to ask for silence during the day in southern schools. This corresponds to what we found before when we compared the reports of 3rd and 5th grade teacher (see Appendix Table 9). Student behavior improved among 3rd graders but not among 5th graders, which is exactly what happened in terms of test results. The improved behavior, may be an outcome driven by improved teaching more generally, which underlies the improved scores. We also examined the impact of the program on household behaviors in the South and in the North, which is shown in Appendix Table 13. However, there are no noteworthy impacts of the program on household behaviors, and they do not seem to vary with the region of the country where households are located. The resulting picture from this section is mixed. There are several dierences between the South and the North: households are poorer yet more educated in the south, and projects in the south tended to focus on training and human resources and less on information technology. However, when we look at the impact of the grant on how schools use the money, there are no obviously signicant dierences between the north and south. Nevertheless, the improvement in behavior in the south is remarkable and one can expect that fewer classroom disruptions - perhaps due to teacher training - can help learning. 6 Conclusions There is substantial debate about the importance of resources in schools for student performance. More often than not, increases in school resources are not associated with increases in student performance, although much of the research concerns developed countries and the US in particular. One reason may be that central education authorities lack an understanding of the needs of schools. Principals, on the other hand, could have better information and could target resources more eciently. The danger is that incentives to improve student performance may vary across school principals and there may be several sources of local pressures for alternative uses of these funds. This paper studies the impact of a school grant program on student performance and on potential mechanisms that could underlie the change in school performance induced by such a program. We nd impacts of school grants on student learning, especially on girls with high ability 25 levels at baseline. Notably, these impacts persist over the two years of our evaluation. However, these impacts occur only in third grade, as opposed to later grades, and they are stronger in the South of the country. These results suggest that resources distributed in a decentralized manner can have positive impacts on students. While it is dicult to explain the grade dierential in program impacts, one conjecture is that principals focus on earlier grades because they see there the foundations for future learning and indeed they are encouraged to do so. We can say a bit more, however, about the North-South dierence in program impacts, based on how we see principals spending their resources. While schools in the North emphasized information technology (IT) and other educational materials, schools in the South emphasized human resources, namely through the training of teachers and school administrators. Our results suggest that the latter type of investments, although perhaps less visible to the local community (and therefore less preferred by say, local politicians, or even local school authorities), is likely to be more eective than the former type of investments. This result is also consistent with the idea that the main determinant of school quality is teachers, not equipment, as suggested by the most recent literature on this topic (e.g., Hanushek and Rivkin (2006)). 26 References Anderson, M. L. (2008). Multiple inference and gender dierences in the eects of early interven- tion: A reevaluation of the abecedarian, perry preschool, and early training projects. Journal of the American statistical Association, 103(484). 5.3 ANSD (2007). La pauvrete au senegal: de la devaluation de 1994 a 2001-2002. Technical report, Agence Nationale de la Statistique et de la Demographie. 5.3 Banerjee, A. V., Cole, S., Duo, E., and Linden, L. (2007). Remedying education: Evidence from two randomized experiments in india. The Quarterly Journal of Economics, 122(3):12351264. 1 Blimpo, M., Evans, D., and Lahire, N. (2014). School-based management and educational out- comes, lessons from a randomized eld experiment. 1 Bruns, B., Filmer, D., and Patrinos, H. A. (2011). Making schools work: New evidence on ac- countability reforms. World Bank Publications. 5.1 Chetty, R., Friedman, J. N., and Rocko, J. E. (2014). Measuring the impacts of teachers ii: Teacher value-added and student outcomes in adulthood. The American Economic Review, 104(9):26332679. 5.3 Galiani, S., Gertler, P., and Schargrodsky, E. (2008). School decentralization: Helping the good get better, but leaving the poor behind. Journal of Public Economics, 92(10):21062120. 1 Galiani, S. and Perez-Truglia, R. (2013). School management in developing countries. Education Policy in Developing Countries, 4:193. 1 Glewwe, P., Hanushek, E. A., Humpage, S., and Ravina, R. (2013). School resources and educa- tional outcomes in developing countries: A review of the literature from 1990 to 2010. Education Policy in Developing Countries, 4:13. 1 Glewwe, P. and Kremer, M. (2006). Schools, teachers, and education outcomes in developing countries. Handbook of the Economics of Education, 2:9451017. 1 Glewwe, P., Kremer, M., and Moulin, S. (2009). Many children left behind? textbooks and test scores in kenya. American Economic Journal: Applied Economics, 1(1):112135. 1 Glewwe, P., Kremer, M., Moulin, S., and Zitzewitz, E. (2004). Retrospective vs. prospective analyses of school inputs: the case of ip charts in kenya. Journal of Development Economics, 74(1):251268. 1 Hanushek, E. A. (2006). School resources. Handbook of the Economics of Education, 2:865908. 5.1 Hanushek, E. A., Link, S., and Woessmann, L. (2013). Does school autonomy make sense every- where? panel estimates from pisa. Journal of Development Economics, 104:212232. 1 Hanushek, E. A. and Rivkin, S. G. (2006). Teacher quality. Handbook of the Economics of Education, 2:10511078. 6 27 Hanushek, E. A. and Woessmann, L. (2010). The economics of international dierences in educa- tional achievement. Technical report, National Bureau of Economic Research. 1 Heckman, J., Moon, S., Pinto, R., Savelyev, P., and Yavitz, A. (2010). Analyzing social exper- iments as implemented: A reexamination of the evidence from the highscope perry preschool program. Quantitative economics, 1(1):146. 5.3 Kling, J. R., Liebman, J. B., and Katz, L. F. (2007). Experimental analysis of neighborhood eects. Econometrica, 75(1):83119. 5.3 Kremer, M. and Holla, A. (2009). Improving education in the developing world: what have we learned from randomized evaluations? Annual Review of Economics, 1:513. 5.3 Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly Journal of Economics, pages 497532. 5.3 Murnane, R. J. and Ganimian, A. J. (2014). Improving educational outcomes in developing countries: Lessons from rigorous evaluations. Technical report, National Bureau of Economic Research. 1 PASEC (2007). Evaluation pasec senegal. Technical report, Program on the Analysis of Education Systems. 1 Ramey, C. T. and Campbell, F. A. (2007). Carolina abecedarian project. Technical report, http://phoenixday.org/wp-content/uploads/2012/04/res-Abecedarian-studies-full.pdf. 5.3 Reinikka, R. and Svensson, J. (2004). Local capture: evidence from a central government transfer program in uganda. The Quarterly Journal of Economics, 119(2):679705. 5.1 Romano, J. P. and Wolf, M. (2005). Stepwise multiple testing as formalized data snooping. Econometrica, 73(4):12371282. 1 World Bank (2013). Projet d'amelioration de la qualite et de l'equite de l'education de base. Technical report. 1 28 Online Appendix: Not for Publication Figure 4: Location of Schools in Sample Note: The allocation of schools to cohorts is the result of the randomization of the sequence with which grants were allocated. Cohort 1 were the rst to receive grants, followed by cohort 2 and then cohort 3. A. Test Score Distributions In this appendix we document the way the distribution of test scores changed over time and as a result of the experiment. Figure 5 shows the densities of scores in the French, mathematics, and oral tests taken at the beginning of 2nd grade at baseline, the beginning of 3rd grade at rst follow-up, and the end of 3rd grade at second follow-up. The dashed lines in the gures correspond to the control schools while the solid lines are the treatment schools. 29 Figure 5: Distribution of Second/Third Grade Scores Mathematics French Oral 3 2 2 1.5 1.5 2 1 1 1 .5 .5 0 0 0 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 Percentage of correct answers Percentage of correct answers Percentage of correct answers Control, Baseline Treatment, Baseline Control, First Followup Treatment, First Followup Control, Second Followup Treatment, Second Followup As expected given earlier results, the thinnest, lightest lines show that the distributions of test scores are balanced for treatment and control at baseline (beginning of second grade). The lines with a medium level of thickness correspond to the densities of scores at the beginning of third grade at rst follow-up. Although the densities of test scores are balanced at baseline, they are dierent at rst follow-up, indicating that the program has an impact on this group of children. Notice also that students in the rst follow-up have a much higher (and statistically signicant) proportion of correct answers than at baseline (recall that they take the same test in both occasions). Given that the baseline takes place at the beginning of second grade, and the rst follow-up at the beginning of third grade, this is expected if schools provide knowledge about the test material. The thickest lines in Figure 5 correspond to the densities of scores for the second follow-up, conducted at the end of third grade. We use the same test as in the previous two waves. Like before, the evolution of test scores shows (statistically signicant) learning during third grade. Program impacts remain strong towards the end of the third grade, roughly two years after the funds were disbursed to the treatment schools. 30 Figure 6: Distribution of Fourth/Fifth Grade Scores Mathematics French Oral 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 1 .5 .5 .5 0 0 0 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 Percentage of correct answers Percentage of correct answers Percentage of correct answers Control, Baseline Treatment, Baseline Control, First Followup Treatment, First Followup Control, Second Followup Treatment, Second Followup Analogously, in Figure 6, we display the densities of scores at rst and second follow-up for students who were in fourth grade at baseline (and in fth grade in the subsequent year). The dierences between treatment and control schools are smaller than the ones we document for grade 3. In the main body of the paper we show that there is no statistically signicant impact of school grants on fth grade test scores. 31 B. Other Appendix Tables and Figures Figure 7: Distributional Impacts on Test Scores in Fifth Grade .1 .05 Beginning of Grade 5 End of Grade 5 French 0 .05-.05 .1 Math 0 -.05 .1 .05 Oral 0 -.05 .5 .25 Index 0 -.25 0 2 4 6 8 10 Decile .5 .25 PPVT 0 -.25 0 2 4 6 8 10 Decile Notes: Point estimates from a quantile regression at each decile with 95% condence intervals. Index and PPVT coecients are standardized. 32 Table 6: First Follow-up Descriptive Statistics and Balance Grade 2 Grade 4 Control Treat-Control Control Treat-Control Panel A: Household Characteristics Household size 9.92 (4.41) 0.16 (0.34) 10.04 (4.31) -0.46 (0.31) Number of children in household 5.46 (2.80) 0.24 (0.21) 5.69 (2.79) -0.32 (0.21) Head has any education 0.43 (0.50) -0.00 (0.04) 0.44 (0.50) -0.03 (0.04) Percent adult females with any education 0.23 (0.34) 0.00 (0.03) 0.24 (0.36) 0.02 (0.03) ∗∗ Distance to school (km) 0.61 (0.63) -0.09 (0.04) 0.59 (0.60) 0.01 (0.04) Home has electricity 0.44 (0.50) 0.05 (0.04) 0.44 (0.50) 0.02 (0.04) Home has modern toilet 0.37 (0.48) -0.02 (0.04) 0.35 (0.48) -0.03 (0.04) Land owned (hectares) 2.59 (4.63) 0.53 (0.48) 2.33 (3.49) 0.09 (0.29) Interview conducted in French 0.11 (0.31) 0.01 (0.02) 0.09 (0.29) -0.00 (0.02) Panel B: School Characteristics Distance to nearest city (km) 18.35 (24.78) 0.10 (2.10) 17.41 (22.61) 0.91 (2.07) Population in locality (100,000s) 0.92 (2.74) 0.27 (0.31) 0.93 (2.75) 0.30 (0.32) Locality has health center 0.71 (0.45) 0.03 (0.04) 0.71 (0.45) 0.03 (0.04) School located in South 0.19 (0.40) 0.00 (0.04) 0.20 (0.40) -0.00 (0.04) Notes: Grouped columns 2 and 4 report means and standard deviations of baseline characteristics in control schools for grades 2 and 4, respectively. Grouped columns 3 and 5 report dierences in characteristis between ∗ ∗∗ treatment and control schools at baseline and their standard errors, clustered by school. p < 0.10, p < 0.05, ∗∗∗ p < 0.01 Table 7: Summary Statistics of Nationally Representative Sample of 2nd and 5th Grade Students Mean Standard Deviation Panel A: School and Teacher Characteristics Locality has health center 0.809 (0.393) School has electricity 0.359 (0.480) Number of teachers 9.809 (5.007) Number of students 500.683 (386.209) School has library 0.217 (0.412) Percent of teachers with Baccalaureate 0.474 (0.499) Panel B: Household Characteristics Father literate 0.585 (0.493) Mother literate 0.355 (0.478) House has electricity 0.595 (0.491) House has TV 0.598 (0.490) House has modern toilet 0.367 (0.482) Notes: Weighted means and standard deviations (in parentheses) shown. Source: PASEC 2006. 33 Table 8: Program Impacts on Grades 3 and 5 Test Scores, No Controls French Math Oral Index PPVT Panel A: Beginning of Grade (First Follow-Up) Overall 0.021 0.018 0.017 0.078 (0.014) (0.014) (0.015) (0.068) Observations 5368 5361 2732 2679 Control Mean (SD) 0.51 (0.23) 0.49 (0.23) 0.50 (0.27) Grade 3 0.029 ∗ 0.026 0.032 ∗ 0.138 ∗ (0.017) (0.016) (0.018) (0.081) Observations 2720 2718 1385 1350 Control Mean (SD) 0.53 (0.25) 0.54 (0.24) 0.35 (0.22) Grade 5 0.011 0.009 0.004 0.016 (0.015) (0.016) (0.018) (0.081) Observations 2648 2643 1347 1329 Control Mean (SD) 0.48 (0.20) 0.44 (0.20) 0.64 (0.24) Panel B: End of Grade (Second Follow-Up) Overall 0.019 0.003 0.026 0.091 0.046 (0.015) (0.015) (0.017) (0.077) (0.085) Observations 3338 3327 1686 1620 1122 Control Mean (SD) 0.63 (0.22) 0.62 (0.22) 0.58 (0.26) Grade 3 0.033 ∗ 0.014 0.045 ∗∗ 0.176 ∗ 0.170 ∗ (0.019) (0.018) (0.022) (0.096) (0.098) Observations 1732 1721 853 826 566 Control Mean (SD) 0.66 (0.23) 0.68 (0.23) 0.45 (0.23) Grade 5 0.002 -0.011 0.007 0.002 -0.082 (0.016) (0.018) (0.018) (0.093) (0.097) Observations 1606 1606 833 794 556 Control Mean (SD) 0.59 (0.20) 0.57 (0.20) 0.72 (0.21) ∗ ∗∗ Notes: Standard errors are in parentheses and are adjusted for clustering. p < 0.10, ∗∗∗ p < 0.05, p < 0.01 correspond to p-values from the usual single-hypothesis tests. 34 Table 9: Program Impacts on Teacher and Household Outcomes, First Follow-Up Grade 3 Grade 5 Dierence Panel A: Teacher Outcomes Teacher has Baccalaureate -0.052 0.018 -0.070 (0.047) (0.046) (0.064) ∗∗ ∗∗ Teacher had training in past 5 years 0.083 0.101 -0.018 (0.039) (0.047) (0.056) Minutes spent preparing lesson 2.115 0.061 2.054 (1.726) (1.614) (2.023) Number of manuals 6.209 6.493 -0.284 (4.231) (4.964) (5.296) Number of measuring instruments 0.138 0.228 -0.090 (0.179) (0.193) (0.210) ∗∗ Number of chairs 0.018 0.071 -0.053 (0.038) (0.035) (0.041) Teaches with books 0.004 0.012 -0.008 (0.013) (0.012) (0.015) Teaches with computers -0.005 0.034 -0.038 (0.019) (0.024) (0.029) ∗∗ ∗∗∗ Times per day ask for silence -1.638 1.109 -2.747 (0.686) (0.830) (0.931) ∗∗ ∗∗ Times per day punish a student -0.126 0.741 -0.868 (0.277) (0.370) (0.407) Number of students who left in past year 0.303 1.063 -0.760 (0.730) (0.800) (1.095) Number of students who joined in past year 0.262 -0.030 0.292 (0.212) (0.185) (0.194) Panel B: Household Outcomes Days of school missed last week 0.106 0.010 0.096 (0.075) (0.055) (0.086) Student works after school -0.008 -0.016 0.008 (0.012) (0.014) (0.014) Parent involved in school 0.037 0.026 0.011 (0.038) (0.038) (0.039) Expenditure on uniform (1,000s CFA) -0.003 -0.070 0.067 (0.070) (0.056) (0.055) Expenditure on tuition (1,000s CFA) 0.303 0.035 0.269 (0.303) (0.164) (0.313) Expenditure on supplies (1,000s CFA) -0.162 -0.111 -0.050 (0.232) (0.320) (0.302) Student has tutor -0.027 -0.009 -0.017 (0.026) (0.029) (0.031) Expenditure on children (1,000s CFA) 0.303 -0.175 0.477 (0.611) (0.662) (0.683) Column 3 reports the dierence in impacts between grade 3 and grade 5. Clustered ∗ ∗∗ ∗∗∗ standard errors in parentheses. p < 0.10, p < 0.05, p < 0.01 35 Table 10: Characteristics of Schools by Baseline Missing Status, First Follow-Up Not Missing Missing Dierence Panel A: Treatment Status Treated 0.337 0.312 0.025 (0.473) (0.464) (0.089) Panel B: Control School Characteristics ∗∗ School located in South 0.186 0.439 -0.253 (0.389) (0.497) (0.116) ∗∗∗ School located in Rural Area 0.736 1.000 -0.264 (0.441) (0.000) (0.025) ∗∗∗ Locality Population (100,000s) 0.981 0.024 0.957 (2.827) (0.026) (0.173) ∗∗∗ Number of Teachers 9.946 7.090 2.856 (4.954) (4.144) (0.991) ∗∗∗ Number of Pupils 341.617 238.326 103.291 (254.680) (152.758) (37.826) ∗∗∗ Percent Correct: French, Grade 3 0.540 0.418 0.123 (0.245) (0.239) (0.045) ∗∗∗ Percent Correct: Math, Grade 3 0.550 0.408 0.142 (0.236) (0.228) (0.042) ∗∗∗ Percent Correct: Oral, Grade 3 0.361 0.212 0.149 (0.221) (0.178) (0.037) ∗∗∗ Percent Correct: Index, Grade 3 -0.012 -0.666 0.653 (1.009) (0.939) (0.203) ∗∗∗ Percent Correct: French, Grade 5 0.483 0.379 0.104 (0.201) (0.190) (0.032) ∗∗∗ Percent Correct: Math, Grade 5 0.441 0.342 0.099 (0.204) (0.195) (0.037) Percent Correct: Oral, Grade 5 0.648 0.572 0.076 (0.236) (0.257) (0.050) ∗∗∗ Percent Correct: Index, Grade 5 0.026 -0.483 0.509 (0.943) (0.937) (0.179) Columns 1 and 2 report means and standard deviations (in parentheses) for schools that were not missing at baseline and missing at baseline, respectively. Column 3 reports the dierence in means between not missing and missing schools, and clustered ∗ standard errors (at the school level) in parentheses. p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01 36 Table 11: Program Impacts on Grades 5 Test Scores by Gender, Ability, and Region French Math Oral Index PPVT Panel A: Beginning of Grade (First Follow-Up) Male 0.014 0.008 0.020 0.043 (0.012) (0.013) (0.015) (0.064) Female 0.009 0.013 -0.005 0.009 (0.013) (0.013) (0.015) (0.061) Male Control Mean (SD) 0.49 (0.20) 0.46 (0.21) 0.66 (0.24) 0.07 (1.02) Female Control Mean (SD) 0.46 (0.20) 0.41 (0.20) 0.63 (0.23) -0.09 (0.96) Low Ability 0.009 -0.009 0.003 -0.016 (0.012) (0.013) (0.019) (0.066) High Ability 0.000 0.013 0.017 0.039 (0.016) (0.018) (0.013) (0.071) Low Control Mean (SD) 0.35 (0.14) 0.33 (0.15) 0.51 (0.22) -0.63 (0.76) High Control Mean (SD) 0.62 (0.16) 0.56 (0.18) 0.79 (0.13) 0.72 (0.66) North 0.017 0.014 -0.002 0.029 (0.011) (0.013) (0.013) (0.057) South -0.021 -0.010 0.049 0.009 (0.026) (0.030) (0.032) (0.130) North Control Mean (SD) 0.50 (0.20) 0.46 (0.20) 0.68 (0.22) 0.13 (0.96) South Control Mean (SD) 0.37 (0.17) 0.33 (0.18) 0.52 (0.25) -0.55 (0.94) Panel B: End of Grade (Second Follow-Up) Male -0.001 -0.013 0.019 0.046 -0.029 (0.014) (0.015) (0.018) (0.081) (0.123) Female 0.008 -0.003 -0.006 -0.022 -0.095 (0.013) (0.014) (0.018) (0.072) (0.127) Male Control Mean (SD) 0.60 (0.20) 0.59 (0.20) 0.73 (0.20) 0.06 (0.99) 0.05 (1.02) Female Control Mean (SD) 0.59 (0.19) 0.54 (0.21) 0.71 (0.21) -0.06 (0.95) 0.03 (1.01) Low Ability -0.002 -0.020 -0.032 -0.120 (0.015) (0.016) (0.023) (0.087) High Ability 0.006 -0.007 0.012 0.016 (0.014) (0.018) (0.015) (0.079) Low Control Mean (SD) 0.49 (0.17) 0.46 (0.18) 0.63 (0.21) -0.53 (0.86) High Control Mean (SD) 0.71 (0.15) 0.68 (0.17) 0.83 (0.12) 0.62 (0.65) North 0.003 -0.009 0.004 0.004 -0.014 (0.013) (0.014) (0.015) (0.067) (0.099) South -0.002 -0.010 0.020 0.046 -0.181 (0.026) (0.028) (0.036) (0.144) (0.259) North Control Mean (SD) 0.62 (0.19) 0.60 (0.19) 0.74 (0.19) 0.11 (0.92) -0.06 (0.94) South Control Mean (SD) 0.49 (0.20) 0.45 (0.20) 0.65 (0.24) -0.47 (1.04) 0.44 (1.18) Notes: Standard errors are in parentheses and are adjusted for clustering. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01 correspond to p-values from the usual single-hypothesis tests. † corresponds signicance at the 10% level of Romano Wolf (2005) p-values from joint tests of French, mathematics, and oral (3 tests each, by row) or to the index alone. Conditioning variables: Grade, gender, household size, number of children, education of head, distance to school, wealth index, interview language, baseline scores, missing dummies. 37 Table 12: Baseline Descriptive Statistics and Balance (Grade 2), by Region North South Control Treat-Control Control Treat-Control Panel A: Test Scores Percent Correct: French 0.42 (0.21) 0.01 (0.02) 0.46 (0.26) -0.08 (0.05) Percent Correct: Math 0.37 (0.23) 0.00 (0.02) 0.33 (0.23) -0.02 (0.05) Percent Correct: Oral 0.24 (0.17) 0.01 (0.02) 0.14 (0.14) 0.03 (0.03) Index Score (standardized) 0.04 (0.98) 0.04 (0.09) -0.17 (0.96) -0.19 (0.23) Panel B: Household Characteristics Days of school missed last week 0.18 (0.91) 0.06 (0.08) 0.10 (0.55) 0.10 (0.13) Student works after school 0.01 (0.11) -0.00 (0.01) 0.00 (0.00) 0.04 (0.02) Household size 9.14 (4.03) 0.11 (0.36) 9.79 (4.14) -0.49 (0.69) Number of children in household 5.13 (2.56) 0.06 (0.23) 5.80 (2.76) -0.46 (0.47) Head has any education 0.58 (0.49) 0.00 (0.04) 0.68 (0.47) -0.10 (0.09) Percent of adult females with any education 0.37 (0.41) -0.03 (0.03) 0.35 (0.39) -0.01 (0.07) Distance to school (km) 0.68 (0.86) -0.00 (0.07) 0.88 (1.06) -0.38∗∗∗ (0.12) Parent involved in school 0.41 (0.49) 0.07 (0.04) 0.25 (0.43) 0.17∗ (0.08) Expenditure on household food (1,000s CFA) 23.56 (16.04) 1.19 (1.29) 14.13 (9.22) 1.25 (1.87) Expenditure on uniform (1,000s CFA) 2.26 (1.10) 0.10 (0.41) 3.75 (1.04) -0.72 (0.70) Expenditure on tuition (1,000s CFA) 1.08 (1.17) -0.00 (0.10) 1.19 (1.22) -0.01 (0.23) Expenditure on supplies (1,000s CFA) 4.10 (6.14) -0.46 (0.34) 2.71 (1.95) -0.05 (0.35) Student has tutor 0.18 (0.38) -0.01 (0.03) 0.04 (0.18) 0.00 (0.03) Home has electricity 0.52 (0.50) 0.02 (0.05) 0.25 (0.43) 0.08 (0.09) Home has modern toilet 0.60 (0.49) -0.01 (0.04) 0.29 (0.46) -0.04 (0.08) Land owned (hectares) 2.42 (3.53) 0.27 (0.52) 2.15 (3.18) 1.15 (0.99) Interview conducted in French 0.10 (0.30) -0.03 (0.02) 0.22 (0.42) -0.08 (0.06) Panel C: School and Teacher Characteristics Distance to nearest city (km) 18.83 (26.53) 0.08 (2.56) 16.38 (16.54) -0.87 (3.10) Locality population (100,000s) 1.67 (4.83) 0.04 (0.56) 0.14 (0.37) -0.04 (0.07) Locality has health center 0.70 (0.46) 0.03 (0.05) 0.74 (0.44) -0.01 (0.10) School has Electricity 0.58 (0.49) 0.02 (0.05) 0.50 (0.50) -0.01 (0.11) Number of Teachers 9.53 (4.88) 0.61 (0.56) 10.35 (5.29) -0.30 (1.23) Number of Pupils 335.33 (252.92) 36.62 (28.67) 366.65 (248.77) -8.09 (56.47) School has library 0.24 (0.43) 0.07 (0.05) 0.07 (0.25) 0.10 (0.08) Number of computers 1.44 (4.48) -0.03 (0.46) 0.58 (3.87) 0.06 (0.66) Number of manuals in classroom 61.41 (45.68) 5.07 (5.15) 53.02 (42.26) -5.60 (9.26) Percent teachers female 0.33 (0.23) 0.01 (0.03) 0.24 (0.22) -0.01 (0.04) Average teacher age 33.16 (4.45) -0.16 (0.44) 32.93 (3.18) -0.00 (0.77) Percent teachers with Baccalaureate 0.41 (0.23) -0.01 (0.03) 0.42 (0.21) -0.05 (0.05) Average teacher experience 6.57 (3.90) 0.17 (0.41) 6.49 (2.57) -0.34 (0.57) Percent teachers with training in past 5 years 0.50 (0.50) 0.10∗ (0.05) 0.37 (0.48) 0.09 (0.12) Percent of principals with Baccalaureate 0.74 (0.44) -0.02 (0.05) 0.73 (0.44) -0.19∗ (0.11) Notes: Grouped columns 2 and 4 report means and standard deviations of baseline characteristics in control schools for grades 2 and 4, respectively. Grouped columns 3 and 5 report dierences in characteristis between treatment and control schools at baseline and their standard errors, clustered by school. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01 38 Table 13: Program Impacts on Grade 3 Household Characteristics by Region, First Follow-Up South North Dierence Student works after school 0.015 -0.013 0.017 (0.025) (0.014) (0.025) ∗∗ Days of school missed last week 0.345 0.049 0.391 (0.173) (0.082) (0.291) Parent involved in school -0.039 0.057 -0.017 (0.073) (0.043) (0.117) Expenditure on uniform (1,000s CFA) 0.008 -0.010 -0.015 (0.257) (0.059) (0.254) Expenditure on tuition (1,000s CFA) 1.321 0.056 1.230 (1.431) (0.117) (1.427) Expenditure on supplies (1,000s CFA) -0.129 -0.155 -0.015 (0.476) (0.261) (0.640) Student has tutor -0.018 -0.027 0.014 (0.029) (0.031) (0.048) Notes: Columns 1 and 2 report program impacts in the South and North, and Column 3 reports the dierence in program impacts between South and North. ∗ ∗∗ ∗∗∗ Clustered standard errors in parentheses. p < 0.10, p < 0.05, p < 0.01 39 Table 14: Student Test Score Sample Sizes and Attrition Grade 3 Grade 5 French Math Oral French Math Oral Baseline Sample Size 2722 2752 1388 2724 2726 1362 First Followup Sample Size 2720 2718 1385 2648 2643 1347 New Observations 322 299 177 262 261 155 Total Attrition 324 333 180 338 344 170 % Attrition 0.119 0.121 0.130 0.124 0.126 0.125 % Attrition, Treated 0.118 0.117 0.113 0.116 0.116 0.114 % Attrition, Control 0.119 0.123 0.138 0.128 0.131 0.130 Second Followup Sample Size 1732 1721 853 1606 1606 833 Total Attrition* 290 301 208 355 357 186 % Attrition* 0.157 0.162 0.222 0.197 0.197 0.206 % Attrition*, Treated 0.160 0.165 0.230 0.204 0.206 0.224 % Attrition*, Control 0.155 0.159 0.215 0.189 0.189 0.188 Observed in All Waves 1464 1461 709 1396 1392 696 Notes: Attrition in the second follow-up is based on cohorts 1 and 3, since cohort 2 schools were dropped in the second follow-up. A student in the second follow-up has attrited if they have a baseline test score but not a second follow-up test score (regardless of their status in the rst follow-up). Notes: Attrition in the second follow-up is based on cohorts 1 and 3, since cohort 2 schools were dropped in the second follow-up. A student in the second follow-up has attrited if they have a baseline test score but not a second follow-up test score (regardless of their status in the rst follow-up). 40 Table 15: Dierence in Baseline Characteristics among Non-Attriters Grade 2 Grade 4 1st Followup 2nd Followup 1st Followup 2nd Followup Panel A: Test Scores Percent Correct: French -0.005 (0.018) 0.006 (0.022) 0.000 (0.014) 0.001 (0.016) Percent Correct: Math -0.005 (0.018) -0.000 (0.021) -0.003 (0.016) -0.004 (0.018) Percent Correct: Oral 0.012 (0.015) 0.009 (0.018) -0.007 (0.019) -0.006 (0.023) Index Score (standardized) -0.020 (0.095) 0.052 (0.114) -0.046 (0.083) -0.029 (0.098) Panel B: Household Characteristics Distance to nearest city (km) 0.347 (2.203) -0.513 (2.436) 0.035 (2.341) -0.985 (2.579) Locality population (100,000s) -0.063 (0.407) 0.160 (0.442) 0.142 (0.473) 0.295 (0.500) Locality has health center 0.032 (0.044) 0.076 (0.052) 0.032 (0.043) 0.069 (0.052) School located in South -0.016 (0.038) -0.024 (0.046) -0.021 (0.037) -0.040 (0.044) School has Electricity 0.021 (0.049) -0.019 (0.056) 0.037 (0.049) 0.003 (0.057) Number of Teachers 0.464 (0.515) 0.636 (0.587) 0.534 (0.515) 0.708 (0.609) Number of Pupils 33.657 (25.444) 43.355 (28.603) 31.181 (26.044) 47.491 (30.330) School has library 0.085∗∗ (0.043) 0.087∗ (0.047) 0.094∗∗ (0.044) 0.079 (0.049) Number of computers -0.000 (0.397) -0.144 (0.485) 0.052 (0.409) -0.170 (0.515) Number of manuals in classroom 3.810 (4.564) 3.572 (5.758) 6.530 (5.498) 10.830∗ (6.374) Percent teachers female 0.007 (0.023) 0.028 (0.026) 0.012 (0.023) 0.032 (0.026) Average teacher age -0.126 (0.394) -0.056 (0.459) 0.051 (0.392) 0.020 (0.454) Percent teachers with Baccalaureate -0.012 (0.023) -0.016 (0.027) -0.017 (0.023) -0.017 (0.027) Average teacher experience 0.085 (0.345) 0.138 (0.394) 0.220 (0.355) 0.168 (0.404) Percent teachers with training in past 5 years 0.092∗ (0.050) 0.071 (0.058) 0.012 (0.050) 0.024 (0.058) Percent principals with Baccalaureate -0.034 (0.045) -0.022 (0.052) -0.037 (0.045) -0.026 (0.053) Panel C: School and Teacher Characteristics Days of school missed last week 0.084 (0.077) 0.068 (0.109) -0.078∗ (0.042) -0.139∗∗ (0.058) Student works after school 0.009 (0.009) 0.009 (0.011) -0.015∗ (0.008) -0.025∗ (0.013) Household size -0.126 (0.335) 0.498 (0.373) 0.217 (0.356) 0.111 (0.430) Number of children in household -0.098 (0.214) 0.125 (0.242) 0.249 (0.239) 0.159 (0.289) Head has any education -0.030 (0.039) -0.023 (0.047) -0.077∗ (0.040) -0.067 (0.048) Percent of adult females with any education -0.016 (0.034) -0.010 (0.040) -0.010 (0.030) 0.002 (0.036) Distance to school (km) -0.093 (0.064) -0.125 (0.076) -0.020 (0.134) -0.143 (0.239) Parent involved in school 0.070∗ (0.040) 0.081∗ (0.048) -0.069∗ (0.040) -0.066 (0.049) Expenditure on household food (1,000s CFA) 0.948 (1.236) 2.343 (1.449) 1.216 (1.286) 0.372 (1.575) Expenditure on uniform (1,000s CFA) 0.226 (0.437) 0.033 (0.593) -0.124 (0.336) -0.234 (0.410) Expenditure on tuition (1,000s CFA) 0.029 (0.095) -0.028 (0.119) 0.019 (0.086) -0.071 (0.121) Expenditure on supplies (1,000s CFA) -0.509 (0.314) -0.529 (0.512) -0.362 (0.287) -0.480 (0.396) Student has tutor -0.022 (0.028) -0.005 (0.030) 0.006 (0.030) -0.004 (0.034) Home has electricity 0.043 (0.045) 0.041 (0.053) 0.052 (0.045) 0.048 (0.055) Home has modern toilet -0.007 (0.042) -0.015 (0.050) 0.016 (0.045) 0.004 (0.053) Land owned (hectares) 0.293 (0.504) 0.730 (0.561) -0.214 (0.322) -0.155 (0.395) Interview conducted in French -0.041∗ (0.023) -0.036 (0.029) -0.014 (0.025) 0.001 (0.029) Notes: Columns 2 and 4 report the dierence in means between treatment and control. Standard deviations in parentheses in columns 1 and 3. Clustered standard errors in parentheses in columns 2 and 4. ∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01 41