RESULTS-BASED FINANCING RBF EDUCATION EVIDENCE TANZANIA Can a Simple Teacher Incentive System Improve Learning? MARCH 2018 REACH co-funded an evaluation that examined whether a simple or more complex teacher performance pay system was more effective in increasing student learning. The Results in Education for All Children (REACH) Trust Fund supports and disseminates research on the impact of results-based financing on learning outcomes. The EVIDENCE series highlights REACH grants around the world to provide empirical evidence and operational lessons helpful in the design and implementation of successful performance-based programs. Despite several major reforms and been shown to improve student significant new investments in public learning in many settings, although Evidence shows little education over the last decade, its results have been mixed. correlation between level of teacher salaries and student learning levels in East Education systems with limited student learning. Africa remain low. Results-based administrative capacity currently financing (RBF) has been used in face a tradeoff between adopting many developing countries in an more complex incentive systems that attempt to incentivize teachers and may be more effective but are harder Teacher bonuses Student performance Learning outcomes other stakeholders to achieve better to implement and choosing simpler results. RBF mechanisms work by systems that are easier to implement Teacher performance making financing conditional on but may be less effective. pay systems are an example of RBF that has achieving measurable results such been shown to improve as student test scores or other The Results in Education for All student learning. intermediate education outcomes. Children (REACH) Trust Fund at the Teacher performance pay systems World Bank co-funded an evaluation are one example of RBF that has that compared the effectiveness of This note was adapted from Mbiti, Isaac, Karthik Muralidharan, Mauricio Romero, and Youdi Schipper (2017). Designing Teacher Performance Pay Programs: Experimental Evidence from Tanzania, (mimeo). 2 RFB EDUCATION | EVIDENCE two different teacher performance teachers based on the rank ordering the theoretical advantages of the more pay systems in early primary schools of each of their students within each complex learning “gains” system, the in Tanzania. These performance pay group. Hence, this system rewards simple learning “levels” system was systems are part of KiuFunza, an teachers based on the gains of their at least as effective in raising student experimental teacher pay program students within the structure of a learning levels. Furthermore, the introduced by Twaweza East- rank order tournament. In theory benefits of the simpler scheme were Africa, a civil society organization, rewarding learning gains through more equitably distributed across in collaboration with the Abdul Latif a rank order tournament with students from all five quintiles, while Jameel Poverty Action Lab (J-PAL), rankings determined within sets of the more complex scheme primarily Innovations for Poverty Action similar students at baseline should benefited students in the top quintile. (IPA) and Economic Development produce better results because all These results highlight the critical Initiatives (EDI). The incentive design teachers are rewarded regardless of importance of the design of RBF rewarded teachers based on the their students’ initial learning levels schemes. By rewarding teachers number of specific milestones (or and because it should incentivize for student achievement at multiple proficiency) levels each of their teachers to improve learning across learning levels rather than just one, the students could achieve. The second the entire student distribution. simple scheme overcame one of the was a more complex system that disadvantages of similar proficiency first grouped students by baseline While the evaluation found that both levels-based systems with minimal test scores, and then rewarded systems raised test scores, despite added complexity. CONTEXT on task doing instructional activities, according to a World Bank survey of improve student learning. However, a large body of evidence has service delivery indicators.2 Teacher shown that there is little correlation Tanzania invests 3.5 percent of its salaries and benefits account for between teacher compensation GDP in its education sector, which is almost two-thirds of Tanzania’s and student learning.4/5/6 Without below the Sub-Saharan Africa average education budget, while the average addressing teacher accountability of 4.5 percent. Student learning teacher in Sub-Saharan Africa earns and incentives, simply increasing the levels in the country remain low, with almost four times per capita GDP.3 volume of resources is unlikely to be large majorities of children unable to Despite already high wages, the effective in raising the test scores of read or do arithmetic at the required Ministry of Education (MoE) has Tanzanian students. level.1 While these challenges are well faced sustained pressure from the known, reforms have largely failed to In contrast, introducing teacher teachers’ union to increase teacher improve these results. performance pay systems could pay, with proponents arguing that this would motivate teachers to give teachers an incentive to help One of the many issues in Tanzanian schools is that no one is held accountable or is incentivized to improve learning. Teachers are paid Tanzania Poor accountability regardless of their attendance or 60% performance. Even when teachers Amount of are in the school, they are often not time teachers in the classroom teaching. Teachers are off task: spend only 40 percent of their time TANZANIA 3 their students to learn by linking their pay to their students’ learning outcomes. Twaweza, an East African civil society organization, first developed and launched teacher incentive programs in Tanzania in 2013 under a broader umbrella program called KiuFunza (“thirst for learning” in Swahili). The first teacher incentive program had a simple design that would be relatively easy to implement at scale. Both of the incentive programs evaluated by this report were implemented by Twaweza East-Africa in partnership with EDI, a Tanzanian research firm, and local partners in each district. WHY WAS THE INTERVENTION CHOSEN? Teacher performance pay systems gains and choosing those that like Tanzania to implement. These have been implemented in several reward learning levels. Rewarding systems may also be more difficult developing countries, but evidence of teachers based on a simple student for teachers to understand, which the effectiveness of these programs proficiency level may penalize those may weaken the incentive. is mixed. This heterogeneity is driven teachers who serve students from Therefore, the objective of this in part by large differences in the disadvantaged backgrounds and evaluation was to compare the way in which the incentives were encourage teachers to focus only effectiveness of two teacher incentive designed.7 In general, incentives on those students who are close programs, both implemented in the designed to reward teachers based to the threshold. On the other hand, same context and with the same on student learning gains have rewarding teachers based on learning budget. One scheme had a simple been more effective than systems gains should in theory incentivize learning “levels” design that rewarded that reward teachers based on them to improve learning across the teachers based on the number simple learning levels. However, it entire student distribution and should of students who reached specific is difficult to compare these results be more equitable for all teachers proficiency levels, and the second because of differences in the context, regardless of their students’ initial was a more complex scheme that design, and budgets of the different levels. However, this type of scheme rewarded teachers based on the schemes. There is little research that requires maintaining a complex average learning “gains” that their directly compares different systems. database of students’ performance to students achieved relative to their Furthermore, there may be tradeoffs calculate the “value added” for each initial learning levels. between choosing incentive teacher, which is difficult for countries schemes that reward learning with limited administrative capacity 4 RFB EDUCATION | EVIDENCE HOW DID THE higher teacher bonuses per student who reached the required level. until after student outcomes were measured. This may have affected INTERVENTION In the learning “gains” scheme, how the teachers responded to the incentives. The program WORK? students in all schools participating in the scheme were tested at the implementers provided information about each incentive program to In the simple learning “levels” scheme, beginning of the school year and schools and their communities at students were tested at the end grouped according to their initial public meetings at the beginning of of the school year, and teachers learning levels. At the end of the each school year. The implementers were rewarded for the number school year, the students were tested used culturally appropriate materials, of students who reached various again and ranked within their group. examples, and analogies to convey levels of proficiency. Teachers were Teachers were paid in proportion to the features of the program. They rewarded for students’ mastery of their students’ ranking within each also revisited each school in the grade-specific and subject-specific group. This incentive design has middle of the school year to refresh skills, ranging from very basic to more two theoretical advantages. First, teachers’ knowledge of the program advanced (for example, grade one it does not penalize teachers who and test their understanding, which students in Swahili were assessed serve disadvantaged students so was generally considerable in the on three skills—letters, words, and it incentivizes all teachers to exert case of both incentive schemes. sentences). The total amount of more effort, regardless of their money available for teacher bonuses students’ initial learning levels. The evaluation was implemented was the same for each type of skill, so Second, because the rewards are in 180 randomly selected schools that more advanced skills that were given for improvements across across 10 districts in Tanzania, with achieved by fewer students led to the entire distribution of students, 60 schools in each incentive scheme teachers are encouraged to focus and 60 more in the control group. on all students rather than only All students completed a baseline LEVELS on students near the learning test, a “high stakes” endline test thresholds. In theory, under certain that was used to determine teacher circumstances this design can bonuses and assess the program’s Student proficiency level TEACHER bonus maximize learning gains across the impact, and a “low stakes” endline bonus entire student population. test that was only used to measure bonus the program’s impact. The two The program focused on Math and endline tests were similar except a b Swahili teachers in grades one, two, c that the low stakes test covered was and three. Both incentive designs longer and covered a wider range had a fixed bonus pool of $75,000 of curricula concepts and learning GAINS split between each subject-grade domains, including material that combination, with an average bonus was not incentivized. In addition, of $3 per student or roughly $125 per Student test score ranking the enumerators collected data teacher. This was to ensure that the within their group on the characteristics on schools, budgets of the two designs would head teachers, individual teachers, be directly comparable. However, and individual students as control this also led to some uncertainty variables and to measure the about the size of teachers’ payments program’s impact across these Teacher bonus amount since they could not be calculated characteristics. TANZANIA 5 WHAT WERE THE RESULTS Both incentive schemes significantly program impact, math scores In addition, the learning gains raised test scores. However, despite increased by 0.142 SD in the “levels” from the simple “levels” design the theoretical advantages of the design compared to 0.0910 SD in were more equitably distributed more complex learning “gains” the “gains” design, a difference of across all students. In the first design, the simpler learning “levels” 0.044 SD (p=0.31), and the Swahili year of the program, teachers in design was at least as effective in scores increased by 0.187 SD in the the scheme with the “levels” design raising student learning. “levels” design compared to only focused on the top half of their 0.098 SD in the “gains” design, a class in math, while in Swahili the When the “low stakes” test scores statistically significant difference top four quintiles of students made were used to measure program of 0.093 SD (p=0.045). Overall, substantial learning gains. However, impact, math test scores increased these gains of 0.06 to 0.187 SD in the scheme with the “gains” design by 0.07 standard deviations (SD) are comparable in magnitude to only students in the top quintile under both systems in the second the results from a recent meta- improved their test scores, which year of the evaluation. However, the analysis on the use of teacher and suggests that teachers focused only Swahili test scores increased by 0.11 student incentives as well as the on the very best students. In the SD under the simple “levels” design results of other interventions to second year, the gains in math were compared to only 0.06 SD under improve student test scores, such more broadly distributed across all the more complex “gains” design, as computer-assisted learning, students in both types of incentive a difference of 0.057 SD (p=0.16). teacher training, reducing class size, schemes, even those in the bottom Similarly, when the “high stakes” providing instructional materials, two quintiles. However, in the “gains” test scores were used to measure and providing school grants.8 scheme, Swahili teachers seem to 6 RFB EDUCATION | EVIDENCE any resources away from students Figure 1: Effects of Teacher Incentives on Student Test Scores in higher grades and that the 0.2 learning gains made by grade three 0.25 students in the first year of the 0.2 scheme may have persisted when Standard deviation 0.15 they moved into grade four in the 0.1 second year. Likewise, there was 0.05 no significant effect on science 0 test scores for grades one to three, -0.05 although the point estimates were generally positive, suggesting -0.1 that the incentives may have had -0.15 Math Swahili Math Swahili Math Swahili Math Swahili some positive spillover effects on year 1 year 1 year 2 year 2 year 1 year 1 year 2 year 2 other subjects. Low stakes High stakes Teachers exerted more effort in “Levels” design “Gains” design the simple “levels” design than in the “gains” incentive scheme, and the program results were not have continued to focus mostly on schools with higher student-teacher driven by any differences in teacher students in the top quintile, while ratios benefited less in math in the comprehension. all students achieved gains in the “gains” design. “levels” scheme, suggesting that While there were no differences in teachers focused on all students. Learning gains did not come at teacher attendance, teachers in Therefore, despite the theoretical the expense of other subjects schools in the “levels” scheme were advantage of the “gains” design and grade levels. One potential more likely to be on task, less likely in motivating teachers to help all concern about implementing teacher to report that their students were students across the distribution, incentives only for some subjects disengaged, and assigned more the results suggest that this kind of and grade levels was that teachers homework. In the first year, teachers incentive scheme actually had the might cut back the effort that they in the “levels” scheme were also more opposite effect. put into teaching non-incentivized likely to provide extra help to their subjects and that schools might shift students. Furthermore, teachers were The learning gains were broadly resources away from other grades given comprehension tests to ensure distributed across various student, to grades one to three. On the other that they understood the incentive teacher, and school characteristics. hand, it was possible that positive program to which they were assigned. There were no significant differences learning gains would spill over into Their comprehension was generally in students’ learning gains by gender, other subjects or could persist over good and roughly equal in both age, or pre-school attendance. time to later grades. Overall, neither programs. In fact, the point estimates Likewise, there were no significant incentive scheme had a significant of teacher comprehension were differences by teachers’ gender, age, effect on grade four learning, higher for the “gains” scheme, so there or content knowledge. Lastly, while although the point estimates were is no evidence that the lower learning there were no significant differences positive for the “levels” design, gains in that incentive design were in learning gains based on school ranging from 0.04 to 0.13 SD. This driven by a lack of comprehension of facilities or proximity to urban areas, suggests that schools did not shift the incentives by teachers. TANZANIA 7 WHAT WERE thresholds, because the simple “levels” design did not consider a series of smaller experiments, for example, using an “A/B test” THE LESSONS students’ initial learning levels, it still approach in which two alternative LEARNED? did not offer rewards to teachers for all students’ improvements designs are compared on an outcome that can be assessed across the entire distribution of test quickly. These tests could be used Previous research in Tanzania had scores. These results suggest that to collect low stakes test scores found that the effectiveness of the including multiple thresholds in the over a short period of time or “levels” incentive design was limited, simple “levels” design overcame the intermediate outcomes such as particularly for students far above or limitations of the earlier incentives classroom observations. below the threshold, when it only set scheme. However, there are still one proficiency level for the whole many other potential variations of the curriculum.9 To address the issue of teachers focusing only on students incentive design that have not been tested. Continuing to experiment with Continuing to close to the threshold, the KiuFunza, version of the “levels” design that small tweaks to the design could have experiment with small was evaluated here included multiple big payoffs in terms of maximizing learning gains. While it may not be tweaks to the design thresholds at various points along the student learning distribution, so feasible to conduct randomized could have big payoffs that teachers could earn bonuses evaluations of several incentive designs, these design tweaks could in terms of maximizing for helping a broad set of students. However, even with multiple be tested and compared using learning gains. When it comes to RBF CONCLUSION this kind of simple incentive design that rewards learning levels in Tanzania, simpler A simple teacher incentive scheme may be the most suitable to be is better, but further that rewarded teachers based implemented on a wide scale. However, within this simple incentive research will be on the number of students who achieved specific learning levels scheme, certain design features needed to establish improved learning at least as much are critical, particularly the need to the most effective as a more complex scheme that use multiple learning thresholds rather than just one threshold so rewarded teachers based on learning ways to use teacher gains. Furthermore, contrary to that teachers can earn bonuses performance pay expectations, the simpler design also for learning achievements across benefited a broader set of students, the entire student distribution. systems. while the more complex scheme Further research will be needed to led teachers to focus primarily on establish the most effective ways the best students. Given the limited to use teacher performance pay administrative capacity in Tanzania systems to narrow the learning and other developing countries to gap and effectively target the most implement complex RBF schemes, vulnerable students. 1 Uwezo. (2017). Are Our Children Learning? Uwezo Tanzania Sixth Learning Assessment Report. Dar es Salaam: Twaweza East Africa (Tech. Rep.). 2 World Bank. (2011). Service delivery indicators: Tanzania (Tech. Rep.). The World Bank, Washington D.C. 3 World Bank. (2017). World Development Indicators. http://databank.worldbank.org/data/reports.aspx?source=world-development-indicators 4 Kane, T. J., J.E. Rockoff, and D.O. Staiger, (2008). “What does certification tell us about teacher effectiveness? Evidence from New York City.” Economics of Education Review, 27 (6), 615-631. 5 Bettinger, E. P., and B.T. Long (2010). “Does cheaper mean better? The impact of using adjunct instructors on student outcomes.” The Review of Economics and Statistics, 92 (3), 598-613. 6 Woessmann, L. (2011). “Cross-country evidence on teacher performance pay.” Economics of Education Review, 30 (3), 404-418. 7 Glewwe, P. and K. Muralidharan. (2016). “Improving education outcomes in developing countries: Evidence, knowledge gaps, and policy implications” in The Handbook of the Economics of Education, Volume 5: 653-743, Elsevier, New York, NY. 8 McEwan, P. J. (2015). “Improving Learning in Primary Schools of Developing Countries: A Meta-Analysis of Randomized Experiments.” Review of Educational Research 85(3): 353-394. 9 Mbiti, I., K. Muralidharan, M. Romero, Y. Schipper, C. Manda, and R. Rajani (2017). “Inputs, incentives, and complementarities in primary education: Experimental evidence from Tanzania.” (mimeo). PHOTO CREDITS: Cover: “A principal helps a student in class” by GPE/Kelley Lynch, license: CC BY-NC-ND 2.0 Page 3: “A teacher grades homework outside of class” by GPE/Kelley Lynch, license: CC BY-NC-ND 2.0 Page 5: “Students take an exam outdoors” by GPE/Kelley Lynch, license: CC BY-NC-ND 2.0 Page 7: “A principal walks through a classroom” by GPE/Kelley Lynch, license: CC BY-NC-ND 2.0 RESULTS IN EDUCATION FOR ALL CHILDREN (REACH) worldbank.org/reach REACH is funded by the Government of Norway through NORAD, the Government of the United States of America through USAID, and the Government of Germany reach@worldbank.org through the Federal Ministry for Economic Cooperation and Development.