90606 International Comparison Program [04.01] Towards an Output Approach to Estimate the Value of Education Services in Developing and Transitional Countries 4th Technical Advisory Group Meeting October 20-21, 2010 Washington DC Table of Contents 1. Introduction – Towards an Output Approach .......................................................................... 3 2. Volume measures – the number of pupils ............................................................................... 5 3. Adjustments to quantity and quality – time spent learning ................................................... 12 4. Quality - Imputed scores and grouping countries .................................................................. 15 5. Adjustments for to quality – non-school contributions to learning, repetition, and dropout. 30 6. REFERENCES .................................................................................................................... 35 7. APPENDIX – OVERVIEW OF INTERNATIONAL AND REGIONAL ASSESSMENT SERIES ......................................................................................................................................... 36 2 Towards an Output Approach to Estimate the Value of Education Services in Developing and Transitional Countries Education Policy and Data Center1 1. Introduction – Towards an Output Approach The International Comparative Program (ICP), charged with calculating purchasing power parity (PPP) has been working with the Education Policy and Data Center (EPDC) since October, 2009, to develop a new output-based methodology for education PPP adjustments. Both the ICP and the EPDC are guided by the Technical Advisory Group (TAG). The ICP also works with Eurostat, which has developed a methodology to measure education output for OECD countries (Eurostat, 2010). The methodology proposed here is complementary to and builds on that prepared by Eurostat for OECD countries. PPP adjustments have been made to GDP counts for a number of decades. In 2005, the ICP utilized an input-based approach to PPP conversions for education, based on teacher salaries and other costs. The ICP recognized that an output-oriented approach would, if feasible, be a more direct measure of the value of education services. Meanwhile, Eurostat, which calculates the PPPs for OECD countries, has moved towards an output-oriented approach based largely on enrollments and learning scores (Eurostat, 2010). This paper proposes a methodology for an output based approach to measure the value of education services for the non-OECD countries included in the ICP. The Eurostat approach provides the basis for the non-OECD education-output methodology. But, a number of modifications are necessary. Education systems in non-OECD countries have high variation in the access to school, attendance during the school year, retention to higher grades, children‘s backgrounds, and levels of learning -- all of which will become evident in the sections below. Moreover, there are more data gaps and more data is unreliable than in the OECD countries. The high variability and the data issues need to be incorporated into the methodology to derive measures of education output across non-OECD countries. The methodology described here has been tested with data available in late 2010. For some measures, notably learning scores, there will be more data by the next ICP round in 2011. We believe that the approach can and will be modified in the future as better methodologies are developed, but that what is proposed here is a reasonable starting point. There are a number of issues that still remain open and need to be resolved with the ICP. These issues are marked throughout the text. 1 The EPDC is a project of the Academy for Educational Development, Washington DC. For more information, visit www.epdc.org. 3 The authors are indebted to members of the TAG, ICP Global Office, and experts from the World Bank (Harry Patrinos, Emilio Porta, and Olivier Dupriez) who provided input, ideas, and corrections that improved the proposed methodology far beyond what the EPDC would have been able to accomplish on its own. Where possible, the individuals who provided particular contributions are acknowledged – recognizing that most of the contributions emerged from group discussions. The proposed conceptual model for an output-based educational value consists of two basic elements: 1) the volume, or quantity of educational services acquired; and 2) the quality of the output acquired as a result of these services (adjusted for the effects of non-education factors and the duration of schooling). At the highest level, the conceptual equation for the calculation of the PPP adjustment factor is: , where the Expenditures are collected through existing ICP procedures; Volume is a measure of the number of pupils receiving education services and the hours received per year; Q(adj) is a measure of the quality adjustments derived from learning scores and corrections for household background and system inefficiencies; and PPP are the purchasing power parity adjustments. The paper focuses on the measurement of volume and quality and is organized in the following manner: 1. Volume measures – the number of pupils 2. Adjustments to volume – absenteeism 3. Quality measure – actual and imputed learning scores 4. Accounting for uncertainty – grouping countries 5. Quality adjustments for pupil background 6. Quality adjustments for inefficiency – repetition and dropout 7. Error! Reference source not found. 4 2. Volume measures – the number of pupils The primary determinant of the volume of education services is the number of pupils enrolled in an education system – the recipients of these services. The starting point for student volumes will be administrative data, not because it is the best measure but rather it is used throughout the ICP, and to start out with other numbers, such as those coming from household surveys, would likely raise objections from statistical offices with whom the Global Office works (from a note provided by Alan Heston, TAG). Many ICP countries have an administrative system in place to take an annual (or semi-annual) census of all pupils in primary, secondary and tertiary schools, as well as in preschools, called the Education Management Information System (EMIS). While much EMIS data can be considered reliable, this is not universally the case. Concern about administrative data arises, at least in part, from the discrepancy between school participation rates as counted by administrative sources (EMIS) and household surveys respectively. Moreover, the actual time that pupils spend in school learning within a given school year, can vary considerably due to absenteeism and time in school spent wastefully. This section discusses the EMIS data itself (2.1) and the discrepancy between administrative and household survey sources (2.2); absenteeism and time in school on task is discussed in the next section. 2.1 EMIS pupil counts and data verification The EMIS information is collected typically by school headmasters and/or teachers who fill in standard questionnaires on the number of pupils and teachers at their school (disaggregating by relevant categories, such as sex, grade level, or training and experience), availability of instructional materials, the number of classrooms and facilities, and periodically, school finances including fees. This information is channeled up the organizational hierarchy, and pooled at the national level. In some cases the available counts are out of date, and in some countries, do not exist at all. Further, schools outside mainstream schools, such as community schools, part-time and specialized education (vocational and professional training, arts, sports, etc.) are often not included in these questionnaires (Moses, 2010). The EMIS pupil counts need to be verified for accuracy and reliability. Over the years, education statistics experts have developed, tested and implemented methodologies to deal with measurement error and missing data. Together, these methodologies provide well-defined steps to procuring reasonable-to-excellent estimates of the number of pupils who are attending schools in all countries. The methods can be divided into two broad categories: first, steps to ensure the reliability of EMIS data directly; second, comparisons to pupil estimates from other sources, such as household surveys. 5 Methods to ensure reliability of EMIS pupil enrollment data. Effective EMIS systems mitigate these sources of error with the following four steps (Moses, 2010). 1. Correct for incomplete coverage of schools (if not all schools report data). Based on the percentage of schools reporting, either the previous year‘s data for a school is inserted (since, if the school has not closed, its enrollment the prior year is likely to be close to that for the current year) or a percentage adjustment is made for unreported schools.; 2. Correct for incomplete or inaccurate reporting (headmasters count incorrectly or falsify enrollments for financial reasons). Inaccurate reporting is countered through a verification process in which, using a 2-5% sample, actual base records are checked by headquarters or regional personnel. This provides both an independent assessment, and a reference to original documents. 3. Count enrolment twice during the year. Enrollment is typically counted at the beginning of the school year, but enrollments often decline over the year. Some countries track enrollment more than once—typically at each term – or track actual attendance weekly or monthly, and make adjustments of pupil counts according to actual attendance. 4. Adjust for sectors of the system not counted in any annual census—such as adult education. EMIS analysts determine that all education sub-sectors are reporting, and ensure consistent ―composition‖ of reported education sub-sectors. This approach to data accuracy refinement is supported by recent work conducted in seven African countries (UNESCO, 2010a). Countries participating in the ICP could be asked to document which of these four techniques were in use for which years. Where documentation of the data verification is absent, the ICP should encourage countries to implement it. The information on data verification can be used, along with a cross-comparison with household survey data (discussed below), to decide whether the EMIS pupil counts should be adjusted. A related approach to obtaining more reliable counts is to use data from the UNESCO Institute of Statistics (UIS). The UIS has several data validation and verification techniques in the electronic survey software that countries use to provide their EMIS data to UIS, which reduce input options and chances for error. In attempting to coordinate the exchange of international data, UIS is also pursuing the standardization of indicator data exchanges through the introduction of specific software2. 2 See UIS website. 6 2.2 Corroborating EMIS counts in a comparison with data from other sources and adjustments A second method to test and improve the accuracy of EMIS pupil counts uses household surveys. It has long been recognized that school participation as measured by EMIS systems and by household surveys differs considerably in many countries. UNESCO has long recognized these differences and developed approaches to analyze the two data sources and identify the number most likely to be accurate. ICP should adopt a similar approach where EMIS pupil counts are compared against household survey counts and, where differences are larger than some acceptable margin (10% may be sufficient for the ICP purposes), an expert investigation, along the lines of Stukel and Feroz (2010) discussed below, determines which of the sources is more likely to be accurate and those numbers are utilized to estimate pupil volume. Figure 1 shows the difference between primary school gross enrolment rates counted by EMIS systems (pupils enrolled in school/children of primary school age) and primary school gross attendance rates counted by household surveys (children who attended school in last week or year/children of primary school age) in 60 developing countries post-2005 3 . Surveys distinguished as DHS surveys and all other surveys (largely MICS). The DHS surveys are highlighted because over the decades, they have earned a reputation of being highly reliable and internationally comparable. Figure 1. Difference between primary school gross enrolment rates counted by EMIS systems (pupils enrolled in school/children of primary school age) and primary school gross attendance rates counted by household surveys (children who attended school in last week or year/children of primary school age) in 60 developing countries post-2005. Surveys distinguished as DHS surveys and all other surveys (largely MICS) 3 The difference is calculated as: (100*(GER/GER-1)), which is the same calculation as used in a UIS paper by Stukel and Feraz, 2010 that analyzes these differences and is referenced later in this section. 7 The administrative and survey results for enrolment and attendance are within 10% of each other in a little over 60% of the countries in the graph, leaving a little under 40% that have larger differences. The discrepancies show both higher and lower survey values (relative to administrative), with more discrepancies on the low end. There is a notable difference between DHS and other survey – DHS surveys tend to produce more large differences (>10%) that are below the administrative data (by a ratio of 4:1); whereas for the other surveys it is the other way around – a 1:3 ratio of attendance rates that are 10% or more above the administrative data. When the survey rates are higher than administrative this can be because more pupils enroll in school than attend; if the converse is true it may be because students entered school later in the year and missed the enrolment counts. But the differences can also be due to errors or problems: 1) The administrative units incorrectly counted (or reported) the number of pupils. 2) The population data underlying administrative enrolment rates is faulty. If the population estimates are wrong, then in principle, the administrative and the survey counts of pupils could be in agreement. 3) Problems with the survey sample, response rates, questionnaire, or implementation of the questionnaire, for example, the survey asked only about current attendance (this week), and missed pupils who were out of school because of illness, school break, or another temporary absence. Differences can also be caused by a lack of coherence between the age distribution in the survey sample and the age distribution of the population estimates – this can cause problems because attendance rates are unequally distributed over age. If particular age-groups with lower or higher attendance rates are over or under-represented, this will skew the absolute pupil estimate from the surveys. UNESCO organizations, charged with estimating the number of children in school globally, and the number of not in school children have struggled with this discrepancy for years. In 2005, UIS and UNICEF jointly developed a process for corroborating EMIS data with household survey data. First, the UNESCO/UNICEF method includes consistent definitions of school levels and school participation. If school levels and defined participation are consistent and if measurement error is minimal, then school attendance rates from a household survey and school enrollment rates from EMIS data for a given country, school level and year should be very close. UIS and UNICEF consider differences smaller than 5 percentage points to be acceptable, and see such small deviations as an indication that both sources of information are reliable. In such cases, EMIS counts of pupils can be used with reasonable confidence. For cases where a sizeable difference remains (>5 %) it is likely that one or the other data source has an error or an omission that needs to be reconciled. The criteria and a process to locate errors and decide which of the two sources was more credible are refined in UNESCO (2010b) and are basically an expert-based analysis. 8 One UIS publication of particular interest to the ICP presents an in-depth analysis of the differences in absolute pupil counts from EMIS and DHS surveys for 10 developing countries (Stukel and Feroz, 2010). EMIS pupil counts were obtained directly from UIS. Pupil counts from the DHS were obtained by multplying the number of pupils counted by the survey in the various strata (sub-sections of the country) by the respective strata population weights. The original population weights are not provided with the DHS datasets but were requested from MacroInternational. Stukel and Feroz were able to obtain the original weights for 11 of the 16 countries they requested (one country was subsequently dropped from their analysis). The analysis was done for Bangladesh, Côte d‘Ivoire, Egypt, Indonesia, Mozambique, Namibia, Nigeria, Rwanda, Tanzania, and Vietnam. The observed pupil count differences were large (>10%) only for 9 out of 10 countries - an even lower correspondence than for the attendance ratios shown in Figure 1. Stukel and Feroz comment: ―… eight countries (Bangladesh, Côte d‘Ivoire, Egypt, Indonesia, Mozambique, Namibia, Rwanda and Tanzania) have values highlighted where the relative percent differences (in absolute numbers) are greater than 10% but less than 25%. For all countries except Indonesia and Tanzania, the values are positive, indicating that the enrolment figures are substantially higher than the attendance figures. In Indonesia and Tanzania, the inverse is true. There is one country where the discrepancy exceeds 25% – Vietnam (47.2%). (Stukel and Feroz 2010:15). The numbers are shown in Table 1. Stukel and Feroz analyzed the cause of these differences, and their findings adjustments were necessary to these raw numbers, and that the pupil numbers between the two sources can be aligned in 7 out of 10 cases when appropriate adjustments are made. For one country, an analysis of the questionnaire suggested which source was more appropriate to choose, leaving just one country out of 10 with unexplained pupil count discrepancies in excess of 10% (Tanzania) where the EMIS pupil counts could need to be adjusted – interestingly in this country, the rates are only 6% apart. a) In one country, Vietnam, the pupil count differences between EMIS and DHS were 47% (8.5 million pupils vs. 4.0 million), but the enrolment and attendance rates are identical – 96%. An inquiry to MacroInternational showed that the pupil count difference is the result of a decision by MacroInternational to use old population weights. ―(T)he Viet Nam DHS for 2002 was based on the previous DHS in 1997, which in turn was a sub-sample of the 1996 Multi-Round Demographic Survey. Nevertheless, Macro International made the decision not to boost the final weights for DHS 2002 by the inverse of the sub-sampling rate since their main interest was to produce ratio estimates and omitting this step would not matter. If the weights had been boosted by the inverse of the sub-sampling rates for Viet Nam, this would have generated survey-based estimates of totals …, which would have been close to the estimate from the corresponding alternate source …‖ (Stukel and Feroz 2010:16) 9 b) Most of the pupil count discrepancies can be reduced by aligning the age-distribution of the household survey sample with that of census-based population estimates or counts. Pupil discrepancies can result from differences in the population age-distribution because age- specific attendance varies greatly by age, and an over-or under-representation of a particular age-group can skew the aggregated pupil counts. Often, surveys adjust the population sampling weights using a technique called post-stratification or benchmarking. It is not obligatory however, because post-stratification has little effect on ratios, which are the results of interest for surveys. Stukel and Feroz were informed by MacroInternational that while ―DHS countries do perform empirical checks to ensure that there is coherence in age distributions between surveys and national population sources, and that post-stratification takes place when it is deemed necessary (i.e. when there is a lack of coherence), none of the DHS countries considered in this report have used post-stratification‖ (Stukel and Feroz, 2010:32). When Stukel and Feroz post-stratified the population weights of the DHS surveys to match the UN population estimates used by UIS, the percent difference of pupil counts was reduced to under 10% for an additional seven of the 10 countries (not including Vietnam, which was discussed in point a). c) Some pupil count discrepancies arise because some surveys inquired only about attendance in the last week, thus missing pupils who were out of school temporarily due to illness, vacation, or other causes. One country, where the pupil estimate was still significantly lower for household surveys than for EMIS systems even after the post-stratification was Bangladesh – survey pupil estimates were 12.7 million compared to 15.0 million from EMIS. Bangladesh is also a country where the DHS inquired only about attendance in the last week and Stukel and Feroz say: ―in Bangladesh, the academic year spans all 12 months of the year. Given that the missing information on past attendance may have constituted a significant portion of attendance for this country, the enrolment figure for Bangladesh may be considered more credible than the attendance figure.‖ 10 Table 1. Selected results from Stukel and Feroz, 2010, analysis of discrepancies in school participation and pupil numbers between EMIS and DHS, for 10 countries. Enrolment rate All pupil numbers in thousands and sampling counts, post- error of DHS counts (from difference to adjustments stratification within 2 s.d. Attendance attendance attendance rate (DHS) thousands Difference NER/NAR enrolment difference estimate? difference to Relative Relative UIS) in counts EMIS) EMIS EMIS DHS DHS Relative EMIS Bangladesh 93 79.6 13% 15,020 12,467 -17% NO 12,728 -13% Cote d'Ivoire 53.7 52.2 2% 1,474 1,304 -11% YES 1,421 -4% Egypt 93.5 85.5 8% 7,340 6,531 -11% NO 6,731 -8% Indonesia 100.9 95.3 6% 25,185 29,527 17% YES 23,588 -6% Mozambique 62.9 59.9 3% 2,318 1,842 -21% NO 2,243 -3% Namibia 74.2 78.6 -4% 283 226 -21% NO 298 5% Nigeria 62.1 62 0% 13,211 12,030 -9% YES 13,299 1% Rwanda 71.1 71.9 -1% 1,046 910 -13% NO 1,058 1% Tanzania 47.7 53.8 -6% 3,105 3,444 11% YES 3,492 12% Vietname 96.1 96.3 0% 8,498 4,487 -47% NO 8,494 0% In conclusion, the ICP should implement a similar approach, to UIS/UNICEF where EMIS pupil counts are compared against household survey counts and, where differences are larger than some acceptable margin (10% may be sufficient for the ICP purposes), an expert investigation, along the lines of Stukel and Feroz, determines which of the sources is more likely to be accurate and those numbers are utilized to estimate pupil volume. Transforming pupil numbers into a Volume index Once the pupil volume estimates have been found and agreed upon, they will be transformed to a volume index for use in the PPP adjustments, that will adjust for population size. Eurostat uses an index based on the pupils as a portion of the population, and the volume index is the pupil ratio divided by the geometric mean of all countries‘ ratios: where is the volume index for country i, and n is the number of countries. The map in Figure 2 shows the volume index for all countries. Note the higher volume indices localized in the countries with a high proportion of youth in Sub-Saharan Africa and Asia, and the relatively low indices in Eastern Europe. The overall range of the volume index is .49 (Serbia) to 4.22 (Uganda), indicating that the volume adjustments to education output measures will be quite large – in fact, these turn out to be the largest adjustments, compared to all of the quality adjustments discussed in the next sections. 11 Figure 2. Map of Volume index for primary school pupils based on EMIS pupil counts. The country level data are available in the appendix. 3. Adjustments to quantity and quality – time spent learning It is widely acknowledged that pupil counts by themselves do not necessarily reflect educational volume because they do not consider how many hours pupils come to school; nor how many hours of education they receive on the days that they do come to school. Yet, the productive exchange between teacher and student that is at the core of education service production (Hill, 1975; Schreyer and Lequiller, 2007). The OECD (2007) notes the importance of collecting pupil hours (and also repetition and dropout) by level of education or grade to calculate education volume. One might argue, as Eurostat does for the OECD countries (Eurostat, 2010), that the time spent learning, or the opportunities to learn, per pupil, are already a component of education quality and need not be collected separately for the purposes of the education PPPs. Gallais (2006) argues that how much pupils learn (as measured in assessments, discussed next), already includes the pupils-hours component. That said, it is still useful to analyze the distribution of time spent learning, assess the data availability, and the variability in time for those countries where data is available. The most straightforward measure of time per pupil is the official amount of instruction recommended for each school year. Though this information is not systematically maintained in an international database, many national guidelines or recommendations on instructional time can be found in curricular documentation, much of which has been compiled in the UNESCO International Bureau of Education World Data on Education Factbook. 4 National guidelines or 4 http://www.ibe.unesco.org/Countries/WDE/2006/index.html 12 recommendations on instruction time come in a variety of units of measurement – hours, days, or weeks per academic year. Hours per year is the preferred unit of measurement because is directly comparable across countries (countries may have a different number of hours in a school day or days in a school week), but instruction time recommendations are more common as a measure of days or weeks. No matter what the unit of measurement, there is considerable variation in the recommended duration of instruction time per country. Colombia, for example, recommends 1,000 annual hours of instruction time for primary students while The Gambia recommends 674. Because of this wide variation in the amount of instruction time provided per country, this factor must be taken into account in a measure of volume. A large body of literature suggests that the actual hours of time in the classroom in developing countries is far lower than the official hours of school per year (Hill, 1975; Atkinson, 2005; Schreyer, 2009; Konijn and Gallais, 2006; Lequiller, 2006; Abadzi 2007a, 2007b; OECD, 2007; Fraumeni, 2008; Moore et.al. 2010). Furthermore, the literature finds that even in the classroom, time is not always spent on learning tasks (time on task), sometimes because the materials necessary to teaching and learning are not available. For example, Moore, et. al find that in over 100 schools surveyed in four countries more than half of the school year was lost due to school closures, teacher absences, student absences, late starts, prolonged breaks and other reasons. Abadzi reports similar findings. This is concerning, because it turns out that opportunities to learn are an important predictor of how much children learn and thus, of school quality (Schuh Moore et.al., 2010; Abadzi, 2007a; Woessmann, 2005). Some countries collect information on some of the time lost due to absenteeism in attendance records. More countries could do the same (suggested by e.g. Chaudary et.al. 2006; UNESCO, 2010b). Beyond absenteeism, Abadzi (2007b) suggests using instructional time surveys for time loss adjustments. Fraumeni et. al. (2008) suggests an aggregation of pupil hours to account for actual service delivery to reflect a time component of educational quantity. Although Fraumeni, et. al., (2008) do not recommend methods for collecting actual teacher/pupil internaction time, others assume this component can be created through using official school contact hours (e.g. Eurostat, 2001; OECD, 2007). Lequiller (2006) suggests gathering this information via attendance figures, while another approach is classroom observation (discussed below). There is no comprehensive data source for absenteeism or time spent learning. The EPDC extracted the average student attendance from 2006 MICS surveys for 30 countries; collected data on teacher absenteeism from reports; and anecdotal evidence on actual in-class time spent on learning activities. Figure 3 shows the average number of days of school missed by students as a percentage of the official school days, in 30 countries with MICS 2006 surveys where the data could be collected. Absenteeism ranges from a minimum of -2% in Cote d‘Ivoire, where children on average attend a bit of additional school time (presumably in supplemental, private institutions); to a high of 21% in Jamaica. There is no regional pattern for absenteeism. 13 Figure 3. Percent of days of school missed by primary pupils in 30 countries, arranged by region. Country values shown in blue and regional average values shown in red. Average absenteeism by region and country absenteeism 25% 20% 15% country values 10% 5% regional 0% average Kazakhstan Montenegro Burkina F3aso Macedonia Côte d'Ivoire Guyana Kyrgyzstan Burundi Nigeria Malawi Vanuatu Guinea-Bissau Mongolia Ukraine Bang2ladesh Serbia Gambia Vietnam Albania Central African Republic Thai2land Syria Cameroon Trinidad and Tobago Laos Belize Georgia Jamaica Togo Sierra Leone -5% Teacher absenteeism for 12 countries was found in Kremer et.al. and Abadzi. The teacher absenteeism range is a little higher than for pupils from 8% in the Dominican Republic to 30% in Kenya. The effects of pupil and teacher absenteeism are multiplicative, as both have to be present for learning to take place, but the EPDC found both data points for only one country, Bangladesh. There, pupils are absent 5% of the school days; and teachers 16%; resulting in a probability of 79% that both the pupil and the teacher are in the classroom on the same day. In conclusion, while absenteeism and in-classroom time lost may be significant factors reducing the amount of time that students learn and how much they learn, the factual situation is that the data coverage for these factors is sparse, and even regional averages are not feasible. If eventually, the data situation improves, it may be useful to include absenteeism, although from the sample of available countries, it looks like the maximum volume adjustment would be a bit over 30% at the extreme, but with much lower average. Classroom observation to capture opportunities to learn In addition to absenteeism, time is lost within the classroom. Abadzi (2010) provides country averages for four countries and the total time lost can range up to 50%. Clearly, these are serious time losses, but there is no empirical way to estimate them for all countries. Opportunities to learn are a measure of the effective time spent in the classroom – the combination of resources, practices, and conditions available to provide all students with the opportunity to learn material in the national curriculum. To capture the opportunities to learn, Moore et.al, use a classroom observation method using several instruments to collect their data including "Concepts about Print (CAPs); Early Grade Reading Assessments (EGRA); Stallings Classroom observation protocols; school observations; and interviews with teachers and 14 principals" (p1). They collect data on: % of days school is open; student and teacher attendance; % of the school day available for instruction; % of student time- on-task; % of students with a textbook; observed textbook use; % of time spent reading; class size; school support, and, as a measure of output, grade 3 reading fluency. The findings allow educators to diagnose where instructional time is lost and thus, where improvements can be made. That said, time on task data does not present information on pedagogical performance; it simply provides information about how long students and teachers are on-task and what activities are happening in the classroom. Research like this 5 is time and resource intensive. While in research studies, data can be collected from a small number of schools and data collectors can spend an entire day or two at the school collecting the data; at a national level, data collection would need to be done on a sample bases, requiring simplification of all instrumentation. Such direct school observations may not be feasible for the ICP, but sampling of schools to collect some of the attendance and day-use measures may be feasible. 4. Quality - Imputed scores and grouping countries One of the greatest challenges of estimating the value of education services is the lack of a readily available common measure by which to assess the quality of output. In education, the word ―output‖ encompasses the amount of learning that is transferred from the education system to its recipients – students enrolled in formal schooling. Therefore, the volume of services produced in education must be adjusted for an estimate of quality of learning. Education professionals agree that learning achievement is but one aspect of quality education, and that standardized tests cannot capture the entirety of the transfer and acquisition of knowledge and skills. Nonetheless, large-scale standardized tests administered across large samples of students in a growing number of countries provide the best available common metric against which to compare learning outcomes. 5 Conducted by organizations like the Academy for Educational Development (AED) and the Research Triangle Institute (RTI) 15 Figure 4.1. Geographic coverage of major international assessments (PISA 2006, TIMSS 2007, PIRLS 2006) However, the geographic coverage of each individual international assessment, such as PISA, TIMSS, and PIRLS, as well as their combined geographic coverage, is far from universal, with representation particularly low among low-income countries (see Figure 3.1). How can one put all countries on the same metric when participation is so uneven across tests and missing information so nonrandom? EPDC proposes a methodology that allows for a combination of available scores from international assessments into a single metric, and the imputation of missing learning scores by exploiting the relationships between quality scores and an array of macro-level indicators. This section explains this methodology. General Description The method for the construction of the common metric of learning scores proposed by EPDC is regression imputation6. In short, first a set of OLS models was fit to the available data, to generate predicted values of the target assessment metric (PISA). The best available predicted values were then used in place of missing learning scores in a series of recursive steps, until all missing values were filled. Table 4.1. Descriptive statistics from international achievement studies N Range Minimum Maximum Mean Std. Deviation Variance PISA 2006 (Science) 54 241.29 322.03 563.32 471.29 55.27 3054.28 TIMSS 2007 8th grade Math 51 291.00 307.00 598.00 452.22 72.47 5252.13 TIMSS 2007 4th grade Math 37 383.00 224.00 607.00 471.89 91.79 8425.02 PIRLS 2007 37 262.00 302.00 564.00 498.34 72.18 5210.65 SERCE 2007 6th grade Math 16 221.83 415.64 637.47 500.00 55.65 3096.63 SACMEQ 2003 Math 13 153.70 430.90 584.60 501.52 52.29 2734.59 6 EPDC abandoned the originally proposed multiple imputation (MI), both due to the complexity of the missing data pattern, and the concerns voiced by the TAG. 16 For each country, the best available score is either its actual PISA score, or a score predicted by a model with the smallest amount of error available for that country. The target of the imputation, the PISA assessment, was selected due both to its wide coverage and its conceptual characteristics (measuring the stock of learning and skills). The key features of the data available from the international and regional assessments are provided in Table 3.1. The imputation process can be broken into three broad stages: Stage 1. Imputations based on international assessments, and the regional SERCE assessment. The key predictor variable in these OLS models (with PISA as the dependent variable) was a score from 2007 TIMSS 8th grade math test, 2007 TIMSS 4th grade math test, 2007 PIRLS test (reading and literacy in 4th grade), and finally, for a group of Latin American countries, their score on the SERCE test. While the strongest influence for the predicted PISA score was a test score from another assessment, a set of covariates were used in these models to improve the precision of the estimates. The general equation used for regression modeling in this stage is as follows: - where is the country i score on the PISA assessment in 2006; - is the constant term; - is its score on another assessment (TIMSS, PIRLS, or SERCE); - is the available indicators of the formal schooling system of country i (such as per pupil expenditure, secondary gross enrollment rate, primary completion rate, repetition rate, and pupil-teacher ratio); - is a group of variables indicating the level of economic development and the demographic features of country i (log of household consumption per capita or GDP per capita, percent of youth ages 0-14, percent population living in urban areas, and a dummy designating oil- producing countries); - and is the error term. The variable for geographic region was not introduced at this stage, because a sufficient number of cases with actual PISA scores and information on the other variables were not available in each region. For the most part, no imputation of missing values in the predictor variables was performed, and therefore, a large number of models were fit using different combinations of variables to take advantage of all available information for each country. The only exception to this rule was mean years of schooling in the adult population. Because this variable is later used to adjust the PISA* scores, it was imputed with regional and income group means. Every effort was made to include all of the available indicators, and the choices among a set of predicted values generated for a given country was made in favor of models that showed the least amount of error. 17 The best predicted values from Stage 1 models were incorporated in the Best value of PISA variable, or PISA*, which now included the 56 actual scores, and the 27 imputed scores. This variable was later regressed on the available predictors in Stage 3 to obtain a new set of predicted scores, and then used to further impute missing values. Stage 2. Imputations based on the SACMEQ assessment. Because there were no 2006 PISA scores for the group of Sub-Saharan African countries that participated in SACMEQ in 2003, the only linkages available for these countries were the Stage 1 imputed PISA scores from other tests - TIMSS 8th grade math in 2007 for Botswana and PIRLS 2006 for South Africa. However, given that only two predicted PISA* values were available, imputing using regression of PISA* on SACMEQ with covariates was impossible at this stage. Therefore, we began by calculating the average index ratio of Botswana and South Africa‘s SACMEQ scores to their predicted PISA* scores from Stage 1, and applying this ratio to compute the ―starting values‖ of PISA for subsequent adjustment in Stage 3. The process is similar to Altinok and Murselli (2006), with the exception that in this case we only have two doubloon countries. The computation was as follows: 4. The resulting starting values were incorporated into a new variable PISA** (**denotes a duplicate of PISA*, with the addition of the starting values of SACMEQ) variable, which by now included actual PISA scores, the scores imputed in Stage 1, and the starting values for SACMEQ countries. This variable was regressed on a set of indicators in Stage 3, and as a result of both the greater N of the models, and the additional information gained from other variables (see variables in Table A2 in the Appendix), the predicted scores for the SACMEQ participants from these models were deemed more reliable than the starting values based on the ratio-linking. Therefore, we replaced the starting values with the more reliable regression – adjusted predicted values for the SACMEQ countries. Table 4.2 shows the SACMEQ countries with the set of their scores on SACMEQ, the transformed starting values of PISA, and the final set of scores after adjustment by other predictors in Stage 3. 18 Table 4.2 SACMEQ Starting Values and Regression- adjusted PISA scores. PISA** Starting Country SACMEQ MATH Values PISA* Mauritius 584.6 377.59 363.76 Kenya 563.3 373.81 311.11 Seychelles 554.3 382.74 359.56 Mozambique 530 352.56 337.58 Tanzania 522.4 359.84 319.15 Swaziland 516.5 352.36 363.15 Botswana 512.9 393.57 393.57 Uganda 506.3 333.02 330.97 South Africa 486.2 298.60 298.6 Lesotho 447.2 302.61 339.7 Malawi 432.9 290.28 314.73 Zambia 432.2 293.82 282.98 Namibia 430.9 296.31 312.81 Stage 3. In this round, first PISA** was regressed on a set of predictors, which included geographic region, as well as other macro-level variables. Once the SACMEQ-based scores were adjusted by regression and incorporated in the PISA**, that variable served as the dependent for a set of models that exploited the relationships between the now larger pool of PISA values (actual and imputed) and the country‘s predictors of quality. Due to the level of missing data on a number of predictors, we first fit a set of models with the most number of predictors, minimizing the mean squared error of the estimates and the residual variance. These values were once again incorporated into PISA*, prior to fitting more parsimonious models for countries lacking data on all but a few predictors. This was done to strengthen the robustness of the sample that went into the last few models and minimize the residual variance. As in Stage 1, for each country, the value from the model with the smallest amount of error was used to impute a missing score in Stage 3. The general functional form for models in this stage was as follows: where the dependent variable is PISA* (the ―best available value of PISA‖) , the ―other test‖ variable is no longer present, and a dummy variable designating geographic region was introduced. The variables were used to the greatest extent possible, but missing data patterns dictated the use of more parsimonious models in order to obtain predicted values (fewer variables in vectors and ) . 19 Tables 4.3.1 and 4.3.2 provide a brief overview of the models used, the percentage of their contribution to the ultimate PISA* metric, and the summary statistics for the models used in the imputation. The values obtained in the imputation process are presented in the Appendix A, in Table A1. The table includes an indicator of which model was used for a given imputed score, so that the reader may obtain a sense of the error associated with that predicted score by looking up the model statistics in Table 4.3.2. Regression results from the models selected for imputation are presented in Table A3. The SPSS syntax used to run the models and to incorporate predicted values in PISA* is available upon request. Table 4. 3 .1 Contributions to PISA* Scores from EPDC Imputation Models Number Cumulative Model Test used of scores Percent Percent obtained Actual PISA Score 56 30.4 30.4 A TIMSS 2007 8th grade science 14 7.6 38.0 B TIMSS 2007 8th grade science 1 .5 38.6 C TIMSS 2007 8th grade science 1 .5 39.1 D TIMSS 2007 8th grade science 4 2.2 41.3 E TIMSS 2007 8th grade science 1 .5 41.8 F TIMSS 2007 8th grade science 1 .5 42.4 G TIMSS 2007 4th grade science 2 1.1 43.5 H PIRLS 2006 4 2.2 45.7 I SERCE (Latin America) 8 4.3 50.0 J SERCE (Latin America) 1 .5 50.5 K SACMEQ (reg adjustment) 8 4.3 54.9 L None 3 1.6 56.5 M None 31 16.8 73.4 N None 1 .5 73.9 O None 17 9.2 83.2 P None 23 12.5 95.7 Q None 5 2.7 98.4 R None 3 1.6 100.0 Total 184 20 Table 3. 4 .2 Summary Statistics from PISA* Imputation Models Mean % Residuals N of the Adj R- Residual % Residuals Model F statistic R-squared Squared < 0.5 PISA model Squared SD < 1 PISA SD Error SD A 11.0 3866.5 1.00 1.00 .7 0.3 100.0 100.0 B 12.0 111.9 1.00 0.99 40.0 3.8 100.0 100.0 C 13.0 106.1 0.99 0.98 50.0 5.0 100.0 100.0 D 15.0 121.1 0.99 0.99 43.0 4.3 100.0 100.0 E 12.0 125.1 1.00 0.99 35.8 3.6 100.0 100.0 F 17.0 69.1 0.98 0.97 101.8 7.6 100.0 100.0 G 10.0 35.5 0.99 0.96 175.1 7.6 100.0 100.0 H 24.0 7.3 0.76 0.66 658.4 21.4 95.8 87.5 I 6.0 1.1 0.82 0.10 415.3 9.1 100.0 100.0 J 6.0 0.3 0.58 (1.12) 980.4 14.0 100.0 100.0 K 56.0 25.6 0.90 0.86 675.4 22.4 96.4 83.9 L 75.0 24.8 0.84 0.81 836.2 26.3 97.3 68.0 M 51.0 20.4 0.87 0.82 702.7 23.1 98.0 70.6 N 45.0 25.9 0.91 0.87 523.6 19.5 100.0 82.2 O 60.0 21.6 0.82 0.78 788.9 25.3 98.3 70.0 P 75.0 15.9 0.71 0.67 1394.6 34.7 93.3 61.3 Q 151.0 42.9 0.73 0.72 1176.3 33.3 89.4 65.6 R 156.0 50.6 0.73 0.72 1128.3 32.7 89.7 66.7 Non-causal nature of the models When examining the variables used in imputation models in Table A3, one must keep in mind that these models are built for prediction only. No causal relationship is assumed between any of these variables and the outcome of interest. Furthermore, the coefficients might be collinear, endogenous to achievement, and therefore, inconsistent. The sole purpose of inclusion of these variables is to account for some of the variation around the learning scores. Caution must be exercised when examining and interpreting these coefficients. Reduction of bias: OECD and Oil producing countries In order to minimize the potential bias in predicted values created by the presence of OECD countries with higher PISA scores (higher intercept in regression models), every effort was made to exclude the 25 high-income OECD countries from the prediction models. This was possible in Stage 1 with Models based on TIMSS 2007 8th grade and 4th grade science tests, and in all of the Stage 3 models. However, with models based on PIRLS 2006, the geographic distribution required the inclusion of all cases in order to compute standard errors and predicted estimates within the plausible range. In addition, because wealth is generally associated with higher mean scores, we controlled for potential over-prediction in oil-producing countries, by including either the percent GDP 21 resulting from oil extraction, or a simple dummy of oil producer (based on Fearon and Laitin 2003, and imputed using the UN Trade % GDP from fuels data). The negative coefficient on this variable, although generally not statistically significant, would account for the lower scores of countries that are resource-rich, but have lower levels of achievement in their public schools than countries at the similar level of economic development. Error and uncertainty in the predicted scores While it is, by definition, impossible to evaluate the size of the error for countries with no observed PISA test data, one can obtain a gauge of the error by examining the size of the residuals from models predicting scores for countries for which data is available. Generally, the smaller the residuals, relative to the range of the overall score distribution, the better is the model at predicting the outcome of interest. As noted above, for each country with a missing PISA score, the predicted estimate from the model with the smallest residual standard deviation available for that country was used as the ―best value‖. If a predicted score was not available for a given country from the model with the smallest residual standard error (due to a missing value on one of the predictors), the next best estimate for that country was used, from the model with the next lowest residual standard deviation. The key assumption made here is that the model is equally good at predicting the scores for observed countries, as it is for unobserved; this assumption may or may not always hold, and it can never be empirically tested, until more learning assessment data becomes available. In other words, if the countries with missing PISA scores are vastly different from ones with observed scores (or, in Stage 3, with PISA* scores) in ways that cannot be controlled by the available indicators (see Table 3.3), then the relationship between the variables in the model may produce estimates that are farther from the true (but unobserved) values for these countries. As Table 3.3 demonstrates, the standard deviation of the residuals ranged from 0.3 score points (for PISA metric see Table 3.1) in models with greater numbers of predictors, to almost 35 points. The conventional 95% confidence interval for statistical significance, therefore, may range from roughly 0.6 points above and below the estimate, to about 70 points above and below, depending on the model. This means that if the assumptions hold, the true score should be within 70 points above or below the imputed estimate, but that there is a 5% chance that the true score will be beyond those limits. The choice was made in favor of stronger reliability of some countries, for which more data was available, at the expense of consistency across the entire sample, which would mean that all country scores would contain a more or less equally large amount of error. As the table shows, most if not all of the values predicted for countries that have actual PISA scores were within 1 standard deviation from their true score. The same is true for countries with PISA* scores from Stage 1: most of the Stage 3 predictions are captured within 1 SD of their PISA* score. There is no guarantee that this will hold for the rest of the countries with unobserved learning scores, but it gives us some degree of confidence in the reliability of the models. However, caution must be exercised when comparing countries 22 directly using the imputed scores, as small differences (up to 55 points) may be due to random error. Figure 4.2 PISA* Imputed Values and Alternative Predicted Scores 600 550 500 450 400 350 300 250 200 Cameroon Botswana Belize Mali Iraq Benin Fiji Oman Malta Syria Yemen Morocco Tonga Niger Costa Rica Vietnam Lebanon Dominica Bolivia Tajikistan Egypt Myanmar Armenia Macedonia India Madagascar Comoros China Bhutan Seychelles Grenada Paraguay Kiribati Uganda Zambia Albania Guatemala Tanzania Ghana Kuwait Cuba Swaziland South Africa Maldives Solomon Islands Honduras Rwanda Ethiopia Singapore Brunei Darussalam Libya Mozambique Nigeria Congo, Dem. Rep. Cambodia Saudi Arabia Suriname St. Lucia Equatorial Guinea Bosnia and Herzegovina Sao Tome and Principe Philippines United Arab Emirates St. Vincent and the Grenadines Figure 4.2 plots the predicted values obtained for the countries in the dataset vs. the final imputed values. As is evident from the graph, there is a considerable degree of fluctuation around the estimates, and the variability differs by country. Generally, predicted estimates are sensitive to model specifications, especially for countries with large missing data rates on many predictors. Information on the fluctuation of predicted scores for each country is provided in Table A2 in the Appendix. Resulting distribution of the imputed vs. actual values As could be expected, most of the imputed values are in the lower half of the overall distribution, as the countries with missing scores tend to be those where the education systems are generally considered weaker. Figure 4.3 plots these imputed values against the set of predictors used in the modeling, to give the reader some understanding of the underlying relationships between these indicators and the quality of education. 23 Figure 4.3 Distribution of Imputed (Red) vs. Actual (Blue) PISA Scores 25 20 15 Frequency 10 5 0 200 300 400 500 600 PISA* Frequency Frequency In terms of geographic representation, the imputation process gradually expanded coverage until it reached all 184 countries participating in the ICP 2011 round. Figure 4.4. Geographic representation of the imputation sequence. Dealing with the uncertainty of the imputed scores by aggregating countries into groups Given the error range in the imputed scores, in particular for the countries in the lower half of the scoring range, it may be best not to use the individual imputed scores for each country, but 24 rather, to group countries, and apply the average score of the group to all countries in it (TAG suggestion). The grouping can be done a priori – using a variable that has been found to correlate well with scores, such as GDP per capita, household consumption, or secondary enrolment rate. Countries would be grouped into 5 or 10 categories depending on the value of that variable. Alternatively, the grouping can be done ex post – using the model for imputed quality described above. The advantage of the a priori grouping is its transparency – one variable, which is known and measured, is used. The disadvantage is that empirically, measured learning scores do not line up exactly with any one indicator, and unmeasured learning is also unlikely to. The a priori grouping will result in some countries being placed in higher or lower education quality groups than they ―should‖ have been given the actual measured learning scores, or imputed scores based on a more complete model. This disadvantage would not apply to the ex post grouping because the countries can be divided exactly according to their measured or imputed score; on the other hand, this division is somewhat less transparent, and there is more opportunity for countries to raise objections to the model that is used, and their placement in the education quality groups. Aside from these conceptual considerations, the grouping method can be informed by an empirical observation of how countries would line up with alternative groupings. The graphs below present a selection of three possible alternatives: 1. A priori grouping by one (or two) selected indicator(s); 2. A priori grouping by one (or two) selected indicator(s), but using actual scores for countries with a PISA test; 3. Ex post grouping based on the models for imputed and actual scores described above. The graphs show the countries arranged by each of these three groupings, with their country- level imputed scores, and the average score for each group overlaid on the country-level scores - Figure 4, Figure 6 and Figure 7. The actual indicators used in the a priori grouping graphs can be exchanged with others, and were on our desktops, but the general insights remained the same. Depending on the indicator(s) used for the a priori grouping, however, countries land in different groups. This redistribution of countries is shown in Figure 5 with three illustrative indicators. Figure 4 shows the a priori grouping by GDP per capita, which is the strongest predictor of observed PISA scores where observation is possible. The countries are in five groups, with GDP per capita cut-off points at $1500, $4500, $10,000 and $20,000 (these are illustrative and provide an approximate distribution in quintiles). The figure shows, in blue, the imputed or actual scores, and overlaid in red, the average score for each of the income-determined country groups (the actual country names are not included here). The average values for each group are included in the figure, and range from 307 – 482. The figure shows that while there is some tendency for countries with higher scores to be in the higher average groups (reflected in the ascending line-up of average scores), there is also considerable overlap, that is, countries that have similar actual imputed scores landing in different country groups. 25 The same exercise but with different indicators gives the same picture, but with a different set of countries in each group. Figure 5 shows the shifts in groupings by three indicator sets – GDP per capita, secondary GER, and the product of both. The countries are arranged by income group (as in Figure 4) with blue diamonds, but in addition, a dot is shown for the group that each country would be in according to secondary GER (red squares), and the product of both (green triangles). The vertical lines show the shifts for particular countries, depending on each of these three criteria. For less than half (79 out of 184) countries, there are no shifts; half shift by one group (90); and 15 shift two or more groups depending on the indicator chosen. It seems to us that both the high level of overlap and the country shift depending on the indicator chosen for the groups outweigh the benefits of transparency of the a priori grouping. Figure 4. Imputed country scores and group scores by GDP/capita in 2008 (or most recent data). Original country score and group score by log GPD/cap 600 Predicted score 500 Average 400 score 482 300 418 361 394 200 307 100 0 100 109 118 127 136 145 154 163 172 181 1 10 19 28 46 55 73 82 91 37 64 Figure 5. Country quintile groups by three criteria, GDP per capita, secondary GER, and product of these two, most recent data available late 2010. Country names are omitted, as the graph serves only to illustrate the different slots that countries would land in depending on the indicator chosen. Actual country values available in the appendix Error! Reference source not found.. 5 Country groups by three criteria 4 3 Group by GDP/cap Group by SecGER 2 GDP, SecGER combined 1 0 106 113 120 127 134 141 148 155 162 169 176 183 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 26 An alternative grouping that would recognize the uncertainty of the imputed scores, while using the certainty of the actual PISA scores, would be to use the actual PISA scores where available, and group only the countries with imputed scores. Figure 6 shows one possible distribution with this alternative based on GDP per capita and group cut-off points the same as above (different indicators can be chosen and the cut-off points can vary with the same general result). This alternative does not reduce the level of overlap, for example, there are still a number of countries that would be in the highest average group (group score of 428) having the same country score as countries in the lowest group (group score of 307). Figure 6. Actual PISA scores, and, for non-PISA countries, computed country scores and group scores by GDP/capita in 2008 (or most recent data) Original country score and group score by log GPD/cap and actual PISA 600 Predicted 428 score 500 Average 400 score 300 361 392 392 200 307 100 0 109 118 136 145 172 181 100 127 154 163 1 10 19 28 46 55 73 82 91 37 64 The third grouping would be based on the model that is used to impute scores, described above. By definition, there is no overlap between groups, because the model for selecting the countries is now identical to that which is used to impute the scores. The distribution of average group scores is not much different from the a priori grouping by income (a range of 307 – 509 instead of 307 - 483), but the categorization of the countries is now unambiguous and there is no overlap issue. The EPDC suggests that countries are grouped by the derived model for quality, rather than a single indicator. A possible alternative would be to use actual PISA scores for the quality index where these are available (or imputed PISA scores based on TIMMS and PIRLS) and group only the remaining countries for which uncertainty is higher. 27 Figure 7. Actual and imputed country scores and groups by scores. Original country score and group by country score 600 Predicted score 500 Average 400 score 509 437 300 400 200 364 304 100 0 109 118 136 145 172 181 100 127 154 163 1 10 19 28 46 55 73 82 91 37 64 Transformation of group (and observed) scores into a measure of education output. Even if the data and the methodology underlying the above quality measures are accepted, then the question remains of how to use the imputed PISA scores. This issue was raised and discussed by the ICP Global Office and TAG members, and what is provided below is largely a summary of this discussion (most of the text below is taken from written contributions from Yuri Dikhanov and Kim Zieschang, both of whom we thank, noting that this is not the EPDC‘s area of expertise). The main question here, is how to interpret differences in PISA scores. What does it mean if one country's score is 110 and the score for another country is 100 [this would be the normalized scores, PISA publishes scores as centered at 500, with STD of 100, but many users including OECD normalize them to 100%]? The simplest proposal is that the first country's educational output would be raised by 10% when compared to the second one. What about the difference between the scores 70 and 80 then? Would it be the same? Does this mean, for example, that if country A has twice as many students per capita as country B, and the scores are 60 and 120 in countries A and B, respectively, then country A would have the same educational output per capita as country B? If the transformations are linear, this would be the case, at first glance, a strange result. But, if not a linear form, then what analytical form should be used to transform the scores into education output? And why? Perhaps we could learn about the analytical form by analyzing education system within one country? It should be able to explain the willingness to pay for the perceived differences in the quality of education. 28 One proposal is to derive the education output from school-level data. Say we have the education suba-ggregate of government expenditure whose relative between two countries we are trying to deflate with a purchasing power parity (PPP) to get the relative volume of education services between the two countries (say, A and B): EA . EB We characterize education expenditures as the product   E A  cost per pupil hour in A  pupil hours in A  P A quality factors A  Q A . Where the cost per pupil hour (P) is a function of factors affecting the quality of a pupil hour (with the total pupil hours being Q). At the top of these quality indicators is performance on a standardized test such as PISA, or more and more distant proxies for that score in lieu of available PISA information, with the most distant proxy being ‗percent pupils having parent with post-secondary education‘, which explains about 63 percent of the variation in the PISA score for countries where the PISA has been administered. When we form the PPP, or price relative for pupil hours between countries A and B, we will have to account for the fact that the quality factors differ, and adjust to allow us to compare like with like. Price index-wise, standard practice is to take the following geometric average, simplifying at the end to acknowledge that we have one, results-oriented quality factor, the PISA score:         1 1  P A quality factors A P A quality factors B  2  P A PISA A P A PISA B  2 PPP AB  B  B   B  B   P quality factors  A  P quality factors B    P PISA A  P PISA B   .  Let‘s suppose to keep it initially simple that we have a PISA dataset over schools for all countries, and we run, for each country, a regression of school cost per pupil hour on school PISA score. The parameters of that regression give us, for each country, say, A, the function PA(PISA scoreAi), where i runs over schools in country A. To compare country A with country B, we‘ll have to find representative values of PISA scoreA and PISA scoreB to plug into the above geometric mean (Fisher-type) price index formula. Ideally, the representative value should be such that when it is plugged into the price function of the same country as that of the representative PISA score, you get the national cost per pupil hour of that country. If there were no other complications, this would be our education PPP. Note that the function PA(PISA scoreAi) need not be linear, and probably should not be, in order to fit the actual school cost per pupil hour data vis a vis PISA score. Note also that when we deflate the expenditure ration by this quality adjusted PPP, we will not get the ratio of the Qs between the two countries, but a quality adjusted version of that. 29 Of course, there is a complication, which is that we do not have PISA scores for all countries, so we have to predict the PISA score based on school data where we have both the PISA score and one of several proxies and/or schools where we have overlaps between proxies, allowing us to link the predicted PISAs or actual PISAs, where available, together (end of communications from Yuri Dikhanov and Kim Zieschang). While this approach may lead to a better approximation of the relation of education output to learning scores, empirically, it is not possible to pursue at this time, because the school-level data on expenditures are not available, and the school-level PISA scores (plus TIMMS and PIRLS, which are reasonable proxies) scores are available only for the countries that took these tests. With only country level data to derive the output, the only practical approach at this time appears to be to assume a linear relationship of education output and the learning score indices. 5. Adjustments for to quality – non-school contributions to learning, repetition, and dropout. There are additional adjustments necessary to the learning scores. One is for non-school contributions to learning. Children from backgrounds with more educated, wealthier parents have higher learning scores, because their backgrounds have contributed to their learning. At the country level, the average contribution of households should be taken out of the calculation of education services. Second, the children who take the PISA test have successfully remained in school through primary and lower secondary. These may be the more successful learners. If that‘s so, then the PISA test scores, or their imputed values, are biased upwards, and the scores need to be corrected for that bias. This section discusses those two corrections. Adjustments for the stock of human capital While the assessments of learning outcomes are often taken as evidence of the contribution of the education system, a substantial portion of student achievement is the result of the enabling educational environment that the students are enjoying at home. It has been shown repeatedly in many studies that students with higher educated parents attain better results in learning, all other factors holding constant. In this exercise, EPDC attempted to take out the variation in quality that is associated with the level of educational development in the adult population of the ICP countries. The assumption is that the more educated, on average, are the adults in a given country, the more prepared are the students to undertake their studies, and therefore, the higher their average scores. Consequently, teachers in low-income countries not only have to deal with the regular challenges of educating their youth, they also may not have the same support of the parents as that enjoyed in wealthier Western states. EPDC‘s methodology is simple: we regress the actual and imputed PISA* scores on the country mean years of schooling, controlling for country wealth (GDP per capita and household consumption per capita), and use this coefficient to downward adjust the scores. Each country 30 loses a few score points, but the loss is greater in countries with higher average level of schooling among adults. Table A2 in the Appendix shows the unadjusted and the adjusted PISA* scores. The equations are as follows: 1. 2. The coefficient, was equal to roughly 6.2. Therefore, 6.2 PISA* points were subtracted from each country‘s score. Figure 5.1 shows how the distribution of the scores changed as a result of this process. The two upper lines are the curves of the actual unadjusted and adjusted scores, and the two lower lines are the curves of the imputed unadjusted and mean-schooling adjusted scores. Figure 5.1. Adjustment by Mean schooling, Actual and Imputed PISA* scores Adjustments for possible bias of lower-achievement of early school dropouts. In many of the lower income countries, a significant portion students drops out of school early. An education system that consistently fails a part of its population would be considered of lower quality, or lower value, even if reasonably good results are achieved with the remaining students. One reason is that due to the nonrandom nature of dropout, the students that remain in the classrooms are likely to differ from those who dropped out, and the bias created by this distortion will increase proportionately with the rate of dropout. It is not possible, from any available data that we know of, to measure the level of learning of students who drop out compared to students who continue. To get this, one would need a longitudinal study where initially all students take a 31 reading or math test in grade X, and subsequently, their school survival rate is tracked. This is not available. Data does provide levels of literacy by highest level of education attained, but it is not possible to know whether the reading level of someone who‘s highest grade is say, grade 3, is different from the grade 3 reading level that an individual who continued schooling. That said, it‘s possible to make some dropout bias adjustments even assuming that the learning of dropouts was equal to that of those who continued. Basically, the adjustment applies a 0-1 value to education depending on the acquisition of literacy. People who do not attain literacy before they drop out of school are given a value of zero; and those who do acquire literacy have a value of 1. The adjustment would be the proportion of students in any given year, who are going to drop out of school without learning to read. The DHS and MICS surveys give a very brief, one sentence reading test to the women interviewed. The women are categorized into three groups ―cannot read a sentence‖; ―can read with difficulty‖; and ―can read a sentence with ease‖. The surveys also provide the highest education level, by grade, of the interviewees, and with these data, it‘s possible to construct a curve that approximates the attainment of literacy by grades for each country. The curve is likely to be a (slight?) under-estimation because, as mentioned, the persons with lower grades as their highest level of educational attainment are likely to have been the poorer learners. Of interest to the adjustment though, are the illiteracy rates of the dropouts. These are used to calculate the proportion of children in school who will drop out without learning to read, and from that, the proportion of education services that will receive a value of zero. In words, the proportion of zero-value education services will be set equal to the sum of: the percentage of students who drop out in each grade (up to grade 8), multiplied by the percent who are still illiterate when they drop out and the number of years they have been in school (proxied by grade, disregarding repetition), or: , Where is the proportion of education services that will have a value of zero; is the percent of students who drop out in grade g; is the percent of drop outs who are still illiterate; is the grade number – or, the years the students dropping out have been in school prior to leaving. The DHS/MICS surveys provide the information on the literacy rates of those who drop out of school by primary school grade. Figure 8 shows the proportions of young women age 15-19 who can read a sentence by the highest grade completed for sub-Saharan Africa and Asian countries. The graphs for Europe, Arab countries, and Latin America are not shown because there are very few countries available. There is a fairly clear regional difference, with the literacy rates in African countries increasing significantly more slowly than in the Asian countries. The average rates for the observed countries are shown in a thicker, black curve. 32 At the same time, the drop out rates, particularly in sub-Saharan Africa are high, suggesting that the low literacy rates of those who drop out early in the school career affect a sizeable portion of the school population. Figure 9 shows the survival rates – the inverse of the drop out rates – to grade 6. These are the cumulative values of all drop outs up to grade 6 to illustrate the extent of drop out in some countries for these regions. For the calculation of the zero-value education services, grade-specific dropout rates are used. Particularly in Sub-Saharan Africa, remaining in school until grade 6 is not, unfortunately, the norm; it is somewhat more so in Asia. Figure 8. Proportion of people who can read a sentence (with difficulty or with ease) by the highest grade attained, shown separately by region. Average values for each region shown in the black curve 100 100 Bangladesh Burundi Cote d'Ivoire Indonesia Gambia 80 80 Ghana Iraq Percent who can read Percent who can read Guinea Kenya Laos 60 60 Liberia Pakistan Madagascar 40 Niger 40 Palestinians_in_Leba Nigeria non Philippines Senegal 20 Sierra Leone 20 Tajikistan Togo Uganda Vietnam 0 0 Zambia Asia AVERAGE 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 SSA AVERAGE Highest grade completed Highest grade completed Figure 9. Survival rates to grade 6, sub-Saharan Africa, and Asia. Survival to grade 6, both, sub-Saharan Africa Survival to grade 6, both, Asia 100 100 80 80 60 60 40 40 20 20 0 0 Sao Tome and … Central African… United Arab… Mozambique Mauritania Swaziland Zambia Malta Ethiopia Cote d'Ivoire Comoros Ghana Nigeria Lesotho Uganda Burundi Sudan Mali Tanzania Guinea Malawi Djibouti Gambia Niger Rwanda Namibia Senegal Cameroon Chad Botswana South Africa Benin Cape Verde Burkina Faso Kenya Congo, Rep. Gabon Togo Indonesia Maldives Samoa Vanuatu Tunisia Timor-Leste Cambodia Saudi Arabia Congo, Dem. Rep. Bahrain Thailand Israel Syria Brunei Darussalam Lebanon Qatar Morocco Iraq Philippines Yemen Jordan Fiji Bhutan Oman The data is supplemented by regional averages where missing. The resulting estimates of zero- value education services for all countries where more than 5% of education services are lost are 33 shown in ; actual country values are shown in the appendix. This adjustment affects a few countries more strongly. So long as it is relatively easy to include in the quality adjustment, it can be included; however, for the overall PPP adjustment to each country, even for education services alone, the effect is small, so it is not an adjustment that should receive many ICP resources. Figure 10. Percent of pupils in school who will drop out without learning to read, or, a proxy of the proportion of education services that have zero value. % pupils in school who will dropout without learning to read 45 40 35 30 25 20 15 10 5 0 Sao Tome … Central… Lesotho Senegal Niger Uganda Nepal Myanmar Yemen Gambia Rwanda Guinea Benin Malta Nigeria Chad Comoros Burundi Cambodia Mozambique Cameroon India Pakistan Djibouti Malawi Vanuatu Madagascar Ethiopia Mali Zambia Iraq Togo Bangladesh Namibia Eritrea Burkina Faso Nicaragua Mauritania Swaziland 34 6. REFERENCES - TO BE PROVIDED Abadzi, 2010, "Instructional Time Loss in Developing Countries: Concepts, Measurement, and Implications" 35 7. APPENDIX – OVERVIEW OF INTERNATIONAL AND REGIONAL ASSESSMENT SERIES  Progress in International Reading and Literacy Study - PIRLS PIRLS is administered by the International TIMSS and PIRLS Center under the auspices of the International Association for Evaluation of Educational Achievement (IEA). This study tested 4th grade students for reading and literacy skills in 2001 and 2006. In 2006, 38 countries participated in the PIRLS and the next round is scheduled for 2011. A comprehensive survey of families was administered in both years as part of the study, providing a wealth of information not only on the student background, but also on the household practices contributing to reading and literacy  Program for International Student Assessment - PISA PISA tests science, math and reading skills of 15-year olds and has been administered four times since 2000, in three-year cycles (2000, 2003, 2006, 2009) . Organized and managed by the OECD, PISA is designed to capture and compare the preparedness of older students for entry into the labor force across countries. PISA has a diversity of academic content often not found in other international assessments and, similarly to other international studies, includes surveys of student and family demographic data which allows for the control of non-school factors during analysis.  Trends in International Mathematics and Science Study - TIMSS TIMSS is another assessment administered by the International TIMSS and PIRLS Center under the auspices of the IEA. Target groups are 4thand 8th grade students, and target areas are mathematics and the natural sciences  Regional Assessments The major regional assessments of learning outcomes are SAQMEC (Southern Africa), LLECE (Latin America) and PASEC (French-speaking Africa). 36 Table A1. Actual and Imputed PISA scores, scores adjusted for mean schooling, and the variability across the predicted values by country. PISA* Unadjusted Adjusted for SD of Actual or PISA* Score mean predicted Income group Country Imputed OCT 14 schooling Model values Lower middle income Albania Imputed 427.68 365.8 P 5.3 Upper middle income Algeria Imputed 379.88 328.48 A 29.17 Lower middle income Angola Imputed 334.37 292.65 P 6.77 High income: nonOECD Antigua and Barbuda Imputed 374.22 305.09 P 6.27 Upper middle income Argentina Actual 391.24 338.09 8.83 Lower middle income Armenia Imputed 427.11 364.61 D 9.47 High income: OECD Australia Actual 526.88 456.44 High income: OECD Austria Actual 510.84 440.61 5.5 Lower middle income Azerbaijan Actual 382.33 320.46 6.08 High income: nonOECD Bahamas Imputed 451.81 382.63 P 11.98 High income: nonOECD Bahrain Imputed 439.6 382.42 A 11.39 Low income Bangladesh Imputed 435.7 405.01 M 37.34 High income: nonOECD Barbados Imputed 421.92 352.79 Q 0.01 Upper middle income Belarus Imputed 446.55 386.78 P 4.37 High income: OECD Belgium Actual 510.36 443.45 7.09 Lower middle income Belize Imputed 374.59 327.39 M 6.86 Low income Benin Imputed 296.73 276.97 M 13.02 Lower middle income Bhutan Imputed 399.41 361.65 O 17.42 Lower middle income Bolivia Imputed 391.44 331.44 M 12 Bosnia and Upper middle income Herzegovina Imputed 421.12 361.35 F 11.51 Upper middle income Botswana Imputed 393.57 348.62 A 26.77 Upper middle income Brazil Actual 390.33 352.22 16.55 High income: nonOECD Brunei Darussalam Imputed 408.45 357.05 P 2.29 Upper middle income Bulgaria Actual 434.08 368.81 21.18 Low income Burkina Faso Imputed 318.09 307.07 M 13.09 Low income Burundi Imputed 292.6 267.37 O 10.8 Low income Cambodia Imputed 413.67 386.91 M 14.12 Lower middle income Cameroon Imputed 349.55 312.34 M 14.71 High income: OECD Canada Actual 534.47 453.72 Lower middle income Cape Verde Imputed 319.22 277.49 O 14.14 Central African Low income Republic Imputed 384.64 359.38 M 30.65 Low income Chad Imputed 330.01 314.88 M 25.42 Upper middle income Chile Actual 438.18 372.9 7.91 Lower middle income China Imputed 422.88 377.74 P 10.19 37 PISA* Unadjusted Adjusted for SD of Actual or PISA* Score mean predicted Income group Country Imputed OCT 14 schooling Model values Chinese Taipei Upper middle income (Taiwan) Actual 532.47 466.13 18.96 Upper middle income Colombia Actual 388.04 340.6 6.22 Low income Comoros Imputed 318.88 291.58 M 12.41 Low income Congo, Dem. Rep. Imputed 268.44 243.2 P 10.05 Lower middle income Congo, Rep. Imputed 294.2 252.48 O 8.25 Upper middle income Costa Rica Imputed 407.7 359.47 I 12.39 Lower middle income Côte d'Ivoire Imputed 350.6 328.69 M 8.6 High income: nonOECD Croatia Actual 493.2 443.17 9.66 Upper middle income Cuba Imputed 447.89 389.26 I 47.12 High income: nonOECD Cyprus Imputed 461.34 402.5 A 10.5 High income: OECD Czech Republic Actual 512.86 452.66 6.8 High income: OECD Denmark Actual 495.89 421.46 5.92 Lower middle income Djibouti Imputed 408.05 370.67 P 36.04 Upper middle income Dominica Imputed 397.32 349.33 O 10.56 Upper middle income Dominican Republic Imputed 393.07 341.85 I 16.88 Lower middle income Ecuador Imputed 406.14 356.76 J 11.29 Lower middle income Egypt Imputed 357.59 318.05 D 16.65 Lower middle income El Salvador Imputed 352.53 308.56 A 20.95 High income: nonOECD Equatorial Guinea Imputed 320.07 R Low income Eritrea Imputed 324.38 297.95 M 15.47 High income: nonOECD Estonia Actual 531.39 462.94 9.04 Low income Ethiopia Imputed 315.23 305.78 M 21.07 Upper middle income Fiji Imputed 456.45 377.39 M 20.91 High income: OECD Finland Actual 563.32 483.54 15.56 High income: OECD France Actual 495.22 439.72 5.72 Upper middle income Gabon Imputed 359.16 312.26 P 2.84 Low income Gambia, The Imputed 282.54 257.3 O 18.02 Lower middle income Georgia Imputed 395.71 333.84 A 31.28 High income: OECD Germany Actual 515.65 436.55 3.02 Low income Ghana Imputed 293.56 249.68 A 44.38 High income: OECD Greece Actual 473.38 417.83 8.09 Upper middle income Grenada Imputed 314.25 302.71 M 29.17 Lower middle income Guatemala Imputed 373.91 343.03 I 7.56 Low income Guinea Imputed 271.1 256.27 M 32.73 Low income Guinea-Bissau Imputed 297.47 272.24 P 4.46 Lower middle income Guyana Imputed 368.3 315.86 M 29.07 Low income Haiti Imputed 346.6 318.02 Q 0.01 Lower middle income Honduras Imputed 351.63 320.62 M 19.32 High income: nonOECD Hong Kong, China Actual 542.21 472.88 13.13 38 PISA* Unadjusted Adjusted for SD of Actual or PISA* Score mean predicted Income group Country Imputed OCT 14 schooling Model values Upper middle income Hungary Actual 503.93 439.31 15.02 High income: OECD Iceland Actual 490.79 426.19 10.45 Lower middle income India Imputed 383.91 354.81 M 32.04 Lower middle income Indonesia Actual 393.48 350.24 20.01 Lower middle income Iran Imputed 446.18 405.34 A 21.77 Lower middle income Iraq Imputed 365.59 328.2 P 3.78 High income: OECD Ireland Actual 508.33 443.34 5.45 High income: nonOECD Israel Actual 453.9 402.51 11.31 High income: OECD Italy Actual 475.4 420.04 13.4 Upper middle income Jamaica Imputed 401.79 353.8 O 10.9 High income: OECD Japan Actual 531.39 462.37 Lower middle income Jordan Actual 421.97 373.78 12.31 Upper middle income Kazakhstan Imputed 412.2 346.22 G 9.61 Low income Kenya Imputed 311.11 267.82 K 13.34 Lower middle income Kiribati Imputed 377.98 328.32 P 24.83 High income: OECD Korea, Rep. Actual 522.15 454.66 1.49 High income: nonOECD Kuwait Imputed 465.04 413.64 A 36.06 Low income Kyrgyzstan Actual 322.03 259.2 0.02 Low income Laos Imputed 365.96 336.88 O 12.15 Upper middle income Latvia Actual 489.54 419.06 12.36 Upper middle income Lebanon Imputed 405.49 354.09 D 12.98 Lower middle income Lesotho Imputed 339.7 297.97 L 5.58 Low income Liberia Imputed 256.44 231.21 O 37.75 Upper middle income Libya Imputed 378.62 327.22 P 8.58 Upper middle income Lithuania Actual 487.96 422.91 9.73 High income: OECD Luxembourg Actual 486.32 425.93 10.19 High income: nonOECD Macao, China Actual 510.84 459.26 9.31 Upper middle income Macedonia Imputed 409.52 363 H 17.38 Low income Madagascar Imputed 329.03 295.24 M 12.14 Low income Malawi Imputed 314.73 287.38 K 17.16 Upper middle income Malaysia Imputed 428.7 375.04 E 10.74 Lower middle income Maldives Imputed 272.97 235.58 M 42.83 Low income Mali Imputed 309.58 299.35 M 6.06 High income: nonOECD Malta Imputed 446.23 387.98 A 15.69 Low income Mauritania Imputed 328.2 308.63 M 4.73 Upper middle income Mauritius Imputed 363.76 319.92 K 11.59 Upper middle income Mexico Actual 409.65 361.97 7.04 Lower middle income Micronesia Imputed 373.74 324.07 P 14.18 Lower middle income Moldova Imputed 437.72 375.84 H 7.12 Lower middle income Mongolia Imputed 427.23 366.79 M 23.72 39 PISA* Unadjusted Adjusted for SD of Actual or PISA* Score mean predicted Income group Country Imputed OCT 14 schooling Model values Upper middle income Montenegro Actual 411.79 352.02 0 Lower middle income Morocco Imputed 437.08 412.37 A 43.45 Low income Mozambique Imputed 337.58 321.21 K 14.29 Low income Myanmar N/A Upper middle income Namibia Imputed 312.81 270.67 K 15.33 Low income Nepal Imputed 367.44 349.22 M 21.24 High income: OECD Netherlands Actual 524.86 457.12 6.07 High income: OECD New Zealand Actual 530.38 450.09 Lower middle income Nicaragua Imputed 419.34 369.51 I 21.87 Low income Niger Imputed 326.72 318.3 M 21.47 Lower middle income Nigeria Imputed 312.07 278.59 P 4.32 High income: OECD Norway Actual 486.53 407.42 11.79 High income: nonOECD Oman Imputed 432.72 381.33 A 25.31 Lower middle income Pakistan Imputed 413.76 389.38 M 17.93 Palestinian Autonomous Lower middle income Territories N/A Upper middle income Panama Imputed 391.14 337.31 I 7.34 Lower middle income Papua New Guinea Imputed 378.56 328.9 Q 0.01 Lower middle income Paraguay Imputed 395.56 347.96 I 9.71 Upper middle income Peru Imputed 407.29 356.53 I 21.26 Lower middle income Philippines Imputed 438.61 383.31 M 19.12 Upper middle income Poland Actual 497.81 436.55 10.28 High income: OECD Portugal Actual 474.31 430.07 High income: nonOECD Qatar Actual 349.31 297.92 43.64 Upper middle income Romania Actual 418.39 360.78 15.68 Upper middle income Russian Federation Actual 479.47 420.91 12.59 Low income Rwanda Imputed 341.05 315.22 M 15.31 Lower middle income Samoa Imputed 407.63 357.97 O 10.46 Sao Tome and Lower middle income Principe Imputed 325.67 283.94 P 2.41 High income: nonOECD Saudi Arabia Imputed 385.71 340.12 B 9.36 Low income Senegal Imputed 325.7 300.47 O 2.86 Upper middle income Serbia Actual 435.64 375.87 9.11 Upper middle income Seychelles Imputed 359.56 314.61 L 18.99 Low income Sierra Leone Imputed 299.01 273.77 P 5.2 High income: nonOECD Singapore Imputed 561.14 508.64 C 24.5 Upper middle income Slovakia Actual 488.43 428.67 14.51 Upper middle income Slovenia Actual 518.82 451.28 23.68 Lower middle income Solomon Islands Imputed 367.18 317.52 P 0.62 Upper middle income South Africa Imputed 298.6 251.7 H 27.14 40 PISA* Unadjusted Adjusted for SD of Actual or PISA* Score mean predicted Income group Country Imputed OCT 14 schooling Model values High income: OECD Spain Actual 488.42 441.45 7.92 Lower middle income Sri Lanka Imputed 397.96 337.9 M 3.76 Upper middle income St. Kitts and Nevis Imputed 409.04 361.05 O 14.06 Upper middle income St. Lucia Imputed 407.51 359.53 O 11.71 St. Vincent and the Upper middle income Grenadines Imputed 418.68 370.69 O 23.26 Lower middle income Sudan Imputed 319.06 277.33 P 0.43 Upper middle income Suriname Imputed 389.29 341.31 P 3.3 Lower middle income Swaziland Imputed 363.15 288.96 K 16.69 High income: OECD Sweden Actual 503.33 435.63 4.9 High income: OECD Switzerland Actual 511.52 437.65 Lower middle income Syria Imputed 373.09 339.43 A 8.48 Low income Tajikistan Imputed 367.54 304.92 P 1.86 Low income Tanzania Imputed 319.15 292.83 L 6.37 Lower middle income Thailand Actual 421.01 376.73 8.87 Low income Timor-Leste Imputed 362.71 333.63 P 31.83 Low income Togo Imputed 386.04 360.29 Q 0.01 Lower middle income Tonga Imputed 382.08 332.42 O 15.77 High income: nonOECD Trinidad and Tobago Imputed 382.49 313.36 H 12.06 Lower middle income Tunisia Actual 385.51 348.12 18.23 Upper middle income Turkey Actual 423.83 389.33 8.81 Lower middle income Turkmenistan Imputed 393.58 329.36 Q 0.01 Low income Uganda Imputed 330.97 295.19 K 18.73 Lower middle income Ukraine Imputed 424.74 365.86 D 13.2 High income: nonOECD United Arab Emirates Imputed 445.4 394.01 A 28.99 High income: OECD United Kingdom Actual 514.77 453.73 High income: OECD United States Actual 488.91 416.42 Upper middle income Uruguay Actual 428.13 378.16 13.27 Lower middle income Vanuatu Imputed 467 417.33 O 22.82 Upper middle income Venezuela Imputed 377.16 329.17 O 18.08 Low income Vietnam Imputed 406.7 372.06 M 22.71 Low income Yemen Imputed 263.06 225.67 G 72.75 Low income Zambia Imputed 282.98 241.94 K 14.25 Low income Zimbabwe Imputed 263.56 215.07 N 51.64 41 Figure A.1 Imputed (Red) vs. Actual (Blue) PISA Scores by Log Household Consumption 600 FIN SIN HKG CAN EST TAWJPN NEW AUL ROK CZRSLV NTHGMYUKG 500 MAC HUN BEL SWZ AUS IRE POL SWD DEN CRO LAT SLO LIT FRN SPNICE LUX USA RUS NOR POR ITA GRC VAN KUW CYP FJI ISR BHM IRNCUB BLR MLT UAE BNG MORPHI MLD SERCHL BAH BUL OMA MON CHN ALB ARM JORUKR MAL TURURU BAR PISA* CAM NICPAK SVG THIBOS RUM KZK MNT 400 VIE DJI PER ECU MAC SAM SKT JAM COSMEX SLC LEB BRD BHU INS SRI BOT GRG PAR DMC TKM DOM CEN TOG SUR BOL COL TUN PAN BRA SAU ARG PNG IND KIRALG AZE LIB TON VEN ATB GUA BLZ TRI SOL ETMNEP LAO GUY MIC TAJ IRQ SYR SWA MAS GAB EGY SAL SEY CDICAO HAI HON QAT RWA MZM LES ANG CHA NIR ERI UGA MAGSTP TAZ MAA COMSENSUD KYR CPV EQG ETI BFO MAW MLI NIG NAM GRE 300 KEN BUI GNB SIE CON BEN GHA SAF ZAM GAM ZAI GUI MDV ZIM LBR YEM 200 5 6 7 8 9 10 log_hhc PISA* PISA* Figure A.2 Imputed (Red) vs. Actual (Blue) PISA Scores by Mean Years of Schooling in Adults 25+ 600 SIN FIN HKG CAN JPN EST NTHAUL NEW ROK SLV UKG CZR SWZ GMY 500 MAC HUNBEL IRESWDAUS CRO FRN POL DEN SPN SLO LUX LIT LAT USA NOR POR ITA GRCRUS CYP FJI IRN CUB MLT BHM MOR BNG BAH PHI CHL BUL TUR CHN URU JOR MAL UKR ARM MON PISA* PAK CAM VIE THI NIC RUM KZK 400 MEX MAC COS PER ECU BRA INS PARDOM ARG SRI TKM PAN BOL TOG CENIND COL SAU NEP GUA SYR BLZ GUY CDI HAI HON CAOEGYMASGAB SAL SWA MZM RWA NIR CHAMAA TAZ ERI UGA MAG BFO ETI GRE COM MAW NIG KYR NAM 300 MLI KEN BEN SAF GHA ZAM GUI MDV ZIM 200 0 5 10 15 mean years of school in population of 25+ from EdStats in 2005 PISA* PISA* 42