77689 H N P D i s c u s s i o n P a p e R Monitoring Monitoring Assessing Results Measurement at the World Bank Daniel Cotlear and Dorothy Kronick September 2010 MONITORING MONITORING Monitoring Monitoring Assessing Results Assessing Measurement Results at Measurement at the theBank World World Bank A Case Study from d Caribbean Region the Latin America and Human Daniel Cotlear Development Department Dorothy Kronick d d September, 2010 Daniel Cotlear Dorothy Kronick June 2009 The Fate The Fate of Project of PDO Indicators: Development Objective Indicators: Of every 10 seven appear in five have at least and just one is PDO indicators at least one ISR, one follow-up measured at listed in PADs, measurement in least once per an ISR, year. To the non-Bank reader: please excuse the many acronyms. "PDO" stands for "Project Development Objective," "PAD" for "Project Appraisal Document," and "ISR" for "Implementation Status Report." The idea should be clear.   Acronyms DMT - Departmental Management Team HNP - Health, Nutrition, Population IBRD - International Bank for Reconstruction and Development ICR - Implementation Completion Report IDA - International Development Association IEG - Independent Evaluation Group ISR - Implementation Status Report LCSHD - Latin America and Caribbean Region Human Development Department M&E - Monitoring and Evaluation OPCS - Operations Policy & Country Services PAD - Project Appraisal Document PDO - Project Development Objective PSR - Project Status Report RBD - Results-Based Disbursement RSG - Results Steering Group SM - Sector Manager TTL - Task Team Leader Health, Nutrition and Population (HNP) Discussion Paper This series is produced by the Health, Nutrition, and Population Family (HNP) of the World Bank's Human Development Network. The papers in this series aim to provide a vehicle for publishing preliminary and unpolished results on HNP topics to encourage discussion and debate. The findings, interpretations, and conclusions expressed in this paper are entirely those of the author(s) and should not be attributed in any manner to the World Bank, to its affiliated organizations or to members of its Board of Executive Directors or the countries they represent. Citation and the use of material presented in this series should take into account this provisional character. For free copies of papers in this series please contact the individual author(s) whose name appears on the paper. Enquiries about the series and submissions should be made directly to the Editor, Homira Nassery (HNassery@worldbank.org). Submissions should have been previously reviewed and cleared by the sponsoring department, which will bear the cost of publication. No additional reviews will be undertaken after submission. The sponsoring department and author(s) bear full responsibility for the quality of the technical contents and presentation of material in the series. Since the material will be published as presented, authors should submit an electronic copy in a predefined format (available at www.worldbank.org/hnppublications on the Guide for Authors page). Drafts that do not meet minimum presentational standards may be returned to authors for more work before being accepted. For information regarding this and other World Bank publications, please contact the HNP Advisory Services at healthpop@worldbank.org (email), 202-473-2256 (telephone), or 202-522-3234 (fax). © 2010 The International Bank for Reconstruction and Development / The World Bank 1818 H Street, NW Washington, DC 20433 All rights reserved. i Health, Nutrition and Population (HNP) Discussion Paper Monitoring Monitoring Assessing Results Measurement at the World Bank Daniel Cotleara Dorothy Kronickb a Health, Nutrition and Population Department, Human Development Network, World Bank, Washington DC., USA b Ex-Latin American and Caribbean Human Development Department, World Bank, Washington DC., USA Paper prepared for Human Development Department of the Latin America and Caribbean Region Abstract: This paper documents a review of results monitoring in the portfolio of the Latin America and Caribbean Region Human Development Department (LCSHD) of the World Bank. The review assembled and assessed quantitative data, drawn from 67 Project Appraisal Documents (PADs) and nearly 600 Implementation Status Reports (ISRs), and qualitative data drawn from interviews. The main conclusion was that, while several aspects of results monitoring have improved in recent years, and while a focused department-level action plan (based on this diagnostic study) made additional progress, further improvements require Bank-wide action. Specifically, the report suggests that the Bank does not have a fully functioning system of results monitoring. In other words, the Bank does not have a set of rules or procedures governing the measurement of development results, nor does it have a physical platform for reporting results, nor does it have an incentive structure designed to encourage results monitoring. The paper discusses this finding along with potential recommendations. This paper received the 2010 IEG Award for Monitoring and Evaluation. The paper was completed in 2009 and was made available in a restricted manner; given expressed interest in the report, the HNP Discussion Papers Series is now publishing it to make it available to a wider audience. Since the completion of the report, the Bank has introduced reforms the monitoring of project indicators and to the focus on results of investment lending; these reforms are consistent with the key recommendations of the paper. Keywords: monitoring, results, indicators, outcomes, evaluation, project objectives Disclaimer: The findings, interpretations and conclusions expressed in the paper are entirely those of the authors, and do not represent the views of the World Bank, its Executive Directors, or the countries they represent. Correspondence Details: Daniel Cotlear, MSN: G 7-701, 1818 H St. NW, Washington DC., 20433 USA, tel: 202-473-5083, fax: 202-522-3234, email: dcotlear@worldbank.org, website: www.worldbank.org/hnp Dorothy Kronick, tel: 858-353-8196, email: dkronick@stanford.edu ii Table of Contents ACKNOWLEDGEMENTS ........................................................................................................ V   EXECUTIVE SUMMARY ...................................................................................................... VII   INTRODUCTION ........................................................................................................................ 1   “I’M STILL IN THERAPY FROM FORM 590.” SECTOR MANAGER ..................................................... 1   LITERATURE REVIEW & BACKGROUND.......................................................................... 3   METHODOLOGY & DESCRIPTIVE STATISTICS .............................................................. 6   THE ESTABLISHMENT OF BASELINE AND TARGET VALUES FOR INDICATORS. ................................ 8   MONITORING DURING PROJECT IMPLEMENTATION. ...................................................................... 8   FINDINGS ................................................................................................................................... 10   THE GOOD NEWS IS THAT:........................................................................................................... 10   THE DEPARTMENT-LEVEL PROBLEMS—AND OUR SOLUTIONS—WERE AS FOLLOWS:.................. 12   CONCLUSION AND RECOMMENDATIONS...................................................................... 19   FIRST: THINK BIG........................................................................................................................ 19   SECOND: DEFINE SUCCESS. ......................................................................................................... 20   THIRD: TALK TO TTLS............................................................................................................... 20   REFERENCES............................................................................................................................ 21   ANNEX ........................................................................................................................................ 22   iii iv ACKNOWLEDGEMENTS Many of our colleagues provided support and direction without which we would not have been able to complete this report. Ariel Fiszbein's prior work on monitoring in LCSHD provided inspiration and a starting point for our efforts. In the very early stages of this project—one year before publication—we benefited from the guidance of a small steering group. Laura Rawlings, Alan Carroll, and Suzana Abbott directed our planning and research design process, helping us select key questions and figure out how to answer them. We presented the resultant research design, in May of 2008, at a meeting open to all department members. Aline Coudouel provided especially useful comments at that meeting. The following summer was devoted entirely to data collection and analysis; Daniele Ferreira supplied invaluable assistance in the data entry process. TTLs Cornelia Tesliuc, Manuel Salazar, Polly Jones, Andrea Guedes, and Ricardo Silveira generously made time for interviews about results monitoring. At a series of three meetings in the fall of 2008, we presented our findings to the steering group, to the Departmental Management Team (DMT), and to the department as a whole. We thank all participants for insightful comments and lively discussion. Laura Rawlings and Alan Carroll provided particularly detailed reviews. We also thank Stefan Koberle and Denis Robitaille for productive conversations about our process and results. In December of 2008, we distributed a full draft of the report to the DMT, whom we thank for their review. Sector Managers Chingboon Lee, Helena Ribe, and Keith Hansen gave us excellent and abundant feedback. We are especially grateful to Keith Hansen, whose intelligent advice on framing and storyline made this report immeasurably better. Finally, we owe a great debt of gratitude to our Sector Director at the time the work was conducted, Evangeline Javier. Vangie's constructive criticism, participation (often as chair) at numerous meetings, supply of resources, and infinite support were instrumental in bringing this project to a successful conclusion. All remaining errors are our own. The authors are grateful to the World Bank for publishing this report as an HNP Discussion Paper. v vi EXECUTIVE SUMMARY Motivation & Methodology As part of an ongoing effort to better manage for results, the Latin America and Caribbean Region Human Development Department (LCSHD) conducted a review of results monitoring in our portfolio. Quantitative and qualitative data informed the study. The quantitative data were drawn from the Project Appraisal Documents (PADs) and status reports (PSRs & ISRs) of the department’s entire active portfolio at the outset of the study; this portfolio included 67 projects, which together comprised 519 PDO indicators, 1,168 intermediate indicators, and 594 status reports (PSRs & ISRs). The qualitative data were drawn from interviews with department staff members. Results Our findings fall into three categories: (1) Good news: where we have made progress on results monitoring; (2) Department-level problems: issues we have now resolved through an internal action plan; and (3) Bank-level problems: issues that cannot be resolved at the level of the department. The good news is that: • Quality at entry is improving. Compared with older projects, recent projects are more likely to have strong results frameworks. • The likelihood that a given indicator has a baseline value in the PAD has increased substantially. Roughly 3/4 of PDO indicators in new projects have a baseline in the PAD, compared with 1/2 of PDO indicators in pre-2005 projects. • Five of our projects use results-based disbursement; monitoring intensity on these projects exceeds average monitoring intensity. • The few available data suggest that LCSHD monitors results well relative to other units, according to a comparison of the results of this study with those of similar studies in other regions and by IEG. The department-level problems—and our corresponding solutions—were: • Mismatch between PDOs as expressed in the main text of the PAD and in the results annex of the PAD: about 1/3 of PADs were internally inconsistent in this way. TTLs and SMs are now correcting this problem during project preparation. • Remaining issues with results framework quality: TTLs and SMs are now using lessons from the review of quality-at-entry to further improve clarity and measurability of PDOs and indicators. • Underuse of results-based disbursement: staff are now actively considering results- based disbursement for projects in the pipeline. vii The Bank-level problems are symptoms of the fact that the Bank does not have a system of results monitoring (this conclusion is explored further in the main text): • There is no mechanism to enable learning from project to project regarding articulation of PDOs and indicators (i.e., which PDOs and indicators can be accurately measured in a cost-effective manner, which create desirable incentives, etc.), despite considerable homogeneity among results frameworks. • There is no mechanism for monitoring results over the lifetime of project: – ~1/4 of PDO indicators lack a baseline – ~1/4 of PDO indicators in PADs are never listed in ISRs – 1/2 of PDO indicators in PADs lack even one follow-up measurement in ISRs – <10% of PDO indicators measured at least once per year • Existing quality control measures are designed to raise flags concerning process (i.e., results framework) rather than actual monitoring of PDO indicators. Recommendations These findings have led us to three recommendations for those working on the results agenda: 1. Think Big. Fixing the results-monitoring problem is not about tinkering with the ISR or enforcing existing regulations, but rather about reimagining the incentives and platforms that shape our operations. 2. Define Success. Articulating the objective of results monitoring reform—an objective such as, “The results monitoring system should provide the Bank and policymakers with the incentive and capacity to (1) track project development indicators and (2) enable learning over time about what works”—would lend clarity and focus to results monitoring reform efforts. 3. Talk to TTLs. Results reforms should incorporate the perspective of those working in operations. viii ix INTRODUCTION “I’M STILL IN THERAPY FROM FORM 590.” SECTOR MANAGER Something of a results-monitoring fever has swept the development community in recent years. Monitoring handbooks abound. Monitoring specialists multiply. Monitoring consultancies fetch extravagant fees among the growing ranks of corporate social responsibility units. The United States government, the United Nations, the UK’s Department for International Development, the Gates Foundation, and many other institutions allocate ever-greater resources to measuring the results of development projects.1 There is even, as of November 1999, an entire agency within the Organization for Economic Cooperation and Development devoted to improving development monitoring worldwide.2 The Bank, while perhaps not at the forefront of this trend, is certainly part of it. OPCS founded the Results Secretariat in 2003; out of this emerged, in 2006, the Results Steering Group.3 At least three major initiatives (the criterion for “major” being the possession of a nomenclative acronym) seek to establish formal systems for results monitoring.4 This flurry of activity constitutes a new response to an old concern: the Wapenhans Report, published in 1992, found that “Portfolio management now systematically monitors implementation, disbursements and loan service, but not development results … attention to actual development impact remains inadequate.”5 It is in this context that the Latin America and the Caribbean Region Human Development Department (LCSHD) conducted an internal review of project monitoring.6 The objective of the review was to describe and assess results monitoring in LCSHD—to paint a picture of how (and, by extension, how well) the department defines development 1 See, for example, the US government’s new Program Assessment Rating Tool, the UNDP’s new Handbook on Monitoring and Evaluating for Results (PDF), and/or various DFID publications. The Paris Declaration of 2005 is further evidence of the momentum around monitoring. 2 The Partnership in Statistics for Development in the 21st Century (PARIS21), dedicated to “promoting a culture of evidence-based policymaking and monitoring in all countries.” 3 Intranet users can read the minutes of the RSG meetings here. 4 These are: IDA RMS (Results Measurement System), Africa RMS (Results Monitoring System) and IFC DOTS (Development Outcome Tracking System). There are also a number of major impact evaluation initiatives (for example, DIME and the Spanish Impact Evaluation Fund). This is not a comprehensive list of the Bank’s pro-monitoring moves (which include the establishment of CMU OPCS advisors and a nascent HDN initiative). 5 It is as difficult to pinpoint the start date of the Bank’s monitoring focus as it is to locate the catalyst of the broader interest–the former, if not the Wapenhans Report itself, may well be Susan Stout and Timothy Johnston’s 1999 study; the latter the establishment of the Millennium Development Goals in 2000. 6 The study design, conclusions, and recommendations were completed by the authors, Daniel Cotlear and Dorothy Kronick; the data analysis and writing were completed by Dorothy Kronick. Laura Rawlings and Alan Carroll served as advisors. 1 objectives, articulates indicators by which to measure progress toward those objectives, and measures indicators over time. In other words: how much do we know about what has changed in client countries as a result of the projects in our portfolio? The central conclusion of this study is that the Bank does not have a results- monitoring system. In other words, the Bank does not have a set of rules or procedures governing the measurement of development results, nor does it have a physical platform for reporting results, nor does it have an incentive structure designed to encourage results monitoring. This conclusion is discussed further in Section V. Section II summarizes existing literature on Bank results monitoring and explains the value added of this study, Section III describes our methodology, Section IV enumerates the main findings, and Section V presents conclusions and recommendations. 2 LITERATURE REVIEW & BACKGROUND Bank staff have been studying the institution’s results-monitoring activities at least since the 1970s, when the Operations Evaluation Department was established and when the value of monitoring and evaluation earned official sanction (new Operational Manual Statements, for example, mandated the use of “key performance indicators” (1974) and recommended that all projects include some form of monitoring and evaluation (1977)). In the early 1990s, decades of diffuse discussion on monitoring crystallized in the Wapenhans Report, which declared in no uncertain terms that the traditional method of evaluating portfolio performance—calculating economic rates of return—was entirely insufficient to the task of gauging development impact. “Portfolio management now systematically monitors implementation, disbursements and loan service, but not development results,” the authors wrote. “The radical premise of this paper is that the Bank should be as concerned about and accountable for the ‘development worth’ of its loan portfolio as it is now for its performance as a financial intermediary.”7 Two years later, in 1994, the Operations Evaluation Department (OED) published a 100- page report called “An Overview of Monitoring and Evaluation in the World Bank.”8 This report echoed the Wapenhans view that Bank monitoring and evaluation was woefully inadequate: a cover memo to executive directors stated bluntly, “The record of Monitoring and Evaluation in the Bank has been disappointing.” Authors found low levels of compliance with late-1980s Operational Directives requiring project teams to consider monitoring in appraisal designs; they also found that “the Bank’s record on the implementation of M&E is worse than the unsatisfactory performance already established at appraisal.” A 1999 review of development effectiveness in HNP operations (written by Susan Stout and Timothy Johnston) arrived at a similar conclusion, stating that “assessing the impact of health interventions can be challenging, but excessive Bank focus on inputs and the low priority given to M&E are also to blame.” Near the turn of the century two events—the establishment of the Millennium Development Goals in 2000 and the Monterrey Conference in 2002—launched results measurement to the forefront of the development agenda. At Monterrey, a joint statement of the heads of multilateral development banks publicly committed the Bank to invest more in managing for results. The aforementioned rush of results-monitoring activity ensued: OPCS established the Results Secretariat in 2003; that same year, IDA Deputies demanded that the Bank provide more information on country outcomes and on IDA’s contribution to country outcomes. Pilot monitoring projects began in the IDA13 7 The increased focus on results at the time of the Wapenhans Report could have been driven in part by the growing percentage of Bank lending allocated to the social sectors. As health, education, and social protection lending climbed from 5% of all lending (in the 60s, 70s, and early 80s) to 15% or 20% of all lending (in the 90s and early 00s), the need for alternative (alternative to ERR) methods of cost-benefit analysis (as it was then called) sharpened. This change in portfolio composition was in turn driven, of course, by the (then) relatively new focus on poverty reduction. 8 The authors of the OED report were certainly passionate about their subject, even going so far as to personify it in statements such as, “But the fortunes of M&E were about to reverse again.” 3 period (FY03-05) and grew into larger initiatives such as the Africa Results Monitoring System (AfricaRMS), a results reporting platform that aims to allow some aggregation of results across projects and requires staff to report on a set of standard indicators. A Results Steering Group took shape in 2006; among this group’s self-assigned missions is building a Bank-wide “Results Monitoring and Reporting Platform.” In recent years, two studies have attempted rigorous diagnoses of the state of results monitoring in the Bank; these studies are the most closely related to Monitoring Monitoring. The first, a 2006 review of monitoring and evaluation in HNP operations in the South Asia Region, involved an in-depth examination of twelve projects. The SAR report considered selection of indicators, initial quality of results framework, collection of baseline data, use and analysis of data, and several other dimensions of monitoring; they found (1) baseline data for only 39% of indicators and (2) 25% of projects in which the data collection plan “was actually implemented.” Asking a panel of evaluators to independently rate the quality of PDO indicators, the SAR study found that there is no consensus among specialists about what constitutes a good indicator. This lack of consensus has led some to conclude that there is a need for rigorous study of which indicators are best, with the goal of identifying indicators with special qualities. Others have concluded what is needed is a convention—any convention, almost—around which to align measurement effort. This logic suggests that consistency of measurement is a substantial part of quality of measurement: the major achievement of the MDGs, for example, was to unite the development community around common indicators, rather than to identify the best, most relevant indicators. Like earlier reviewers, then, the SAR report concluded that monitoring and evaluation at the Bank is in some dimension inadequate. The second of the two studies closely related to Monitoring Monitoring is IEG’s very recent (2009) review of monitoring and evaluation in the HNP portfolio. After conducting an in-depth study of dozens of HNP projects since 1997, IEG concluded that, despite some improvements, HNP monitoring and evaluation still suffers from “important weaknesses.” Chief among the report’s findings are: (1) that just 29% of HNP projects (compared with 36% of all projects) have “high” or “substantial” M&E ratings in ICRs and (2) that more than 25% of projects approved in FY07 did not establish baselines for any outcome indicators at the outset of the project. IEG also found that 71% of recently closed HNP projects reported difficulties in collecting data. “M&E is very important for both learning and accountability,” the report states, “but there are very serious gaps in its quality and implementation.” This report confirms many of the findings of these previous studies: we provide new evidence for the assertion that the Bank makes no serious attempt to measure the development results of operations. In addition to validating previous findings, this report contributes to our understanding of Bank results monitoring by asking and answering new questions. First, while previous studies view results monitoring largely through the lens of initial and final documents, Monitoring Monitoring systematically aggregates data from all of the ISRs of all of the projects in the sample, thereby providing the first quantitative evidence on indicator 4 tracking during implementation (the first of which we are aware). In doing so, Monitoring Monitoring also sheds new light on the function of the ISR as a vehicle of real-time intra-Bank communication about results. This—the systematic study of indicator tracking in ISRs—is the principal value added of this study; it is highly relevant to the Bank’s ongoing efforts to build a results monitoring system. In addition, Monitoring Monitoring systematically compares results frameworks across projects, developing quantitative measures of the similarity (or dissimilarity) among PDOs and indicators in different projects in the portfolio. Finally, Monitoring Monitoring makes use of a broader sample than previous work; the SAR and IEG studies look only at HNP projects, while this paper considers HNP, education, and social protection. 5 METHODOLOGY & DESCRIPTIVE STATISTICS As stated above, the objective of this review was to describe and assess results monitoring in LCSHD. Quantitative and qualitative data informed this effort. The quantitative data were drawn from the Project Appraisal Documents (PADs) and status reports (PSRs and ISRs; henceforth, when we refer to “ISRs,” we mean all status reports, both PSRs and ISRs) of 67 active projects. Together, these 67 projects comprised the department’s entire active portfolio at the outset of the study. From each PAD we extracted: (a) basic project identification information (project number, loan amount, approval date, etc.); (b) the PDO as stated in the body text; (c) information regarding the extent of planning for monitoring and evaluation; (d) the results framework as set out in the annex; and (c) any baseline and target values accompanying the results framework. These data were recorded in a spreadsheet, into which we also entered the following data from each ISR: (a) the date of the ISR (b) for each indicator in the PAD results framework: (i) whether the indicator appeared in the ISR; (ii) for those that appeared, whether the ISR included a new measurement of the indicator; (iii)for those that appeared and had a new measurement, the value of the measurement; (c) the M&E performance rating; and (d) information regarding discussion of monitoring issues in comments sections. The 67 resultant spreadsheets (one for each project in the sample) served two analytic ends: first, they permitted the aggregation of data on the 67 projects into a single file, which in turn allowed us to generate tabulations of variables of interest; second, they facilitated comparison of the texts of the 67 results frameworks. The qualitative data for this study were drawn from interviews with department staff members; we held individual meetings with task managers and task team leaders, as well as group discussions with sector managers, department management, and the department as a whole. As we reviewed only active-portfolio projects, we did not examine Implementation Completion Reports (ICRs). The limitations of this approach are discussed in detail below. The 67 projects in our sample represent $4.7 billion in commitments and include a wide variety of project types and sizes. There are projects from all three HD sectors (education, social protection, and health), from all six country management units in LCR, and from every year between 2000 and 2008 (by date of PAD). The largest project is a $570 million loan for Brazil’s Bolsa Familia conditional cash transfer program; the smallest is a $2 million Technical Assistance Loan to Colombia. The sample contains 6 519 PDO indicators, 1,174 intermediate indicators, and 594 ISRs. See Table 1 for more comprehensive descriptive statistics. Table 1.1. The Portfolio Under Review Average per Unit N project Range Projects 67 . . PDO indicators 519 8 1 – 36 Intermediate Indicators 1,168 17 0 – 41 ISRs 594 9 1 – 19 Total funds (Million $) 4,750 70.9 2 – 572.2 Table 1.2. Distribution by Sector Funds Sector Projects (Million $) $ per Project Range Health 32 2,025 63.2 3.5 – 240 Education 25 1,669 66.7 3 – 350 Social Protection 10 1,058 105.8 2 – 572.2 Four phases of the results monitoring process formed the foci of our research: (1) the definition and articulation of Project Development Objectives (PDOs) and indicators; (2) the establishment of baseline and target values for indicators; (3) planning for monitoring; and (4) monitoring during project implementation, or how project teams track indicators between approval and closing. We describe the methodology for each in turn: The definition and articulation of Project Development Objectives (PDOs) and indicators (results frameworks). • Our central question about results frameworks was: to what extent are PDOs similar across projects and to what extent are the indicators used to measure a given PDO similar across projects? (In other words, how much variation is there among PDOs, and how much variation is there among indicators?) Secondary questions were: (a) to what extent are PDOs in the body text of PADs consistent with PDOs in the results annexes of PADs? and (b) to what extent do results frameworks conform to OPCS standards for measurability, clarity, and other desirable characteristics? • To address the first question (regarding thematic similarity or dissimilarity among PDOs and indicators), we developed a typology of results frameworks. PDOs were classified according to (a) target group (such as level of schooling or category of health problem) and (b) type of intervention (such as coverage expansion or quality improvement). PDO indicators were classified according to (a) the objective they purport to measure and (b) method of measurement (for an objective related to coverage, for example, one method of measurement would be enrollment in a program; a second would be access to a given facility or service). 7 • To address the secondary question (regarding the quality of results frameworks in light of OPCS standards, or “quality at entry”), we answered three questions about each PDO: (a) is the PDO stated in terms of measurable results? (b) Is the PDO stated in terms of results that the project can directly influence, as opposed to higher-level objectives beyond the project? And, (c) is the PDO clear? Of each PDO indicator we asked, (a) is the PDO indicator framed in terms of measurable results? And, (b) does the PDO indicator measure one of the PDOs? THE ESTABLISHMENT OF BASELINE AND TARGET VALUES FOR INDICATORS. • To measure the extent to which projects establish baseline and target values for indicators, we generated tabulations of (a) whether each indicator has corresponding baseline and target values and (b) in what document existing baseline and target values were established (PAD, first ISR, second ISR, etc.). • Not all indicators require measurement effort to establish a baseline: some are categorical or start at zero, such as “1000 schools reopened,” or “Proposal for rationalization and reorganization of social spending developed.” Of the 519 PDO indicators in the sample, 154 were of this type; the remaining 365 required some measurement effort in order to establish a baseline. In tabulating baseline values, we considered only those which require some measurement effort in order to establish a baseline. MONITORING DURING PROJECT IMPLEMENTATION. • We define results monitoring during implementation as the collection and recording of information about the key and intermediate indicators set out in the PAD (or, in the case of restructured projects, in the restructuring document). • To assess results monitoring during implementation, we measured the extent to which ISRs include follow-up measurements of the indicators defined in PADs. Specifically, we asked questions such as: how many indicators have at least one follow-up measurement in an ISR? How many indicators are measured at least once per year? How many projects report at least one follow-up measurement for all of the PDO indicators in the PAD? How many projects report no follow-up measurements for any of the PDO indicators in the PAD? Etc. • Explanation of the absence of data on a given indicator (“Survey not complete,” for example) was not considered a follow-up measurement. However, description of the progress of a qualitative indicator (“Designed concluded under review; 20 sites identified and ready for implementation in the second semester of 2006,”) was, in most cases, considered an interim measurement. If an indicator was reworded but substantively unchanged, we considered it as the original. If a project was formally restructured and new indicators appeared in one or more 8 ISRs, we considered only the new indicators (in other words, the original indicators of restructured projects were dropped from the sample). • Some discussants of this study objected to this ISR-centric approach, arguing that there is much results-related information that never appears in ISRs. A study of results monitoring through an examination of ISRs, some suggested, may conflate an information issue with a reporting issue (as one commenter said, “the study mixes up what teams do with what teams report”). Despite these limitations, the ISR remains the only formal mechanism for communicating about results, and therefore an appropriate focal point for this study. This is a study of project monitoring, not of impact evaluation. “Whoever put ‘monitoring’ and ‘evaluation’ together into ‘M&E’ did a great disservice to monitoring,” said an early discussant of this report. We agree: monitoring and evaluation are distinct activities, deserving of separate reviews. 9 FINDINGS Our findings fall into three categories: (1) Good news: where we have made progress on results monitoring; (2) Department-level problems: issues we have now resolved through our own action plan; and (3) Bank-level problems: issues that cannot be resolved at the level of the department. THE GOOD NEWS IS THAT: Quality at entry is improving. Almost all projects now have measurable, clear PDOs and measurable, clear, relevant indicators. OPCS efforts to provide additional guidance on the work on results framework definition appear to have had some effect, according to interviews with department staff. Furthermore, the number of PDO indicators per project has declined (see Figure 1). In keeping with the findings of the SAR study—which found no agreement among reviewers as to the quality of various sets of PDOs and indicators—we found that evaluating the quality of results frameworks was a rather subjective exercise; different observers may have different opinions about whether a PDO is “clear” or an indicator is “measurable.” The likelihood that a given indicator has a baseline value in the PAD has increased substantially (See Figure 2). Roughly 3/4 of PDO indicators in new projects have a baseline in the PAD, compared with 1/2 of PDO indicators in pre-2005 projects. This increase may be attributable to the introduction, in 2005, of a second results annex table titled “Arrangements for Results Monitoring,” which includes columns for baseline and target values. 10 We have had some success with results-based disbursement, which appears to be correlated with higher results-monitoring intensity. In the five projects in our portfolio utilizing results-based disbursement, 97% of PDO indicators had baselines and 95% appeared in ISRs, compared with 70% and 74%, respectively, in non-results-based- disbursement projects (see Figure 3). 55% of PDO indicators in results-based- disbursement projects had at least one follow-up indicator, compared with 44% in other projects. Our department monitors results well relative to other units, according to a comparison of the results of this study with those of similar studies in South Asia and the HNP sector. In the SAR study of M&E in HNP operations, for example, just 39% of PDO indicators had baseline data, while more than 70% of PDO indicators in our sample had baseline data. 11 THE DEPARTMENT-LEVEL PROBLEMS—AND OUR SOLUTIONS—WERE AS FOLLOWS: PADs do not consistently articulate projects’ PDOs. One-third of PADs included two substantively distinct versions of the PDO (i.e., the PDO in the body text of the PAD was substantively (not just semantically) different from that in the results annex). TTLs and SMs are now correcting this problem during project preparation. • For example, one results annex replaced the phrase “improving the quality of preschool and primary education by enhancing the teacher training system and introducing new teaching and learning instruments in the classroom” (from the PDO in the body text) with the phrase “Improve the quality of preschool and primary education (ages 4 to 11) with a focus on socioeconomically disadvantaged and very disadvantaged contexts.” (See Figure A in the Annex for more examples of discrepancies between body-text and results-annex PDOs.) • Other projects featured body-text PDOs designed to expand coverage of a given service, only to switch to disease incidence or educational achievement in the results annex. For example, the PDO in the body text of one PAD contained the phrase, “scaling up prevention programs targeting high-risk groups as well as the general population,” while the corresponding part of the results annex read, “Reduce the mortality and morbidity attributed to HIV/AIDS.” • In a few cases, the lists of indicators in the “results framework” table did not completely correspond with the list of indicators in the “arrangements for results monitoring” table, even though the tables appear on adjacent pages. • While we do not purport to establish a definitive explanation for such mismatch, we note that, in interviews, staff members cited division of tasks (different people responsible for different parts of the PAD) and diverse pressures (different demands from different managers and partners) as potential causes. Several commentators suggested that, were we to review legal agreements, we might find yet more iterations of the PDOs. Despite the aforementioned improvement in logical frameworks (quality-at-entry), a few PDOs and indicators remained unclear, un-measurable, and/or incommensurate with the size of the project. TTLs and SMs are now using the lessons from our review of quality- at-entry to further improve results framework quality. • For example, “improve attention to quality and the relevance of learning” was considered an un-measurable objective in that “attention” is an effectively unobservable institutional characteristic. A number of PDOs contained phrases such as “reduce poverty,” “create sustainable economic growth,” or “improve competitiveness,” none of which are results an individual project can directly influence. PDOs considered “unclear” were those so indefinite as to have little meaning ( “To improve the quality and equity of the Borrower's Tertiary Education system through sub sector's response to society's needs for high quality 12 human capital that will enhance competitiveness in the global market,” for example.) • Similarly, there are a number of PDO indicators which are not stated in terms of measurable results. For example, “Presence/absence of clear lines of authority, written policies, strategic planning, budgetary and financial structures and processes,” was considered so nonspecific as to be un-measurable. PDO indicators that refer to the perpetuation of an institution or policy over an indefinite period of time, such as “Sustaining a core permanent leadership team with national accountability,” were judged un-measurable in that they imply infinity. Other PDO indicators were phrased in such a way that they would be impossible to observe, such as, “Percent of sick children correctly assessed and treated in health facility.” • Some of the PDO clarity problems arose as a result of the effort to include both outputs and outcomes. As one regional manager put it, “The discussion of outputs vs. outcomes has become ideological and often indecipherable to anyone but M&E experts. Talking to them is like going to an Ignatian college.” • The PAD guidelines are of little help on this issue, stating only, “Ideally, each project should have one project development objective focused on the primary target group. The PDO should focus on the outcome for which the project reasonably can be held accountable, given the project’s duration, resources, and approach. The PDO should not encompass higher level objectives that depend on other efforts outside the scope of the project … At the same time the PDO should not merely restate the project’s components or outputs.” At the outset of this study, few projects in our portfolio were making use of results-based disbursement. Now, department staff are actively considering results-based disbursement for projects in the pipeline. Many of our findings point to a problem that cannot be resolved within LCSHD: the fact that the Bank does not have a system of results monitoring. This conclusion is discussed further in Section V. Among the findings that lead to this conclusion are the following: There is no mechanism to enable learning from project to project regarding articulation of PDOs and indicators, despite considerable homogeneity among results frameworks. Each team essentially starts from scratch in constructing the results framework. • Two bodies of evidence support this finding: quantitative evidence drawn from a comparison of the results frameworks of the 67 projects in our sample, and qualitative evidence drawn from interviews with department staff. The results- framework-comparison exercise attests to the similarity of PDOs and indicators (and thus the potential for learning over time); the interviews assert the lack of such learning. 13 • The 67 projects in our sample comprise a far smaller number of PDOs (in other words, many projects have similar PDOs). Of 25 health projects, for example, ten address HIV/AIDS and nine address maternal-child health; the remaining six focus on other health challenges. Types of interventions were even more homogenous: almost all of the health projects sought to (1) expand coverage of a health program and/or expand access to care, (2) improve the quality of health services, and (3) improve government capacity to administer and/or deliver health care. The 32 education projects in the sample encompass a similar number of target groups and intervention types. In short, the vast majority of projects set common objectives. Very few PDOs are unique. (See Figure 4 for a complete typology.) Figure 4. Many Projects Address Similar Issues 14 • Correspondingly, the indicators in our sample comprise a far smaller number of ways to measure outcomes (in other words, many PDO indicators are alike). For example: indicators used to measure health project PDOs related to coverage fall into one of three categories: (1) enrollment in a program; (2) access to a given facility or service; or (3) availability of a given facility or service (output). Similarly, indicators used to measure health project PDOs related to institutional capacity are generally of one of four types: (1) development and/or implementation of new regulations; (2) budgeting practice and use of resources; (3) ministerial capacity building; (4) monitoring & evaluation. Indicators used to measure objectives in education and social protection are similarly homogenous. In short, the vast majority of indicators have homologues in other projects. Very few indicators are unique. • Despite this substantive similarity, however, staff members report in interviews that results frameworks are defined anew at the outset of each project. A series of pre-appraisal workshops involving extensive discussions and reformulations follows initial consultation with clients; systematic consideration of the results frameworks of previous projects is not generally part of this process. As one team leader said, somewhat indignantly, “I never look at other people’s PDOs!” While review meetings may be attended by people with related experience, involving staff members with experience on similar operations “is not really built in.” Another team leader commented that a recent IEG review of ten years of Honduras lending was helpful because “we don’t usually have that kind of perspective.” • “Each project is a different animal,” said one staff member by way of explanation. (In fact, as we have seen, most projects address familiar issues.) • “Intellectually we love the strategy discussion, the weight of the conversation about the heart of a project,” said one regional manager. “So we end up reinventing the wheel.” Staff receive conflicting messages on baselines; consequently, baselines are not consistently established: about 1/4 of PDO indicators do not have a corresponding baseline either in the PAD or in any of the ISRs (See Figure 5). • Missing baselines are not distributed evenly across all projects; only half of projects have PDO indicators with missing baselines (see Table 2). • Of those indicators that have a corresponding baseline, approximately 73% had a baseline in the PAD; the remainder gained a baseline in one of the ISRs. 15 • As discussed above, these tabulations exclude categorical or starting-from-zero indicators. Some such indicators did specify a baseline. For example, one project recorded the baseline, “Undifferentiated lines of authority, responsibility, information and execution in HRM across the central level,” for the indicator, “MEC’s role as rector in its normative, regulatory, and evaluation functions for HRM is clarified and implemented.” Another project included the PDO indicator, “Benchmarks for second HD PSRL met and documented using improved monitoring and evaluation systems and data;” the PAD specified the baseline as “Progress as of start of first HD PSRL.” Few projects specify “qualitative baselines” of this type, and there is no policy or guideline as to whether qualitative baselines are required or desirable. There are no rules regarding transfer of the results framework from PAD to ISR; the results framework is therefore often discarded in transfer: about 1/4 of PDO indicators set out in PADs never appear in an ISR (see Figure 6 below). • In other words, 1/4 of PDO indicators effectively vanish from the record during the entire period between the PAD and the Implementation Completion Report (ICR). Moreover, the “missing” indicators are not concentrated in a few delinquent projects: only 60% of projects have ISRs that contain all of the PDO indicators. As one senior manager described it, “The results framework we construct so carefully during project preparation is essentially torn down immediately after approval.” • According to interviews with staff, there are no common criteria for selecting which indicators appear in ISRs. “I just include the most important ones,” said one task manager. “Because some of them, you know, are ones the government insisted on.” Another task manager said, “Well, you can’t put all of them in! I mean, I just put the minimum required by the ISR.” One staff member reported that training sessions advise teams to select a reduced set of indicators for the 16 ISR; the ISR instructions counsel the same. Furthermore, the indicator set is not static across ISRs: it often changes with the arrival of a new TTL, and in every project it changed with the switch from PSRs to ISRs. There is no mechanism for monitoring results over the lifetime of project: 1/2 of all PDO indicators set out in PADs do not have even one follow-up measurement in ISRs, and fewer than 10% of PDO indicators are measured once per year or more (see Figure 6 below). • This is in contrast to the periodicity anticipated in the PADs, which is often annual; this periodicity is itself in conflict with the PAD guidelines, which state that “PDO indicators normally cannot be observed or measured before the end of the project.”9 Moreover, as with “missing” baseline values, “missing” interim measurements are distributed across almost all projects: just 15% of the 67 projects have at least one interim measurement for all of their PDO indicators. • Intermediate indicators are measured even less often: 75% lack even one interim value. • Identifying the sources of variation in monitoring intensity across projects and across indicators is largely beyond the scope of this study. It is not clear whether the absence of interim measurements for so many indicators stems from Bank procedural issues or from country system issues—in other words, it is not clear whether Bank teams are neglecting to absorb and record available information or whether countries are neglecting to generate information. Rather, it is clear that both problems are present; which is more significant, and to what degree, we have not determined. • “This is a development challenge, not a bureaucracy challenge,” said one discussant of this report. “We can’t solve this problem with checklists and flags.” This argument has some intuitive appeal. On the other hand, the variation in monitoring intensity across projects—imperfectly correlated, as it is, with client country—suggests that Bank effort is a determining factor. 9 A survey of a small sample of PADs indicated that the majority of PDO indicators are intended to be measured annually or semi-annually. 17 18 CONCLUSION AND RECOMMENDATIONS The conclusion of this report is that the Bank does not have a system for monitoring the development results of its investment projects. In other words: the Bank does not have a system capable of measuring the development outcomes of the tens of billions of dollars in grants and loans provided to client governments each year. As nearly two decades of analytic work such as this paper have shown, the Bank does not know how many more children are in school, how many more people have access to health care, or how many more families can afford food as a result of our operations. There is no set of rules or procedures governing the measurement of these outcomes (there is no formal policy, for example, regarding the establishment of baseline and target values for indicators— indeed, the PAD Guidelines state only that indicators “should be presented with baselines” and include a sample results annex in which most indicators lack baselines (see Figure B in the Annex)), nor is there a physical platform for reporting on them, nor is there an incentive structure designed to encourage results measurement. The Annual Report contains no information on development outcomes. A simple comparison further illustrates the central point (the central point being, to reiterate, that the Bank does not have a system for measuring results). Consider for a moment the procurement system: governed by a strict set of principles and regulations set out in numerous lengthy documents, the procurement system has its own filing platform, its own specialists, and its own training courses. The procurement system exacts strict penalties for noncompliance with regulations. The procurement system uses the ISR only to flag major issues, reserving substantive discussion of progress and problems for a separate set of reports. The results-measurement arrangement, in contrast, has no governing principles, no reporting platform, few penalties for noncompliance—in other words, no system. As described above, the absence of such a system means that development results are largely not measured. This conclusion entails three principal implications for the stewards of the Bank’s “results agenda:” FIRST: THINK BIG. Fixing the results-monitoring problem is not about tinkering with the ISR or enforcing existing regulations, but rather about reimagining the incentives and platforms that shape our operations. 19 SECOND: DEFINE SUCCESS. Articulating the goal or objective of results monitoring is important because the nature of this objective should determine the design of a results monitoring platform.10 An objective such as, “The results monitoring system should provide the Bank and policymakers with the incentive and capacity to (1) track project development indicators and (2) enable learning over time about what works” would lend clarity and focus to results monitoring reform efforts. THIRD: TALK TO TTLS. Only those whose work program is centered in operations have a concrete sense of how a new system would (or would not) facilitate Bank workflow. Results reforms should incorporate the perspective of those closest to our clients. 10 Other international organizations have struggled to define and measure the objective of results monitoring: in a 2008 review of results-based management at the United Nations, for example, the institution concluded that “the purpose of the results-based management enterprise has not been clearly articulated … there is no clear common understanding of the objectives of results-based management.” 20 REFERENCES “Accelerating the Results Agenda: Progress and Next Steps.” Operations Policy and Country Services, The World Bank (2006). “Getting Results: The World Bank’s Agenda for Improving Development Effectiveness.” The World Bank (1993). Johnston, Timothy and Susan Stout. “Investing in Health: Development Effectiveness in the Health, Nutrition, and Population Sector.” Operations Evaluation Department, The World Bank (1999). Loevinsohn, Ben. “Measuring Results: A Review of Monitoring and Evaluation in HNP Operations in South Asia and some Practical Suggestions for Implementation.” South Asia Human Development Sector, The World Bank (2006). Wapenhans, Willi and Portfolio Management Task Force. “Effective Implementation: Key to Development Impact.” The World Bank (1992). Rice, Edward. “An Overview of Monitoring and Evaluation in the World Bank.” Operations Evaluation Department, The World Bank (1994). Villar Uribe, Manuela. “Monitoring and Evaluation in the HNP portfolio.” Independent Evaluation Group, The World Bank (2009). United Nations General Assembly. “Review of results-based management at the United Nations.” Office of Internal Oversight Services, United Nations (2008). 21 ANNEX Figure A. Comparison between body-text and results-annex PDOs: five examples Body Text Results Annex 1 Reduce the mortality and morbidity attributed to Reduce the incidence of HIV infections HIV/AIDS; Reduce the impact of HIV/AIDS on Mitigate the negative impact of HIV/AIDS individuals, families, and the community; Consolidate on persons infected and affected sustainable organizational and institutional framework for managing HIV/AIDS 2 To: a) increase enrollment for Preschool, Primary and The project’s development objective is to Secondary education; b) improve attention to quality benefit the primary target group (children and relevance of learning; c) improve systems of of school going age) with more quantity governance and accountability, including measures to and improved quality of education. These strengthen community participation in the education objectives are achieved by first attaining a sector; and d) harmonize donor assistance in the sector. regulated and coordinated donor financing governance and accountability. 3 To improve Honduras' social safety net for children and Improved capacity to supervise and youth. This would be achieved by (i) improving monitor social protection interventions for nutritional and basic health status of young children by CY; Improved interinstitutional expanding successful AIN-C program, and (ii) coordination; Improved social protection increasing employability of disadvantaged youth by policy for children and youth. piloting a First Employment program. 4 To (a) improve coverage and equity at the primary To ensure universal access to primary school level through the expansion and consolidation education for all Guatemalan children, to of PRONADE schools and by providing scholarships improve the quality and efficiency of basic primarily for indigenous girls in rural communities; (b) education, to enhance cultural diversity and improve efficiency and quality of primary education by pluralism, and to decentralize and supporting bilingual education, providing textbooks strengthen the capacity of the education and didactic materials in 18 linguistic areas; expanding system. multigrade schools; and improving teacher qualifications; (c) facilitate MINEDUC and the Ministry of Culture and Sports to jointly design and execute a program to enhance the goals of cultural diversity and pluralism contained in the National Constitution, the Guatemalan Peace Accords, and the April 2000 National Congress on Cultural Policies; (d) assist decentralization and modernization of MINEDUC by supporting efforts to strengthen the organization and management of the education system. 5 To increase coverage and quality of health services and None related programs that would improve the health of the population, and to empower communities to improve their health status; and to strengthen local capacity to respond to health needs. 22 Figure B. PAD Template and Guidelines Results Annex Has No Baselines 23 About this series... This series is produced by the Health, Nutrition, and Population Family (HNP) of the World Bank’s Human Development Network. The papers in this series aim to provide a vehicle for publishing preliminary and unpolished results on HNP topics to encourage discussion and debate. The findings, interpretations, and conclusions expressed in this paper are entirely those of the author(s) and should not be attributed in any manner to the World Bank, to its affiliated organizations or to members of its Board of Executive Directors or the countries they represent. Citation and the use of material presented in this series should take into account this provisional character. For free copies of papers in this series please contact the individual authors whose name appears on the paper. Enquiries about the series and submissions should be made directly to the Editor Homira Nassery (hnassery@worldbank.org) or HNP Advisory Service (healthpop@worldbank.org, tel 202 473-2256, fax 202 522-3234). For more information, see also www.worldbank.org/ hnppublications. THE WORLD BANK 1818 H Street, NW Washington, DC USA 20433 Telephone: 202 473 1000 Facsimile: 202 477 6391 Internet: www.worldbank.org E-mail: feedback@worldbank.org