77689



       H N P        D i   s   c   u s   s   i   o n   P   a p e   R




Monitoring Monitoring

Assessing Results Measurement at the World Bank

Daniel Cotlear and Dorothy Kronick




September 2010
                        MONITORING MONITORING
                            Monitoring Monitoring
 Assessing Results
        Assessing      Measurement
                  Results                  at
                          Measurement at the  theBank
                                             World World Bank
                                    A Case Study from
                                            d Caribbean Region
                         the Latin America and
                             Human Daniel    Cotlear
                                    Development   Department
                                       Dorothy Kronick
                                                      d
                                                      d
                                       September,    2010
                                         Daniel Cotlear
                                           Dorothy Kronick


                                                June 2009




                          The Fate
             The Fate of Project   of PDO Indicators:
                                 Development Objective Indicators:




                Of every 10   seven appear in five have at least and just one is
              PDO indicators at least one ISR, one follow-up      measured at
              listed in PADs,                  measurement in least once per
                                                   an ISR,            year.

 To the non-Bank reader: please excuse the many acronyms. "PDO" stands for "Project Development Objective,"
"PAD" for "Project Appraisal Document," and "ISR" for "Implementation Status Report." The idea should be clear.
	  
                    Acronyms

DMT     -   Departmental Management Team
HNP     -   Health, Nutrition, Population
IBRD    -   International Bank for Reconstruction and Development
ICR     -   Implementation Completion Report
IDA     -   International Development Association
IEG     -   Independent Evaluation Group
ISR     -   Implementation Status Report
LCSHD   -   Latin America and Caribbean Region Human Development
            Department
M&E     -   Monitoring and Evaluation
OPCS    -   Operations Policy & Country Services
PAD     -   Project Appraisal Document
PDO     -   Project Development Objective
PSR     -   Project Status Report
RBD     -   Results-Based Disbursement
RSG     -   Results Steering Group
SM      -   Sector Manager
TTL     -   Task Team Leader
        Health, Nutrition and Population (HNP) Discussion Paper

This series is produced by the Health, Nutrition, and Population Family (HNP) of the
World Bank's Human Development Network. The papers in this series aim to provide a
vehicle for publishing preliminary and unpolished results on HNP topics to encourage
discussion and debate. The findings, interpretations, and conclusions expressed in this
paper are entirely those of the author(s) and should not be attributed in any manner to the
World Bank, to its affiliated organizations or to members of its Board of Executive
Directors or the countries they represent. Citation and the use of material presented in this
series should take into account this provisional character. For free copies of papers in this
series please contact the individual author(s) whose name appears on the paper.

Enquiries about the series and submissions should be made directly to the Editor, Homira
Nassery (HNassery@worldbank.org).           Submissions should have been previously
reviewed and cleared by the sponsoring department, which will bear the cost of
publication. No additional reviews will be undertaken after submission. The sponsoring
department and author(s) bear full responsibility for the quality of the technical contents
and presentation of material in the series.

Since the material will be published as presented, authors should submit an electronic
copy in a predefined format (available at www.worldbank.org/hnppublications on the
Guide for Authors page). Drafts that do not meet minimum presentational standards may
be returned to authors for more work before being accepted.

For information regarding this and other World Bank publications, please contact the
HNP Advisory Services at healthpop@worldbank.org (email), 202-473-2256 (telephone),
or 202-522-3234 (fax).




© 2010 The International Bank for Reconstruction and Development / The World Bank
1818 H Street, NW
Washington, DC 20433

All rights reserved.




                                              i
     Health, Nutrition and Population (HNP) Discussion Paper

                           Monitoring Monitoring
              Assessing Results Measurement at the World Bank

                            Daniel Cotleara Dorothy Kronickb
a
  Health, Nutrition and Population Department, Human Development Network, World
Bank, Washington DC., USA
b
  Ex-Latin American and Caribbean Human Development Department, World Bank,
Washington DC., USA

Paper prepared for Human Development Department of the Latin America and Caribbean
                                     Region

Abstract: This paper documents a review of results monitoring in the portfolio of the
Latin America and Caribbean Region Human Development Department (LCSHD) of the
World Bank. The review assembled and assessed quantitative data, drawn from 67
Project Appraisal Documents (PADs) and nearly 600 Implementation Status Reports
(ISRs), and qualitative data drawn from interviews. The main conclusion was that, while
several aspects of results monitoring have improved in recent years, and while a focused
department-level action plan (based on this diagnostic study) made additional progress,
further improvements require Bank-wide action. Specifically, the report suggests that
the Bank does not have a fully functioning system of results monitoring. In other
words, the Bank does not have a set of rules or procedures governing the measurement of
development results, nor does it have a physical platform for reporting results, nor does it
have an incentive structure designed to encourage results monitoring.            The paper
discusses this finding along with potential recommendations.

This paper received the 2010 IEG Award for Monitoring and Evaluation. The paper
was completed in 2009 and was made available in a restricted manner; given expressed
interest in the report, the HNP Discussion Papers Series is now publishing it to make it
available to a wider audience. Since the completion of the report, the Bank has introduced
reforms the monitoring of project indicators and to the focus on results of investment
lending; these reforms are consistent with the key recommendations of the paper.

Keywords: monitoring, results, indicators, outcomes, evaluation, project objectives

Disclaimer: The findings, interpretations and conclusions expressed in the paper are
entirely those of the authors, and do not represent the views of the World Bank, its
Executive Directors, or the countries they represent.

Correspondence Details: Daniel Cotlear, MSN: G 7-701, 1818 H St. NW,
Washington DC., 20433 USA, tel: 202-473-5083, fax: 202-522-3234, email:
dcotlear@worldbank.org, website: www.worldbank.org/hnp
Dorothy Kronick, tel: 858-353-8196, email: dkronick@stanford.edu


                                             ii
                                                       Table of Contents

ACKNOWLEDGEMENTS ........................................................................................................ V	  
EXECUTIVE SUMMARY ...................................................................................................... VII	  
INTRODUCTION ........................................................................................................................ 1	  
   “I’M STILL IN THERAPY FROM FORM 590.” SECTOR MANAGER ..................................................... 1	  
LITERATURE REVIEW & BACKGROUND.......................................................................... 3	  
METHODOLOGY & DESCRIPTIVE STATISTICS .............................................................. 6	  
   THE ESTABLISHMENT OF BASELINE AND TARGET VALUES FOR INDICATORS. ................................ 8	  
   MONITORING DURING PROJECT IMPLEMENTATION. ...................................................................... 8	  
FINDINGS ................................................................................................................................... 10	  
   THE GOOD NEWS IS THAT:........................................................................................................... 10	  
   THE DEPARTMENT-LEVEL PROBLEMS—AND OUR SOLUTIONS—WERE AS FOLLOWS:.................. 12	  
CONCLUSION AND RECOMMENDATIONS...................................................................... 19	  
   FIRST: THINK BIG........................................................................................................................ 19	  
   SECOND: DEFINE SUCCESS. ......................................................................................................... 20	  
   THIRD: TALK TO TTLS............................................................................................................... 20	  
REFERENCES............................................................................................................................ 21	  
ANNEX ........................................................................................................................................ 22	  




                                                                        iii
iv
                               ACKNOWLEDGEMENTS

Many of our colleagues provided support and direction without which we would not have been
able to complete this report.

Ariel Fiszbein's prior work on monitoring in LCSHD provided inspiration and a starting point for
our efforts. In the very early stages of this project—one year before publication—we benefited
from the guidance of a small steering group. Laura Rawlings, Alan Carroll, and Suzana Abbott
directed our planning and research design process, helping us select key questions and figure out
how to answer them. We presented the resultant research design, in May of 2008, at a meeting
open to all department members. Aline Coudouel provided especially useful comments at that
meeting. The following summer was devoted entirely to data collection and analysis; Daniele
Ferreira supplied invaluable assistance in the data entry process. TTLs Cornelia Tesliuc, Manuel
Salazar, Polly Jones, Andrea Guedes, and Ricardo Silveira generously made time for interviews
about results monitoring. At a series of three meetings in the fall of 2008, we presented our
findings to the steering group, to the Departmental Management Team (DMT), and to the
department as a whole. We thank all participants for insightful comments and lively discussion.
Laura Rawlings and Alan Carroll provided particularly detailed reviews. We also thank Stefan
Koberle and Denis Robitaille for productive conversations about our process and results.

In December of 2008, we distributed a full draft of the report to the DMT, whom we thank for
their review. Sector Managers Chingboon Lee, Helena Ribe, and Keith Hansen gave us
excellent and abundant feedback. We are especially grateful to Keith Hansen, whose intelligent
advice on framing and storyline made this report immeasurably better.

Finally, we owe a great debt of gratitude to our Sector Director at the time the work was
conducted, Evangeline Javier. Vangie's constructive criticism, participation (often as chair) at
numerous meetings, supply of resources, and infinite support were instrumental in bringing this
project to a successful conclusion.

All remaining errors are our own.

The authors are grateful to the World Bank for publishing this report as an HNP Discussion
Paper.




                                               v
vi
                             EXECUTIVE SUMMARY

Motivation & Methodology
As part of an ongoing effort to better manage for results, the Latin America and
Caribbean Region Human Development Department (LCSHD) conducted a review of
results monitoring in our portfolio.

Quantitative and qualitative data informed the study. The quantitative data were drawn
from the Project Appraisal Documents (PADs) and status reports (PSRs & ISRs) of the
department’s entire active portfolio at the outset of the study; this portfolio included 67
projects, which together comprised 519 PDO indicators, 1,168 intermediate indicators,
and 594 status reports (PSRs & ISRs). The qualitative data were drawn from interviews
with department staff members.

Results
Our findings fall into three categories: (1) Good news: where we have made progress on
results monitoring; (2) Department-level problems: issues we have now resolved through
an internal action plan; and (3) Bank-level problems: issues that cannot be resolved at the
level of the department.

The good news is that:
•   Quality at entry is improving. Compared with older projects, recent projects are more
    likely to have strong results frameworks.
•   The likelihood that a given indicator has a baseline value in the PAD has increased
    substantially. Roughly 3/4 of PDO indicators in new projects have a baseline in the
    PAD, compared with 1/2 of PDO indicators in pre-2005 projects.
•   Five of our projects use results-based disbursement; monitoring intensity on these
    projects exceeds average monitoring intensity.
•   The few available data suggest that LCSHD monitors results well relative to other
    units, according to a comparison of the results of this study with those of similar
    studies in other regions and by IEG.

The department-level problems—and our corresponding solutions—were:
•   Mismatch between PDOs as expressed in the main text of the PAD and in the results
    annex of the PAD: about 1/3 of PADs were internally inconsistent in this way. TTLs
    and SMs are now correcting this problem during project preparation.
•   Remaining issues with results framework quality: TTLs and SMs are now using
    lessons from the review of quality-at-entry to further improve clarity and
    measurability of PDOs and indicators.
•   Underuse of results-based disbursement: staff are now actively considering results-
    based disbursement for projects in the pipeline.




                                            vii
The Bank-level problems are symptoms of the fact that the Bank does not have a
system of results monitoring (this conclusion is explored further in the main text):
•   There is no mechanism to enable learning from project to project regarding
    articulation of PDOs and indicators (i.e., which PDOs and indicators can be
    accurately measured in a cost-effective manner, which create desirable incentives,
    etc.), despite considerable homogeneity among results frameworks.
•   There is no mechanism for monitoring results over the lifetime of project:
       – ~1/4 of PDO indicators lack a baseline
       – ~1/4 of PDO indicators in PADs are never listed in ISRs
       – 1/2 of PDO indicators in PADs lack even one follow-up measurement in ISRs
       – <10% of PDO indicators measured at least once per year
•   Existing quality control measures are designed to raise flags concerning process (i.e.,
    results framework) rather than actual monitoring of PDO indicators.

Recommendations
These findings have led us to three recommendations for those working on the results
agenda:
1. Think Big. Fixing the results-monitoring problem is not about tinkering with the ISR
   or enforcing existing regulations, but rather about reimagining the incentives and
   platforms that shape our operations.
2. Define Success. Articulating the objective of results monitoring reform—an objective
   such as, “The results monitoring system should provide the Bank and policymakers
   with the incentive and capacity to (1) track project development indicators and (2)
   enable learning over time about what works”—would lend clarity and focus to results
   monitoring reform efforts.
3. Talk to TTLs. Results reforms should incorporate the perspective of those working in
   operations.




                                           viii
ix
                                          INTRODUCTION
                  “I’M STILL IN THERAPY FROM FORM 590.” SECTOR MANAGER


Something of a results-monitoring fever has swept the development community in recent
years. Monitoring handbooks abound. Monitoring specialists multiply. Monitoring
consultancies fetch extravagant fees among the growing ranks of corporate social
responsibility units. The United States government, the United Nations, the UK’s
Department for International Development, the Gates Foundation, and many other
institutions allocate ever-greater resources to measuring the results of development
projects.1 There is even, as of November 1999, an entire agency within the Organization
for Economic Cooperation and Development devoted to improving development
monitoring worldwide.2

The Bank, while perhaps not at the forefront of this trend, is certainly part of it. OPCS
founded the Results Secretariat in 2003; out of this emerged, in 2006, the Results
Steering Group.3 At least three major initiatives (the criterion for “major” being the
possession of a nomenclative acronym) seek to establish formal systems for results
monitoring.4 This flurry of activity constitutes a new response to an old concern: the
Wapenhans Report, published in 1992, found that “Portfolio management now
systematically monitors implementation, disbursements and loan service, but not
development results … attention to actual development impact remains inadequate.”5

It is in this context that the Latin America and the Caribbean Region Human
Development Department (LCSHD) conducted an internal review of project monitoring.6
The objective of the review was to describe and assess results monitoring in LCSHD—to
paint a picture of how (and, by extension, how well) the department defines development

1
 See, for example, the US government’s new Program Assessment Rating Tool, the UNDP’s new
Handbook on Monitoring and Evaluating for Results (PDF), and/or various DFID publications. The Paris
Declaration of 2005 is further evidence of the momentum around monitoring.
2
 The Partnership in Statistics for Development in the 21st Century (PARIS21), dedicated to “promoting a
culture of evidence-based policymaking and monitoring in all countries.”
3
    Intranet users can read the minutes of the RSG meetings here.
4
  These are: IDA RMS (Results Measurement System), Africa RMS (Results Monitoring System) and IFC
DOTS (Development Outcome Tracking System). There are also a number of major impact evaluation
initiatives (for example, DIME and the Spanish Impact Evaluation Fund). This is not a comprehensive list
of the Bank’s pro-monitoring moves (which include the establishment of CMU OPCS advisors and a
nascent HDN initiative).
5
  It is as difficult to pinpoint the start date of the Bank’s monitoring focus as it is to locate the catalyst of
the broader interest–the former, if not the Wapenhans Report itself, may well be Susan Stout and Timothy
Johnston’s 1999 study; the latter the establishment of the Millennium Development Goals in 2000.
6
 The study design, conclusions, and recommendations were completed by the authors, Daniel Cotlear and
Dorothy Kronick; the data analysis and writing were completed by Dorothy Kronick. Laura Rawlings and
Alan Carroll served as advisors.



                                                     1
objectives, articulates indicators by which to measure progress toward those objectives,
and measures indicators over time. In other words: how much do we know about what
has changed in client countries as a result of the projects in our portfolio?

The central conclusion of this study is that the Bank does not have a results-
monitoring system. In other words, the Bank does not have a set of rules or procedures
governing the measurement of development results, nor does it have a physical platform
for reporting results, nor does it have an incentive structure designed to encourage results
monitoring. This conclusion is discussed further in Section V. Section II summarizes
existing literature on Bank results monitoring and explains the value added of this study,
Section III describes our methodology, Section IV enumerates the main findings, and
Section V presents conclusions and recommendations.




                                           2
                    LITERATURE REVIEW & BACKGROUND
Bank staff have been studying the institution’s results-monitoring activities at least since
the 1970s, when the Operations Evaluation Department was established and when the
value of monitoring and evaluation earned official sanction (new Operational Manual
Statements, for example, mandated the use of “key performance indicators” (1974) and
recommended that all projects include some form of monitoring and evaluation (1977)).

In the early 1990s, decades of diffuse discussion on monitoring crystallized in the
Wapenhans Report, which declared in no uncertain terms that the traditional method of
evaluating portfolio performance—calculating economic rates of return—was entirely
insufficient to the task of gauging development impact. “Portfolio management now
systematically monitors implementation, disbursements and loan service, but not
development results,” the authors wrote. “The radical premise of this paper is that the
Bank should be as concerned about and accountable for the ‘development worth’ of its
loan portfolio as it is now for its performance as a financial intermediary.”7

Two years later, in 1994, the Operations Evaluation Department (OED) published a 100-
page report called “An Overview of Monitoring and Evaluation in the World Bank.”8
This report echoed the Wapenhans view that Bank monitoring and evaluation was
woefully inadequate: a cover memo to executive directors stated bluntly, “The record of
Monitoring and Evaluation in the Bank has been disappointing.” Authors found low
levels of compliance with late-1980s Operational Directives requiring project teams to
consider monitoring in appraisal designs; they also found that “the Bank’s record on the
implementation of M&E is worse than the unsatisfactory performance already established
at appraisal.” A 1999 review of development effectiveness in HNP operations (written
by Susan Stout and Timothy Johnston) arrived at a similar conclusion, stating that
“assessing the impact of health interventions can be challenging, but excessive Bank
focus on inputs and the low priority given to M&E are also to blame.”

Near the turn of the century two events—the establishment of the Millennium
Development Goals in 2000 and the Monterrey Conference in 2002—launched results
measurement to the forefront of the development agenda. At Monterrey, a joint
statement of the heads of multilateral development banks publicly committed the Bank to
invest more in managing for results. The aforementioned rush of results-monitoring
activity ensued: OPCS established the Results Secretariat in 2003; that same year, IDA
Deputies demanded that the Bank provide more information on country outcomes and on
IDA’s contribution to country outcomes. Pilot monitoring projects began in the IDA13

7
  The increased focus on results at the time of the Wapenhans Report could have been driven in part by the
growing percentage of Bank lending allocated to the social sectors. As health, education, and social
protection lending climbed from 5% of all lending (in the 60s, 70s, and early 80s) to 15% or 20% of all
lending (in the 90s and early 00s), the need for alternative (alternative to ERR) methods of cost-benefit
analysis (as it was then called) sharpened. This change in portfolio composition was in turn driven, of
course, by the (then) relatively new focus on poverty reduction.
8
  The authors of the OED report were certainly passionate about their subject, even going so far as to
personify it in statements such as, “But the fortunes of M&E were about to reverse again.”



                                                  3
period (FY03-05) and grew into larger initiatives such as the Africa Results Monitoring
System (AfricaRMS), a results reporting platform that aims to allow some aggregation of
results across projects and requires staff to report on a set of standard indicators. A
Results Steering Group took shape in 2006; among this group’s self-assigned missions is
building a Bank-wide “Results Monitoring and Reporting Platform.”

In recent years, two studies have attempted rigorous diagnoses of the state of results
monitoring in the Bank; these studies are the most closely related to Monitoring
Monitoring. The first, a 2006 review of monitoring and evaluation in HNP operations in
the South Asia Region, involved an in-depth examination of twelve projects. The SAR
report considered selection of indicators, initial quality of results framework, collection
of baseline data, use and analysis of data, and several other dimensions of monitoring;
they found (1) baseline data for only 39% of indicators and (2) 25% of projects in which
the data collection plan “was actually implemented.” Asking a panel of evaluators to
independently rate the quality of PDO indicators, the SAR study found that there is no
consensus among specialists about what constitutes a good indicator. This lack of
consensus has led some to conclude that there is a need for rigorous study of which
indicators are best, with the goal of identifying indicators with special qualities. Others
have concluded what is needed is a convention—any convention, almost—around which
to align measurement effort. This logic suggests that consistency of measurement is a
substantial part of quality of measurement: the major achievement of the MDGs, for
example, was to unite the development community around common indicators, rather
than to identify the best, most relevant indicators. Like earlier reviewers, then, the SAR
report concluded that monitoring and evaluation at the Bank is in some dimension
inadequate.

The second of the two studies closely related to Monitoring Monitoring is IEG’s very
recent (2009) review of monitoring and evaluation in the HNP portfolio. After
conducting an in-depth study of dozens of HNP projects since 1997, IEG concluded that,
despite some improvements, HNP monitoring and evaluation still suffers from “important
weaknesses.” Chief among the report’s findings are: (1) that just 29% of HNP projects
(compared with 36% of all projects) have “high” or “substantial” M&E ratings in ICRs
and (2) that more than 25% of projects approved in FY07 did not establish baselines for
any outcome indicators at the outset of the project. IEG also found that 71% of recently
closed HNP projects reported difficulties in collecting data. “M&E is very important for
both learning and accountability,” the report states, “but there are very serious gaps in its
quality and implementation.”

This report confirms many of the findings of these previous studies: we provide new
evidence for the assertion that the Bank makes no serious attempt to measure the
development results of operations.

In addition to validating previous findings, this report contributes to our understanding of
Bank results monitoring by asking and answering new questions. First, while previous
studies view results monitoring largely through the lens of initial and final documents,
Monitoring Monitoring systematically aggregates data from all of the ISRs of all of the
projects in the sample, thereby providing the first quantitative evidence on indicator


                                           4
tracking during implementation (the first of which we are aware). In doing so,
Monitoring Monitoring also sheds new light on the function of the ISR as a vehicle of
real-time intra-Bank communication about results.          This—the systematic study of
indicator tracking in ISRs—is the principal value added of this study; it is highly relevant
to the Bank’s ongoing efforts to build a results monitoring system. In addition,
Monitoring Monitoring systematically compares results frameworks across projects,
developing quantitative measures of the similarity (or dissimilarity) among PDOs and
indicators in different projects in the portfolio. Finally, Monitoring Monitoring makes
use of a broader sample than previous work; the SAR and IEG studies look only at HNP
projects, while this paper considers HNP, education, and social protection.




                                           5
             METHODOLOGY & DESCRIPTIVE STATISTICS
As stated above, the objective of this review was to describe and assess results
monitoring in LCSHD. Quantitative and qualitative data informed this effort.

The quantitative data were drawn from the Project Appraisal Documents (PADs) and
status reports (PSRs and ISRs; henceforth, when we refer to “ISRs,” we mean all status
reports, both PSRs and ISRs) of 67 active projects. Together, these 67 projects
comprised the department’s entire active portfolio at the outset of the study. From each
PAD we extracted: (a) basic project identification information (project number, loan
amount, approval date, etc.); (b) the PDO as stated in the body text; (c) information
regarding the extent of planning for monitoring and evaluation; (d) the results framework
as set out in the annex; and (c) any baseline and target values accompanying the results
framework. These data were recorded in a spreadsheet, into which we also entered the
following data from each ISR:

   (a) the date of the ISR
   (b) for each indicator in the PAD results framework:
       (i) whether the indicator appeared in the ISR;
       (ii) for those that appeared, whether the ISR included a new measurement of the
            indicator;
       (iii)for those that appeared and had a new measurement, the value of the
            measurement;
   (c) the M&E performance rating; and
   (d) information regarding discussion of monitoring issues in comments sections.

The 67 resultant spreadsheets (one for each project in the sample) served two analytic
ends: first, they permitted the aggregation of data on the 67 projects into a single file,
which in turn allowed us to generate tabulations of variables of interest; second, they
facilitated comparison of the texts of the 67 results frameworks.

The qualitative data for this study were drawn from interviews with department staff
members; we held individual meetings with task managers and task team leaders, as well
as group discussions with sector managers, department management, and the department
as a whole.

As we reviewed only active-portfolio projects, we did not examine Implementation
Completion Reports (ICRs). The limitations of this approach are discussed in detail
below.

The 67 projects in our sample represent $4.7 billion in commitments and include a wide
variety of project types and sizes. There are projects from all three HD sectors
(education, social protection, and health), from all six country management units in LCR,
and from every year between 2000 and 2008 (by date of PAD). The largest project is a
$570 million loan for Brazil’s Bolsa Familia conditional cash transfer program; the
smallest is a $2 million Technical Assistance Loan to Colombia. The sample contains


                                          6
519 PDO indicators, 1,174 intermediate indicators, and 594 ISRs. See Table 1 for more
comprehensive descriptive statistics.

Table 1.1. The Portfolio Under Review
                                                    Average per
 Unit                                       N           project           Range
 Projects                                  67                  .                .
 PDO indicators                           519                 8           1 – 36
 Intermediate Indicators                1,168                17           0 – 41
 ISRs                                     594                 9           1 – 19
 Total funds (Million $)                4,750              70.9        2 – 572.2

 Table 1.2. Distribution by Sector
                                                         Funds
 Sector                              Projects        (Million $)   $ per Project       Range
 Health                                   32               2,025            63.2    3.5 – 240
 Education                                25               1,669            66.7      3 – 350
 Social Protection                        10               1,058           105.8    2 – 572.2


Four phases of the results monitoring process formed the foci of our research: (1) the
definition and articulation of Project Development Objectives (PDOs) and indicators; (2)
the establishment of baseline and target values for indicators; (3) planning for
monitoring; and (4) monitoring during project implementation, or how project teams
track indicators between approval and closing. We describe the methodology for each in
turn:

The definition and articulation of Project Development Objectives (PDOs) and indicators
(results frameworks).

    •   Our central question about results frameworks was: to what extent are PDOs
        similar across projects and to what extent are the indicators used to measure a
        given PDO similar across projects? (In other words, how much variation is there
        among PDOs, and how much variation is there among indicators?) Secondary
        questions were: (a) to what extent are PDOs in the body text of PADs consistent
        with PDOs in the results annexes of PADs? and (b) to what extent do results
        frameworks conform to OPCS standards for measurability, clarity, and other
        desirable characteristics?

    •   To address the first question (regarding thematic similarity or dissimilarity among
        PDOs and indicators), we developed a typology of results frameworks. PDOs
        were classified according to (a) target group (such as level of schooling or
        category of health problem) and (b) type of intervention (such as coverage
        expansion or quality improvement). PDO indicators were classified according to
        (a) the objective they purport to measure and (b) method of measurement (for an
        objective related to coverage, for example, one method of measurement would be
        enrollment in a program; a second would be access to a given facility or service).




                                                7
•   To address the secondary question (regarding the quality of results frameworks in
    light of OPCS standards, or “quality at entry”), we answered three questions about
    each PDO: (a) is the PDO stated in terms of measurable results? (b) Is the PDO
    stated in terms of results that the project can directly influence, as opposed to
    higher-level objectives beyond the project? And, (c) is the PDO clear? Of each
    PDO indicator we asked, (a) is the PDO indicator framed in terms of measurable
    results? And, (b) does the PDO indicator measure one of the PDOs?


    THE ESTABLISHMENT OF BASELINE AND TARGET VALUES FOR INDICATORS.

•   To measure the extent to which projects establish baseline and target values for
    indicators, we generated tabulations of (a) whether each indicator has
    corresponding baseline and target values and (b) in what document existing
    baseline and target values were established (PAD, first ISR, second ISR, etc.).

•   Not all indicators require measurement effort to establish a baseline: some are
    categorical or start at zero, such as “1000 schools reopened,” or “Proposal for
    rationalization and reorganization of social spending developed.” Of the 519
    PDO indicators in the sample, 154 were of this type; the remaining 365 required
    some measurement effort in order to establish a baseline. In tabulating baseline
    values, we considered only those which require some measurement effort in order
    to establish a baseline.


                MONITORING DURING PROJECT IMPLEMENTATION.

•   We define results monitoring during implementation as the collection and
    recording of information about the key and intermediate indicators set out in the
    PAD (or, in the case of restructured projects, in the restructuring document).

•   To assess results monitoring during implementation, we measured the extent to
    which ISRs include follow-up measurements of the indicators defined in PADs.
    Specifically, we asked questions such as: how many indicators have at least one
    follow-up measurement in an ISR? How many indicators are measured at least
    once per year? How many projects report at least one follow-up measurement for
    all of the PDO indicators in the PAD? How many projects report no follow-up
    measurements for any of the PDO indicators in the PAD? Etc.

•   Explanation of the absence of data on a given indicator (“Survey not complete,”
    for example) was not considered a follow-up measurement. However, description
    of the progress of a qualitative indicator (“Designed concluded under review; 20
    sites identified and ready for implementation in the second semester of 2006,”)
    was, in most cases, considered an interim measurement. If an indicator was
    reworded but substantively unchanged, we considered it as the original. If a
    project was formally restructured and new indicators appeared in one or more



                                      8
       ISRs, we considered only the new indicators (in other words, the original
       indicators of restructured projects were dropped from the sample).

   •   Some discussants of this study objected to this ISR-centric approach, arguing that
       there is much results-related information that never appears in ISRs. A study of
       results monitoring through an examination of ISRs, some suggested, may conflate
       an information issue with a reporting issue (as one commenter said, “the study
       mixes up what teams do with what teams report”). Despite these limitations, the
       ISR remains the only formal mechanism for communicating about results, and
       therefore an appropriate focal point for this study.

This is a study of project monitoring, not of impact evaluation. “Whoever put
‘monitoring’ and ‘evaluation’ together into ‘M&E’ did a great disservice to monitoring,”
said an early discussant of this report. We agree: monitoring and evaluation are distinct
activities, deserving of separate reviews.




                                          9
                                      FINDINGS
Our findings fall into three categories: (1) Good news: where we have made progress on
results monitoring; (2) Department-level problems: issues we have now resolved through
our own action plan; and (3) Bank-level problems: issues that cannot be resolved at the
level of the department.

                               THE GOOD NEWS IS THAT:

Quality at entry is improving. Almost all projects now have measurable, clear PDOs and
measurable, clear, relevant indicators. OPCS efforts to provide additional guidance on
the work on results framework definition appear to have had some effect, according to
interviews with department staff. Furthermore, the number of PDO indicators per project
has declined (see Figure 1).




In keeping with the findings of the SAR study—which found no agreement among
reviewers as to the quality of various sets of PDOs and indicators—we found that
evaluating the quality of results frameworks was a rather subjective exercise; different
observers may have different opinions about whether a PDO is “clear” or an indicator is
“measurable.”

The likelihood that a given indicator has a baseline value in the PAD has increased
substantially (See Figure 2). Roughly 3/4 of PDO indicators in new projects have a
baseline in the PAD, compared with 1/2 of PDO indicators in pre-2005 projects. This
increase may be attributable to the introduction, in 2005, of a second results annex table
titled “Arrangements for Results Monitoring,” which includes columns for baseline and
target values.




                                         10
We have had some success with results-based disbursement, which appears to be
correlated with higher results-monitoring intensity. In the five projects in our portfolio
utilizing results-based disbursement, 97% of PDO indicators had baselines and 95%
appeared in ISRs, compared with 70% and 74%, respectively, in non-results-based-
disbursement projects (see Figure 3). 55% of PDO indicators in results-based-
disbursement projects had at least one follow-up indicator, compared with 44% in other
projects.




Our department monitors results well relative to other units, according to a comparison of
the results of this study with those of similar studies in South Asia and the HNP sector.
In the SAR study of M&E in HNP operations, for example, just 39% of PDO indicators
had baseline data, while more than 70% of PDO indicators in our sample had baseline
data.




                                         11
  THE DEPARTMENT-LEVEL PROBLEMS—AND OUR SOLUTIONS—WERE AS FOLLOWS:

PADs do not consistently articulate projects’ PDOs. One-third of PADs included two
substantively distinct versions of the PDO (i.e., the PDO in the body text of the PAD was
substantively (not just semantically) different from that in the results annex). TTLs and
SMs are now correcting this problem during project preparation.

   •   For example, one results annex replaced the phrase “improving the quality of
       preschool and primary education by enhancing the teacher training system and
       introducing new teaching and learning instruments in the classroom” (from the
       PDO in the body text) with the phrase “Improve the quality of preschool and
       primary education (ages 4 to 11) with a focus on socioeconomically
       disadvantaged and very disadvantaged contexts.” (See Figure A in the Annex for
       more examples of discrepancies between body-text and results-annex PDOs.)

   •   Other projects featured body-text PDOs designed to expand coverage of a given
       service, only to switch to disease incidence or educational achievement in the
       results annex. For example, the PDO in the body text of one PAD contained the
       phrase, “scaling up prevention programs targeting high-risk groups as well as the
       general population,” while the corresponding part of the results annex read,
       “Reduce the mortality and morbidity attributed to HIV/AIDS.”

   •   In a few cases, the lists of indicators in the “results framework” table did not
       completely correspond with the list of indicators in the “arrangements for results
       monitoring” table, even though the tables appear on adjacent pages.

   •   While we do not purport to establish a definitive explanation for such mismatch,
       we note that, in interviews, staff members cited division of tasks (different people
       responsible for different parts of the PAD) and diverse pressures (different
       demands from different managers and partners) as potential causes. Several
       commentators suggested that, were we to review legal agreements, we might find
       yet more iterations of the PDOs.

 Despite the aforementioned improvement in logical frameworks (quality-at-entry), a few
PDOs and indicators remained unclear, un-measurable, and/or incommensurate with the
size of the project. TTLs and SMs are now using the lessons from our review of quality-
at-entry to further improve results framework quality.

   •   For example, “improve attention to quality and the relevance of learning” was
       considered an un-measurable objective in that “attention” is an effectively
       unobservable institutional characteristic. A number of PDOs contained phrases
       such as “reduce poverty,” “create sustainable economic growth,” or “improve
       competitiveness,” none of which are results an individual project can directly
       influence. PDOs considered “unclear” were those so indefinite as to have little
       meaning ( “To improve the quality and equity of the Borrower's Tertiary
       Education system through sub sector's response to society's needs for high quality


                                         12
       human capital that will enhance competitiveness in the global market,” for
       example.)

   •   Similarly, there are a number of PDO indicators which are not stated in terms of
       measurable results. For example, “Presence/absence of clear lines of authority,
       written policies, strategic planning, budgetary and financial structures and
       processes,” was considered so nonspecific as to be un-measurable. PDO
       indicators that refer to the perpetuation of an institution or policy over an
       indefinite period of time, such as “Sustaining a core permanent leadership team
       with national accountability,” were judged un-measurable in that they imply
       infinity. Other PDO indicators were phrased in such a way that they would be
       impossible to observe, such as, “Percent of sick children correctly assessed and
       treated in health facility.”

   •   Some of the PDO clarity problems arose as a result of the effort to include both
       outputs and outcomes. As one regional manager put it, “The discussion of
       outputs vs. outcomes has become ideological and often indecipherable to anyone
       but M&E experts. Talking to them is like going to an Ignatian college.”

   •   The PAD guidelines are of little help on this issue, stating only, “Ideally, each
       project should have one project development objective focused on the primary
       target group. The PDO should focus on the outcome for which the project
       reasonably can be held accountable, given the project’s duration, resources, and
       approach. The PDO should not encompass higher level objectives that depend on
       other efforts outside the scope of the project … At the same time the PDO should
       not merely restate the project’s components or outputs.”

At the outset of this study, few projects in our portfolio were making use of results-based
disbursement.       Now, department staff are actively considering results-based
disbursement for projects in the pipeline.

Many of our findings point to a problem that cannot be resolved within LCSHD: the fact
that the Bank does not have a system of results monitoring. This conclusion is
discussed further in Section V. Among the findings that lead to this conclusion are the
following:

There is no mechanism to enable learning from project to project regarding articulation
of PDOs and indicators, despite considerable homogeneity among results frameworks.
Each team essentially starts from scratch in constructing the results framework.

   •   Two bodies of evidence support this finding: quantitative evidence drawn from a
       comparison of the results frameworks of the 67 projects in our sample, and
       qualitative evidence drawn from interviews with department staff. The results-
       framework-comparison exercise attests to the similarity of PDOs and indicators
       (and thus the potential for learning over time); the interviews assert the lack of
       such learning.




                                          13
    •   The 67 projects in our sample comprise a far smaller number of PDOs (in other
        words, many projects have similar PDOs). Of 25 health projects, for example, ten
        address HIV/AIDS and nine address maternal-child health; the remaining six
        focus on other health challenges. Types of interventions were even more
        homogenous: almost all of the health projects sought to (1) expand coverage of a
        health program and/or expand access to care, (2) improve the quality of health
        services, and (3) improve government capacity to administer and/or deliver health
        care. The 32 education projects in the sample encompass a similar number of
        target groups and intervention types. In short, the vast majority of projects set
        common objectives. Very few PDOs are unique. (See Figure 4 for a complete
        typology.)
Figure 4. Many Projects Address Similar Issues




                                             14
   •   Correspondingly, the indicators in our sample comprise a far smaller number of
       ways to measure outcomes (in other words, many PDO indicators are alike). For
       example: indicators used to measure health project PDOs related to coverage fall
       into one of three categories: (1) enrollment in a program; (2) access to a given
       facility or service; or (3) availability of a given facility or service (output).
       Similarly, indicators used to measure health project PDOs related to institutional
       capacity are generally of one of four types: (1) development and/or
       implementation of new regulations; (2) budgeting practice and use of resources;
       (3) ministerial capacity building; (4) monitoring & evaluation. Indicators used to
       measure objectives in education and social protection are similarly homogenous.
       In short, the vast majority of indicators have homologues in other projects. Very
       few indicators are unique.

   •   Despite this substantive similarity, however, staff members report in interviews
       that results frameworks are defined anew at the outset of each project. A series of
       pre-appraisal workshops involving extensive discussions and reformulations
       follows initial consultation with clients; systematic consideration of the results
       frameworks of previous projects is not generally part of this process. As one team
       leader said, somewhat indignantly, “I never look at other people’s PDOs!” While
       review meetings may be attended by people with related experience, involving
       staff members with experience on similar operations “is not really built in.”
       Another team leader commented that a recent IEG review of ten years of
       Honduras lending was helpful because “we don’t usually have that kind of
       perspective.”

   •   “Each project is a different animal,” said one staff member by way of explanation.
       (In fact, as we have seen, most projects address familiar issues.)

   •   “Intellectually we love the strategy discussion, the weight of the conversation
       about the heart of a project,” said one regional manager. “So we end up
       reinventing the wheel.”

Staff receive conflicting messages on baselines; consequently, baselines are not
consistently established: about 1/4 of PDO indicators do not have a corresponding
baseline either in the PAD or in any of the ISRs (See Figure 5).

   •   Missing baselines are not distributed evenly across all projects; only half of
       projects have PDO indicators with missing baselines (see Table 2).

   •   Of those indicators that have a corresponding baseline, approximately 73% had a
       baseline in the PAD; the remainder gained a baseline in one of the ISRs.




                                         15
   •   As discussed above, these tabulations exclude categorical or starting-from-zero
       indicators. Some such indicators did specify a baseline. For example, one project
       recorded the baseline, “Undifferentiated lines of authority, responsibility,
       information and execution in HRM across the central level,” for the indicator,
       “MEC’s role as rector in its normative, regulatory, and evaluation functions for
       HRM is clarified and implemented.” Another project included the PDO indicator,
       “Benchmarks for second HD PSRL met and documented using improved
       monitoring and evaluation systems and data;” the PAD specified the baseline as
       “Progress as of start of first HD PSRL.” Few projects specify “qualitative
       baselines” of this type, and there is no policy or guideline as to whether
       qualitative baselines are required or desirable.

There are no rules regarding transfer of the results framework from PAD to ISR; the
results framework is therefore often discarded in transfer: about 1/4 of PDO indicators
set out in PADs never appear in an ISR (see Figure 6 below).

   •   In other words, 1/4 of PDO indicators effectively vanish from the record during
       the entire period between the PAD and the Implementation Completion Report
       (ICR). Moreover, the “missing” indicators are not concentrated in a few
       delinquent projects: only 60% of projects have ISRs that contain all of the PDO
       indicators. As one senior manager described it, “The results framework we
       construct so carefully during project preparation is essentially torn down
       immediately after approval.”

   •   According to interviews with staff, there are no common criteria for selecting
       which indicators appear in ISRs. “I just include the most important ones,” said
       one task manager. “Because some of them, you know, are ones the government
       insisted on.” Another task manager said, “Well, you can’t put all of them in! I
       mean, I just put the minimum required by the ISR.” One staff member reported
       that training sessions advise teams to select a reduced set of indicators for the


                                        16
           ISR; the ISR instructions counsel the same. Furthermore, the indicator set is not
           static across ISRs: it often changes with the arrival of a new TTL, and in every
           project it changed with the switch from PSRs to ISRs.

There is no mechanism for monitoring results over the lifetime of project: 1/2 of all PDO
indicators set out in PADs do not have even one follow-up measurement in ISRs, and
fewer than 10% of PDO indicators are measured once per year or more (see Figure 6
below).

      •    This is in contrast to the periodicity anticipated in the PADs, which is often
           annual; this periodicity is itself in conflict with the PAD guidelines, which state
           that “PDO indicators normally cannot be observed or measured before the end of
           the project.”9 Moreover, as with “missing” baseline values, “missing” interim
           measurements are distributed across almost all projects: just 15% of the 67
           projects have at least one interim measurement for all of their PDO indicators.

      •    Intermediate indicators are measured even less often: 75% lack even one interim
           value.

      •    Identifying the sources of variation in monitoring intensity across projects and
           across indicators is largely beyond the scope of this study. It is not clear whether
           the absence of interim measurements for so many indicators stems from Bank
           procedural issues or from country system issues—in other words, it is not clear
           whether Bank teams are neglecting to absorb and record available information or
           whether countries are neglecting to generate information. Rather, it is clear that
           both problems are present; which is more significant, and to what degree, we have
           not determined.

      •    “This is a development challenge, not a bureaucracy challenge,” said one
           discussant of this report. “We can’t solve this problem with checklists and flags.”
           This argument has some intuitive appeal. On the other hand, the variation in
           monitoring intensity across projects—imperfectly correlated, as it is, with client
           country—suggests that Bank effort is a determining factor.




9
    A survey of a small sample of PADs indicated that the majority of PDO indicators are intended to be
       measured annually or semi-annually.



                                                   17
18
               CONCLUSION AND RECOMMENDATIONS

The conclusion of this report is that the Bank does not have a system for monitoring the
development results of its investment projects. In other words: the Bank does not have a
system capable of measuring the development outcomes of the tens of billions of dollars
in grants and loans provided to client governments each year. As nearly two decades of
analytic work such as this paper have shown, the Bank does not know how many more
children are in school, how many more people have access to health care, or how many
more families can afford food as a result of our operations. There is no set of rules or
procedures governing the measurement of these outcomes (there is no formal policy, for
example, regarding the establishment of baseline and target values for indicators—
indeed, the PAD Guidelines state only that indicators “should be presented with
baselines” and include a sample results annex in which most indicators lack baselines
(see Figure B in the Annex)), nor is there a physical platform for reporting on them, nor
is there an incentive structure designed to encourage results measurement. The Annual
Report contains no information on development outcomes.

A simple comparison further illustrates the central point (the central point being, to
reiterate, that the Bank does not have a system for measuring results). Consider for a
moment the procurement system: governed by a strict set of principles and regulations set
out in numerous lengthy documents, the procurement system has its own filing platform,
its own specialists, and its own training courses. The procurement system exacts strict
penalties for noncompliance with regulations. The procurement system uses the ISR only
to flag major issues, reserving substantive discussion of progress and problems for a
separate set of reports. The results-measurement arrangement, in contrast, has no
governing principles, no reporting platform, few penalties for noncompliance—in other
words, no system. As described above, the absence of such a system means that
development results are largely not measured.

This conclusion entails three principal implications for the stewards of the Bank’s
“results agenda:”

                                   FIRST: THINK BIG.

Fixing the results-monitoring problem is not about tinkering with the ISR or enforcing
existing regulations, but rather about reimagining the incentives and platforms that shape
our operations.




                                         19
                                   SECOND: DEFINE SUCCESS.

Articulating the goal or objective of results monitoring is important because the nature of
this objective should determine the design of a results monitoring platform.10 An
objective such as, “The results monitoring system should provide the Bank and
policymakers with the incentive and capacity to (1) track project development indicators
and (2) enable learning over time about what works” would lend clarity and focus to
results monitoring reform efforts.

                                     THIRD: TALK TO TTLS.

Only those whose work program is centered in operations have a concrete sense of how a
new system would (or would not) facilitate Bank workflow. Results reforms should
incorporate the perspective of those closest to our clients.




10
    Other international organizations have struggled to define and measure the objective of results
monitoring: in a 2008 review of results-based management at the United Nations, for example, the
institution concluded that “the purpose of the results-based management enterprise has not been clearly
articulated … there is no clear common understanding of the objectives of results-based management.”



                                               20
                                 REFERENCES

“Accelerating the Results Agenda: Progress and Next Steps.” Operations Policy and
Country Services, The World Bank (2006).

“Getting Results: The World Bank’s Agenda for Improving Development Effectiveness.”
The World Bank (1993).

Johnston, Timothy and Susan Stout. “Investing in Health: Development Effectiveness in
the Health, Nutrition, and Population Sector.” Operations Evaluation Department, The
World Bank (1999).

Loevinsohn, Ben. “Measuring Results: A Review of Monitoring and Evaluation in HNP
Operations in South Asia and some Practical Suggestions for Implementation.” South
Asia Human Development Sector, The World Bank (2006).

Wapenhans, Willi and Portfolio Management Task Force. “Effective Implementation:
Key to Development Impact.” The World Bank (1992).

Rice, Edward. “An Overview of Monitoring and Evaluation in the World Bank.”
Operations Evaluation Department, The World Bank (1994).

Villar Uribe, Manuela. “Monitoring and Evaluation in the HNP portfolio.” Independent
Evaluation Group, The World Bank (2009).

United Nations General Assembly. “Review of results-based management at the United
Nations.” Office of Internal Oversight Services, United Nations (2008).




                                       21
                                               ANNEX
Figure A. Comparison between body-text and results-annex PDOs: five examples
    Body Text                                                   Results Annex
1   Reduce the mortality and morbidity attributed to            Reduce the incidence of HIV infections
    HIV/AIDS; Reduce the impact of HIV/AIDS on                  Mitigate the negative impact of HIV/AIDS
    individuals, families, and the community; Consolidate       on persons infected and affected
    sustainable organizational and institutional framework
    for managing HIV/AIDS

2   To: a) increase enrollment for Preschool, Primary and       The project’s development objective is to
    Secondary education; b) improve attention to quality        benefit the primary target group (children
    and relevance of learning; c) improve systems of            of school going age) with more quantity
    governance and accountability, including measures to        and improved quality of education. These
    strengthen community participation in the education         objectives are achieved by first attaining a
    sector; and d) harmonize donor assistance in the sector.    regulated and coordinated donor financing
                                                                governance and accountability.

3   To improve Honduras' social safety net for children and     Improved capacity to supervise and
    youth. This would be achieved by (i) improving              monitor social protection interventions for
    nutritional and basic health status of young children by    CY;        Improved        interinstitutional
    expanding successful AIN-C program, and (ii)                coordination; Improved social protection
    increasing employability of disadvantaged youth by          policy for children and youth.
    piloting a First Employment program.

4   To (a) improve coverage and equity at the primary           To ensure universal access to primary
    school level through the expansion and consolidation        education for all Guatemalan children, to
    of PRONADE schools and by providing scholarships            improve the quality and efficiency of basic
    primarily for indigenous girls in rural communities; (b)    education, to enhance cultural diversity and
    improve efficiency and quality of primary education by      pluralism, and to decentralize and
    supporting bilingual education, providing textbooks         strengthen the capacity of the education
    and didactic materials in 18 linguistic areas; expanding    system.
    multigrade     schools;    and     improving      teacher
    qualifications; (c) facilitate MINEDUC and the
    Ministry of Culture and Sports to jointly design and
    execute a program to enhance the goals of cultural
    diversity and pluralism contained in the National
    Constitution, the Guatemalan Peace Accords, and the
    April 2000 National Congress on Cultural Policies; (d)
    assist decentralization and modernization of
    MINEDUC by supporting efforts to strengthen the
    organization and management of the education system.

5   To increase coverage and quality of health services and     None
    related programs that would improve the health of the
    population, and to empower communities to improve
    their health status; and to strengthen local capacity to
    respond to health needs.




                                                 22
Figure B. PAD Template and Guidelines Results Annex Has No Baselines




                                            23
                                                 About this series...
                         This series is produced by the Health, Nutrition, and Population Family
                         (HNP) of the World Bank’s Human Development Network. The papers
                         in this series aim to provide a vehicle for publishing preliminary and
                         unpolished results on HNP topics to encourage discussion and debate.
                         The findings, interpretations, and conclusions expressed in this paper
                         are entirely those of the author(s) and should not be attributed in any
                         manner to the World Bank, to its affiliated organizations or to members
                         of its Board of Executive Directors or the countries they represent.
                         Citation and the use of material presented in this series should take
                         into account this provisional character. For free copies of papers in
                         this series please contact the individual authors whose name appears
                         on the paper.

                         Enquiries about the series and submissions should be made directly to
                         the Editor Homira Nassery (hnassery@worldbank.org) or HNP
                         Advisory Service (healthpop@worldbank.org, tel 202 473-2256, fax
                         202 522-3234). For more information, see also www.worldbank.org/
                         hnppublications.




THE WORLD BANK

1818 H Street, NW
Washington, DC USA 20433
Telephone:       202 473 1000
Facsimile:       202 477 6391
Internet: www.worldbank.org
E-mail: feedback@worldbank.org