57490




             Impact Evaluations
                    and
               Development
 Inputs           NoNIE Guidance
                on Impact Evaluation
   ť




Activities
   ť




Outputs
   ť




Outcomes
   ť




Impacts
What Is NONIE?
NONIE is a Network of Networks for Impact Evaluation comprised of the Organisation for
Economic Co-operation and Development's Development Assistance Committee (OECD/
DAC) Evaluation Network, the United Nations Evaluation Group (UNEG), the Evaluation
Cooperation Group (ECG), and the International Organization for Cooperation in Evaluation
(IOCE)--a network drawn from the regional evaluation associations.

NONIE was formed to promote quality impact evaluation. NONIE fosters a program of impact
evaluation activities based on a common understanding of the meaning of impact evaluation
and approaches to conducting impact evaluation. NONIE focuses on impact evaluation and
does not attempt to address wider monitoring and evaluation issues.

To this end NONIE aims to--

ˇ Build an international collaborative research effort for high-quality and useful impact evalu-
  ations as a means of improving development effectiveness.

ˇ Provide its members with opportunities for learning, collaboration, guidance, and support,
  leading to commissioning and carrying out impact evaluations.

ˇ Develop a platform of resources to support impact evaluation by member organizations.




                                www.worldbank.org/ieg/nonie
     Impact Evaluations
      and Development
NoNIE Guidance on Impact Evaluation
                                        Frans Leeuw
                                    Maastricht University

                                         Jos Vaessen
       Maastricht University and University of Antwerp
Š2009 NONIE--The Network of Networks on Impact Evaluation, Frans Leeuw, and Jos Vaessen
c/o Independent Evaluation Group
1818 H Street, NW
Washington, DC 20433
Internet: www.worldbank.org/ieg/nonie/

All rights reserved

This volume is a product of the volume's authors, Frans Leeuw and Jos Vaessen, who were commis-
sioned by NONIE. The findings, interpretations, and conclusions expressed in this volume are those
of the authors and do not necessarily reflect the views of NONIE, its members, or other participating
agencies. NONIE does not guarantee the accuracy of the data included in this work and accepts no
responsibility for any consequence of their use.

Rights and Permissions
The material in this publication is copyrighted. Copying and/or transmitting portions or all of this
work without permission may be a violation of applicable law. NONIE encourages dissemination of its
work and will normally grant permission to reproduce portions of the work promptly. All queries on
rights and licenses, including subsidiary rights, should be addressed to NONIE, c/o IEG, 1818 H St.,
   ,
NW Washington, DC, 20433, ieg@worldbank.org.




Cover: Pakistani girl reading. Photo by Curt Carnemark, courtesy of World Bank Photo Library.




ISBN-10: 1-60244-120-0
ISBN-13: 978-1-60244-120-0




       Printed on recycled paper
                                                                          Contents

vii   Acknowledgments
ix    Executive Summary
xix   Introduction
1     PArt I ­ MEthodologIcAl And concEPtuAl ISSuES In IMPAct EvAluAtIon
3     1   Identify the (type and scope of the) intervention
           3   1.1.   The impact evaluation landscape and the scope of impact evaluation
           4   1.2.   Impact of what?
           7   1.3.   Impact on what?
           9          Key message
11    2   Agree on what is valued
          11   2.1.   Stakeholder values in impact evaluation
          12   2.2.   Intended versus unintended effects
          12   2.3.   Short-term versus long-term effects
          12   2.4.   The sustainability of effects
          13          Key message
15    3   carefully articulate the theories linking interventions to outcomes
          15   3.1.   Seeing interventions as theories: The black box and the
                      contribution problem
          15   3.2.   Articulating intervention theories on impact
          17   3.3.   Testing intervention theories on impact
          19          Key message
21    4   Address the attribution problem
          21   4.1.   The attribution problem
          23   4.2.   Quantitative methods addressing the attribution problem
          29   4.3.   Applicability of quantitative methods for addressing the attribution
                      problem
          31   4.4.   Other approaches
          34          Key message
35    5   use a mixed-methods approach: the logic of the comparative advantages
          of methods
          35   5.1.   Different methodologies have comparative advantages
                      in addressing particular concerns and needs
          36   5.2.   Advantages of combining different methods and sources
                      of evidence
          38   5.3.   Average effect versus distribution of costs and benefits
          39          Key message



                                                                                             iii
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  41             6    Build on existing knowledge relevant to the impact of interventions
                                      43             Key message
                  45             PArt II ­ MAnAgIng IMPAct EvAluAtIonS
                  47             7    determine if an impact evaluation is feasible and worth the cost
                                      48             Key message
                  49             8    Start collecting data early
                                      49     8.1.    Timing of data collection
                                      49     8.2.    Data availability
                                      51     8.3.    Quality of the data
                                      51     8.4.    Dealing with data constraints
                                      52             Key message
                  53             9    Front-end planning is important
                                      53     9.1.    Planning tools
                                      53     9.2.    Staffing and resources
                                      54     9.3.    The balance between independence and collaboration between
                                                     evaluators and stakeholders
                                      54     9.4.    Ethical issues
                                      55     9.5.    Norms and standards
                                      56     9.6.    Ownership and capacity building
                                      56             Key message
                  57             Appendices
                                      59     1.     Examples of diversity in impact evaluation
                                      61     2.     The General Elimination Methodology as a basis for causal analysis
                                      63     3.     Overview of quantitative techniques of impact evaluation
                                      65     4.     Technical aspects of quantitative impact evaluation techniques
                                      69     5.     Evaluations using quantitative impact evaluation approaches
                                      71     6.     Decision tree for selecting quantitative evaluation designs to deal
                                                    with selection bias
                                      73     7.     Hierarchical modeling and other statistical approaches
                                      75     8.     Multi-site evaluation approaches
                                      77     9.     Methodological frameworks for assessing the effects
                                                    of interventions, mainly based on quantitative methods
                                      79     10.    Where to find reviews and synthesis studies on mechanisms
                                                    underlying processes of change
                                      81     11.    Evaluations based on qualitative and quantitative descriptive
                                                    methods
                                      101 12.       Further information on review and synthesis approaches in impact
                                                    evaluation
                                      105 13.       Basic education in Ghana
                                      109 14.       Hierarchy of quasi-experimental designs
                                      111 15.       International experts who contributed to the subgroup documents
                  113            Endnotes
                  117            references
                                 Boxes
                                       7     1.1.    "Unpacking" the aid chain
                                      16     3.1.    Social funds and government capacity: Competing theories

iv
                                                                                        contEnts




   18     3.2.     Social and behavioral mechanisms as heuristics for understanding
                   processes of change and impact
   25     4.1.     Using propensity scores to select a matched comparison group--
                   The Vietnam Rural Roads Project
   33     4.2.     Participatory impact monitoring in the context of the poverty
                   reduction strategy process
   39     5.1.     Brief illustration of the logic of comparative advantages
   42     6.1.     Narrative review and synthesis study: Targeting and impact
                   of community-based development initiatives
   73     A7.1.    Impact of the Indonesian financial crisis on the poor: Partial
                   equilibrium modeling and CGE modeling with microsimulation
Figures
     xi ES.1.      Levels of intervention, programs, and policies and types of impact
    xii ES.2.      Simple graphic of net impact of an intervention
      8 1.1.       Levels of intervention, programs, and policies and types
                   of impact
    17 3.1.        Basic intervention theory of a fictitious small business
                   support project
    22 4.1.        Graphic display of the net impact of an intervention
    29 4.2.        Regression discontinuity analysis
    66 A4.1.       Estimation of the effect of class size with and without the
                   inclusion of a variable correlated with class size
    87    A11.1.   Final impact assessment triangulation
    92    A11.2.   Generic representation of a project's theory of change
    93    A11.3.   Components of impact evaluation framework
    96    A11.4.   Project outputs and outcomes
    99    A11.5.   Framework to establish contribution
    99    A11.6.   Model linking outcome to impact
tables
     5    1.1.     Aspects of complication in interventions
     6    1.2.     Aspects of complexity in interventions
    26    4.1.     Double difference and other designs
    52    8.1.     Evaluation scenarios with time, data, and budget constraints
    96    A11.1.   Project outcome
    97    A11.2.   Change in key ecological attributes over time
    97    A11.3.   Current threats to the global environment benefits




                                                                                               v
                                               Acknowledgments

This Guidance document could not have                 The Guidance document represents the views of
existed without the numerous contributions            the authors, who were commissioned by NONIE.
of Network of Networks on Impact Evaluation           Given the fact that perspectives on the defini-
(NONIE) members and others in terms of papers,        tion, scope, and appropriate methods of impact
PowerPointŽ presentations, and suggestions.           evaluation differ widely among practitioners and
                                                      other stakeholders, the document should not be
In particular, this Guidance document builds on two   taken to represent the agreed positions of all of
existing draft guidance documents, a document on      the individual NONIE members. The network
experimental and quasi-experimental approaches        membership and the authors recognize that
to impact evaluation (NONIE subgroup 1, May 17,       there is scope to develop the arguments further
2007) and a document on qualitative approaches        in several key areas.
to impact evaluation (NONIE subgroup 2, January
9, 2008). A third draft document prepared by          We would like to thank all of the above people
NONIE members on the impact evaluation of             for their contributions to the process of writing
macroeconomic policies and new aid modalities         the Guidance document. First, we thank the
such as budget support is outside the scope of this   authors of the subgroup documents for provid-
Guidance document. The subgroup 1 document            ing building blocks for this document. In
was prepared mainly by Howard White and Antonie       addition, we would like to thank the steering
De Kemp. The subgroup 2 document, which was           committee of this project, Andrew Warner, David
somewhat broader in content than methodol-            Todd, Zenda Ofir, and Henri Jorritsma, for their
ogy, was coordinated by Sukai Prom-Jackson.           pertinent suggestions. We also would like to
The primary authors were Patricia Rogers, Zenda       thank Antonie De Kemp for exchanging ideas
Ofir, Sukai Prom-Jackson, and Christine Obester.      on design questions. We are grateful to Patricia
Case studies were prepared by Jocelyn Delarue,        Rogers, the external peer reviewer, for providing
Fabrizio Felloni, Divya Nair, Christine Obester,      valuable input to this document. Our thanks also
Lee Risby, Patricia Rogers, David Todd, and Rob       go to Victoria Gunnarsson and Andrew Warner
van den Berg. The development of this document        from the NONIE secretariat for accompanying
benefited extensively from a reference group of       us throughout the whole process and providing
international evaluators.                             excellent feedback. Nick York, Howard White,
                                                      David Todd, Indran Nadoo, and John Mayne
Whereas the two subgroup documents provided           provided helpful insights in the final phase of
the basis for the current Guidance document, the      this project. We thank Arup Banerji for drafting
purpose of the current document was to develop        the executive summary. Comments from NONIE
a new structure that could accommodate some           members were received at the Lisbon European
of the diversity in perspectives on impact evalua-    Evaluation Society Conference (October, 2008)
tion. In addition, within this new structure, new     and the Cairo Conference on Impact Evaluation
content was added where necessary to support key      (March, 2009). Networks within NONIE, such as
points. The process of developing this Guidance       the International Organization for Cooperation in
was supervised by a steering committee of NONIE       Evaluation and the European Evaluation Society,
members. An external peer reviewer critically         contributed by submitting written comments.
assessed the first draft of this document.            Moreover, many individual NONIE members also

                                                                                                          vii
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  sent in their feedback through email. We would              an important and quite diverse selection of the
                  like to thank all NONIE members for the stimulat-           thinking and practice on the subject has been
                  ing discussions and inputs on impact evaluation.            incorporated. The result, we hope, represents
                                                                              a balance between coherence, a comprehen-
                  Finally, within the restricted time available               sive structure of key issues, and diversity. Any
                  for writing this document, we have tried to                 remaining errors are our own.
                  combine different complementary perspec-
                  tives on impact evaluation into an overall
                  framework, in line with our own views on                                                        Frans Leeuw
                  these topics and feedback from the steering                                frans.leeuw@maastrichtuniversity.nl
                  committee and others. Though we have not                                                          Jos Vaessen
                  included all perspectives on impact evaluation,                            jos.vaessen@maastrichtuniversity.nl




viii
                                            Executive Summary


I
   n international development, impact evaluation is principally concerned
   with final results of interventions (programs, projects, policy measures,
   reforms) on the welfare of communities, households, and individuals, in-
cluding taxpayers and voters. Impact evaluation is one tool within the larger
toolkit of monitoring and evaluation (including broad program evaluations,
process evaluations, ex ante studies, etc.).

The Network of Networks for Impact Evaluation          whether development interventions do or do not
(NONIE) was established in 2006 to foster more         work, whether they make a difference, and how
and better impact evaluations by its membership--      cost-effective they are. Consequently, they can help
the evaluation networks of bilateral and multilat-     ensure that scarce resources are allocated where
eral organizations focusing on development             they can have the most developmental impact.
issues, as well as networks of developing country
evaluators. NONIE's member networks conduct            Although there is debate within the profession
a broad set of evaluations, examining issues such      about the precise definition of impact evalua-
as project and strategy performance, institutional     tion, NONIE's use of the term proceeds from
development, and aid effectiveness. But the focus      its adoption of the Development Assistance
of NONIE is narrower. By sharing methodological        Committee of the Organisation for Economic
approaches and promoting learning by doing on          Co-operation and Development (DAC) definition
impact evaluations, NONIE aims to promote the          of impact, as "the positive and negative, primary
use of this more specific approach by its members      and secondary long-term effects produced by a
within their larger portfolio of evaluations. This     development intervention, directly or indirectly,
document, by Frans Leeuw and Jos Vaessen, has          intended or unintended."2
been developed to support this focus.1
                                                       Adopting the DAC definition of impact leads to
The Guidance document was written by and               a focus on two underlying premises for impact
represents the views of the authors. Given the         evaluations:
fact that perspectives on the definition, scope,
and appropriate methods of impact evaluation           ˇ Attribution: The words "effects produced by"
differ widely among practitioners and other              in the DAC definition imply an approach to
stakeholders, the document should not be taken           impact evaluation that is about attributing im-
to represent the agreed positions of all of the          pacts to interventions, rather than just assess-
individual NONIE members.                                ing what happened.
                                                       ˇ Counterfactual: It follows that in most contexts,
Why promote impact evaluations? For develop-             knowledge about the impacts produced by an
ment practitioners, impact evaluations play a key        intervention requires an attempt to gauge what
role in the drive for better evidence on results and     would have occurred in the absence of the
development effectiveness. They are particularly         intervention and a comparison with what has
well suited to answer important questions about          occurred with the intervention implemented.

                                                                                                              ix
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  These two premises do not, however, lead to a               Yet across this continuum, the scope of an impact
                  determination of a set of analytical methods that           evaluation can be identified through the lens of
                  is above all others in all situations. In fact, this        two questions: the impact of what and the impact
                  Guidance note underlines that--                             on what?

                  ˇ No single method is best for addressing the               When asking the "of what" question, it is useful
                    variety of questions and aspects that might be            to differentiate among intervention character-
                    part of impact evaluations.                               istics. Take single-strand initiatives with explicit
                  ˇ However, depending on the specific questions              objectives--for example, the change in crop
                    or objectives of a given impact evaluation,               yield after introduction of a new technology,
                    some methods have a comparative advan-                    or reduction in malaria prevalence after the
                    tage over others in analyzing a particular ques-          introduction of bed nets. Such interventions can
                    tion or objective.                                        be isolated, manipulated, and measured, and
                  ˇ Particular methods or perspectives comple-                experimental and quasi-experimental designs
                    ment each other in providing a more complete              may be appropriate for assessing causal relation-
                    "picture" of impact.                                      ships between these single-strand initiatives and
                                                                              their effects.
                  The document is structured around nine key
                  issues that provide guidance on conceptualizing,            At the other end of the continuum are programs
                  designing, and implementing an impact evalua-               with an extensive range and scope that have
                  tion:                                                       activities that cut across sectors, themes, and
                                                                              geographic areas. These can be complicated--
                  Methodological guidance:                                    multiple agencies, multiple simultaneous causes
                  1. Identify the type and scope of the interven-             for the outcomes, and causal mechanisms differ-
                     tion.                                                    ing across contexts and complex (recursive,
                  2. Agree on what is valued.                                 with feedback loops, and with emergent
                  3. Carefully articulate the theories linking                outcomes) (Rogers, 2008). In such cases, impact
                     interventions to outcomes.                               evaluations have to proceed systematically--
                  4. Address the attribution problem.                         first, through locating and prioritizing key
                  5. Use a mixed-methods approach--the logic of               program components through a comprehen-
                     the comparative advantages of methods.                   sive mapping of the potential influences shaping
                  6. Build on existing knowledge relevant to the              the program, including possible feedback loops
                     impact of interventions.                                 and emerging outcomes; second, evaluating
                                                                              program components by subsets of this priori-
                  Guidance on managing impact                                 tized program mapping.
                  evaluations:
                  7. Determine if an impact evaluation is feasible            When asking the "on what" question, impact
                     and worth the cost.                                      evaluations have to unpack interventions that
                  8. Start collecting data early.                             affect multiple institutions, groups, individuals
                  9. Front-end planning is important.                         and sites. For tractability, this guidance distin-
                                                                              guishes between two principal levels of impact:
                  1. Identify the (type and scope of the)                     impact at the institutional level and impact at
                  intervention                                                the beneficiary level (figure ES1). Examples
                  Interventions range along a continuum from                  of the former are policy dialogues, training
                  single-"strand" initiatives with explicit objectives        programs, and strategic support to institutional
                  to complex institutional policies, and the particu-         actors such as governmental and civil society
                  lar type of impact evaluation would be affected by          institutions or private corporations and public-
                  the type and scope of the intervention.                     private partnerships.


x
                                                                                                                                        ExEcutIvE summary




  Figure ES1: Levels of intervention, programs, and policies and types of impact



                                           International conferences, treaties, declarations, protocols, policy networks


                                                                       Institutional-level impact

           Donor capacities/policies                                Government capacities/policies                             Other actors (INGOs, NGOs,
                                           Macro-earmarking
                                           (e.g., debt relief,
                                                                                                                                banks, cooperatives, etc.)
          Micro-earmarking,                GBS)
          meso-earmarking
          (e.g., SBS)



                                                 May constitute
                   Programs                        multiple
                                                                          Projects                                             Policy measures
                   (e.g., health reform)                                  (e.g., agricultural                                  (e.g., tax increases)
                                                                          extension)


                                                                        Beneficiary-level impact

                                                     Communities



                                                                  Households



                                                         Individual (taxpayers, voters, citizens, etc.)

                                                                                                  Replication and scaling up
                                                                                                                                       Wider systemic effects




Most policy makers and stakeholders are,                                   objectives of the intervention, and then as much
however, primarily interested in beneficiary-                              as possible to try to translate these objectives
level interventions that directly affect communi-                          into measurable indicators while keeping track of
ties, households, and individuals--whether                                 important aspects that are difficult to measure.
they be trade liberalization measures, technical
assistance programs, antiretroviral treatments,                            The "for whom" question is inherently a question
cash transfer programs, construction of schools,                           about stakeholder values--which impacts and
etc. This Guidance document accordingly focuses                            processes are judged as significant or valuable,
on this level. But it should be recognized that                            and whose values are used to judge the distri-
policy interventions primarily geared at inducing                          bution of costs and benefits? The first and most
sustainable changes at institutional levels can also                       important reference source to answer this
have indirect effects at the beneficiary level.                            question is the objectives of an intervention,
                                                                           as stated in the official documents. However,
2. Agree on what is valued                                                 interventions evolve and objectives might be
When conducting impact evaluations, evaluators                             implicit or may change. To bring stakeholder
also need to ask a third question--not only the                            values to the surface, evaluators may need to have
impact of what and on what, but impact for whom.                           informal or structured (e.g., "values inquiry")
The fundamental principles to follow here are to                           consultations with representatives from different
agree on the most important, and most valued,                              stakeholder groups or use a participatory evalua-

                                                                                                                                                                xi
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                   tion approach to include stakeholder values                But often, these theories are partly "hidden"
                   directly in the evaluation.                                and require reconstruction and articulation.
                                                                              This articulation can use one or more pieces
                   Three other issues are critical to creating                of evidence--ranging from the interven-
                   measurable indicators to capture the effects               tion's existing logical framework, to insights
                   of an intervention. First, the evaluation has to           and expectations of policy makers and other
                   consider the possibility of unintended effects             stakeholders on the expected way target groups
                   that go beyond those envisaged in the program              are affected, to theoretical and empirical research
                   theory of the intervention--for example,                   on processes of change or past experiences of
                   governments reducing spending on a village                 similar interventions. However, it is important
                   targeted by an aid intervention. Second, there             to critically look for, and articulate, plausible
                   may be long-term effects of an intervention (such          explanations for the changes.
                   as environmental changes, or changes in social
                   impacts on subsequent generations) or time lags            After articulating the assumptions on the effect of
                   not captured in an impact evaluation that occur            an intervention on outcomes and impacts, these
                   relatively soon after the intervention period.             assumptions will need to be tested. This can be
                   Third, and related, is evidence on the sustain-            done in two ways--by carefully constructing the
                   ability of effects, which few impact evaluations           causal "story" about the way the intervention has
                   will be able to directly capture. Impact evalua-           produced results (as by using "causal contribu-
                   tions therefore need to identify shorter-term              tion analysis") or by formally testing the causal
                   impacts and, where possible, indicate whether              assumptions using appropriate methods.
                   longer-term impacts are likely to occur.
                                                                              4. Address the attribution problem
                   3. carefully articulate the theories linking               The steps above are important to identify the
                   interventions to outcomes                                  "factual"--the observed outcome that is a result
                   Development policies and interventions are                 of the intervention. But given that multiple factors
                   typically aimed at changing the behavior or                can affect the outcomes pertaining to individuals
                   knowledge of households, individuals, and organi-          and institutions, the unique point of an impact
                   zations. Underlying the design of the intervention         evaluation is to go beyond the factual--to know
                   is a "theory"--explicit or implicit--with social,          the added value of the policy intervention
                   behavioral, and institutional assumptions indicat-         under consideration, separate from these other
                   ing why a particular policy intervention will work         factors.
                   to address a given development challenge.
                                                                              Any observed changes will be, in general, only
                   For evaluating the nature and direction of an              partly caused by the intervention of interest. Other
                   impact, understanding this theory is critical.             interventions inside or outside the core area will
                                                                              often interact and strengthen/reduce the effects
                                                                              of the intervention of interest for the evaluation.
      Figure ES2: Simple graphic of net impact of an                          Therefore, addressing this "attribution problem"
      intervention                                                            implies both isolating and accurately measuring
                                                                              the particular contribution of an intervention and
                                                                              ensuring that causality runs from the interven-
                                                         a                    tion to the outcome.
 target variable
     Value




                                                         c
                                                                              Analysis of the attribution problem compares
                   b                                                          the situation "with" an intervention to what
                                                                              would have happened in the absence of an
                   Before                             After                   intervention, the "without" situation (the
                                     Time
                                                                              counterfactual, figure ES2). The impact is not

xii
                                                                                               ExEcutIvE summary




measured by either the value of a target variable       control group ends up being exposed to the
(point a) or even the difference between the            intervention (either because of geographic
before and after situation (a­b, measured on            proximity or because of the presence of simi-
the vertical axis). The net impact is the differ-       lar parallel interventions affecting the control
ence between the target variable's value after          group).
the intervention and the value the variable
would have had in case the intervention had          Quasi-experimental techniques can simulate
not taken place (a­c).                               comparable intervention and comparison
                                                     groups.
In doing impact evaluations, there is no "gold
standard" (in the sense of a single method that is   ˇ A pipeline approach takes advantage of proj-
best in all cases). However, depending on factors      ects that are rolled out gradually and compares
such as the scope, objectives, and design of the       outcomes for households or communities that
intervention, as well as data availability, some       have already experienced the intervention
methods can be better than others in specific          (the treatment group) with households or
cases.                                                 communities that are selected but that have
                                                       not yet participated (the control group). But
Quantitative techniques can be broadly cat-            for pipeline approaches to be valid, it is critical
egorized into experimental, quasi-experimental,        that both the treatment and control groups
and regression-based techniques. These, if well        have similar characteristics. Self-selection (due
done, have a comparative advantage in address-         to earlier participation by those eager to re-
ing the issue of attribution. In each case, the        ceive the intervention) or geographical biases
counterfactual is simulated by examining the           (such as moving from rural to urban areas) do
situation of a participant group (receiving            introduce selection biases.
benefits from or affected by an intervention,        ˇ In propensity score matching, a control
the "treatment" group) with the situation of an        group is created ex post by selecting its mem-
equivalent comparison or "control" group that          bers on the basis of observed and relevant
is not affected by the intervention. A key issue       characteristics that are similar to those of
these techniques aim to tackle is selection bias--     members of the treatment group. The pairs
when those in the treatment group are different        are formed not by matching every character-
in some way from those in the control group.           istic exactly, but by selecting groups that have
                                                       similar probabilities of being included in the
Experimental techniques avoid selection effects        sample as the treatment group on the basis
by randomly selecting treatment and control            of observable characteristics. But the tech-
groups from the same eligible population, before       nique does not solve the potential bias that
the intervention starts.                               results from the omission of unobserved dif-
                                                       ferences between the groups and may require
ˇ In a randomized controlled trial (RCT), both         a large sample for the selection of the com-
  groups are expected to have similar average          parison group. This is usually accounted for
  characteristics, with the single exception that      through the added use of double difference
  the treatment group received the interven-           or difference-in-difference, which measures
  tion. Thus, a simple comparison of average           differences between the two groups, before
  outcomes in the two groups solves the attri-         and after the intervention, thus netting out
  bution problem and yields accurate estimates         the unobservables (as long as they remain
  of the impact of the intervention. But, despite      constant over time).
  the clean design, RCTs have to be managed          ˇ Judgmental matching is a less precise
  carefully to ensure that the two groups do not       method using descriptive information to
  have different rates of attrition and that there     construct comparison groups--first consult-
  is a minimum of "contamination," when the            ing with clients and other knowledgeable

                                                                                                             xiii
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                      persons to identify relevant matching                       comes before the intervention with the regres-
                      characteristics, and then combining geo-                    sion line after. But this method assesses the
                      graphic information, secondary data (such                   marginal impact of the program only around
                      as household surveys), interviews, and key                  the cut-off point for eligibility and not across
                      informants to select comparison areas or in-                the whole spectrum of the people affected by
                      dividuals/households with the best match of                 the intervention. Moreover, care must be taken
                      characteristics. But the element of subjectiv-              that individuals were not able to manipulate
                      ity may induce biases, and further qualitative              the selection process or threshold.
                      work is essential to tease out unobserved
                      differences.                                            Quantitative techniques are not foolproof
                                                                              and can have limitations that go beyond the
                  Regression-based techniques are more flexible               technical constraints identified above. Narrow
                  tools for ex post impact evaluation, which can              counterfactual estimation is not applicable in
                  flexibly deal with a range of issues--heterogene-           full-coverage interventions such as price policies
                  ity of treatment, multiple interventions, hetero-           or regulation on land use, which affect everybody
                  geneity of participant characteristics, interactions        (although to different degrees)--so regression-
                  between interventions, and interactions between             based techniques that focus on the variability in
                  interventions and specific characteristics. With            exposure/participation are called for. There are
                  a regression approach, it may be possible to                also some pragmatic constraints--such as ethical
                  estimate the contribution of a separate interven-           objections to randomization or lack of data
                  tion to the total effect or to estimate the effect of       representing the baseline situation of interven-
                  the interaction between two interventions.                  tion target groups. And simple quantitative
                                                                              approaches may not be appropriate in "complex"
                  ˇ Dealing with unobservables and endogene-                  contexts--though the methodological difficul-
                    ity: "Difference-in-difference" approaches in a           ties of evaluating complicated interventions can
                    regression model, by examining the changes                to some extent be "neutralized" by deconstruct-
                    within groups over time, can have unobserved              ing them into their "active ingredients."
                    (time invariant) variables drop from the equa-
                    tion. The approach is similar to a fixed-effects          Nonquantitative techniques are often less ef-
                    regression model. "Instrumental variables" can            fective in many cases in addressing attribution,
                    help with endogeneity, as a good instrument               though they can have the comparative advantage
                    correlates with the original endogenous vari-             when addressing issues of contribution in complex
                    able in the equation, but not with the error              settings. But they can be useful in impact evalua-
                    term. But the difference-in-difference method             tions to both obtain information about the scope,
                    is more vulnerable than others to the presence            objectives, and theory of change and to generate
                    of measurement error in the data, and good                or supplement data and evidence.
                    instruments are not always possible to find,
                    given the available data.                                 Participatory approaches are a central nonquan-
                  ˇ Regression discontinuity takes advantage of               titative tool and are built on the principle that
                    programs that have a cut-off point regarding              stakeholders should be involved in some or all
                    who receives the treatment (for example, geo-             stages of the evaluation. In the case of impact
                    graphic boundaries or income thresholds).                 evaluation, this includes aspects such as the
                    It compares the treatment group just within               determination of objectives, indicators to be taken
                    the cut-off point with a control group of those           into account, and stakeholder participation in data
                    just beyond. At that point, it is unlikely that           collection and analysis. The various methodolo-
                    there are unobserved differences between the              gies under this umbrella rely on different degrees
                    two groups. Estimating the impact can now                 of participation, ranging from consultation to
                    be done by comparing the mean difference                  collaboration to joint decision making. Participa-
                    between the regression line of treatment out-             tory approaches can be valuable in identifying a

xiv
                                                                                                ExEcutIvE summary




more comprehensive and/or more appropriate            For example, RCTs are arguably better than most
set of valued impacts, greater ownership and a        other methods in terms of internal validity,
better level of understanding among stakehold-        because if well designed, the counterfactual can
ers, and a better understanding of processes          be cleanly identified--the randomized project
of change and the ways in which interventions         benefits (within a relatively homogenous popula-
affect people. But the higher the degree of partic-   tion) would ensure that there are no systematic
ipation, the more costly and difficult it is to set   differences between those that receive benefits
up an impact evaluation--and thus these may           and those that do not. But RCTs control for differ-
be inappropriate for large-scale comprehensive        ences between groups within the particular
interventions such as sector programs. Also,          setting that is covered by the evaluation; other
there are serious limitations to the validity of      settings have different characteristics that are not
information based only on stakeholder percep-         controlled, so the external validity of such RCTs
tions. Finally, strategic responses, manipulation,    may be limited--unless there has been a system-
or advocacy by stakeholders can also influence        atic and large set of RCTs undertaken that test
the validity of the data collection and analysis.     the intervention across the range of settings and
                                                      policy options found in reality.
Overall, for impact evaluations, well-designed
quantitative methods are usually preferable for       Again, in-depth qualitative methods that attempt
addressing attribution and should be pursued          to capture complexity and diversity of institu-
when possible. Qualitative techniques cannot          tional and social change can have a comparative
quantify the changes attributable to interven-        advantage in construct validity in assessing the
tions, but should be used to evaluate important       contribution of complex and multidimensional
issues for which quantification is not feasible or    interventions or impacts. Take the example of
practical and to develop complementary and            impacts on poverty or governance--these may be
in-depth perspectives on processes of change          difficult to fully capture in terms of the distinct,
induced by interventions.                             quantifiable indicators usually employed by RCTs
                                                      and some quasi-experimental methods and may be
5. use a mixed-methods approach                       better addressed through qualitative techniques.
Each different methodology mentioned above            Yet these methods also may be lacking in terms
has comparative advantages in addressing partic-      of external validity. In such cases, methods having
ular concerns and needs in impact evaluation. A       comparative advantages are those large sample
lens to examine these comparative advantages is       quantitative approaches that cover substantial
the four different types of validity:                 diversity in context and people.

ˇ Internal validity: Establishing the causal re-      A mix of methods--"triangulating" informa-
  lationship between intervention outputs and         tion from different approaches--can be used to
  processes of change leading to outcomes and         assess different facets of complex outcomes or
  impacts                                             impacts, yielding greater validity than from one
ˇ Construct validity: Ensuring that the variables     method alone. For example, if looking at the
  measured adequately represent the under-            impact of incentives on farmers' labor utiliza-
  lying realities of development interventions        tion and livelihoods, a randomized experiment
  linked to processes of change                       can test the effectiveness of different individual
ˇ External validity: Establishing the generaliz-      incentives on labor and income effects (testing
  ability of findings to other settings               internal validity); survey data and case studies
ˇ Statistical conclusion validity: For quantita-      can deepen the analysis by looking at the distri-
  tive techniques, ensuring the degree of con-        bution of these effects among different types
  fidence about the existence of a relationship       of farm households (triangulating with the RCT
  between intervention and impact variable and        evidence on internal validity and increasing
  the magnitude of change.                            external validity); and semistructured interviews

                                                                                                               xv
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  and focus group conversations can broaden the                 different environments. They follow a strict
                  information about the nature of effects in terms              procedure to search for and select appropri-
                  of production, consumption, poverty, and so on                ate evidence, typically using a hierarchy of
                  (establishing construct validity).                            methods, with more quantitatively rigorous
                                                                                (experimental) studies being ranked higher
                  Finally, important to note is that an analysis of the         as sources of evidence.
                  distribution of costs and benefits as a result of an        ˇ Narrative reviews are descriptive accounts
                  intervention--distinguishing between coverage,                of intervention processes and/or results cov-
                  effects on those that are directly affected, and              ering a series of interventions, relying on a
                  indirect effects--cannot be addressed with one                common analytical framework and template
                  particular method. If one is interested in all these          to extract data from the individual studies and
                  questions, then inevitably one needs a framework              summarizing the main findings in a narrative
                  of multiple methods and sources of evidence.                  account and/or tables and matrices represent-
                                                                                ing key aspects of the interventions.
                  6. Build on existing knowledge relevant                     ˇ Realist syntheses are theory based and do
                  to the impact of interventions                                not use a hierarchy of methods. They collect
                  Review and synthesis methods can play a pivotal               earlier research findings by placing the policy
                  role in marshalling existing evidence to deepen               instrument or intervention that is evaluated in
                  the power and validity of an impact evaluation,               the context of other similar instruments and
                  to contribute to future knowledge building, and               describe the intervention in terms of its con-
                  to meet the information needs of stakeholders.                text, social and behavioral mechanisms (what
                  Specifically, these methods can serve two major               makes the intervention work), and outcomes
                  purposes:                                                     (the deliverables).

                  ˇ They strengthen external validity by evaluat-             7. determine if an impact evaluation is
                    ing comparable interventions across different             feasible and worth the cost
                    countries and regions--thus assessing the rel-            Impact evaluations can be costly exercises in
                    ative effectiveness of alternative interventions          terms of their need for human, financial, and
                    in different contexts.                                    often political resources. They complement
                  ˇ Because many interventions rely on similar                rather than replace other types of monitoring
                    mechanisms of change, they help refine the                and evaluation activities and should therefore
                    hypotheses or expected results chain to help              be seen as one of several in a cycle of potentially
                    greater selectivity for the impact evaluation.            useful evaluations in the lifetime of an interven-
                                                                              tion. Thus, at each juncture of deciding whether
                  There are several methods that fall into this               to set up an impact evaluation, it is useful to
                  category:                                                   examine its objectives, benefits, and feasibility
                                                                              and to weigh these against the cost.
                  ˇ Systematic reviews are syntheses of pri-
                    mary studies that, from an initial explicit               Impact evaluations are feasible when they have
                    statement of objectives, follow a transparent,            a clearly defined purpose and design, adequate
                    systematic, and replicable methodology of                 resources, support from influential stakehold-
                    literature search, inclusion and exclusion of             ers, and data availability and when they are
                    studies according to clear criteria, and extract-         appropriate, given the nature and context of the
                    ing and synthesizing of information from the              intervention. They provide the greatest value
                    resulting body of knowledge.                              when there is an articulated need to obtain
                  ˇ Meta-analyses, a common type of systematic                the information from them--either to know
                    review, quantitatively synthesize "scores" for            whether a specific intervention worked, to learn
                    the impact of a similar set of interventions              from the intervention, to increase transparency
                    from a number of individual studies across                of the intervention, or to know its "value for

xvi
                                                                                               ExEcutIvE summary




money." If they are feasible, their value can then   tion--thus policy makers and commissioners
be weighed against the expected costs--includ-       need to involve experts in impact evaluation as
ing the costs of establishing a credible counter-    early as possible in the intervention to design
factual, or what would have happened without         high-quality impact evaluations.
the intervention.
                                                     9. Front-end planning is important
8. Start collecting data early                       For every impact evaluation, front-end planning
As good baseline data are essential to understand-   is important to help manage the study, its
ing and estimating impact, starting early is         reception, and its use.
critical to the success of the eventual evalua-
tion. When working with secondary data, a lack       When managing the evaluation, it is critical to
of information on the quality of data collection     manage costs and staffing and to make essential and
can restrict data analysis options and validity of   transparent decisions on ethical issues and levels of
findings. Those managing an impact evaluation        independence (of the evaluating team vis-ŕ-vis the
have to take notice of and deal effectively with     stakeholders with whom they are collaborating).
the constraints--of time, data, and resources--
under which an impact evaluation has to be           To ensure that the evaluation is used, it is also
carried out.                                         important, at the beginning, to pay attention to
                                                     country and regional ownership of the impact
Depending on the type of intervention, the           evaluation and to build capacity to understand
collection of baseline data and the setup of         and use it. Providing a space for consultation and
other aspects of the impact evaluation require       agreement on impact evaluation priorities among
an efficient relationship between the impact         the different stakeholders of an intervention will
evaluators and the implementers of the interven-     help enhance utilization and ownership.




                                                                                                             xvii
                                                                          Introduction


O
         ver the last 15­20 years, governments and other (public sector) or-
         ganizations have been paying much more attention to evaluation. It
         has become a growth industry in which systems of evaluation exist,
with their methodologies, organizational infrastructures, textbooks, and pro-
fessional societies (Leeuw and Furubo, 2008).

In the development world, the growth of monitor-       and corresponding evaluative inquiry, impact
ing and evaluation (M&E) in particular has been        evaluation. This document discusses questions
acknowledged as crucial. Kusek and Rist (2004)         of what impact evaluation is about, when it is
have articulated its underlying philosophy. M&E        appropriate, and how to do it.
stimulates capacity development within countries
and organizations to do their "own" evaluations        The Network of Networks for Impact Evaluation
and to produce their "own" performance data.           (NONIE) was established in 2006 to foster more
M&E is not focused on one type of evaluation, but      and better impact evaluations by its membership.
concerns all of them, including, for example, ex       NONIE uses the definition of the Organisation
ante studies, rapid appraisals, process evaluations,   for Economic Co-operation and Development's
cost-benefit analyses, and impact evaluations.         Development Assistance Committee (DAC),
                                                       defining impacts as "[p]ositive and negative,
Part of the philosophy of evaluation and therefore     primary and secondary long-term effects
also M&E is to put questions first. Different          produced by a development intervention,
questions raise a need for different approaches.       directly or indirectly, intended or unintended"
If the question an evaluator is confronted with is     (OECD-DAC, 2002: 24).
directed toward understanding what a program
or policy is about, what the underlying theory of      The impact evaluations that NONIE pursues
change or logic is, and what the risk factors are      are expected to reinforce and complement the
when implementing the program, an evaluabil-           broader evaluation work by NONIE members.
ity assessment or an ex ante evaluation will be        The DAC definition refers to the "effects produced
an appropriate route to follow. If the question is     by," stressing the attribution aspect. This implies
focused on the implementation of the program           an approach to impact evaluation that is about
or policy or on the role agencies play, then an        attributing impacts rather than assessing what
implementation analysis or a review of the perfor-     happened. In most contexts, adequate empiri-
mance of agencies can be appropriate. This can         cal knowledge about the effects produced by an
include an audit or inspection. However, if the        intervention requires at least an accurate estimate
question is about whether and to what extent           of what would have occurred in the absence of
the policy intervention made a significant differ-     the intervention and a comparison with what has
ence (compared with the status quo, compared           occurred with the intervention implemented.
with other factors and interventions and with
or without side effects), then an impact evalua-       Following this line of argument, this document
tion is the appropriate answer. This Guidance          subscribes to a somewhat more comprehensive
document looks at the latter type of question          view on impact than the DAC definition does.

                                                                                                             xix
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  Much of the work on impact evaluation that                  last few years. The major reason is that many
                  stresses the attribution problem is in fact about           outside of development agencies believe that
                  attributing short- and medium-term outcomes                 achievement of results has been poor, or at best
                  (to interventions). In practice, this type of               not convincingly established. Many develop-
                  attribution analysis is also referred to as impact          ment interventions appear to leave no trace of
                  evaluation, although (in a strict sense) not within         sustained positive change after they have been
                  the scope of the DAC evaluation. This document              terminated, and it is hard to determine the extent
                  includes a discussion on the latter type of analysis        to which interventions are making a difference.
                  as well as the more long-term effects emphasized            However, the development world is not "alone"
                  in the DAC definition (for further discussion of            in attaching increasing importance to impact
                  these issues, see White, 2009).                             evaluations. In fields such as crime and justice,
                                                                              education, and social welfare, impact evaluations
                  The purpose of NONIE is to promote more and                 have over the last decade become more and
                  better impact evaluations among its members.                more important.1 Evidence-based (sometimes
                  Issues relating to evaluations in general are               "evidence-informed") policies are high on the
                  more effectively dealt with within the parent               (political) agenda, and some even refer to the
                  networks and are thus not the primary focus of              "Evidence Movement" (Rieper et al., 2009). This
                  NONIE. NONIE will focus on sharing methods                  includes the development of knowledge reposi-
                  and learning by doing to promote the practice of            tories, where results of impact evaluations are
                  impact evaluation. This Guidance document was               summarized. In some fields such as criminol-
                  developed to support those purposes.                        ogy and in some professional associations such
                                                                              as the Campbell Collaboration, methodological
                  The Guidance document was written by and                    standards and scales are used to grade impact
                  represents the views of the authors, Frans Leeuw            evaluations,2 although not without discussion
                  and Jos Vaessen, who were commissioned by                   (Leeuw, 2005; Worral, 2002, 2007).
                  NONIE. In writing the document, the authors
                  included previous work by NONIE members                     Important reasons for doing impact evaluations
                  and took account of their comments in finaliz-              are the following:
                  ing the document. Given the fact that perspec-
                  tives on the definition, scope, and appropriate             ˇ Impact evaluations provide evidence on "what
                  methods of impact evaluation differ widely                    works and what doesn't" (under what circum-
                  among practitioners and other stakeholders, the               stances) and how large the impact is. As the
                  document should not be taken to represent the                 Independent Evaluation Group (IEG) of the
                  agreed positions of all of the individual NONIE               World Bank (IEG, 2005) puts it: measuring
                  members. The current Guidance document,                       outcomes and impacts of an activity and dis-
                  highlighting key conceptual and methodologi-                  tinguishing these from the influence of other,
                  cal issues in impact evaluation, provides ample               external factors is one of the rationales behind
                  coverage of such topics as delimitation, interven-            impact evaluation.
                  tion theory, attribution, and combining methods             ˇ Measuring impacts and relating the changes in
                  in impact evaluation. It also presents an introduc-           dependent variables to development policies
                  tion to such topics as participatory approaches               and programs is not something that can be
                  to impact evaluation and assessing impact for                 done "from an armchair." Impact evaluation
                  complex interventions. These and other topics,                is the instrument for these tasks.
                  such as the evaluation of new aid modalities                ˇ Impact evaluation can gather evidence on the
                  and country perspectives to impact evaluation,                sustainability of effects of interventions.
                  should be developed further in the future.                  ˇ Impact evaluations produce information that
                                                                                is relevant from an accountability perspective;
                  Impact evaluation in development assistance                   they disclose knowledge about the (societal)
                  has received considerable attention over the                  effects of programs that can be linked to the (fi-

xx
                                                                                                      IntroductIon




  nancial) resources used to reach these effects.     the world of health and agriculture or in other
ˇ Individual and organizational learning can be       social conditions can help realize goal achieve-
  stimulated by doing impact evaluations. This is     ment, even in a situation where the "believed-
  true for organizations in developing countries      to-be-effective" intervention under review is not
  but also for donor organizations. Informing de-     working.
  cision makers on whether to expand, modify,
  or eliminate projects, programs, and policies       The question of whether impact evaluation
  is linked to this point, as is IEG's (2005) argu-   should always attempt to measure all possible
  ment that impact evaluations enable sponsors,       impacts is not easy to answer. Impact evaluation
  partners, and recipients to compare the effec-      involves finding the appropriate balance between
  tiveness of alternative interventions.              the desire to understand and measure the full
                                                      range of effects in the most rigorous manner
The authors of this Guidance document believe         possible and the practical need to delimit and
that the ultimate reason for promoting impact         prioritize on the basis of interests of stakehold-
evaluations is to learn about "what works and         ers as well as resource constraints.
what doesn't and why" and thus to contribute
to the effectiveness of (future) development          Key issues addressed in this document
interventions. In addition to this fundamental        The guidance is structured around nine key
motive, impact evaluations have a key role to         issues in impact evaluation:
play in the international drive for better evidence
on results and development effectiveness.             1. Identify the (type and scope of the) interven-
They are particularly well suited to answering           tion.
important questions about whether develop-            2. Agree on what is valued.
ment interventions made a difference (and how         3. Carefully articulate the theories linking
cost-effective they were). Well-designed impact          interventions to outcomes.
evaluations also shed light on why an interven-       4. Address the attribution problem.
tion did or did not work, which can vary across       5. Use a mixed-methods approach: the logic of
time and space.                                          the comparative advantages of methods.
                                                      6. Build on existing knowledge relevant to the
Decision makers need better evidence on impact           impact of interventions.
and its causes to ensure that resources are           7. Determine if an impact evaluation is feasible
allocated where they can have most impact and            and worth the cost.
to maintain future public funding for interna-        8. Start collecting the data early.
tional development. The pressures for this are        9. Front-end planning is important.
already strong and will increase as resources
are scaled up for international development.          The discussion of these nine issues constitutes
Without such evidence there is a risk of the          the structure of this Guidance document. The
case for aid and future funding sources being         first part, comprising the first six issues, deals
undermined.                                           with methodological and conceptual issues in
                                                      impact evaluation and constitutes the core of
Using the word "effects" and "effectiveness"          the document. In addition, a shorter second
implies that the changes in the "dependent            part focuses on managing impact evaluation and
variable[s]" that are measured within the             addresses aspects of evaluability, benefits, and
context of an impact evaluation are caused            costs of impact evaluation and planning.
by the intervention under study. The concept
of "goal achievement" is used when causal-            There is no universally accepted definition of
ity is not necessarily present. Goals can also        "rigorous" impact evaluation. There are some who
be achieved independent of the intervention.          equate rigorous impact evaluation with particu-
Changes in financial or economic situations in        lar methods and designs. Given the diversity in

                                                                                                               xxi
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  thinking and practice on the topic and the variety            over others in analyzing a particular question
                  in terms of interventions and contexts in which               or objective.
                  impact evaluation is being applied, the writing of          ˇ Particular methods or perspectives comple-
                  this document has been guided by three basic                  ment each other in providing a more complete
                  premises:                                                     "picture" of impact.

                  ˇ No single method is best for addressing the               Moreover, in our view, rigorous impact evaluation
                    variety of questions and aspects that might be            is more than methodological design. Rigorous
                    part of impact evaluations.                               impact evaluation requires addressing the issues
                  ˇ However, depending on the specific questions              described above in an appropriate manner,
                    or objectives of a given impact evaluation,               especially the core methodological and concep-
                    some methods have a comparative advantage                 tual issues described in Part I.




xxii
                          Part I
Methodological and Conceptual
    Issues in Impact Evaluation
                                                                                               Chapter 1

                          Identify the (type and
                      scope of the) intervention


I
     n international development, impact evaluation is principally concerned
     with final results of interventions (programs, projects, policy measures,
     reforms) on the welfare of communities, households, and individuals.

1.1. the impact evaluation landscape and                other tasks of impact evaluation. At the other end
the scope of impact evaluation                          of the continuum are comprehensive programs
Impact is often associated with progress at the         with an extensive range and scope (increas-
level of the Millennium Development Goals,              ingly at the country, regional, or global level),
which primarily comprise indicators of welfare of       with a variety of activities that cut across sectors,
these households and individuals. The renewed           themes, geographic areas, and emergent specific
attention on results- and evidence-based thinking       activities. Many of these interventions address
and ensuing interest in impact evaluation               aspects that are assumed to be critical for effective
provides new momentum for applying rigorous             development yet difficult to define and measure,
methods and techniques in assessing the impact          such as human security, good governance, politi-
of interventions.                                       cal will and capacity, sustainability, and effective
                                                        institutional systems.
There is today more than ever a "continuum"
of interventions. At one end of the continuum           Some evidence of this continuum is provided
are relatively simple projects characterized by         in appendix 1, in which two examples of impact
single-"strand" initiatives with explicit objectives,   evaluations are presented, implemented at differ-
carried out within a relatively short timeframe,        ent (institutional) levels, and based on divergent
where interventions can be isolated, manipu-            methodologies with different timeframes (see
lated, and measured. An impact evaluation in            also figure 1.1.).
the agricultural sector, for example, will seek to
attribute changes in crop yield to an interven-         The endorsement in 2000 of the Millennium
tion such as a new technology or agricultural           Development Goals by all heads of state, together
practice. In a similar guise, in the health sector, a   with other defining events and occurrences, has
reduction in malaria will be analyzed in relation       propelled new action that challenges develop-
to the introduction of bed nets. For these types        ment evaluation to enter new arenas. There
of interventions, experimental and quasi-experi-        is a shift away from fragmented, top-down,
mental designs may be appropriate for assessing         and asymmetrical approaches. Increasingly,
causal relationships, along with attention to the       ideals such as "harmonization," "partnership,"

                                                                                                                3
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  "participation," "ownership," and "empower-                 ing (e.g., support for the agricultural sector) or
                  ment" are being emphasized by stakeholders.                 macro-earmarking (e.g., support for the govern-
                                                                              ment budget being allocated according to country
                  However, this trend in policy is not yet reflected          priorities).
                  in evaluative practices, including impact evalua-
                  tion. Institutional policies such as anticorrup-            Besides a continued interest in the impact of
                  tion policies--but also regional and global policy          individual projects, donors, governments, and
                  networks and public-private partnerships with               nongovernmental institutions are increasingly
                  their different forms and structures1--appear to            interested in the impact of comprehensive
                  be less often a part of the goal of impact evalua-          programs and sector or country strategies, often
                  tions, when compared with (top-down) small                  comprising multiple instruments, stakeholders,
                  programs for specific groups of beneficiaries.              sites of intervention, and target groups.
                  Ravallion (2008: 6) is of the opinion that there is "a
                  `myopia bias' in our knowledge, favoring develop-           There is a growing demand for assessing the
                  ment projects that yield quick results."2 In the            impact of new instruments and modalities, such
                  promotion of more rigorous impact evaluation,               as--
                  development agencies, national governments,
                  civil society organizations, and other stakehold-           ˇ International treaties governing the actions of
                  ers in development should be aware of this bias               multiple stakeholders (e.g., the Paris Declara-
                  in focus, keeping in mind the full range of policy            tion, the Kyoto Protocol)
                  interventions that (eventually) affect the welfare          ˇ New aid modalities such as sector budget sup-
                  of developing societies.                                      port or general budget support
                                                                              ˇ Instruments such as institutional capacity
                  Evaluating the impact of policies--with their                 building, institutional reform, partnership
                  own settings and levels--requires appropri-                   development, and stakeholder dialogues at
                  ate methodological responses. These can be                    national or regional levels.
                  usefully discussed under the banner of two key
                  issues: the impact of what and the impact on                In most countries donor organizations are (still)
                  what. These two issues point to a key challenge             the main promoters of impact evaluation. The
                  in impact evaluation: the scope of the impact               shift of the unit of analysis to the macro and
                  evaluation.                                                 (government) institutional level requires that
                                                                              impact evaluators pay more attention to compli-
                  1.2. Impact of what?                                        cated and more complex interventions at the
                  What is the independent variable (intervention)             national, sector, or program level. Multi-site,
                  we are looking at? In recent years, we have seen            multi-governance, and multiple (simultaneous)
                  a broadening in the range of policy interven-               causal strands are important elements of this
                  tions that should/could be subject to impact                (see Rogers, 2008).
                  evaluation.
                                                                              At the same time, the need for more rigorous
                  One of the trends in development is that donors             impact evaluation at the "project level" remains
                  are moving up the aid chain. In the past, donors            urgent. The majority of aid money is (still)
                  were very much involved in "micro-managing"                 micro-earmarked money for particular projects
                  their own projects and (sometimes) bypass-                  managed by donors in collaboration with national
                  ing government systems. In contrast, nowadays               institutions. Furthermore, the ongoing efforts
                  a sizeable chunk of aid is allocated to national            in capacity building on national M&E systems
                  support for recipient governments. Condition-               (see Kusek and Rist, 2004) and the promotion
                  ality to some extent has shifted from micro-                of country-led evaluation efforts stress the need
                  earmarking (e.g., donor money destined for an               for further guidance on impact evaluation at the
                  irrigation project in district x) to meso-earmark-          "single" intervention level.

4
                                                        IdEntIfy thE (typE and scopE of thE) IntErvEntIon




Earlier we referred to a continuum of interven-         of public-private partnerships or new aid modali-
tions. At one end of the continuum are relatively       ties, which have become more important in
simple projects characterized by single-"strand"        the development world. Demands for account-
initiatives with explicit objectives, carried out       ability and learning about results at the country,
within a relatively short timeframe, where              agency, sector, or program and strategy levels
interventions can be relatively easy isolated,          are also increasing, which has made the need
manipulated, and measured. Examples of these            for appropriate methodological frameworks to
kinds of interventions include building new             assess their impact more pressing.
roads, repairing roads, reducing the price of fertil-
izer for farmers, providing clean drinking water at     Pawson (2005) has distinguished five principles
lower cost, etc. It is important to be precise in       on complicated programs that can be helpful
what the interventions are and what they focus          when designing impact evaluations of aid:
on. In the case of new roads or the rehabilitation
of existing ones, the goal often is a reduction in      1. Locate key program components. Evaluation
journey time and therefore reduction of societal           should begin with a comprehensive scoping
transaction costs.                                         study, mapping out the potential conjec-
                                                           tures and influences that appear to shape
At the other end of the continuum are compre-              the program under investigation. One can
hensive programs with an extensive range and               envisage stage-one mapping as the hypothesis
scope (increasingly at the country, regional, or           generator. It should alert the evaluator to the
global level), with a variety of activities that cut       array of decisions that constitute a program,
across sectors, themes, and geographic areas               as well as providing some initial deliberation
and emergent specific activities. Rogers (2008)            on their intended and wayward outcomes.
has outlined several aspects of what constitutes        2. Prioritize among program components.
complicated interventions (multiple agencies,              The general rule here is to concentrate on
alternative and multiple causal strands) and               (i) those components of the program
complex interventions3 (recursive causality, and           (intervention) theory that seem likely to
emergent outcomes; see tables 1.1 and 1.2).                have the most significant bearing on overall
                                                           outcomes, and (ii) those segments of program
Rogers (2008: 40) recently argued that "the                theory about which the least is known.
greatest challenge [for the evaluator] comes            3. Evaluate program components by subsets.
when interventions have both complicated                   This principle is about when and where
aspects (multi-level and multi-site) and complex           to locate evaluation efforts in relation to a
aspects (emergent outcomes)." These aspects                program. The evaluation should take on
often converge in interventions in the context             subsets of program theory. Evaluation should




  Table 1.1: Aspects of complication in interventions


  Aspect of complication                     Simple intervention                    complicated intervention
  Governance and location                    Single organization                    Multiple agencies, often interdisciplinary
                                                                                    and cross-jurisdictional
  Simultaneous causal strands                Single causal strand                   Multiple simultaneous causal strands
  Alternative causal strands                 Universal mechanism                    Different causal mechanisms operating in
                                                                                    different contexts
 Source: Rogers (2008).


                                                                                                                                 5
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




    Table 1.2: Aspects of complexity in interventions


    Aspect of complexity                         Simple intervention                           complex intervention
    Recursive causality and disproportionate     Linear, constant dose-response relationship   Recursive, with feedback loops, including
    effect                                                                                     reinforcing loops; disproportionate effects at
                                                                                               critical limits
    Emergent outcomes                            Pre-identified outcomes                       Emergent outcomes
 Source: Rogers (2008).




                             occur in ongoing portfolios rather than               To a large extent interventions can be identi-
                             one-off projects. Suites of evaluations and           fied and categorized on the basis of the main
                             reviews should track program theories as and          theme addressed. Examples of thematic areas of
                             wherever they unfold.                                 interventions are roads and railroads, protected
                          4. Identify bottlenecks in the program network.          area management, alternative livelihoods, and
                             "Theories of Change" analysis perceives               research on innovative practices.
                             programs as implementation chains and asks,
                             "What are the flows and blockages as we put a         A second way to identify interventions is to
                             program into action?" The basic strategy is to        find out which generic policy instruments and
                             investigate how the implementation details            their combinations constitute the interven-
                             sustain or hinder program outputs. The                tion: economic incentives (e.g., tax reductions,
                             main analytic effort is directed at configura-        subsidies), regulations (e.g. , laws or restric-
                             tions made up of selected segments of the             tions), or information (e.g., education or
                             implementation chains across a limited range          technical assistance). As argued by authors such
                             of program locations.                                 as Pawson (2006), Salamon (1981), and Vedung
                          5. Provide feedback on the conceptual                    (1998), using this relatively simple classifica-
                             framework. What the theory-based approach             tion helps identify the interventions. ``Rather
                             initiates is a process of thinking through the        than focusing on individual programs, as is now
                             pathways along which a successful program             done, or even collections of programs grouped
                             has to travel. What would be described are            according to major `purpose,' as is frequently
                             the main series of decision points through            proposed, the suggestion here is that we should
                             which an initiative has proceeded, and                concentrate on the generic tools of govern-
                             the findings would be used in alerting                ment that come to be used, in varying combina-
                             stakeholders to the caveats and consider-             tions in particular public programs" (Salamon,
                             ations that should inform those decisions.            1981: 256). Acknowledging the central role of
                             The most durable and practical recommen-              policy instruments enables evaluators to take
                             dations that evaluators can offer come from           into account lessons from the application of
                             research that begins with a theory and ends           particular (combinations of) policy interven-
                             with a refined theory.                                tions elsewhere (see Bemelmans-Videc and
                                                                                   Rist, 1998).
                          If interventions are complicated, in that they
                          have multiple active components, it is helpful to        Third, the separate analysis of intervention
                          state these separately and treat the intervention        components implies interventions being
                          as a package of components. Depending on the             unpacked in such a way that the most important
                          context, the impact of intervention components           social and behavioral mechanisms believed to
                          can be analyzed separately and/or as part of a           make the "package" work are spelled out (see
                          package.4                                                chapter 3).

6
                                                          IdEntIfy thE (typE and scopE of thE) IntErvEntIon




  Box 1.1: "Unpacking" the aid chain


  The importance of distinguishing among different levels of im-      ances mechanisms, etc.) and is likely to be affected by donor
  pact is also discussed by Bourguignon and Sundberg (2007), who      policies and aid. (institutional level impact)
  "unpack" the aid effectiveness box by differentiating among       ˇ External donors and international financial institutions to policy
  three essential links between aid and final policy outcomes:        makers: How do external institutions influence the policy-mak-
                                                                      ing process through financial resources, dialogue, technical
  ˇ Policies to outcomes: How do policies, programs and projects      assistance, conditionalities, etc.? (institutional-level impact)
    affect investment, production, growth, social welfare, and
    poverty levels? (beneficiary level impact)                          The above links can be perceived as channels through which
  ˇ Policy makers to policies: How does the policy-making process   aid eventually affects beneficiary-level impact. At the same time,
    at national and local levels lead to "good policies"? This is   the processes triggered by aid generate lasting impacts at insti-
    about governance (institutional capacities, checks and bal-     tutional levels.

  Source: Bourguignon and Sundberg (2007).




Although complicated interventions are                     intermediate changes and being contingent on
becoming more important and therefore should               more external variables (e.g., from stakeholder
be subject to impact evaluation, this evolution            dialogue, to changes in policy priorities, to
should not imply a reduction of interest in                changes in policy implementation, to changes in
evaluating the impact of relatively simple, single-        human welfare).
strand interventions. The sheer number of these
interventions makes doing robust impact evalua-            Given this diversity, we think it is useful for
tions of great importance.                                 purposes of "scoping" to distinguish between
                                                           two principal levels of impact: at the institu-
1.3. Impact on what?                                       tional level and at the beneficiary level.6 It
This topic concerns the "dependent variable                broadens impact evaluation beyond either
problem." Interventions often affect multiple              simply measuring whether objectives have been
institutions, groups, and individuals. What level          achieved or assessing direct effects on intended
of impact should we be interested in?                      beneficiaries. It includes the full range of impacts
                                                           at all levels of the results chain, including ripple
The causality chain linking policy interventions           effects on families, households, and communi-
to ultimate policy goals (e.g., poverty alleviation)       ties; on institutional, technical, or social systems;
can be relatively direct and straightforward (e.g.,        and on the environment. In terms of a simple
the impact of vaccination programs on mortal-              logic model, there can be multiple intermediate
ity levels) but also complex and diffuse. Impact           (short- and medium-term) outcomes over time
evaluations of, for example, sector strategies or          that eventually lead to impact--some or all of
general budget support potentially encompass               which may be included in an evaluation of impact
multiple causal pathways, resulting in long-term           at a specific moment in time.
direct and indirect impacts. Some of the causal
pathways linking interventions to impacts might            Interventions that can be labeled as institu-
be "fairly" straightforward5 (e.g., from training          tional primarily aim at changing second-order
programs in alternative income generating                  conditions (i.e., the capacities, willingness, and
activities to employment and to income levels),            organizational structures enabling institutions to
whereas other pathways are more complex                    design, manage, and implement better policies
and diffuse in terms of going through more                 for communities, households, and individuals).

                                                                                                                                           7
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                    Examples are policy dialogues, policy networks,                                   the discussion on choice of scope and method in
                    training programs, institutional reforms, and                                     impact evaluation.
                    strategic support to institutional actors (i.e.,
                    governmental and civil society institutions,                                      Having illustrated this differentiation, it is
                    private corporations, and hybrids) and public-                                    important to note that for many in the develop-
                    private partnerships.                                                             ment community, impact assessment is essentially
                                                                                                      about impact at the beneficiary level. The main
                    Other types of interventions directly aim at/                                     concern is how (sets of) policy interventions
                    affect communities, households, and individu-                                     directly or indirectly affect the welfare of benefi-
                    als, including voters and taxpayers. Examples                                     ciaries and to what extent changes in welfare
                    are fiscal reforms, trade liberalization measures,                                can be attributed to these interventions. In line
                    technical assistance programs, cash transfer                                      with this interpretation of impact evaluation, 8
                    programs, construction of schools, etc.                                           throughout this document we will focus on
                                                                                                      impact assessment at the beneficiary level (see
                    Figure 1.1. graphically presents different levels of                              the dotted oval in figure 1.1.), addressing key
                    intervention and levels of impact. The differentia-                               methodological concerns and methodological
                    tion between impact at the institutional level and                                approaches as well as the choice of methodologi-
                    impact at the beneficiary level7 can be useful in                                 cal approach in a particular evaluation context.




    Figure 1.1: Levels of intervention, programs, and policies and types of impact



                                             International conferences, treaties, declarations, protocols, policy networks


                                                                         Institutional-level impact

             Donor capacities/policies                                Government capacities/policies                             Other actors (INGOs, NGOs,
                                             Macro-earmarking
                                             (e.g., debt relief,
                                                                                                                                  banks, cooperatives, etc.)
            Micro-earmarking,                GBS)
            meso-earmarking
            (e.g., SBS)



                                                   May constitute
                     Programs                        multiple
                                                                            Projects                                             Policy measures
                     (e.g., health reform)                                  (e.g., agricultural                                  (e.g., tax increases)
                                                                            extension)


                                                                          Beneficiary-level impact

                                                       Communities



                                                                    Households



                                                           Individual (taxpayers, voters, citizens, etc.)

                                                                                                    Replication and scaling up
                                                                                                                                         Wider systemic effects




8
                                                               IdEntIfy thE (typE and scopE of thE) IntErvEntIon




Where necessary, other levels and settings of                  as interventions financed through these modali-
impact will be addressed (see the dashed oval in               ties (aim to) affect the lives of households and
figure 1.1.). The implication is that with respect             individuals.9 We do not address the question of
to the impact evaluation of, for example, new aid              how to do impact evaluations of new aid modali-
modalities (e.g., general budget support or sector             ties as such (see Lister and Carter, 2006; Elbers
budget support), this will only be discussed as far            et al., 2008).




  Key message
  Identify the scope and type of the intervention. In-         and therefore should be subject to impact evaluation,
  terventions range from single-strand initiatives with        this should not imply a reduction of interest in evaluating
  explicit objectives to complicated institutional policies.   the impact of relatively simple, single-strand interven-
  Across this continuum, the scope of an impact evalu-         tions. The sheer number of these interventions makes
  ation can be identified by answering two questions:          doing robust impact evaluations of great importance.
  the impact of what and on what? Look closely at the          In addition, one should be clear about the level of im-
  nature of the intervention, for example, on the basis        pact to be evaluated. Although most policy makers and
  of the main theme addressed or by the generic policy         stakeholders are primarily interested in beneficiary-
  instruments used. If interventions are complicated in        level impact (e.g., impact on poverty), specific policy
  that they have multiple active components, state these       interventions are primarily geared at inducing sustain-
  separately and treat the intervention as a package of        able changes at the institutional (government) level
  components that should be unpacked.                          ("second-order" effects), with only indirect effects at
      Although complicated interventions, sometimes of         the beneficiary level.
  an institutional nature, are becoming more important




                                                                                                                             9
                                                                                        Chapter 2



                        Agree on what is valued


I
    mpact evaluation requires finding a balance between taking into account
    the values of stakeholders and paying appropriate attention to the empiri-
    cal complexity of processes of change induced by an intervention. Some of
this complexity has been unpacked in the discussion on the topic of scope of
the impact evaluation, where we distinguished between levels of impact that
neatly capture the often complex and diffuse causal pathways from interven-
tion to different outcomes and impact: institutional or beneficiary level and
replicatory impact. It is best to--as much as possible--translate objectives
into measurable indicators, but at the same time not lose track of important
aspects that are difficult to measure.

After addressing the issue of stakeholder values,   First, stakeholder values are reflected in the
we briefly discuss three dimensions that are        objectives of an intervention, as stated in the
particularly important and at the same time         official documents produced by an intervention.
challenging to capture in terms of measurable       However, interventions evolve and objectives
indicators: intended versus unintended effects,     might change. In addition, stakeholder groups,
short-term versus long-term effects, and the        besides funding and implementing agencies,
sustainability of effects.                          might harbor expectations not adequately
                                                    covered by official documents. Impact evaluations
2.1. Stakeholder values in impact                   need to answer questions related to "for whom"
evaluation                                          the impacts have been intended and how context
Impact evaluation needs to assess the value of      influences impacts of interest. Some of the main
the results derived from an intervention. This is   tasks of an impact evaluation are, therefore, to
not only an empirical question but inherently a     be clear about who decides what the right aims
question about values--which impacts are judged     are and to ensure that the legitimate different
as significant (whether positive or negative),      perspectives of different stakeholders are given
what types of processes are valued in themselves    adequate weight. Where there are multiple aims,
(either positive or negative), and what and whose   there must be agreement about the standards
values are used to judge the distribution of the    of performance required in the weighting of
costs and benefits of interventions.                these--for example, can an intervention be

                                                                                                        11
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  considered a success overall if it fails to meet            2.3. Short-term versus long-term effects
                  some of the targets but does well in terms of the           In some types of interventions, impacts emerge
                  main intended outcome?                                      quickly. In others, impact may take much longer
                                                                              and change over time. The timing of the evalua-
                  Depending on the evaluation context, there                  tion is therefore important. Development
                  are different ways for evaluators to address                interventions are usually assumed to contribute
                  stakeholder values:                                         to long-term development (with the exception of
                                                                              humanitarian disaster and emergency situations).
                  ˇ Informal consultation with representatives                However, focusing on short-term or intermedi-
                    from different stakeholder groups                         ate outcomes often provides more useful and
                  ˇ Using values inquiry1 (Henry, 2002) as a basis            immediate information for policy and decision
                    for more systematic stakeholder consultation              making. Intermediate outcomes may be mislead-
                  ˇ Using a participatory evaluation approach to              ing, often differing markedly from those achieved
                    include stakeholder values in the evaluation              in the longer term. Many of the impacts of interest
                    (see, e.g., Cousins and Whitmore, 1998).                  from development interventions will only be
                                                                              evident in the longer-term, such as environ-
                  2.2. Intended versus unintended effects                     mental changes or changes in social impacts on
                  In development programs and projects, intended              subsequent generations. Searching for evidence
                  effects are often translated into measurable indica-        of such impacts too early might mistakenly lead
                  tors as early as the design phase. Impact evalua-           to the conclusion that they have failed.
                  tion should go beyond assessing the expected
                  effects, given an intervention's logical framework          In this context, the exposure time of an interven-
                  and objectives. Interventions often change over             tion in making an impact is an important
                  time, with consequences for how they affect                 point. A typical agricultural innovation project
                  institutional and people's realities. Moreover,             that tries to change farmers' behavior with
                  effects are sometimes context specific, where               incentives (training, technical assistance, credit)
                  different contexts trigger particular processes of          is faced with time lags in both the adoption
                  change. Finally, in most cases, the full scope of           effect (farmers typically are risk averse and
                  an intervention's effects is not known in advance.          face resource constraints and start adopting
                  A well-articulated intervention theory can help             innovations on an experimental scale) and
                  anticipate some of the unintended effects of an             the diffusion effect (other farmers want to see
                  intervention (see chapter 3).                               evidence of results before they copy any new
                                                                              behavior). In such gradual, nonlinear processes
                  Classic impact evaluations assume that there                of change with cascading effects, the timing
                  are no impacts for nonparticipants, but this                of the ex post measurement (of land use) is
                  is unlikely to be true for most development                 crucial. Ex post measurements that occur just
                  interventions. Spillover effects or replicatory             after project closure could either underestimate
                  effects (see chapter 1) can stem from market                (full adoption/diffusion of interesting practices
                  responses (given that participants and nonpar-              has not taken place yet) or overestimate impact
                  ticipants trade in the same markets), the                   (as farmers will stop investing in those land use
                  (nonmarket) behavior of participants/nonpar-                practices that are not attractive enough to be
                  ticipants or the behavior of intervening agents             maintained without project incentives).
                  (governmental/nongovernmental organiza-
                  tion). For example, aid projects often target local         2.4. the sustainability of effects
                  areas, assuming that the local government will              Focusing on short- or intermediate-term
                  not respond; yet if one village gets the project,           outcomes may underestimate the importance of
                  the local government may well cut its spending              designs that are able to measure effects (positive
                  on that village and move to the control village             or negative) in the long term. One example is an
                  (Ravallion, 2008).                                          effective strategy to reduce child malnutrition

12
                                                                                         a G r E E o n w h at I s va l u E d




in a certain population that may quite quickly       be explored in an evaluation. The sustainability of
produce impressive results, yet fail soon after in   positive impacts is also likely to be evident only
the absence of systems, resources, and capacities    in the longer term. Impact evaluations therefore
to maintain the work--or follow-up work--after       can focus on other impacts that will be observable
termination of the intervention.                     in the short term, such as the institutionalization
                                                     of practices and the development of organiza-
Few impact evaluations will probably provide         tional capacity, that are likely to contribute to
direct evidence of long-term impacts, and in any     the sustainability of impacts for participants and
case results are needed before these impacts         communities in the longer term.2
become evident to inform decisions on contin-
uation, next phases, and scaling-up. Impact
evaluations therefore need to identify short-term      Key message
impacts and, where possible, indicate whether          Agree on what is valued. Select objectives that are
longer-term impacts are likely to occur.               important to the stakeholders' values. Do not be
                                                       afraid of selecting one objective; focus and clar-
To detect negative impacts in the long term, early     ity are virtues, not vices. As much as possible try
warning indicators are important to include.           to translate objectives into measurable indicators,
A well-articulated intervention theory (see            but at the same time do not lose track of important
chapter 3) that also addresses the time horizons       aspects that are difficult to measure. In addition,
over which different types of outcomes and             keep in mind the dimensions of exposure time and
impacts could reasonably be expected to occur          the sustainability of changes.
can help to identify impacts that can and should




                                                                                                                         13
                                                                                             Chapter 3
                          Carefully articulate
                         the theories linking
                  interventions to outcomes


W
           hen evaluators talk about the black box "problem," they are usually
           referring to the practice of viewing interventions primarily in terms
           of effects, with little attention paid to how and why those effects
are produced. The common thread underlying the various versions of theory-
based evaluation is the argument that "interventions are theories incarnate"
and evaluation constitutes a test of intervention theory or theories.

3.1. Seeing interventions as theories: the           possible to open up the black box. Development
black box and the contribution problem               policies and interventions, in one way or another,
Interventions are embodiments of theories in at      have to do with changing behavior/intentions/
least two ways. First, they comprise an expecta-     knowledge of households, individuals, and organi-
tion that the introduction of a program or policy    zations (grass roots, private, and public sector).
intervention will help ameliorate a recurring        Crucial for understanding what can change
social problem. Second, they involve an assump-      behavior is information on behavioral and social
tion or set of assumptions about how and why         mechanisms. An important insight from theory-
program activities and resources will bring about    based evaluations is that policy interventions are
changes for the better. The underlying theory of     (often) believed to address and trigger certain
a program often remains hidden, typically in the     social and behavioral responses among people and
minds of policy architects and staff. Policies--be   organizations; in reality this may not be the case.
they relatively small-scale direct interventions
like information campaigns, training programs,       3.2. Articulating intervention theories on
or subsidization; meso-level interventions such      impact
as public-private partnerships and social funds,     Program theory (or intervention theory) can be
or macro-level interventions such as "general        identified (articulated) and expressed in many
budget support"--rest on social, behavioral and      ways--a graphic display of boxes and arrows,
institutional assumptions indicating why "this"      a table, a narrative description, and so on. The
policy intervention will work, which at first view   methodology for constructing intervention
are difficult to uncover.                            theory, as well as the level of detail and complexity,
                                                     also varies significantly (e.g., Connell et al., 1995;
By seeing interventions as theories and by using     Leeuw, 2003; Lipsey, 1993; McClintock, 1990;
insights from theory-based evaluations, it is        Rogers et al., 2000; Trochim, 1989; Wholey, 1987).

                                                                                                              15
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                     Too often the role of methodology is neglected,          Sometimes stakeholders have contrast-
                     and it is assumed that "intervention theories"           ing assumptions and expectations about an
                     are like manna falling out of the sky. That is           intervention's impact that has implications for
                     not the case. Often the underlying theory has            reconstructing the intervention theory. Basically,
                     to be dug up. Moreover, much of what passes as           there are two ways to address this issue. The first
                     theory-based evaluation today is simply a form of        is to try to combine the perspectives of differ-
                     "analytic evaluation [which] involves no theory in       ent people (for example, program managers
                     anything like a proper use of that term" (Scriven,       and target group members) into an overarch-
                     1998: 59).                                               ing intervention theory that consists of (parts
                                                                              of) arguments from these different sources. The
                     The intervention theory provides an overall              overall theory might be created through an itiner-
                     framework for making sense of potential                  ary process of dialogue and refinement and as
                     processes of change induced by an intervention.          such might contribute to a shared vision among
                     Several pieces of evidence can be used for articu-       stakeholders (see, e.g., Pawson and Tilley, 1997).
                     lating the intervention theory:                          Second, when differences are substantial, several
                                                                              competing intervention theories have to be
                     ˇ An intervention's existing logical framework           reconstructed. Carvalho and White (2004) give
                       as a starting point for mapping causal assump-         an example of a "theory" and an "anti-theory"
                       tions linked to objectives and other written           dealing with the assumed impact of social funds
                       documents produced within the framework                (see box 3.1).
                       of an intervention
                     ˇ Insights provided by and expectations harbored         For an example of what an impact theory might
                       by policy makers and staff (and other stakehold-       look like, consider the case of a small business
                       ers) on how they think the intervention will           development project that provides training to
                       affect/is affecting/has affected target groups         young managers who have started a business.
                     ˇ (Written) evidence on past experiences of              The direct goal is to help make small businesses
                       similar interventions (including those imple-          financially sustainable and the indirect goal is
                       mented by other organizations)                         to generate more employment in the region.
                     ˇ Literature on mechanisms and processes of              Closer scrutiny reveals that the project might
                       change in certain institutional contexts, for par-     have a positive influence on the viability of small
                       ticular social problems, in specific sectors, etc.     businesses in two ways: First, by training young




  Box 3.1: Social funds and government capacity: Competing theories


  Proponents of social funds argue they will develop government      social funds affect government capacity. Carvalho and White
  capacity in several ways. Principle among these are that the so-   (2004) refer to both sets of assumptions in terms of "theory" and
  cial fund will develop superior means of resource allocation and   "anti-theory." Their study found that well-functioning, decentral-
  monitoring, which will be transferred to the government either     ized social funds, such as the Zambia Social Investment Fund
  directly through collaborative work or indirectly by copying the   in Zambia, worked through--rather than parallel to--existing
  procedures shown to be successful by the social fund. But crit-    structures and that the social fund procedures were indeed
  ics argue that social funds bypass normal government channels      adopted more generally by district staff. But at national level
  and so undermine government capacity, an effect reinforced by      there was generally little evidence of either positive or nega-
  drawing away the government's best people by paying a project      tive effects on capacity--with some exceptions, such as the
  premium. Hence, these are rather different theories of how         promotion of poverty mapping in some countries.

  Source: Carvalho and White (2004).




16
                           c a r E f u l ly a r t I c u l at E t h E t h E o r I E s l I n k I n G I n t E r v E n t I o n s t o o u t c o m E s




people in basic management and accounting                     ˇ Deteriorating market conditions (in input or
skills, the project intends to have a positive effect           output markets) may jeopardize the future of
on financial viability and ultimately on the growth             the business.
and sustainability of the business; second, by                ˇ The availability and quality of infrastructure
supporting the writing of a business plan, the                  or skilled labor at any point may become con-
project aims to increase the number of successful               straining factors on business development
applications for credit with the local bank, which              prospects.
previously excluded the project's target group                ˇ The efforts of other institutions promoting
because of the small loan sizes (high transac-                  small business development or any particular
tion costs) and high risks involved. Following                  aspect of it might positively (or negatively)
this second causal strand, efficient and effective              affect businesses.
spending of the loan is also expected to contrib-
ute to the strength of the business. Outputs are              Methods for reconstructing the underlying
measured in terms of the number of people                     assumptions of project/program/policy theories
trained by the project and the number of loans                are the following (see Leeuw, 2003):
the bank extends (see figure 3.1.).
                                                              ˇ A policy-scientific method, which focuses on
Any further empirical analysis of the impact of                 interviews, documents, and argumentation
the project requires insight into the different                 analysis
factors--besides the project itself--that affect              ˇ A strategic assessment method, which focuses
small business development and employment                       on group dynamics and dialogue
generation. Even in this rather simple example,               ˇ An elicitation method, which focuses on cogni-
the number of external variables that affect the                tive and organizational psychology.
impact variables either directly or by moderat-
ing the causal relations specified in figure 3.1. is          Central in all three approaches is the search for
manifold. Some examples are the following:                    mechanisms that are believed to be "at work"
                                                              when a policy is implemented. Box 3.2 discusses
ˇ Short-term demands on the labor efforts of                  social and behavioral mechanisms for understand-
  business owners in other activities may lead                ing impact.
  to suboptimal strategic choices, jeopardizing
  the sustainability of the business.                         3.3. testing intervention theories on impact
ˇ Inefficient or ineffective use of loans because             After articulating the assumptions on how an
  of short-term demands for cash for other ex-                intervention is expected to affect outcomes
  penditures might jeopardize repayment and                   and impacts, the question arises as to what
  the financial viability of the business.                    extent these assumptions are valid. In practice,




  Figure 3.1: Basic intervention theory of a fictitious small business support project



        Small business                                        SBO's capacity to                                           Growth and
         owners (SBO)                                         manage business                                          sustainability of
        receive training                                         increases                                               the business




                                           SBO writes                              SBO receives                     Employment generation
                                          business plan                           loan from bank                        in the region




                                                                                                                                            17
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




     Box 3.2: Social and behavioral mechanisms as heuristics for understanding processes
     of change and impact


     Hedström (2005: 25) has defined the concept of social mecha-             is related to this, as are group think, the common knowledge
     nisms as "a constellation of entities and activities that are or-        effect, and herd behavior.
     ganized such that they regularly bring about a particular type of      ˇ "Tipping points," "where a small additional effort can have a
     outcome." Mechanisms form the "nuts and bolts" (Elster, 1989)            disproportionately large effect, can be created through virtu-
     or the "engines" (Leeuw, 2003) of interventions (policies and pro-       ous circles, or be a result of achieving certain critical levels"
     grams), making them work, given certain contexts (Pawson and             (Rogers, 2008: 35).
     Tilley, 1997). Hedström and Swedberg (1998: 296­98), building on
     the work of Coleman (1990), discuss three types of mechanisms:         Relevance of mechanisms for impact evaluations
     situational mechanisms, action formation mechanisms, and               Development policies and interventions, in one way or another,
     transformational mechanisms.                                           have to do with changing behavior/intentions/knowledge of
         Examples of situational mechanisms are self-fulfilling and self-   households, individuals, and organizations (grass roots, pri-
     denying prophecies and crowding-out (e.g., by striving to force        vate, and public sector). Crucial for understanding what can
     people who are already largely compliant with laws and regula-         change behavior is information about these mechanisms.
     tions into full compliance, the opposite is realized, because due      The mechanisms underlying processes of change might
     to the extra focus on laws and regulation, the internal motivation     not be necessarily those that are assumed to be at work
     of people to comply is reduced).                                       by policy makers, programs designers, and staff. Creating
         Action-formation mechanisms are the heuristics that people         awareness on the basis of (public) information campaigns
     develop to deal with their bounded rationality, such as--              does not always lead to behavioral change. Subsidies and
                                                                            other financial incentives run the risk of causing unintended
     ˇ Framing and the endowment effect--"The fact that people              side effects, such as benefit snatching, but also create the
       often demand much more to give up an object than they would          "Mitnahme-effect" (people already tended to behave in a way
       be willing to pay to acquire it," but also the tendency for people   the incentive wanted them to behave before the incentive ex-
       to have a stronger preference for more immediate payoffs than        isted). Mentoring dropouts in education might cause "learned
       for later payoffs, the closer to the present both payoffs are        helplessness" and therefore increase dropout rates. Many
     ˇ Types of learning (social learning, vicarious learning)              other examples are available in the literature. The relevance
     ˇ "Game-theoretical" mechanisms, such as the "grim strat-              of knowing which social and behavioral mechanisms are
       egy" (to repeatedly refuse to cooperate with another party           believed to do the work increases as the complication and
       as a punishment for the other party's failure to cooperate           complexity of interventions increases.
       previously) and the shadow of the future /shadow of the past             A focus on mechanisms helps evaluators and managers open
       mechanism                                                            up and test the theory underlying an intervention. Spending time
     ˇ Mechanisms such as the "fight-or-flight-response" to stress          and money on programs based on "pet theories" of policy makers
       and the "tend-and-befriend mechanism" are other examples.            or implementation agents that are not corroborated by relevant
                                                                            research should probably not be high on the agenda. If a policy
         Transformational mechanisms illuminate how processes and           intervention is based on mechanisms that are known not to work (in
     results of interacting individuals and groups are "transformed"        a given context or in general), that is a signal that the intervention
     into collective outcomes. Examples are the following:                  probably will not be very effective. This can be found out on the
                                                                            basis of desk research as a first test of the relevance and validity
     ˇ Cascading is a process by which people influence one another,        of an intervention theory, that is, by confronting the theory with
       so much so that participants ignore their private knowledge          existing knowledge about mechanisms. That knowledge stems
       and rely instead on the publicly stated judgments of others.         from synthesis and review studies (see chapter 6). Further em-
       The bandwagon phenomenon (the tendency to do [or believe]            pirical impact evaluation can generate more contextualized and
       things because many other people do [or believe] the same)           precise tests of the intervention theory.




18
                        c a r E f u l ly a r t I c u l at E t h E t h E o r I E s l I n k I n G I n t E r v E n t I o n s t o o u t c o m E s




evaluators have at their disposal a wide range of          ˇ The theory of change--or key elements
methods and techniques to test the intervention              thereof--is verified by evidence: the chain of
theory. We can distinguish between two broad                 expected results occurred.
approaches. The first is that the theory consti-           ˇ Other influencing factors have been assessed
tutes the basis for constructing a "causal story"            and either shown not to have made a sig-
about how and to what extent the intervention                nificant contribution or their relative role in
has produced results. Usually different methods              contributing to the desired result has been
and sources of evidence are used to further refine           recognized.
the theory in an iterative manner until a credible
and reliable causal story has been generated.              The analysis is best done iteratively, building up
The second approach is to use the theory as                a more robust assessment of causal contribu-
an explicit benchmark for testing (some of)                tion. The overall aim is to reduce the uncertainty
the assumptions in a formal manner. Besides                about the contribution the intervention is making
providing a benchmark, the theory provides the             to the observed results through an increased
template for method choice, variable selection,            understanding of why the observed results have
and other data collection and analysis issues. This        occurred (or not) and the roles played by the
approach is typically applied in statistical analysis      intervention and other factors. At the impact level
but is not in any way restricted to this type of           this is the most challenging, and a "contribution
method. In short, theory-based methodological              story" has to be developed for each major strategy
designs can be situated anywhere in between                that is part of an intervention, at different levels
"telling the causal story" and "formally testing           of analysis. They would be linked, as each would
causal assumptions."                                       treat the other strategies as influencing factors.

The systematic development and corrobo-                    One of the key challenges in the foregoing
ration of the causal story can be achieved                 analysis is to pinpoint the exact causal effect from
through causal contribution analysis (Mayne,               intervention to its impact. Despite the potential
2001), which aims to demonstrate whether the               strength of the causal argumentation on the
evaluated intervention is one of the causes of             links between the intervention and impact, and
observed change. Contribution analysis relies              despite the possible availability of data on indica-
on chains of logical arguments that are verified           tors, as well as data on contributing factors, etc.,
through careful analysis. Rigor in causal contri-          there remains uncertainty about the magnitude
bution analysis involves systematically identify-          of the impact as well as the extent to which the
ing and investigating alternative explanations for         changes in impact variables are really due to the
observed impacts. This includes being able to              intervention or to other influential variables. This
rule out implementation failure as an explana-             is called the attribution problem and is discussed
tion for lack of results and developing testable           in chapter 4.
hypotheses and predictions to identify the
conditions under which interventions contribute
to specific impacts.                                          Key message
                                                              Carefully articulate the assumptions behind the
The causal story is inferred from the following               theories linking interventions to outcomes. What
evidence:                                                     are the causal pathways linking intervention out-
                                                              puts to processes of change and impact? Be criti-
ˇ There is a reasoned theory of change for the                cal if an "intervention theory" appears to assert or
  intervention: it makes sense, is plausible, and             assume changes without much explanation. The
  is agreed to by key players.                                focus should be on dissecting the causal (social,
ˇ The activities of the intervention were imple-              behavioral, and institutional) mechanisms that make
  mented.                                                     interventions "work."


                                                                                                                                         19
                                                                                            Chapter 4

                                                     Address the
                                             attribution problem


M
           ultiple factors can affect the livelihoods of individuals or the capaci-
           ties of institutions. For policy makers as well as stakeholders it is
           important to know what the added value of the policy intervention
is, apart from these other factors.


4.1. the attribution problem                          time spent fetching water. If nothing else of
The attribution problem is often referred to as       importance happened during the period under
the central problem in impact evaluation. The         study, attribution is so clear that there is no need
central question is to what extent changes in         to resort to anything other than before versus
outcomes of interest can be attributed to a           after to determine this impact.
particular intervention. Attribution refers to both
isolating and estimating accurately the particu-      In general, the observed changes are only partly
lar contribution of an intervention and ensuring      caused by the intervention of interest. Other
that causality runs from the intervention to the      interventions inside or outside the core area will
outcome.                                              often interact and strengthen/reduce the effects
                                                      of the intervention of interest for the evaluation.
The changes in welfare for a particular group of      In addition, other unplanned events or general
people can be observed by undertaking before          change processes will often influence develop-
and after studies, but these rarely accurately        ment, such as natural catastrophes, urbaniza-
measure impact. Baseline data (before the             tion, growing economies, business cycles, war,
intervention) and end-line data (after the            or long-term climate change. For example,
intervention) give facts about the development        in evaluating the impact of microfinance on
over time and describe "the factual" for the          poverty, we have to control for the influences
treatment group (not the counterfactual). But         of changing market conditions, infrastruc-
changes observed by comparing before-after (or        ture developments, or climate shocks such as
pre-post) data are rarely caused by the interven-     droughts, and so on.
tion alone, as other interventions and processes
influence developments, both in time and space.       A discussion that often comes up in impact
There are some exceptions in which before             evaluation is the issue of attribution of what.
versus after will suffice to determine impact. For    This issue is complementary to the indepen-
example, supplying village water pumps reduces        dent variable question discussed in chapter 1.

                                                                                                             21
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                   How the impact of the intervention is measured             The value of a target variable (point a) after an
                   may be stated in several ways:                             intervention should not be regarded as the
                                                                              intervention's impact, nor is it simply the differ-
                   ˇ What is the impact of an additional dollar of            ence between the before and after situation (a­b,
                     funding to program X?1                                   measured on the vertical axis). The net impact (at
                   ˇ What is the impact of country Y's contribution           a given point in time) is the difference between
                     to a particular intervention?                            the target variable's value after the intervention
                   ˇ What is the impact of intervention Z?                    and the value the variable would have had if the
                                                                              intervention had not taken place (a­c).
                   In this guidance we will focus on the third level
                   of attribution: What is the impact of a particu-           The starting point for an evaluation is a good
                   lar policy intervention (from very simple to               account of the factual--what happened in terms
                   complex), independent of the specific monetary             of the outputs/outcomes targeted by the interven-
                   and nonmonetary contributions of the (institu-             tion? A good account of the factual requires
                   tional) actors involved?                                   articulating the intervention theory (or theories)
                                                                              and connecting the different causal assumptions
                   The issue of attributing impact to a particular            from intervention outputs to outcomes and
                   intervention can be a quite complicated issue              impacts, as discussed earlier in chapter 3. This
                   in itself (especially when talking about compli-           guidance will discuss several options for measur-
                   cated interventions such as sector strategies or           ing the counterfactual.
                   programs). Additional levels of attribution, such as
                   tracing impact back from interventions to specific         Evaluations can either be experimental, as
                   (financial) contributions of different donors, are         when the evaluator purposely collects data and
                   either meaningless or too complicated to achieve           designs evaluations in advance, or quasi-experi-
                   in a pragmatic and cost-effective manner.                  mental, as when data are collected to mimic
                                                                              an experimental situation. Multiple regres-
                   Analyzing attribution requires comparing the               sion analysis is an all-purpose technique that
                   situation "with" an intervention to what would             can be used in virtually all settings (provided
                   have happened in the absence of an interven-               that data are available); when the experiment
                   tion, the "without" situation (the counterfac-             is organized in such a way that no controls are
                   tual). Such comparison of the situation with and           needed, a simple comparison of means can be
                   without the intervention is challenging because            used instead of a regression, because both will
                   it is not possible to observe how the situation            give the same answer. (Experimental and quasi-
                   would have been without the intervention,                  experimental approaches will be discussed in
                   so that has to be constructed by the evaluator.            § 4.2.) We briefly introduce the general principles
                   The counterfactual is illustrated in figure 4.1.           and the most common approaches. The idea of
                                                                              (quasi-) experimental counterfactual analysis is
                                                                              that the situation of a participant group (receiv-
   Figure 4.1: Graphic display of the net impact of an                        ing benefits from/affected by an intervention)
   intervention                                                               is compared over time with the situation of an
                                                                              equivalent comparison group that is not affected
                                                                              by the intervention.
                                                         a
 target variable




                                                                              Several designs exist of combinations of ex ante
     Value




                                                         c
                                                                              and ex post measurements of participant and
                   b                                                          control groups (see § 4.2.). Randomization of
                                                                              intervention participation is considered the best
                   Before                             After                   way to create equivalent groups. Random assign-
                                     Time
                                                                              ment to the participant and control group leads

22
                                                                             a d d r E s s t h E at t r I b u t I o n p r o b l E m




to groups with similar average characteristics2 for   controlled trial (RCT) or the pipeline approach can
both observables and non-observables, except          be compromised by two sets of problems: contam-
for the intervention. As a second best alternative,   ination and unintended behavioral responses.
several matching techniques (e.g., propensity
score matching) can be used to create control         Contamination: Contamination (or contagion,
groups that are as similar to participant groups      treatment diffusion) refers to the problem of
as possible (see below).                              groups of people that are not supposed to be
                                                      exposed to certain project benefits but in fact are
4.2. Quantitative methods addressing the              benefiting from them. Contamination comes from
attribution problem3                                  two possible sources. The first is from the interven-
In this section we discuss experimental (e.g.,        tion itself, as a result of spill-over effects. Interven-
randomized controlled trials), quasi-experimen-       tions are most often planned and implemented
tal (e.g., propensity score matching), and regres-    within a delimited space (a village, district, nation,
sion-based techniques.4, 5                            region, or institution). The influence zone of an
                                                      intervention may, however, be larger than the
Three related problems that quantitative impact       core area where the intervention takes place or
evaluation methods attempt to address are the         is intended to generate results (geographical spill-
following:                                            over effects). To avoid contamination, control and
                                                      comparison groups must be located outside the
ˇ The establishment of a counterfactual: What         influence zone. Second, the selected compari-
  would have happened in the absence of the           son group may be subject to similar interven-
  intervention(s)?                                    tions implemented by different agencies, or even
ˇ The elimination of selection effects, leading to    somewhat dissimilar interventions that affect
  differences between the intervention group          the same outcomes. The counterfactual is thus
  (or treatment group) and the control group          a different type of intervention rather than no
ˇ A solution for the problem of unobservables:        intervention. This problem is often overlooked.
  The omission of one or more unobserved vari-        A good intervention theory as a basis for designing
  ables, leading to biased estimates.                 a measurement instrument that records the differ-
                                                      ent potential problems of contamination is a good
Selection effects occur, for example, when those      way to address this problem.
in the intervention group are more or less
motivated than those in the control group. It         Unintended behavioral responses: In any
is particularly a problem when the variable in        experiment people may behave differently when
question, in this case motivation, is not easily      they know that they are part of the intervention
observable. As long as selection is based on          or treatment. Consequently, this will affect data.
observable characteristics and these are measured     The resulting bias is even more pronounced
in the evaluation, they may be included--and          when the researcher has to rely on recall data
thus controlled for--in the regression analysis.      or self-reported effects. Several unintended
However, not all relevant characteristics are         behavioral responses not caused by the interven-
observed or measured. This problem of selection       tion or by "normal" conditions might therefore
of unobservables is one of the main problems in       disrupt the validity of comparisons between
impact evaluation.                                    groups and hence the ability to attribute changes
                                                      to project incentives. Important possible effects
In the following sections we will discuss differ-     are the following (see Shadish et al., 2002; Rossi
ent techniques of quantitative impact evaluation,     et al., 2004):
thereby mainly focusing our discussion on the
selection bias issue. In trying to deal systemati-    ˇ Expected behavior or compliance behavior:
cally with selection effects, (quasi-) experimental     Participants react in accordance with inter-
design-based approaches such as the randomized          vention staff expectations for reasons such

                                                                                                                                23
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                      as compliance with the established contract             To determine if the intervention had a statistically
                      or certain expectations about future benefits           significant impact, one simply performs a test
                      from the organization (not necessarily the              of equality between the mean outcomes in the
                      project).                                               experiment and control group. Statistical analysis
                  ˇ   Compensatory equalization: Discontent                   will tell you if the impact is statistically signifi-
                      among staff or recipients with inequality be-           cant and how large it is. Of course, with larger
                      tween incentives might result in compensa-              samples, the statistical inferences will be increas-
                      tion of groups that receive less than other             ingly precise; but if the impact of an intervention
                      groups.                                                 really is large, it can be detected and measured
                  ˇ   Compensatory rivalry: Differentiation of in-            even with a relatively small sample.
                      centives to groups of people might result in
                      social competition between those receiving              A proper RCT addresses many attribution issues,
                      (many) intervention benefits and those receiv-          but has to be planned and managed carefully to
                      ing fewer or no benefits.                               avoid contamination and other risks. Risks of a
                  ˇ   Hawthorne effect: The fact of being part of             RCT are (i) different rates of attrition in the two
                      an experiment rather than the intervention as           groups, possibly caused by a high dropout in one
                      such causes people to change their behavior.            of the two groups, (ii) spillover effects (contami-
                  ˇ   Placebo effect: The behavioral effect is not            nation) resulting in the control group receiv-
                      the result of the incentives provided by the            ing some of the treatment, and (iii) unintended
                      intervention but of people's perception of the          behavioral responses.
                      incentives and the subsequent anticipatory
                      behavior.                                               4.2.2. Pipeline approach
                                                                              One of the problems for the evaluation of
                  These problems are relevant in most experimen-              development projects or programs is that evalua-
                  tal and quasi-experimental design approaches                tors rarely get involved early enough to design
                  that are based on ex ante participant and control/          a good evaluation (although this is changing).
                  comparison group designs.6 They are less relevant           Often, households or individuals are selected for
                  in regression-based approaches that use statisti-           a specific project, but not everybody participates
                  cal matching procedures or that do not rely on              (directly) in the project. A reason may be a gradual
                  the participant-control group comparison for                implementation of the project. Large projects
                  counterfactual analysis.7                                   (such as in housing or construction of schools)
                                                                              normally have a phased implementation.
                  4.2.1. Randomized controlled trial
                  The safest way to avoid selection effects is a              In such a case, it may be possible to exploit
                  randomized selection of the intervention and                this phase of the project by comparing the
                  control groups before the experiment starts.                outcomes of households or communities that
                  When the experimental group and the control                 actually participate (the experiment group)
                  group are selected randomly from the same                   with households or communities that are
                  eligible population, both groups will have similar          selected but do not participate (the compari-
                  average characteristics (except that one group has          son group). A specific project (school building)
                  been subjected to the intervention and the other            may start, for instance, in a number of villages
                  has not). Consequently, in a well-designed and              and be implemented later in other villages. This
                  correctly implemented RCT, a simple compari-                creates the possibility of evaluating the effect of
                  son of average outcomes in the two groups can               school building on enrollment. One has to be
                  adequately resolve the attribution problem and              certain, of course, that the second selection--
                  yield accurate estimates of the impact of the               the actual inclusion in the project--does not
                  intervention on a variable of interest; by design,          introduce a selection bias. If, for instance, at
                  the only difference between the two groups was              the start of the project a choice is made to start
                  the intervention.                                           construction in a number of specific villages, the

24
                                                                                     a d d r E s s t h E at t r I b u t I o n p r o b l E m




(relevant) characteristics of these villages must              10 years from a small rural school with a high
be similar to other villages that are eligible for             pupil:teacher ratio in a poor district another
new schools. Self-selection (of villages that are              boy with the same observed characteristics. This
eager to participate) or other selection criteria              would be a time-consuming procedure, especially
(starting in remote areas or in urban areas) may               for 100 pupils.
introduce a selection bias.
                                                               An alternative way to create a control group
4.2.3. Propensity score matching                               for this case is the method of propensity score
When no comparison group has been created at                   matching. This technique involves forming pairs,
the start of the project or program, a compari-                not by matching every characteristic exactly, but
son group may be created ex post through a                     by selecting groups that have similar probabilities
matching procedure: for every member of the                    of being included in the sample as the treatment
treatment group, one or more members in a                      group. The technique uses all available informa-
control group are selected on the basis of similar             tion to construct a control group (see box 4.1.).8
observed (and relevant) characteristics.                       Rosenbaum and Rubin (1983) showed that this
                                                               method makes it possible to create a control
Suppose there are two groups, one a relatively                 group ex post with characteristics that are similar
small intervention group of 100 pupils who will                to the intervention group that would have been
receive a specific reading program. If we want                 created had its members been selected randomly
to analyze the effects of this program, we must                before the beginning of the project.
compare the results of the pupils in the program
with other pupils who were not included in                     It should be noted that the technique only
the program. We cannot select just any control                 deals with selection bias on observables and
group, because the intervention group may have                 does not solve potential endogeneity bias (see
been self-selected on the basis of specific charac-            appendix 4), which results from the omission
teristics (pupils with relatively good results or              of unobserved variables. Nevertheless, propen-
relatively bad results, pupils from rural areas,               sity score matching may be combined with the
from private schools or public schools, boys,                  technique of double differencing to correct for
girls, orphans, etc.). Therefore, we need to select            the influence of time-invariant unobservables (see
a group with similar characteristics. One way of               below). Moreover, the technique may require a
doing this would be to find for every boy age                  large sample for the selection of the comparison




  Box 4.1: Using propensity scores to select a matched comparison group--The Vietnam Rural
  Roads Project


  The survey sample included 100 project communes and 100 non-         able to draw on commune-level data collected for administra-
  project communes in the same districts. Using the same districts     tive purposes that cover infrastructure, employment, education,
  simplified survey logistics and reduced costs, but communes          health care, agriculture, and community organization. These
  were still far enough apart to avoid "contamination" (control        data will be used for contextual analysis, to construct commune-
  areas being affected by the project). A logit model of the prob-     level indicators of welfare, and to test program impacts over
  ability of participating in the project was used to calculate the    time. The administrative data will also be used to model the
  propensity score for each project and non-project commune.           process of project selection and to assess whether there are
  Comparison communes were then selected with propensity               any selection biases.
  scores similar to the project communes. The evaluation was also
  Sources: Van De Walle and Cratty (2005); Bamberger (2006).




                                                                                                                                        25
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                        group, which might pose a problem if secondary          their own ways to select the sample. Sometimes
                        data are not available (see chapter 8).                 the selection is based on physical characteristics
                                                                                that can be observed (type of housing, distance
                        4.2.4. Judgmental matching9                             from water and other services, type of crops or
                        A less precise method for selecting control groups      area cultivated), whereas in other cases selection
                        uses descriptive information from, for example,         is based on characteristics that require screening
                        survey data to construct comparison groups.             interviews (e.g., economic status, labor market
                                                                                activity, school attendance). In these latter cases,
                        Matching areas on observables. In consultation          the interviewer must conduct quota sampling.
                        with clients and other knowledgeable persons, the
                        researcher identifies characteristics that should be    4.2.5. Double difference (difference in
                        matched (e.g., access to services, type or quality of   difference)
                        house construction, economic level, location, or        Differences between the intervention group
                        types of agricultural production). Information from     and the control group may be unobserved
                        maps (sometimes including geographic informa-           and therefore problematic. Nevertheless, even
                        tion system data and/or aerial photographs),            though such differences cannot be measured, the
                        observation, secondary data (e.g., censuses,            technique of double difference (or difference-in-
                        household surveys, school records), and key             difference) deals with these differences as long as
                        informants are then combined to select compari-         they are time invariant. The technique measures
                        son areas with the best match of characteristics.       differences between the two groups, before and
                        Operating under real-world constraints means that       after the intervention (hence the name double
                        it will often be necessary to rely on easily observ-    difference).
                        able or identifiable characteristics (e.g., types of
                        housing and infrastructure). Although this may          Suppose there are two groups, an intervention
                        expedite matters, there may also be unobservable        group I and a control group C. One measures, for
                        differences; the researcher must address these as       instance, enrollment rates before (0) and after
                        much as possible through qualitative research and       (1) the intervention. According to this method,
                        attach the appropriate caveats to any results.          the effect is

                        Matching individuals or households on observ-                   (I1 ­ I0) ­ (C1 ­ C0) or (I1 ­ C1) ­ (I0­ C0).
                        ables. Similar procedures as those noted above
                        can be used to match individuals and households.        For example, if enrolment rates at t = 0 would
                        Sample selection can sometimes draw on existing         be 80% (for the intervention group) and 70% for
                        survey data or ongoing household surveys;               the control group and at t = 1, these rates would
                        however, in many cases researchers must find            be, respectively, 90% and 75%, then the effect of




  Table 4.1: Double difference and other designs


                                               Intervention group         control group                    difference across groups
  Baseline                                            I0                        C0                               I0­C0
  Follow-up                                           I1                        C1                               I1­C1
                                                                                                                 Double-difference:
                                                                                                                 (I1­C1) ­ (I0­C0) =
  Difference across time                              I1­I0                     C1­C0                            (I1­I0) ­ (C1­C0)
 Source: Adapted from Maluccio and Flores (2005).


26
                                                                             a d d r E s s t h E at t r I b u t I o n p r o b l E m




the intervention would be (90% ­ 80%) ­ (75% ­          between interventions, and interactions between
70%) = 5%.                                              interventions and specific characteristics, as
                                                        long as the treatment (or intervention) and the
The techniques of propensity score matching             characteristics of the subjects in the sample are
(see above) and double difference may be                observed (can be measured). With a regression
combined. Propensity score matching increases           approach, it may be possible to estimate the
the likelihood that the treatment and control           contribution of a specific intervention to the total
groups have similar characteristics, but cannot         effect or to estimate the effect of the interaction
guarantee that all relevant characteristics are         between two interventions. The analysis may
included in the selection procedure. The double         include an explicit control group.
difference technique can eliminate the effects of
an unobserved selection bias, but this technique        We must go beyond a standard regression-based
may work better when differences between the            approach when there are unobserved selection
intervention group and the control group are            effects or endogeneity (see next section). A way
eliminated as much as possible. The approach            to deal with unobserved selection effects is the
eliminates initial differences between the two          application of the "difference-in-difference"
groups (e.g., differences in enrollment rates)          approach in a regression model (see appendix
and therefore gives an unbiased estimate of the         4). In such a model we do not analyze the (cross-
effects of the intervention, as long as these differ-   section) effects between groups, but the changes
ences are time invariant. When an unobserved            (within groups) over time. Instead of taking the
variable is time variant (changes over time), the       specific values of a variable in a specific year, we
measured effect will still be biased.                   analyze the changes in these variables over time.
                                                        In such an analysis, unobserved time-invariant
4.2.6. Regression analysis and double                   variables drop from the equation.10
difference
In some programs the interventions are all or           Again, the quality of this method as a solution
nothing (a household or individual is subjected         depends on the validity of the assumption that
to the intervention or not); in others they vary        unobservables are time invariant. Moreover, the
continuously over a range, as when programs vary        quality of the method also depends on the quality
the type of benefit offered to target groups. One       of the underlying data. The method of double
example is a cash transfer program or a microfi-        differencing is more vulnerable than some other
nance facility where the amount transferred or          methods to the presence of measurement error
loaned may depend on the income of the partici-         in the data.
pant; improved drinking water facilities are
another example. These facilities differ in capacity    4.2.7. Instrumental variables
and are implemented in different circumstances          An important problem when analyzing the impact
with beneficiaries living at different distances to     of an intervention is the problem of endogeneity.
these facilities.                                       The most common example of endogeneity is
                                                        when a third variable causes two other variables
In addition to the need to deal with both discrete      to correlate without there being any causality. For
and continuous interventions, we also need to           example, doctors are observed to be frequently in
control for other factors that affect the outcome       the presence of people with fevers, but doctors
other than the magnitude of the intervention.           do not cause the fevers; it is the third variable
The standard methodology for such an approach           (the illness) that causes the two other variables
is a regression analysis. One of the reasons for the    to correlate (people with fevers and the presence
popularity of regression-based approaches is their      of doctors). In econometric language, when
flexibility: they may deal with the heterogeneity       there is endogeneity an explanatory variable will
of treatment, multiple interventions, heterogene-       be correlated with the error term in a mathemati-
ity of characteristics of participants, interactions    cal model (see appendix 4). When an explanatory

                                                                                                                                27
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  variable is endogenous, it is not possible to give          tion depends on income. On the left side of
                  an unbiased estimate of the causal effect of this           the cut-off point, people (or households) have
                  variable.                                                   an income that is just low enough to be eligible
                                                                              for participation; on the right side of the cut-off
                  Selection effects also give rise to bias. Consider          point, people are no longer allowed to partici-
                  the following example. Various studies in the field         pate, even though their income is just slightly
                  of education find that repeaters produce lower              higher. There may be more criteria that define
                  test results than non-repeaters. A preliminary and          the threshold, and these criteria may be explicit
                  false conclusion would be that repetition does              or implicit. Regression discontinuity analysis
                  not have a positive effect on student performance           compares the treatment group with the control
                  and that it is simply a waste of resources. But such        group at the cut-off point. At that point, it is
                  a conclusion neglects the endogeneity of repeti-            unlikely that there are unobserved differences
                  tion: intelligent children with well-educated               between the two groups.
                  parents are more likely to perform well and
                  therefore not repeat. Less intelligent children, on         Suppose we want to analyze the effect of a
                  the other hand, will probably not achieve good              specific program to improve learning achieve-
                  results and are therefore more likely to repeat.            ments. This program focuses on the poorest
                  So, both groups of pupils (i.e., repeaters and              households: the program includes only
                  non-repeaters) have different characteristics,              households with an income below a certain
                  which at first view makes it impossible to draw             level. We know that learning achievements
                  conclusions based on a comparison between                   are correlated with income,11 and therefore
                  them.                                                       we cannot compare households participat-
                                                                              ing in the program with households that do
                  The technique of instrumental variables is                  not participate. Other factors may induce an
                  used to address the endogeneity problem.                    endogeneity bias (such as differences in the
                  An instrumental variable (or instrument) is a               educational background of parents or the
                  third variable that is used to get an unbiased              distance to the school). Nevertheless, at the
                  estimate of the effect of the original endoge-              cut-off point, there is no reason to assume that
                  nous variable (see appendix 4). A good instru-              there are systematic differences between the
                  ment correlates with the original endogenous                two groups of households (apart from small
                  variable in the equation, but not with the error            differences in income). Estimating the impact
                  term. Suppose a researcher is interested in the             can now be done, for example, by comparing
                  effect of a training program. Actual participation          the mean difference between the regression
                  in the program may be endogenous, because,                  line of learning achievements in function of
                  for instance, the most motivated employ-                    income before the intervention with the regres-
                  ees may subscribe to the training. Therefore,               sion line after (see figure 4.2).
                  one cannot compare employees who had the
                  training with employees who did not without                 A major disadvantage of a regression discon-
                  incurring bias. The effect of the training may              tinuity design is that the method assesses the
                  be determined if a subset were assigned to the              marginal impact of the program only around
                  training by accident or through some process                the cut-off point for eligibility. Moreover, it must
                  unrelated to personal motivation. In this case,             be possible to construct a specific threshold,
                  the instrumental variables procedure essentially            and individuals should not be able to manipu-
                  only uses data from that subset to estimate the             late the selection process (ADB, 2006: 14).
                  impact of training.                                         Many researchers prefer regression discontinu-
                                                                              ity analysis above propensity score matching,
                  4.2.8. Regression discontinuity analysis                    because the technique generates a higher
                  The basic idea of regression discontinuity                  likelihood that estimates will not be biased by
                  analysis is simple. Suppose program participa-              unobserved variables.12

28
                                                                                                        a d d r E s s t h E at t r I b u t I o n p r o b l E m




  Figure 4.2: Regression discontinuity analysis


                                                      10
                                                      9
         Learning achievements (standardized score)




                                                      8
                                                      7
                                                      6
                                                      5
                                                       4
                                                       3
                                                       2
                                                       1
                                                       0
                                                           0   10   20   30      40             50        60          70          80          90
                                                                              Income (local currency)




4.3. Applicability of quantitative methods                                    is a small n. The small n problem can arise either
for addressing the attribution problem                                        because the intervention was applied to a single
There are some limitations to the applicabil-                                 unit (e.g., capacity building in a single ministry
ity of the techniques discussed in the previous                               or a national policy change) or a small number
section. We briefly highlight some of the more                                of units or because there is heterogeneity in the
important ones (for a more comprehensive                                      intervention so that only a small number of units
discussion see, e.g., Bamberger and White,                                    received support of a specific type. Where this is
2007). First, in general, counterfactual estimation                           a small n, then a variety of other approaches can
is not applicable in full-coverage interventions                              be used (see § 4.4.).
such as price policies or regulation on land use,
which affect everybody (although to different                                 An important critique of the applicability of these
degrees). In this case there are still possibilities                          methods refers to the nature of the intervention
to use statistical "counterfactual-like" analyses,                            and the complexity of the context in which the
such as those that focus on the variability in                                intervention is embedded. The methodological
exposure/participation in relation to changes in                              difficulties of evaluating complicated interven-
an outcome variable (see, e.g., Rossi et al., 2004).                          tions to some extent can be "neutralized" by
Second, there are several pragmatic constraints                               deconstructing them into their "active ingredi-
to applying this type of analysis, especially with                            ents" (see, e.g., Vaessen and Todd, 2008).13
respect to randomization and other design-                                    Consider the example of school reform in
based techniques. For example, there might                                    Kenya as described by Duflo and Kremer (2005).
be ethical objections to randomization or lack                                School reform constitutes a set of different
of data representing the baseline situation of                                simultaneous interventions at different levels,
intervention target groups (see chapter 8).                                   ranging from revisions in and decentralization
Third, applicability of quantitative approaches                               of the budget allocation process, to addressing
(experimental and non-experimental) also                                      links between teacher pay and performance, to
largely depends on the number of observations                                 vouchers and school choice. Although the total
(n) available for analysis. Quantitative analysis is                          package of interventions constituting school
only meaningful if n is reasonably large: statisti-                           reform represents an impressive landscape of
cally based approaches are not applicable if there                            causal pathways of change at different levels,

                                                                                                                                                           29
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  directly and indirectly affecting individual school,        atic control of the influence of other factors can
                  teacher, and student welfare in different ways,             significantly increase the reliability of findings
                  it can be unpacked into different (workable)                (see also chapter 8).
                  components, such as teacher incentives and
                  their effects on student performance indicators             Some final remarks on attribution are in order.
                  or school vouchers and their effects on student             Given the centrality of the attribution issue in
                  performance.                                                impact evaluation, we concur with many of our
                                                                              colleagues that there is scope for more quantita-
                  True experimental designs have been relatively              tive impact evaluation, as these techniques offer a
                  rare in development settings (but not rare in               comparative advantage of formally addressing the
                  developing countries, as medical tests routinely            counterfactual. Therefore, with a relatively large
                  use a randomized approach). Alternatively,                  n, a quantitative approach is usually preferred.
                  quasi-experiments using non-random assign-                  However, at the same time it is admitted that, given
                  ment to participant and control groups are more             the limitations discussed above, the application
                  widely applicable. Preferably, double differ-               of experimental and quasi-experimental design-
                  ence (participant-control group comparisons                 based approaches will necessarily be limited to
                  over time) designs should be used. However,                 only a part of the total amount of interventions in
                  it is more usual that impact assessments are                development.14
                  based on less rigorous--and reliable--designs,
                  where--                                                     The combination of theory-based evaluation
                                                                              and quantitative impact evaluation provides
                  ˇ Baseline data are reconstructed or collected              a powerful methodological basis for rigorous
                    late during the implementation phase.                     impact evaluation for several reasons:
                  ˇ Baseline data are collected only for the treat-
                    ment group.                                               ˇ The intervention theory will help indicate
                  ˇ There are no baseline data for the treatment                which of the intervention components are
                    or control group.                                           amenable to quantitative counterfactual analy-
                                                                                sis through, for example, quasi-experimental
                  If no baseline data exist, then the impact of                 evaluation and how this part of the analysis
                  the intervention is measured by comparing the                 relates to other elements of the theory.15
                  situation afterward between the treatment and               ˇ The intervention theory approach will help
                  control groups. This comparison of end-line                   identify key determinants of impact variables
                  data is measured by a single difference (see also             to be taken into account in a quantitative im-
                  appendix 14).                                                 pact evaluation.
                                                                              ˇ The intervention theory approach can pro-
                  Some impact evaluations are based on pure                     vide a basis for analyzing how an interven-
                  "before and after" comparisons of change only                 tion affects particular individuals or groups in
                  for the treatment group, with no comparison                   different ways; although quantitative impact
                  group at all. The measure in such cases is also               evaluation methods typically result in quan-
                  a single difference, but the lack of a proxy for              titative measures of average net effects of an
                  the counterfactual makes conclusions based on                 intervention, an intervention theory can help
                  this design less robust. This design gives a valid            to support the analysis of distribution of costs
                  measure of impacts only in the rare situations                and benefits (see chapter 5).
                  when no other factors can explain the observed              ˇ The intervention theory can help strengthen
                  change, or when the intervention of interest is the           the interpretation of findings generated by
                  only factor influencing the conditions. In other              quantitative impact evaluation techniques.
                  words, all other factors are stable, or there are no
                  other cause-effect relationships than between the           This symbiosis between theory-based evalua-
                  intervention and the observed change. A system-             tion and quantitative impact evaluation has been

30
                                                                            a d d r E s s t h E at t r I b u t I o n p r o b l E m




acknowledged by a growing number of authors            check on the validity of the conclusions and can
in both the general impact evaluation literature       help one understand why the results are as they
(e.g., Cook, 2000; Shadish et al., 2002; Rossi et      are. Pawson and Tilley (1997) criticized experi-
al., 2004; Morgan and Winship, 2007) as well as        mentalists by highlighting what they perceive as
in the literature on development impact evalua-        a lack of attention to explanatory questions in
tion (e.g., Bamberger et al., 2004; Bourguignon        (quasi-) experiments. Consequently, GEM can be
and Sundberg, 2007; Ravallion, 2008). When             helpful by involving the evaluator in setting up a
this combination is not feasible, alternative          "competition" between the conclusions from the
methods embedded in a theory-based evaluation          evaluation and possible other hypotheses.
framework should be applied.
                                                       A second example is causal contribution analysis
4.4. other approaches                                  (see Mayne, 2001; described in chapter 3).
In this section we introduce a range of method-        Contribution analysis relies on chains of logical
ological approaches that can be used to address        arguments that are verified through careful
the attribution problem or particular aspects of       analysis. Rigor in this type of causal analysis
the impact evaluation.16                               involves systematically identifying and investigat-
                                                       ing alternative explanations for observed impacts.
4.4.1. Alternative approaches for addressing the       This includes being able to rule out implementa-
attribution problem                                    tion failure as an explanation for lack of results
The methods discussed in the previous sections         and developing testable hypotheses and predic-
have the advantage of allowing for an estima-          tions to identify the conditions under which
tion of the magnitude of change attributable           interventions contribute to specific impacts.
to a particular intervention using counterfac-
tual analysis. There are also other (qualitative)      Some of these hypotheses can be tested using the
methods that can be useful in addressing the           quantitative methods discussed previously. In this
issue of attribution. However, these methods as        sense, contribution analysis, and other variants
such do not quantify effects attributable to an        of theory-based analysis, provide a framework in
intervention.17                                        which quantitative methods of impact evaluation
                                                       could be used to test particular causal assump-
A first example of an alternative approach is          tions. If the latter is not possible, the verifica-
the so-called General Elimination Methodology          tion and refinement of the causal story should
(GEM). This approach is epistemologically related      exclusively rely on other (multiple) methods of
to Popper's falsification principle. Michael Scriven   inquiry (see chapter 5).
added it to the methodology of (impact) evalua-
tions. Although in some papers he suggested that       4.4.2. Participatory approaches18
the GEM approach was particularly relevant for         Nowadays, participatory methods have become
dissecting causality chains within case studies,       mainstream tools in development in almost every
both in his earlier work and in a more recent          area of policy intervention. The roots of participa-
paper (Scriven, 1998), he makes clear that the         tion in development lie in the rural sector, where
GEM approach is relevant for every type of expert      Chambers (1995) and others developed the
practice, including RCTs and case studies (see         now widely used principles of participatory rural
appendix 2 for a more detailed discussion).            appraisal. Participatory evaluation approaches
                                                       (see, e.g., Cousins and Whitmore, 1998) are built
What is the relevance of this approach for             on the principle that stakeholders should be
impact evaluation? Given the complexity of             involved in some or all stages of the evaluation.
solving the attribution problem, GEM can help          As Greene (2006: 127ff) illustrates, "[P]articipa-
"test" different counterfactuals that have been        tory approaches to evaluation directly engage the
put forward in a theoretical way. When doing           micropolitics of power by involving stakeholders
(quasi-)experiments, using GEM can be an extra         in important decision-making roles within

                                                                                                                               31
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  the evaluation process itself. Multiple, diverse            ing an intervention theory or multiple theories,19
                  stakeholders collaborate as co-evaluators, often            which subsequently can be refined or put to the
                  as members of an evaluation team." Participa-               test during further analysis.
                  tory evaluation can be perceived as a develop-
                  mental process in itself, largely because it is "the        Some of the latter benefits can also be realized by
                  process that counts" (Whitmore, 1991). In the               using qualitative methods that are nonparticipa-
                  case of impact evaluation, participation includes           tory (see Mikkelsen, 2005; see also appendix 9).
                  aspects such as the determination of objectives,            This brings us to an important point. There is a
                  indicators to be taken into account, as well as             common misperception that there is a finite and
                  stakeholder participation in data collection and            clearly defined set of so-called "participatory"
                  analysis. In practice it can be useful to differenti-       evaluation methods. Although certain methods
                  ate between stakeholder participation as a process          are often (justifiably) classified under the banner
                  and stakeholder perceptions and views as sources            of participatory methods because stakeholder
                  of evidence (Cousins and Whitmore, 1998).                   participation is a defining feature, many methods
                                                                              not commonly associated with stakeholder partic-
                  Participatory approaches to impact evaluation               ipation (including, for example, (quasi-) experi-
                  can be important for several reasons. First, one            mental methods) can also be used in more or less
                  could ask the legitimate question of impact                 participatory ways, with or without stakeholder
                  "according to whom." Participatory approaches               involvement. The participatory aspect of method-
                  can be helpful in engaging stakeholders on the              ology is largely determined by the issues of who is
                  issue of what is to be valued in a particular impact        involved and who does or decides on what and how.
                  evaluation. By engaging a range of stakeholders,            For example, the methodology for testing water
                  a more comprehensive and/or appropriate set of              quality to ascertain the impact of treatment facili-
                  valued impacts is likely to be identified (see the          ties can become participatory if community-level
                  second key issue of this Guidance document).                water users are involved in deciding, for example,
                  When identifying the (type and scope of the)                what aspects of water quality to measure and how
                  intervention to be evaluated (see first chapter),           to collect the data and report the results.
                  participatory methods might be of particular use;
                  aspects that might be "hidden" behind official              Methodologies commonly found under the
                  language and political jargon (in documents) can            umbrella of participatory (impact) evaluation
                  be revealed by narrative analyses and by consult-           include appreciative inquiry; beneficiary assess-
                  ing stakeholders. More generally, the process               ment; participatory impact pathway analysis;
                  of participation in some cases can enhance                  participatory impact monitoring (see box 4.2.);
                  stakeholder ownership, the level of understand-             poverty and social impact analysis; social return
                  ing of a problem among stakeholders, and utiliza-           on investment; systematic client consultation;
                  tion of impact evaluation results.                          self-esteem, associative strength, resourceful-
                                                                              ness, action planning and responsibility; citizen
                  Within the light of the attribution issue,                  report cards; community score cards; and the
                  stakeholder perspectives can help improve                   Participatory Learning and Action toolbox20 (see,
                  an evaluator's understanding of the complex                 for example, IFAD, 2002; Mikkelsen, 2005; Pretty
                  reality surrounding causal relationships among              et al., 1995; Salmen and Kane, 2006).
                  interventions and outcomes and impacts.
                  In addition, insight into the multiple and                  These methods rely on different degrees of
                  (potentially) contrasting assumptions about                 participation, ranging from consultation to
                  causal relationships between an intervention                collaboration to joint decision making. In
                  and processes of change can help enrich an                  general, the higher the degree of participation,
                  evaluator's perspective on the attribution issue.           the more costly and difficult it is to set up the
                  As discussed in chapter 3, stakeholder perspec-             impact evaluation. In addition, a high degree
                  tives can be an important source for reconstruct-           of participation might be difficult to realize in

32
                                                                                    a d d r E s s t h E at t r I b u t I o n p r o b l E m




large-scale comprehensive interventions such as             ˇ For testing/refining particular parts (i.e., as-
sector programs.21                                            sumptions) of the impact theory but not specif-
                                                              ically focused on impact assessment as such
Apart from the previously discussed potential               ˇ For strengthening particular lines of argumen-
benefits of an impact evaluation involving some               tation with additional/detailed knowledge,
element of stakeholder participation, disadvan-               useful for triangulation with other sources of
tages of participatory approaches include the                 evidence
following:                                                  ˇ For deepening the understanding of the na-
                                                              ture of particular relationships between inter-
ˇ Limitations to the validity of information based            vention and processes of change.
  on stakeholder perceptions (only); this prob-
  lem is related to the general issue of short-             The literature on (impact) evaluation methodol-
  comings in individual and group perceptional              ogy, as in any other field of methodology, is riddled
  data.                                                     with labels representing different (and sometimes
ˇ The risk of strategic responses, manipulation,            not so different) methodological approaches. In
  or advocacy by stakeholders can influence the             essence however, methodologies are built upon
  validity of the data collection and analysis.22           specific methods. Survey data collection and
ˇ Limitations to the applicability of impact evalu-         (descriptive) analysis, semi-structured interviews,
  ation with a high degree of participation espe-           and focus-group interviews are but a few of the
  cially in large-scale, comprehensive, multi-site          specific methods that are found throughout
  interventions (aspects of time and cost).                 the landscape of methodological approaches to
                                                            impact evaluation.
4.4.3. Useful methods for data collection and
analysis that are often part of impact evaluation           Evaluators, commissioners, and other stakehold-
designs23                                                   ers in impact evaluation should have a basic
In this section we distinguish a set of methods             knowledge about the more common research
that are useful:                                            techniques:24




  Box 4.2: Participatory impact monitoring in the context of the poverty reduction strategy process


  Participatory impact monitoring builds on the voiced perceptions   ˇ Contribute to methodology development, strengthen the knowl-
  and assessments of the poor and aims to strengthen these as          edge base, and facilitate cross-country learning on the effective
  relevant factors in decision making at national and subnational      use of participatory monitoring at the policy level, and in the
  levels. In the context of poverty reduction strategy monitoring      context of poverty reduction strategy processes in particular.
  it will provide systematic and fast feedback on the implementa-
  tion progress, early indications of outcomes, impact, and the          Conceptually, the proposed project impact monitoring approach
  unintended effects of policies and programs.                       combines (1) the analysis of relevant policies and programs at the
      The purposes are as follows:                                   national level, leading to an inventory of "impact hypotheses," with
                                                                     (2) extensive consultations at the district/local government level,
  ˇ Increase the voice and the agency of poor people through         and (3) joint analysis and consultations with poor communities
    participatory monitoring and evaluation                          on their perceptions of change, their attributions to causal fac-
  ˇ Enhance the effectiveness of poverty oriented policies and       tors, and their contextualized assessments of how policies and
    programs in countries with poverty reduction strategies          programs affect their situation.

  Source: Booth and Lucas (2002).




                                                                                                                                        33
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  Descriptive statistical techniques (e.g., of survey            or testing particular causal assumptions of the
                  or registry data): The statistician Tukey (e.g.,               intervention theory. These techniques (including
                  Tukey, 1977) argued for more attention to explor-              the first bullet point) are also used in the (quasi-)
                  atory data analysis techniques as powerful and                 experimental and regression-based approaches
                  relatively simple ways to understand patterns in               described in § 4.2. For more information, see
                  data. Examples include univariate and bivariate                Agresti and Finlay (1997) or Hair et al. (2005)
                  statistical analysis of primary or secondary data              or, more specifically for development contexts,
                  using graphical analysis and simple statistical                see Casley and Lury (1987) or Mukherjee et al.
                  summaries (e.g., for univariate analysis: mean,                (1998).
                  standard deviation, median, interquartile range;
                  for bivariate analysis: series of boxplots, scatter-           "Qualitative methods" include widely used
                  plots, odds ratios).                                           methods, such as semi-structured interviews,
                                                                                 open interviews, focus group interviews, partici-
                  Inferential statistical techniques (e.g., of survey            pant observation, and discourse analysis--but
                  or registry data): Univariate analysis (e.g.,                  also less conventional approaches such as
                  confidence intervals around the mean; t-test                   mystery guests, unobtrusive measures (e.g.,
                  of the mean), bivariate analysis (e.g., t-test for             through observation; see Webb et al., 2000),
                  difference in means), and multivariate analysis                etc. For more information, see Patton (2002) or,
                  (e.g., cluster analysis, multiple regression) can              more specifically for development contexts, see
                  be rather useful in estimating impact effects                  Mikkelsen (2005) or Roche (1999).25




                     Key message
                     Address the attribution problem. Although there is no single method that is best in all cases (a gold standard), some
                     methods are indeed best in specific cases. When empirically addressing the attribution problem, experimental
                     and quasi-experimental designs embedded in a theory-based evaluation framework have clear advantages over
                     other designs. If addressing the attribution problem can only be achieved by doing a contribution analysis, be clear
                     about that and specify the limits and opportunities of this approach. Overall, for impact evaluations, well-designed
                     quantitative methods may better address the attribution problem. Baseline data are critical when using quantita-
                     tive methods. Qualitative techniques cannot quantify the changes attributable to interventions but should be used
                     to evaluate important issues for which quantification is not feasible or practical, and to develop complementary
                     and in-depth perspectives on processes of change induced by interventions (see next section). Evaluators need
                     a good basic knowledge about all techniques before determining what method to use to address this problem.




34
                                                                                            Chapter 5
Use a mixed-methods approach:
    The logic of the comparative
         advantages of methods


T
       he work by Campbell and others on validity and threats to validity within
       experiments and other types of evaluations have left deep marks on
       the way researchers and evaluators have addressed methodological
challenges in impact evaluation (see Campbell, 1957; Campbell and Stanley,
1963; Cook and Campbell, 1979; Shadish et al., 2002).

5.1. different methodologies have                      vention and impact variable is in fact true?
comparative advantages in addressing                   How can we be sure about the magnitude
particular concerns and needs                          of change?1
Validity can be broadly defined as the "truth of,
or correctness of, or degree of support for an      Applying the logic of comparative advantages
inference" (Shadish et al., 2002: 513). Campbell    makes it possible for evaluators to compare
distinguished among four types of validity, which   methods on the basis of their relative merits in
can be explained in a concise manner by looking     addressing particular aspects of validity. This
at the questions underlying the four types:         provides a useful basis for methodological design
                                                    choice; given the evaluation's priorities, methods
ˇ Internal validity: How do we establish that       that better address particular aspects of validity
  there is a causal relationship between inter-     are selected in favor of others. In addition, the
  vention outputs and processes of change lead-     logic of comparative advantages can support
  ing to outcomes and impacts?                      decisions on combining methods to be able
ˇ Construct validity: How do we make sure that      to simultaneously address multiple aspects of
  the variables we are measuring adequately rep-    validity.
  resent the underlying realities of development
  interventions linked to processes of change?      We will illustrate this logic using the example of
ˇ External validity: How do we (and to what         RCTs. Internal validity usually receives (and justifi-
  extent can we) generalize about findings to       ably so) a lot of attention in impact evaluation, as
  other settings (interventions, regions, target    it lies at the heart of the attribution problem; is
  groups, etc.)?                                    there a causal link between intervention outputs
ˇ Statistical conclusion validity: How do we        and outcomes and impacts? Arguably, RCTs (see
  make sure that our conclusion about the           § 4.2.) are viewed by many as the best method
  existence of a relationship between inter-        for addressing the attribution problem from

                                                                                                             35
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  the point of view of internal validity. Random              validity concerns. In addition, the intervention
                  allocation of project benefits reduces the likeli-          theory as a structure for making explicit causal
                  hood that there are systematic (observable                  assumptions, generalizing findings, and making
                  and unobservable) differences between those                 in-depth analysis of specific assumptions can
                  that receive benefits and those that do not.                help strengthen internal, external, and construct
                  However, this does not make it necessarily the              validity claims.
                  best method overall. For example, RCTs control
                  for differences between groups within the                   To conclude:
                  particular setting that is covered by the study.
                  Other settings have other characteristics that are          ˇ There is no single best method in impact eval-
                  not controlled, hence there are limitations of                uation that can always address the different
                  external validity here.                                       aspects of validity better than others.
                                                                              ˇ Methods have particular advantages in deal-
                  To resolve this issue, Duflo and Kremer (2005)                ing with particular validity concerns; this
                  propose to undertake series of RCTs on the same               provides a strong rationale for combining
                  type of instrument in different settings. However,            methods.
                  as argued by Ravallion, "The feasibility of doing
                  a sufficient number of trials--sufficient to span           5.2. Advantages of combining different
                  the relevant domain of variation found in reality           methods and sources of evidence
                  for a given program, as well as across the range            In principle, each impact evaluation is in some
                  of policy options--is far from clear. The scale of          way supported by different methods and
                  the randomized trials needed to test even one               sources of evidence. For example, even the quite
                  large national program could well be prohibitive"           technical quantitative approaches described
                  (Ravallion, 2008: 19).                                      in § 4.2 include other modes of inquiry, such
                                                                              as the research review to identify key variables
                  Another limitation of RCTs (also valid for other            that should be controlled for in, for example, a
                  approaches discussed in § 4.2.) lies in the realm           quasi-experimental setting. Nevertheless, there
                  of construct validity. Does the limited set of              is a growing literature on the explicit use of
                  indicators adequately represent the impact of               multiple methods to strengthen the quality of
                  a policy on a complex phenomenon such as                    the analysis.2 At the same time the discordance
                  poverty? In-depth qualitative methods can more              between the practice and "theory" of mixed-
                  adequately capture the complexity and diversity             methods research (Bryman, 2006) suggests that
                  of aspects that define (and determine) poverty              mixed-methods research is often more an art
                  than the singular or limited set of impact indica-          than a science.
                  tors taken into account in RCTs. Consequently, the
                  latter have a comparative advantage in address-             Triangulation is a key concept that embodies
                  ing construct validity concerns. However, a                 much of the rationale behind doing mixed-
                  downside of most qualitative approaches is that             methods research and represents a set of
                  the focus is local and findings are very context            principles to fortify the design, analysis, and
                  specific, with limited external validity. External          interpretation of findings in impact evalua-
                  validity can be adequately addressed by, for                tion.3 Triangulation is about looking at things
                  example, quantitative quasi- and non-experimen-             from multiple points of view, a method "to
                  tal approaches that are based on large samples              overcome the problems that stem from studies
                  covering substantial diversity in context and               relying upon a single theory, a single method, a
                  people.                                                     single set of data ... and from a single investiga-
                                                                              tor" (Mikkelsen, 2005: 96). As can be deduced
                  Theory-based evaluation provides the basis for              from the definition, there are different types of
                  combining different methodological approaches               triangulation. Broadly, these are the following
                  that have comparative advantages in addressing              (Mikkelsen, 2005):

36
  u s E a m I x E d - m E t h o d s a p p r o a c h : t h E l o G I c o f t h E c o m pa r at I v E a d va n ta G E s o f m E t h o d s




ˇ Data triangulation--To study a problem using             impact of institutions as "rules of the game" (see
  different types of data, different points in time,       North, 1990), and interventions such as policies
  or different units of analysis                           can be considered as attempts to establish
ˇ Investigator triangulation--Multiple research-           specific rules with the expectation (through a
  ers looking at the same problem                          "theory of change") of generating certain impacts
ˇ Discipline triangulation--Researchers trained            (Picciotto and Wiesner, 1997). In addition, the
  in different disciplines looking at the same             literature on behavioral and social mechanisms
  problem                                                  (see appendix 10; see also chapter 6) provides a
ˇ Theory triangulation--Using multiple compet-             wealth of explanatory insights that help evalua-
  ing theories to explain and analyze a problem            tors better understand and frame processes of
ˇ Methodological triangulation--Using different            change triggered by interventions.
  methods, or the same method over time, to
  study a problem.                                         A good methodological practice in impact evalua-
                                                           tion is to encourage applying these principles of
As can be observed from this list, particular              triangulation as much as possible.
methodologies already embody aspects of triangu-
lation. Quantitative double-difference impact              Advantages of mixed-methods approaches to
evaluation (see § 4.2.), for example, embodies             impact evaluation are the following:
aspects of methodological and data triangula-
tion. Participatory impact evaluation approaches           ˇ A mix of methods can be used to assess impor-
are often used to seek out and reconstruct                   tant outcomes or impacts of the intervention
multiple (sometimes contrasting) perspectives                being studied. If the results from different
on processes of change and impact using diverse              methods converge, then inferences about
methods, often relying on teams of researchers               the nature and magnitude of these impacts
with different disciplinary backgrounds (that                will be stronger. For example, triangulation of
may include members of target groups). Theory-               standardized indicators of children's educa-
based evaluation often involves theory triangula-            tional attainments with results from an analy-
tion (see chapter 3; see also Carvalho and White             sis of samples of children's academic work
[2004], who refer to competing theories in their             yields stronger confidence in the educational
study on social funds). Moreover, it also allows for         impacts observed than either method alone
methodological and data triangulation by relying             (especially if the methods employed have off-
on different methods and sources of evidence to              setting biases).
test particular causal assumptions.                        ˇ A mix of methods can be used to assess differ-
                                                             ent facets of complex outcomes or impacts,
Discipline triangulation and theory triangula-               yielding a broader, richer portrait than one
tion both point to the need for more diversity               method alone can. For example, standardized
in perspectives for understanding processes of               indicators of health status could be mixed with
change in impact evaluation. Strong pleas have               onsite observations of practices related to
recently been made for development evalua-                   nutrition, water quality, environmental risks,
tors to recognize and make full use of the wide              or other contributors to health, jointly yield-
spectrum of frameworks and methodologies                     ing a richer understanding of the interven-
that have emerged from different disciplines                 tion's impacts on targeted health behaviors.
and that provide evaluation with a rich arsenal              In a more general sense, quantitative impact
of possibilities (Kanbur, 2003; White, 2002;                 evaluation techniques work well for a limited
Bamberger and White, 2007). For example,                     set of pre-established variables (preferably
when doing impact evaluations, evaluators can                determined and measured ex ante) but less
benefit from approaches developed in different               well for capturing unintended, less expected
disciplines and subdisciplines. Neo-institution-             (indirect) effects of interventions. Qualita-
alist economists have shown ways to study the                tive methods or descriptive (secondary) data

                                                                                                                                    37
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                    analysis can be helpful in better understand-             ˇ Case 4: A theory-based approach with qualita-
                    ing the latter.                                             tive methods (GEF, 2007).
                  ˇ One set of methods could be used to assess
                    outcomes or impacts and another set to assess             5.3. Average effect versus distribution of
                    the quality and character of program imple-               costs and benefits
                    mentation, including program integrity and                Sometimes policy makers and stakeholders are
                    the experiences during the implementation                 concerned with the question of whether an
                    phase.                                                    intervention (for a specific context and group of
                  ˇ Multiple methods can help ensure that the                 people) has been effective overall. This is typically
                    sampling frame and the sample selection                   a question that can be addressed by using (quasi)
                    strategies cover the whole of the target inter-           experimental evaluation techniques. However,
                    vention and comparison populations. Many                  another important question, one that might
                    sampling frames leave out important sectors               not be easily answered with these techniques, is
                    of the population (usually the most vulnerable            whether and how people are differently affected
                    groups or people who have recently moved                  by an intervention. 4 This question can be
                    into the community), while respondent se-                 answered by using regression analysis. A regres-
                    lection procedures often under-represent                  sion model can incorporate different moderator
                    women, youth, the elderly, or ethnic minori-              variables (e.g., through modeling interaction
                    ties. This is critical because important positive         effects) to analyze to what extent important
                    or negative impacts on vulnerable groups (or              characteristics co-determine outcome variables.
                    other important sectors) are completely ig-               In addition, many qualitative methods such as
                    nored if they do not even get included in the             those used for case studies can help evaluators
                    sample. This is particularly important (and               study in detail how interventions work differ-
                    frequently ignored) where the evaluation uses             ently in different situations. From a methodolog-
                    secondary data sets, as the evaluator often               ical design perspective, a mixed-methods study
                    does not have access to information on how                combining quasi-experimental survey data with
                    the sample was selected.                                  a limited number of in-depth, semistructured
                  ˇ Multiple methods are needed to address the                interviews among different types of people
                    complementary questions of average effect                 from the target population is an example of a
                    and distribution of costs and benefits of an              potentially good framework to provide credible
                    intervention (see § 5.3.)                                 answers to both questions (see box 5.1.).

                  Appendix 11 presents four interesting examples              When talking about the issue of distribution of
                  of impact evaluations that are based on a mixed             costs and benefits of an intervention, it is useful
                  method perspective:                                         to distinguish between different levels or foci.
                                                                              First, one should consider the issue of outreach
                  ˇ Case 1: Combining qualitative and quantitative            or coverage. Who are the people (individuals,
                    descriptive methods--Ex post impact study                 households, and communities) directly affected
                    of the Noakhali Rural Development Project                 by an intervention? Sometimes this question
                    in Bangladesh                                             can be answered in a relatively straightforward
                  ˇ Case 2: Combining qualitative and quantitative            manner, such as when the intervention is clearly
                    descriptive methods--Mixed-methods impact                 delineated and targeted to a specific group of
                    evaluation of International Fund for Agricul-             people (e.g., a training program). In other cases
                    tural Development projects in Gambia, Ghana,              (e.g., a tax cut or construction of a road), coverage
                    and Morocco                                               or outreach, or indeed the delineation of the
                  ˇ Case 3: Combining qualitative and quanti-                 group of people affected by the intervention, is
                    tative descriptive methods--Impact evalu-                 not that easy to determine. In the last case, the
                    ation: agricultural development projects in               issue of delineation is closely linked to the second
                    Guinea                                                    level, how an intervention has different effects on

38
  u s E a m I x E d - m E t h o d s a p p r o a c h : t h E l o G I c o f t h E c o m pa r at I v E a d va n ta G E s o f m E t h o d s




  Box 5.1: Brief illustration of the logic of comparative advantages


  Consider the example of an intervention that provides monetary     ˇ Survey data and case studies could tell how incentives have
  incentives and training to farmers to promote land use changes       different effects on particular types of farm households (po-
  leading to improved livelihoods conditions.                          tentially strengthens internal validity and increases external
     We could use the following methods in the impact evaluation:      validity of findings)
                                                                     ˇ Semistructured interviews and focus group conversations
  ˇ A randomized experiment could be used to assess the ef-            could tell us more about the nature of effects in terms of
    fectiveness of different incentives on land use change and/or      production, consumption, poverty, etc. (potentially enhances
    socio-economic effects of these changes (potentially strength-     construct validity of findings).
    ens internal validity of findings)




groups of people (e.g., how the construction of             Important to note is that an analysis of the distribu-
a road affects different types of businesses and            tion of costs and benefits as a result of an interven-
households near or farther from the new road).              tion--distinguishing among coverage, effects
In the case of a simple training program, the first         on those who are directly affected, and indirect
level (who participates, who is directly affected)          effects--cannot be addressed with one particular
can be neatly separated from the second (how                method. If one is interested in all these questions,
an intervention affects participants in different           then inevitably one needs a framework of multiple
ways). A third level concerns the indirect effects          methods and sources of evidence. For example,
of an intervention. For example, an objective of            descriptive analysis of survey data can help to
a training program may be that participants in              map coverage, quasi-experiments can help to
turn become teachers for the population at large.           assess attribution of change among those directly
While this is an intended indirect effect, multiple         affected, and case studies and survey data analysis
indirect effects on participants, their families, and       can help to map indirect effects over time.
non-participants may occur, some of which may
be quite substantial. Time and scale are important
                                                              Key message
dimensions here (see also chapter 2).
                                                              Use a mixed-methods design. Bear in mind the logic
Often, impact evaluation is about level                       of the comparative advantages of designs and meth-
two--determining the effects on those that are                ods. A mix of methods can be used to assess differ-
directly targeted by/participating in the interven-           ent facets of complex outcomes or impacts, yielding
tion. In those cases, it is often assumed that level          more breadth, depth, and width in the portrait than
one (targeting, outreach) is fully known and                  one method alone can. One set of methods could be
mapped. In other cases, level one--outreach                   used to assess outcomes or impacts and another set
and coverage or indeed the determination of                   to assess the quality and nature of intervention im-
the scope of direct effects of an intervention on             plementation, thus enhancing impact evaluation with
the population at risk--is the great "unknown"                information about program integrity and program
and should be a first priority in an impact evalua-           experiences. It is important to note that an analysis
tion exercise. Level three--indirect processes                of the distribution of costs and benefits of an inter-
of change induced by an intervention, with                    vention--distinguishing among coverage, effects on
potentially important implications for the                    those directly affected, and indirect effects--cannot
distribution of costs and benefits among target               be addressed with one particular method. Answer-
populations and beyond--is often outside the                  ing these questions requires a framework of multiple
scope of impact evaluations (see Ravallion,                   methods and sources of evidence.
2008).

                                                                                                                                    39
                                                                                          Chapter 6
          Build on existing knowledge
              relevant to the impact of
                          interventions


R
        eview and synthesis approaches are commonly associated with sys-
        tematic reviews and meta-analyses. Using these methods, comparable
        interventions evaluated across countries and regions can provide the
empirical basis to identify "robust" performance goals and to help assess the
relative effectiveness of alternative interventions under different country
contexts and settings. These methods can lead to increased emphasis on the
rigor of impact evaluations so they can contribute to future knowledge build-
ing as well as meet the information needs of stakeholders. These methods
can also lead to a more selective approach to extensive impact evaluation,
where existing knowledge is more systematically reviewed before undertak-
ing a local impact evaluation.


"Systematic review" is a term that is used to       ˇ Defining and applying criteria for assessing the
indicate a number of methodologies that deal          methodological quality of the documents
with synthesizing lessons from existing evidence.   ˇ Extracting information1
In general, one can define a systematic review as   ˇ Synthesizing the information into findings.
a synthesis of primary studies that contains an
explicit statement of objectives and is conducted   A meta-analysis is a quantitative aggregation of
according to a transparent, methodical, and         effect scores established in individual studies.
replicable methodology (Greenhalgh et al.,          The synthesis is often limited to a calculation of
2004). Typical features of a protocol underlying    an overall effect score that expresses the impact
a systematic review are the following (Oliver et    attributable to a specific intervention or group
al., 2005):                                         of interventions. To arrive at such a calcula-
                                                    tion, meta-analysis involves a strict procedure
ˇ   Defining the review question(s)                 to search for and select appropriate evidence of
ˇ   Developing the protocol                         the impact of single interventions. The selection
ˇ   Searching for relevant bibliographic sources    of evidence is based on an assessment of the
ˇ   Defining and applying criteria for including    methodology of the single-intervention impact
    and excluding documents                         study. In this type of assessment, usually a hierar-

                                                                                                           41
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                     chy of methods is applied in which RCTs rank                  and (to a lesser extent) criminal justice and
                     highest and provide the most rigorous sources                 social work (Clarke, 2006). Knowledge reposi-
                     of evidence for meta-analysis. Meta-analysis                  tories such as the Campbell Collaboration and
                     differs from multicenter clinical trials in that in           Cochrane Society rely heavily on meta-analysis
                     the former, the evaluator has no control over                 as a rigorous tool for knowledge management
                     the single-intervention evaluations as such. As               on what works. Both from within these profes-
                     a result, despite the fact that homogeneity of                sional fields as well as from other fields criticism
                     implementation of similar interventions is a                  has emerged. In part, this criticism reflects
                     precondition for successful meta-analysis, inevita-           a resistance to the idea of a "gold standard"
                     bly meta-analysis is confronted with higher levels            underlying the practice of meta-analysis.
                     of variability in individual project implementa-              The discussion has been useful in that it has
                     tion, context, and evaluation methodology than                helped define the boundaries of applicability of
                     in multicenter clinical trials.                               meta-analysis and the idea that, given the huge
                                                                                   variability in parameters characterizing evalua-
                     Meta-analysis is most frequently applied in                   tions, there is no such thing as a gold standard
                     professional fields such as medicine, education,              (see Clarke, 2006).




  Box 6.1: Narrative review and synthesis study: Targeting and impact of community-based
  development initiatives


  The study was performed by Mansuri and Rao (2004), who re-                  To obtain relevant and reliable evidence on CBD projects, the
  viewed the evidence on community-based development (CBD)                reviewers decided to restrict the review process to peer-reviewed
  projects funded by the World Bank. At the time, it was estimated        publications or studies conducted by independent researchers.
  that an estimated US$ 7 billion of World Bank projects were             This provided an exogenous rule that improved the quality and
  about CBD.                                                              reduced the level of potential bias while casting a wide enough
                                                                          net to let in research from a variety of disciplinary perspectives
  Review questions                                                        on different types of CBD projects. The following sources of evi-
  1. Does community participation improve the targeting of private        dence were included: impact evaluations, which use statistical or
     benefits such as welfare or relief?                                  econometric techniques to assess the causal impact of specific
  2. Are the public goods created by community participation proj-        project outcomes; and ethnographic or case studies, which use
     ects better targeted to the poor?                                    anthropological methods such as participant observation, in-depth
  3. Are such goods of higher quality, or better managed, than            interviews, and focus group discussions.
     similar public goods provided by the government?
  4. Does participation lead to the empowerment of marginalized           Some conclusions
     groups--does it lessen exclusion, increase the capacity for          ˇ Projects that rely on community participation have not been
     collective action, or reduce the possibility that locally powerful      particularly effective at targeting the poor; there is some evi-
     elites will capture project benefits?                                   dence that CBD/community-driven development projects cre-
  5. Do the characteristics of external agents--donors, govern-              ate effective community infrastructure, but not a single study
     ments, nongovernmental organizations (NGOs), and project                establishes a causal relationship between any outcome and
     facilitators--affect the quality of participation or project suc-       participatory elements of a CBD project.
     cess or failure?                                                     ˇ A naďve application of complex contextual concepts like "par-
  6. Can community participation projects be sustainably scaled              ticipation," "social capital," and "empowerment" is endemic
     up?                                                                     among project implementers and contributes to poor design
                                                                             and implementation.

  Source: Mansuri and Rao (2004).




42
                        b u I l d o n E x I s t I n G k n o w l E d G E r E l E va n t t o t h E I m pa c t o f I n t E r v E n t I o n s




Partly as a response to the limitations in applica-       A realist synthesis is a theory-based approach that
bility of meta-analysis as a synthesis tool, more         helps synthesize findings across interventions. It
comprehensive methodologies of systematic                 focuses on the question of which mechanisms
review have been developed. One example is                are assumed to be at work in a given interven-
a systematic review of health behavior among              tion, taking into account the context the interven-
young people in the United Kingdom that                   tion operates in (see appendix 10). Although
involves both quantitative and qualitative synthe-        interventions often appear different, they often
sis (see Oliver et al., 2005). The case shows that        rely on strikingly similar mechanisms. Recogni-
meta-analytic work on evidence stemming from              tion of this can broaden the range of applicable
what the authors call "intervention studies"              evidence from other studies.
(evaluation studies on similar interventions) can
be combined with qualitative systematic review            Combinations of meta-approaches are also
of "non-intervention studies," mainly research on         possible. In a recent study on the impact of public
relevant topics related to the problems addressed         policy programs designed to reduce and/or prevent
by the intervention. Regarding the latter, similar        violence in the public arena, Van der Knaap et al.
to the quantitative part, a systematic procedure          (2008) have shown the relevance of combining
for evidence search, assessment, and selection is         synthesis approaches (see appendix 12).
applied. The difference lies mostly in the synthe-
sis part, which in the latter case is a qualitative
analysis of major findings. The two types of
review can subsequently be used for triangula-
                                                             Key message
tion purposes, reinforcing the overall synthesis
findings.                                                    Build on existing knowledge relevant to the impact
                                                             of interventions. Review and synthesis methods can
Other examples of review and synthesis                       play a pivotal role in impact evaluation in synthesiz-
approaches are the narrative review and the                  ing results and contributing to knowledge. Although
realist synthesis. A narrative review is a descrip-          interventions often appear different, they often may
tive account of intervention processes and/or                rely on strikingly similar mechanisms. Recognition of
results covering a series of interventions (see box          this can broaden the range of applicable evidence.
6.1.). Often, the evaluator relies on a common               As there are several approaches available, it is
analytical framework, which serves as a basis for a          worthwhile to try to combine (some of) them. Review
template that is used for data extraction from the           and synthesis work can provide a useful basis for
individual studies. In the end, the main findings            empirical impact analysis of a specific intervention
are summarized in a narrative account and/or                 and in some cases may even take away the need for
tables and matrices representing key aspects of              further in-depth impact evaluation.
the interventions.




                                                                                                                                      43
                     Part II
Managing Impact Evaluations
                                                                                           Chapter 7
                          Determine if an impact
                        evaluation is feasible and
                                  worth the cost


M
           anagers and policy makers sometimes assume that impact evaluation
           is synonymous with any other kind of evaluation. They might request
           an "impact evaluation" when the real need is for a quite different
kind of evaluation (e.g., to provide feedback on an implementation process or
to assess the accessibility of program services to vulnerable groups). Ensuring
clarity of the information needed and for what purposes is a prerequisite to
defining the type of evaluation to conduct.

Moreover, impact evaluation is not "the" alterna-     ˇ When there is an articulated need to obtain
tive but draws on and complements rather than           the information from an impact evaluation
replaces other types of M&E activities. It should       to know whether the intervention worked,
therefore be seen as one of several in a cycle          to learn from it, to increase transparency of
of potentially useful evaluations in the lifetime       the intervention, and to know its "value for
of an intervention. The rather traditional              money."
difference between ex ante and ex post impact         ˇ When a "readiness assessment" shows that
evaluations remains important, where the ex             political, technical, resource, and other prac-
ante impact assessment is, by nature, largely           tical considerations are adequate and it is
an activity in which "predictions" are made of          feasible to do an impact evaluation. More
any effects and side effects a particular interven-     specifically, this would include the following
tion might have. Ex post impact evaluation, or          conditions:
simply "impact evaluation," as defined by the           ˇ The evaluation has a clearly defined pur-
development community (and elsewhere), can                 pose and an agreed-upon intended use,
test whether and to what extent these ex ante              appropriate to its timing and with support
predictions have been correct. In fact, one of             of influential stakeholders.
the potential uses of impact evaluations, not yet       ˇ There is clarity about the evaluation de-
frequently applied in the field of development             sign. The evaluation design has to be clearly
intervention, could be to strengthen the process           described and well justified after due con-
of ex ante impact assessments.                             sideration of alternatives and constraints.
                                                        ˇ The evaluation design has a chance to be
When should an impact evaluation ideally be                credibly executed, given the nature and
conducted?                                                 context of the intervention, the data, and

                                                                                                           47
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                       information needs and the availability of ad-          Not all interventions should be submitted to
                       equate resources and expertise to conduct              elaborate and costly impact evaluation exercises.
                       the evaluation.                                        Rather, those sectors, regions, and intervention
                  ˇ When an intervention is functioning long                  approaches about which less is known (includ-
                    enough to have visible effects.                           ing new, innovative ideas) should receive funding
                  ˇ When there is sufficient scale (e.g., in terms of         and support for impact evaluation. Ideally,
                    funding, number of people affected) to justify            organizations should pool their resources and
                    a thorough assessment.                                    expertise to select interventions of interest for
                  ˇ When the evaluation is likely to produce                  rigorous and elaborate impact evaluation and
                    "new" knowledge, adding value to the public               consequently contribute jointly to the public
                    knowledge on the effectiveness of particular              good of knowledge on the impact of (under-
                    (innovative) types of interventions and the               evaluated) interventions.
                    mechanisms that "do the work."

                  Impact evaluations may not be appropriate at
                  particular times:
                                                                                 Key message
                  ˇ When other valuable forms of evaluation will                 Determine if an impact evaluation is feasible and
                    yield more useful information to support deci-               worth the cost. Costs can be significant; what are the
                    sions to be made or other purposes                           benefits? In what ways does the impact evaluation
                  ˇ When they move too many resources and too                    contribute to accountability, learning, and informa-
                    much attention away from the need to de-                     tion about the "value for money" of what works?
                    velop and use a rich spectrum of evaluation                  What is the likely added value of an impact evalu-
                    approaches and capacities                                    ation in relation to what is already known about a
                  ˇ When political, technical, practical, or resource            particular intervention? What are the costs? What
                    considerations are likely to prevent a credible,             are the costs of estimating or measuring what would
                    rigorous, and useful evaluation                              have happened without the intervention? Is the likeli-
                  ˇ When there are signs that the evaluation will                hood of getting accurate information on impact high
                    not be used (or may be misused, for example,                 enough to justify the cost of the evaluation?
                    for political reasons).




48
                                                                                         Chapter 8

                                                            Start collecting
                                                                  data early


A
         lthough issues of data and data collection such as availability and qual-
         ity often sound like "mere" operational issues that only need to be
         discussed on a technical level, it should not be forgotten that these
aspects are of crucial importance for any impact evaluation (and any evalua-
tion in general). Data are needed to test whether there have been changes in
the dependent variables or to represent the counterfactual estimate of what
the populations' situation would have been if the project had not taken place.
The data issue is strongly linked to the method of evaluation.


8.1. timing of data collection                     An additional issue concerns short-term versus
Ideally, impact evaluations should be based on     long-term effects. Depending on the interven-
data from both before and after an intervention    tion and its context, at the time of ex post data
has been implemented.1 An important question       collection some effects might not have occurred
is if the baseline period or end-line period is    or not be visible yet, whereas others might wither
representative or normal. If the baseline or       over time. Evaluators should be aware of how this
end-line year (or season) is not normal, then      affects conclusions about impact.
this affects the observed change over time. If,
for example, the baseline year is influenced by    8.2. data availability
unusually high or low agricultural production      In practice, impact evaluation starts with an
or a natural disaster, then the observed change    appraisal of existing data, the data that have been
up to the end-line year can be strongly biased.    produced during the course of an intervention on
In most cases it is the timing of the interven-    inputs, processes, and outputs (and outcomes).
tion, or the impact evaluation, that determines    This inventory is useful for several reasons:
the timing of the baseline and end-line studies.
This timing is not random, and evaluators need     ˇ Available data are useful for reconstructing the
to investigate if the baseline/end-line data are     intervention theory that further guides pri-
representative of "normal" periods before they       mary and secondary data collection efforts.
draw conclusions. If not, even rigorous evalua-    ˇ Available data might affect the choice of meth-
tions may produce unreliable conclusions             odological design or options for further data
about impacts.                                       processing and analysis; for example, ex ante

                                                                                                         49
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                    and ex post data sets of target groups might              A useful stepwise approach for assessing data
                    be complemented with other data sets to                   availability is the following:
                    construct useful control groups; the amount
                    and type of data available might influence the            1. Make an inventory of the availability of data
                    choice of whether to organize additional pri-                and assess its quality. Sometimes secondary
                    mary data collection efforts.                                data can be used to carry out the whole impact
                  ˇ Available data from different sources allow for              study. This is especially true for national or
                    triangulation of findings.                                   sector-wide interventions. More usually,
                                                                                 secondary data can be used to buttress other
                  In addition, evaluators can rely on a variety of               data.
                  data from other sources that can be used in the             2. Analyze, from the perspective of the interven-
                  evaluation process:                                            tion theory, the necessity of additional data.
                                                                                 The process of data gathering must be based
                  ˇ National census data                                         on the evaluation design which is, in turn,
                  ˇ General household surveys such as Living                     partly based on the intervention theory. Data
                    Standards Measurement Surveys                                must be collected across the results chain,
                  ˇ Specialized surveys such as demographic and                  not just on outcomes.
                    health surveys                                            3. Assess the best way(s) to obtain additional
                  ˇ Administrative data collected by line ministries             data.
                    and other public agencies (e.g., on school en-            4. A comparison group sample must be of
                    rolment, use of health facilities, market prices             adequate size, and subject to the same, or
                    for agricultural produce)                                    virtually the same, questionnaire or other
                  ˇ Studies conducted by donor agencies, NGOs,                   data collecting instruments. While some
                    and universities                                             intervention-specific questions may not be
                  ˇ Administrative data from agencies, ministries,               appropriate, similar questions of a more
                    or other organizations                                       general nature can help test for contagion.
                  ˇ Mass media (newspapers, television documen-               5. It is necessary to check if other interventions,
                    taries, etc.); these can be useful, among other              unexpected events, or other processes have
                    things, for understanding the local economic                 influenced developments in the comparison
                    and political context of an intervention.                    group or the treatment group (i.e., check
                                                                                 whether the comparison group is influenced
                  Appendix 13 describes an example of an impact                  by other processes than the treatment
                  evaluation implemented by IEG. In 1986 the                     group).
                  government of Ghana embarked on an ambitious                6. Multiple instruments (e.g., household and
                  program of educational reform, shortening                      facility level) are usually desirable and must
                  the length of pre-university education from 17                 be coded in such a way that they can be
                  to 12 years, reducing subsidies at the second-                 linked.
                  ary and tertiary levels, lengthening the school             7. Baseline data must cover the relevant
                  day, and taking steps to eliminate unqualified                 welfare indicators but preferably also the
                  teachers from schools. There was no clearly                    main determinants of the relevant welfare
                  defined "project" for this study, but the focus was            elements, so it will be easier to investigate
                  World Bank support to the subsector through                    later if other processes than the interven-
                  four large operations. These operations had                    tion have influenced welfare developments
                  supported a range of activities, from rehabilitat-             over time. End-line data must be collected
                  ing school buildings to assisting in the formation             across the results chain, not just on intended
                  of community-based school management                           outcomes.
                  committees. The impact evaluation heavily relied
                  on existing data sets such as the Ghana Living              When there is no baseline, the option of a field
                  Standards Survey for impact analyses.                       survey using recall on the variables of interest may

50
                                                                                    s ta r t c o l l E c t I n G d ata E a r ly




be considered. Many commentators are critical           examples are recall problems or the sensitivity of
of relying on recall. But all survey questions in       certain topics) is equally relevant for semistruc-
the end are recall, so it is a question of degree.      tured interviews and similar techniques in qualita-
The evaluator needs to use his or her judgment          tive research. With respect to group processes
(and knowledge about cognitive processes) as            in qualitative research, Cooke (2001) discusses
to what are credible data, given a respondent's         three of the most widely cited problems: risky
capacity to recall.                                     shift, groupthink, and coercive persuasion.
                                                        A detailed discussion of these issues is beyond
8.3. Quality of the data                                the scope of this guidance. However, they lead
The quality of data can make or break any impact        us to some important points:
evaluation. Mixed methods and triangulation are
strategies to reduce the problem of data quality.       ˇ Data based on the perceptions, views, and
Yet in terms of the quality control that is needed to     opinions of people on the causes and effects
ensure that evaluation findings are not (heavily)         of an intervention (e.g., target groups) do not
biased because of data quality problems, they are         necessarily adequately reflect the real causes
insufficient.                                             of an intervention; data collected through ob-
                                                          servation, measurement, or counting (e.g.,
The evaluator should ask several questions:               assets, farm size, infrastructure, profits) in
                                                          general are less prone to measurement error
ˇ What principles should we follow to improve             (but are not always easy to collect or sufficient
  the quality of data (collection)? Some exam-            to cover all information needs).
  ples of subquestions:                                 ˇ The quality of data is more often than not a
  ˇ How to address missing data (missing ob-              constraining factor in the overall quality of
      servations in a data set, missing variables).       the impact evaluation; it cannot be solved by
  ˇ How to address measurement error--Does                sophisticated methods but might be solved
      the value of a variable or the answer to a          in part through triangulation among data
      question represent the true value?                  sources.
  ˇ How to address specification error--Does
      the question asked or variable measured           8.4. dealing with data constraints
      represent the concept that it was intended        According to Bamberger et al. (2004: 8),
      to cover?                                         "Frequently, funds for the evaluation were not
ˇ Does the quality of the data allow for (ad-           included in the original project budget and the
  vanced) statistical analysis? New advances in         evaluation must be conducted with a much
  and the more widespread use of quasi-experi-          smaller budget than would normally be allocated
  mental evaluation and multivariate data analy-        for this kind of study. As a result, it may not be
  sis are promising in light of impact evaluation.      possible to apply the desirable data collection
  Yet often data quality is a constraining factor       instruments (tracer studies or sample surveys,
  in terms of the quality of the findings (see          for example), or to apply the methods for
  Deaton, 2005).                                        reconstructing baseline data or creating control
ˇ In the case of secondary data, what do we             groups." Data problems are often correlated with
  know about the data collection process that           or compounded by time and budget constraints.
  might strengthen or weaken the validity of            The scenarios laid out in table 8.1 can occur.
  our findings?2
                                                        Bamberger et al. (2004) describe scenarios for
De Leeuw et al. (2008) discuss data quality issues      working within these constraints. For example,
in survey data analysis. Much of their discussion       the implications for quasi-experimental designs
on measurement error (errors resulting from             are that evaluators have to rely on less robust
respondent, interviewer, method, and question-          designs such as ex post comparisons only (see
related sources or a combination of these;              appendix 14).

                                                                                                                            51
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




  Table 8.1: Evaluation scenarios with time, data, and budget constraints


        the constraints under which the
         evaluation must be conducted

      time               Budget          data        typical Scenarios
         X                                           The evaluator is called in late in the project and is told that the evaluation must be completed
                                                     by a certain date so that it can be used in a decision making process or contribute to a report.
                                                     The budget may be adequate but it may be difficult to collect or analyze survey data within
                                                     the time frame.
                              X                      The evaluation is only allocated a small budget, but there is not necessarily excessive time
                                                     pressure. However, it will be difficult to collect sample survey data because of the limited
                                                     budget.
                                           X         The evaluator is not called in until the project is well advanced. Consequently no baseline
                                                     survey has been conducted either on the project population or on a control group. The
                                                     evaluation does have an adequate scope, either to analyze existing household survey
                                                     data or to collect additional data. In some cases the intended project impacts may also
                                                     concern changes in sensitive areas such as domestic violence, community conflict, women`s
                                                     empowerment, community leadership styles, or corruption, on which it is difficult to collect
                                                     reliable data--even when time and budget are not constraints.
         X                    X                      The evaluator has to operate under time pressure and with a limited budget. Secondary survey
                                                     data may be available but there is little time and few resources to analyze it.
         X                                 X         The evaluator has little time and no access to baseline data or a control group. Funds are
                                                     available to collect additional data but the survey design is constrained by the tight deadlines.
                              X            X         The evaluator is called in late and has no access to baseline data or control groups. The
                                                     budget is limited but time is not a constraint.
         X                    X            X         The evaluator is called in late, is given a limited budget, has no access to baseline survey data
                                                     and no control group has been identified.
 Source: Bamberger et al. (2004).




                            Key message
                            Start collecting data early. Good baseline data are        design to be able to design high-quality evaluations.
                            essential to understanding and estimating impact.          Ensuring high-quality data collection should be part
                            Depending on the type of intervention, the collec-         and parcel of every impact evaluation. When work-
                            tion of baseline data, as well as the setup of other       ing with secondary data, a lack of information on the
                            aspects of the impact evaluation, requires an effi-        quality of data collection can restrict data analysis
                            cient relationship between the impact evaluators and       options and the validity of findings. Take notice of and
                            the implementers of the intervention. Policy makers        deal effectively with the restrictions under which an
                            and commissioners need to involve experts in impact        impact evaluation has to be carried out (time, data,
                            evaluation as early as possible in the intervention        and money).




52
                                                                                              Chapter 9

                                                Front-end planning
                                                       is important


F
      ront-end planning refers to the initial planning and design phase of an
      impact evaluation. Ad hoc commissioned impact evaluations usually do
      not have a long planning period, thereby risking a suboptimally planned
and executed evaluation process.

As good impact evaluation relies on good data,          document can be widely circulated and gives
preferably including baseline data, attention           stakeholders and others a chance to comment
to proper front-end planning should be a                and improve upon the intended evaluation design
priority issue. Ideally, front-end planning of          from an early stage. It also helps to generate broad
impact evaluations should be closely articu-            "buy-in" or at worst to define the main grounds
lated to the initial design and planning phase          of potential disagreement between evaluators
of the policy intervention. Indeed, this articu-        and practitioners. In addition, it is wise to use an
lation is most clearly visible in an RCT, in            evaluation matrix when planning and executing
which intervention and impact evaluation are            the work. This tool ensures that key questions
inextricably linked.                                    are identified, together with the ways to address
                                                        them, sources of data, role of theory, etc. This
9.1. Planning tools                                     can also play an important role in stakeholder
Clear definition of scope (chapters 1 and 2) and        consultation, ensuring that important elements
sound methodological design (chapters 3­6)              are not omitted.
cannot be captured in standardized frameworks.
Decision trees on assessing data availability (see      9.2. Staffing and resources
§ 8.2.) and method choice (see appendix 6) are          Resources are important, and spending should
useful, though they provide only partial answers        be safeguarded up front. The longer the time
to methodological design choice issues. Pragmatic       horizon of a study, the more difficult this is.
considerations of time, budget, and data (see           Resources are also important to realize the
§ 8.4) but culture and politics also play a role. Two   much-needed independence of an evalua-
tools that are particularly helpful in the planning     tor and the evaluation team. A template for
phase of an impact evaluation are the approach          assessing the independence of evaluation
paper and the evaluation matrix.                        organizations can be downloaded from http://
                                                        www.ecgnet.org/docs/ecg.doc. This document
The approach paper outlines what the evaluation         specifies a number of criteria and questions that
is about and how it will be implemented. This           can be asked.

                                                                                                               53
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  Evaluation is not only a financial resources                for communication during key points of the
                  business but even more a people's business.                 evaluation.
                  So is the planning of an evaluation. As
                  evaluation projects are usually no longer                   9.3. the balance between independence
                  "lonely hunter" activities, staffing is crucial.            and collaboration between evaluators
                  So when starting the preparation of the study,              and stakeholders
                  a crucial point concerns addressing a number                One of the questions within the world of impact
                  of questions:                                               evaluations is what degree of institutional separa-
                                                                              tion to put between the evaluation providers and
                  ˇ Who are the people who do the evaluation?                 the evaluation users. There is much to be gained
                  ˇ Under which (contractual) conditions are they             from the objectivity provided by having the evalua-
                    doing the job?                                            tion carried out independently of the institution
                  ˇ What is their expertise?                                  responsible for the project being evaluated.
                  ˇ Which roles will they be carrying out?                    Pollitt (1999) warned against "partnerial" evalua-
                                                                              tions, where positions of stakeholders, commis-
                  Topics that deserve attention are the following:            sioners, and evaluators blurred too much. 1
                                                                              However, evaluations often have multiple goals,
                  ˇ The mix of disciplines and traditions that are            including building evaluation capacity within
                    brought together in the team.                             government agencies and sensitizing program
                  ˇ The competencies the team has "in stock."                 operators to the realities of their projects once
                    Competencies range from methodological ex-                they are carried out in the field. At a minimum,
                    pertise to negotiating with institutional actors          the evaluation users, who can range from govern-
                    and stakeholders, getting involved in "hearing            ment agencies in client countries to bilateral
                    both sides" (those evaluated and the principal)           and multilateral donors, international NGOs,
                    and in the clearance of the report.                       and grass roots/civil society organizations, must
                  ˇ The structure of the evaluation team. For                 remain sufficiently involved in the evaluation to
                    the evaluation to be planned and carried out              ensure that the evaluation process is recognized
                    effectively, the roles of the project director,           as legitimate and that the results produced are
                    staff, and other evaluators must be made clear            relevant to their information needs. Otherwise,
                    to all parties.                                           the evaluation results are less likely to be used
                  ˇ The responsibilities of the team members.                 to inform policy. The evaluation manager and
                  ˇ The more an evaluation is linked to a politi-             his or her clients must achieve the right balance
                    cal "hot spot," the more it is necessary that at          between involving the users of evaluations and
                    least one member of the team have a "political            maintaining the objectivity and legitimacy of the
                    nose"--not primarily to deal with administra-             results (Baker, 2000).
                    tors and (local) politicians, but to understand
                    when an evaluation project becomes too much               9.4. Ethical issues
                    of what is known as a partnerial evaluation               It is important to take the ethical objections
                    (Pollitt, 1999).                                          and political sensitivities seriously. There can
                  ˇ Also, staff should be active in realizing an ade-         be ethical concerns with deliberately denying a
                    quate documentation and evaluation trail.                 program to those who need it and providing the
                                                                              program to those who do not; this applies to both
                  A range of skills is needed in evaluation work. The         experimental and non-experimental methods.
                  quality and eventual utility of the impact evalua-          For example, with too few resources, random-
                  tion can be greatly enhanced with coordination              ization may be seen as a fair solution, possibly
                  between team members and policymakers from                  after conditioning on observables. However,
                  the outset. It is therefore important to identify           the information available to the evaluator (for
                  team members as early as possible, agree on roles           conditioning) is typically a partial subset of the
                  and responsibilities, and establish mechanisms              information available "on the ground" (includ-

54
                                                                             f r o n t- E n d p l a n n I n G I s I m p o r ta n t




ing voters/taxpayers). The idea of "intention-to-      of the findings. The client(s) and evaluator could
treat" helps alleviate these concerns; one has         then strategize to either seek ways to increase
a randomized assignment, but anyone is free            the budget or extend the time, or agree to limit
to not participate. Even then, the "randomized         the scope of the evaluation and what it promises
out" group may include people in great need.           to deliver. If clients understand that the current
All these issues must be discussed openly and          design will not hold up under the scrutiny of
weighed against the (potentially large) longer-        critics, they can find ways to help address some
term welfare gains from better information for         of the constraints: "We have found that impact
public decision making (Ravallion, 2008).2             evaluations generally provide rudimentary
                                                       documentation of the data being used. There is
9.5. norms and standards                               evidently a trade-off between decision makers'
As noted before, impact evaluations are often          and bureaucrats' appeal for short and crisp
designed, implemented, analyzed, disseminated,         reports and principles for scientific documenta-
and used under budget, time, and data constraints      tion, but we want to emphasise that displaying
while facing diverse and often competing political     descriptive statistics improves the transparency
interests. Given these constraints, the manage-        of the methodological approach" (Jerve and
ment of a real-world evaluation is much more           Villanger, 2008: 34).
complicated than textbook descriptions.
                                                       For the sake of honest commitment to develop-
Evaluations sometimes fail because the                 ment, evaluators and evaluation units should
stakeholders were not involved, or the findings        ensure that impact evaluations are designed and
were not used because they did not address the         executed in a manner that limits manipulation of
stakeholders' priorities. Others fail because of       processes or results that lean toward any ideolog-
administrative or political difficulties in getting    ical or political agenda. They should also ensure
access to the required data, being able to meet        that there are realistic expectations of what can
with all the individuals and groups that should be     be achieved by a single evaluation within existing
interviewed, or being able to ask all the questions    time and resource constraints, and that findings
that the evaluator feels are necessary. Many other     from the evaluation are presented in ways that are
evaluations fail because the sampling frame,           accessible to the intended users. This includes
often based on existing administrative data,           finding a balance between simple, clear messages
omits important sectors of the target popula-          and properly acknowledging the complexities
tion--often without anyone being aware of this.        and limitations of the findings.
In other cases the budget was insufficient, or was
too unpredictable to permit an adequate evalua-        International evaluation standards (such as
tion to be conducted. Needless to say, evaluations     the OECD-DAC or the United Nations Evalua-
also fail because of emphasizing stakeholders´         tion Group Norms and Standards and/or the
participation too much, leading to partnerial          standards and guidelines developed by national
evaluations (Pollitt, 1999), and because of insuffi-   or regional evaluation associations) should be
cient methodological and theoretical expertise.        applied where appropriate (Picciotto, 2004).

Although many of these constraints are presented       Greater emphasis on impact evaluation for
in the final evaluation report as being completely     evidence-based policy making can create greater
beyond the control of the evaluator, in fact their     risk of manipulation aimed at producing desirable
effects could very probably have been reduced          results (House, 2008). Impact evaluations require
by more effective management of the evaluation.        an honest search for the truth and thus place
For example, a more thorough scoping analysis          high demands on the integrity of those commis-
could have revealed many of these problems, and        sioning and conducting them. For the sake of
the client(s) could then have been made aware of       honest commitment to development, evalua-
the likely limitations on the methodological rigor     tors and evaluation units need to ensure that

                                                                                                                              55
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  impact evaluations are designed and executed in             Evaluation Cooperation Group have a tradition
                  a manner that limits manipulation of processes              of cooperation, shared vision on evaluation, and
                  or that produces results favoring any ideological           longstanding relationships and have fostered
                  or political agenda.                                        numerous joint evaluations.

                  9.6. ownership and capacity building                        The interaction among the international develop-
                  Capacity building at the level of governmental              ment evaluation community, the countries/
                  or non-governmental agencies involved should                regions themselves, and the academic evaluation
                  be an explicit purpose in impact evaluation. In             communities should also be stimulated, as it is
                  cases where sector-wide investment programs are             likely to affect the pace and quality of capacity
                  financed by multidonor co-financing schemes,                building in impact evaluation. Capacity building
                  participating donors would make natural partners            will also strengthen (country and regional)
                  for a joint evaluation (OECD-DAC, 2000).                    ownership of impact evaluation. Providing
                                                                              a space for consultation and agreement on
                  Other factors in selecting other donors as partners         impact evaluation priorities among the differ-
                  in a joint evaluation work may also be relevant.            ent stakeholders of an intervention will also help
                  Selecting donors with similar development                   enhance utilization and ownership.
                  philosophies, cultures, evaluation procedures
                  and techniques, and regional affiliations, and
                  that are geographically close may make working
                  together easier. Another issue may be keeping
                                                                                 Key message
                  the total number of donors "manageable."
                  Where more donors are involved, a key group                    Front-end planning is important. It can help manage
                  of development partners (including national                    the study, its reception, and its use. When managing
                  actors) could assume management responsibili-                  the evaluation, keep a clear eye on items such as
                  ties and the role of others can be more limited.               costs, staffing, ethical issues, and level of indepen-
                  Once appropriate donors that have a likely stake               dence--of the evaluator and the team, versus the
                  in an evaluation topic are identified, the next                level of collaboration with stakeholders. Pay atten-
                  step is to contact them and discern whether they               tion to country and regional ownership of impact
                  are interested in participating. In some cases, an             evaluation and capacity building and promote it.
                  appropriate consortium or group may already                    Providing a space for consultation and agreement
                  exist, where the issue of a joint evaluation can be            on impact evaluation priorities among the different
                  raised and expressions of interest easily solicited.           stakeholders of an intervention will help enhance
                  The DAC Working Party on Aid Evaluation, the                   utilization and ownership
                  United Nations Evaluation Group, and the




56
Appendixes




         57
                                      APPENDIX 1: EXAMPLES OF DIVERSITY IN IMPACT EVALUATION




Example 1. Evaluating the impact of a                 that a training project will bring about a complete
European union-funded training project                transformation from a conventional farming
on low External Input Agriculture in                  system to a LEIA farming system (as assumed in
guatemala                                             the objectives). In line with the literature, the
Within the framework of a European Union-             most popular practices (in this case, for example,
funded integrated rural development project,          organic fertilizers and medicinal plants) were
financial support was provided to a training          those that offer a clear short term return while
project aimed at the promotion of Low External        not requiring significant investments in terms of
Input Agriculture (LEIA) as a viable agricultural     labor or capital. Finally, an ideological faith in the
livelihood approach for small farmers in the          absolute supremacy of LEIA practices is not in the
highlands of western Guatemala.                       best interest of the farmers. Projects promoting
                                                      LEIA should focus on the complementary effects
The impact evaluation design of this project          of LEIA practices and conventional farming
was based on a quasi-experimental design and          techniques, encouraging each farmer to choose
complemented by qualitative methods of data           the best balance fitted to his/her needs.
collection (Vaessen and De Groot, 2004). An
intervention theory was reconstructed on the          Example 2. Assessing the impact of
basis of field observations and relevant literature   Swedish program aid
to make explicit the different causal assumptions     White and Dijkstra (2003) analyzed the impact
of the project, facilitating further data collec-     of Swedish program aid. Their analysis accepted
tion and analysis. The quasi-experimental design      from the start that it is impossible to separate
included data collection on the ex ante and ex        the impact of Swedish money from that of other
post situation of participants, complemented with     donors' money. Therefore, the analysis focuses
ex post data collection involving a control group     on all program aid with nine (country) case
(based on judgmental matching using descriptive       studies that trace how program aid has affected
statistical techniques). Without complex matching     macro-economic aggregates (like imports and
procedures and with limited statistical power, the    government spending) and (through these
strength of the quasi-experiment relied heavily       indicators) economic growth. The authors
on additional qualitative information. This shift     discern two channels for influencing policy:
in emphasis should not give the impression of a       money and policy dialogue. The main evaluation
lack of rigor. Problems such as the influence of      questions are--
selection bias were explicitly addressed, even if
not done in a formal statistical way.                 1. How has the policy dialogue affected the
                                                         pattern and pace of reform (and what has
Farmers' adoption behavior after the termination         been the contribution of program aid to this
of the project can be characterized as selective         process)?
and partial. Given the particular circumstances of    2. What has the impact of the program aid funds
small farmers (e.g., risk aversion, high opportu-        (on imports, government expenditure, invest-
nity costs of labor), it is not realistic to assume      ment, etc.) been?

                                                                                                               59
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  3. What has the impact of reform programs                   a number of ad hoc techniques, such as analyzing
                     been?                                                    behavior during surges and before versus after
                                                                              breaks in key series and searching the data for
                  Their analytical model treats donor funds and               other explanations of the patterns observed.
                  the policy dialogue as inputs; specific economic,
                  social, and political indicators as outputs; and            Moreover, the authors analyze the impact of aid
                  the main program objectives (like economic                  on stabilization through--
                  growth, democracy, human rights and gender
                  equality) as outcomes; and poverty reduction as             a. The effect on imports
                  the overall goal.                                           b. Its impact on the markets for domestic
                                                                                 currency and foreign exchange
                  The analysis focuses on marginal impact and                 c. The reduction of inflationary financing of the
                  uses a combination of quantitative and qualita-                government deficit.
                  tive approaches (interviews, questionnaires, and
                  e-mail enquiries). The analysis of the impact               In terms of the impact of program aid on reform,
                  of aid is largely quantitative, while the analysis          domestic political considerations are a key factor
                  of the impact of the policy dialogue is mainly              in determining reform: most countries have
                  qualitative.                                                initiated reform without the help from donors
                                                                              and have carried out some measure of reform
                  An accounting approach is used to identify aid              not required by them, while ignoring others that
                  impact on expenditure levels and patterns using             have been required.




60
                                            APPENDIX 2: THE GENERAL ELIMINATION METHODOLOGY
                                                                 AS A BASIS FOR CAUSAL ANALYSIS




What are the core elements of the General                   case by applying the general one, is the list of
Elimination Methodology (also known as the                  suspects. When dealing with new effects, we
modus operandi approach)? We follow Scriven                 may not be certain the list is complete, but
(2008).1                                                    we work with the list we have and extend it
                                                            when necessary.
i. The general premise is the deterministic princi-    iii. The second practical premise is the list of
    ple: all macro events (or conditions, etc.) have        the modus operandi for each of the possible
    a cause. This is only false at the micro-level,         causes (the MOL). Each cause has a set of
    where the uncertainty principle applies, but            footprints, a short one if it's a proximate
    the latter principle has essentially no detect-         cause, a long one if it's a remote cause, but in
    able effect on the truth of macro determin-             general the modus operandus is a sequence
    ism (though it is easy enough to deliberately           of intermediate or concurrent events or a set
    create bizarre experiments where it does).              of conditions, or a chain of events, that has
ii. The first "premise from practice" is the list           to be present when the cause is effective.
    of possible causes (LOPC) of events of the              There's often a rubric for this; for example,
    type in which we are interested, e.g., learning         in criminal (and most other) investigations
    gains, reduction of poverty, and extension of           into human agency, we use the rubric of
    life for AIDS patients. We have used LOPCs              means/motives/opportunity to get from the
    for more than a million years, in tracking and          motives to the list of "suspects." The list of
    cooking and healing and repairing, and today            modus operandi is the magnifying lens that
    every detective knows the list for murder, just         fleshes out the candidate causes from the
    as every competent mechanic knows the list              LOPC so that we can start fitting them to the
    for a big-end rattle or a brake failure, though         case or rejecting them, for which we use the
    the knowledge is as often tacit as explicit,            next premise.
    outside the classroom and the maintenance          iv. The fourth premise comprises the "facts of the
    videos. An LOPC usually refers to causes at             case," and these are now assembled selectively,
    a certain temporal or spatial remove from               by looking for the presence or absence of
    the effect, and at a certain level of concep-           factors listed in the modus operandi of each
    tualization, and will vary depending on these           of the LOPCs. Only those causes are (eventu-
    parameters; of course, the context of the               ally) left standing whose modus operandi are
    investigation determines the appropriate                completely present. Ideally, there will be just
    distance parameters. The distant LOPC for               one of these, but sometimes more than one,
    murder is the list of possible motives; a more          which are then co-causes. (Note that there is
    proximate one, developed in a particular                no reference to counterfactuals.)




                                                                                                               61
    APPENDIX 3: OVERVIEW OF QUANTITATIVE TECHNIQUES OF IMPACT EVALUATION




                                           Analysis of intervention(s)

           Explicit counterfactual                      Analysis of multiple
           (with/without)                               interventions and influences
S     O
E     B
L     S
E     E
C     R
                        Propensity score                              Regression analysis
T     V
I     E
O     D
N

E    U     Randomized controlled trial                  Difference in difference regression
F    N     pipeline approach
F    O                                                  Fixed effects regression
E    B     Double difference
C    S     (Difference in difference)                   Instrumental variables
T    E
S    R     Regression discontinuity
     V
     E
     D




                                                                                              63
    APPENDIX 4: TECHNICAL ASPECTS OF QUANTITATIVE IMPACT EVALUATION TECHNIQUES




Endogeneity                                                           Yi = a + bPi + ei ,
The selection on unobservables is an important
cause of endogeneity, a correlation of one of         while in effect we have
the explanatory variables with the error term in
a mathematical model. This correlation occurs                      Yi = a + bPi + (ei + ex),
when an omitted variable has an effect at the
same time on the dependent variable and an            where ei is a random error term and ex is the effect
explanatory variable.1                                of the unobserved variable. P and ex are correlated
                                                      and therefore P is endogenous. Ignoring this
                                                      correlation results in a biased estimate of b. When
                                      Intervention    the source of the selection bias (X) is known,
                                                      inclusion of this variable (or these variables) leads
  Exogenous variable
                                                      to an unbiased estimate of the effect

                                                                   Yi = a + bPi + cXi + ei .
                                           Result

                                                      An example is the effect of class size on learning
                                                      achievements. The school choice of motivated
When a third variable is not included in the model,   (and probably well-educated) parents is probably
the effect of the variable becomes part of the        correlated with class size, as these parents tend
error term and contributes to the "unexplained        to send their children to schools with low
variance." As long as this variable does not have     pupil:teacher ratios. The neglect of the endoge-
an effect at the same time on one of the explana-     neity of class size may lead to biased estimates
tory variables in the model, this does not lead       (with an overestimation of the real effect of class
to biased estimates. However, when this third         size). When the selection effects are observable,
variable has an effect on one of the explanatory      a regression-based approach may be used to get
variables, this explanatory variable will "pick up"   an unbiased estimate of the effects.
part of the error and therefore will be correlated
with the error. In that case, omission of the third   Figure A4.1 gives the relation between class
variable leads to a biased estimate.                  size and learning achievements for two groups
                                                      of schools: the left side of the figure shows
Suppose we have the relation                          private schools in urban areas with pupils with
                                                      relatively rich and well educated parents; the
               Yi = a + bPi + cXi + ei ,              right side shows public schools with pupils from
                                                      poor remote rural areas. A neglect of the differ-
where Yi is the effect, Pi is the program or          ences between the two schools leads to a biased
intervention, Xi is an unobserved variable, and ei    estimate, as shown by the black line. Including
is the error term. Ignoring X we try to estimate      these effects in the equation leads to the smaller
the equation                                          effect of the dotted lines.

                                                                                                              65
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




  Figure A4.1: Estimation of the effect of class size with and without the inclusion
  of a variable correlated with class size

                                   10
                                   9
                                   8
                                   7
           Learning achievements




                                   6
                                   5
                                   4
                                   3
                                   2
                                   1
                                   0
                                        0                20               40                60                80             100             120
                                                                                         Class size




                                        double difference and regression                              program and with the anticipated effect of the
                                        analysis                                                      program, but we have no data on the year of
                                        The technique of "double differencing" can also               birth, we may get an unbiased estimate by taking
                                        be applied in a regression analysis. Suppose                  the first differences of the original variables. This
                                        that the anticipated effect (Y) is a function of              technique helps to get rid of the problem of
                                        participation in the project (P) and of a vector              "unobservables."2
                                        of background characteristics. In a regression
                                        equation we may estimate the effect as                        Instrumental variables
                                                                                                      The use of instrumental variables is another
                                                      Yi = a + bPi + cXi + ei ,                       technique to get rid of the endogeneity
                                                                                                      problem. A good instrument correlates with the
                                        where e is the error term and a, b, and c the                 (endogenous) intervention, but not with the
                                        parameters to be estimated.                                   error term. This instrument is used to get an
                                                                                                      unbiased estimate of the effect of the endoge-
                                        When we analyze changes over time, we get                     nous variable.
                                        (taking the first differences of the variables in
                                        the model):                                                   In practice, researchers often use the method
                                                                                                      of two-stage least squares: in the first stage an
                                        (Yi,1 ­ Yi,0) = a + b(Pi,1 ­ Pi,0) + c (Xi,1 ­ Xi,0) + ei.    exogenous variable (Z) is used to give an estimate
                                                                                                      of the endogenous intervention-variable (P):
                                        When the (unobserved) variables X are time
                                        invariant, (Xi,1 ­ Xi,0) = 0, and these variables                             Pi' = a + dZi + ei .
                                        drop from the equation. Suppose, for instance
                                        that a variable X denotes the "year of birth." For            In the second stage this new variable is used
                                        every individual the year of birth in year 1 = year           to get an unbiased estimate of the effect of the
                                        of birth in year and therefore (Xi,1 ­ Xi,0) = 0. So,         intervention:
                                        if we expect that the year of birth is correlated
                                        with the probability of being included in the                              Yi = a + bP'i + cXi + ei .

66
              a p p E n d I x 4 : t E c h n I c a l a s p E c t s o f q u a n t I tat I v E I m pa c t E va l u at I o n t E c h n I q u E s




the computation of propensity scores                       where pi is the probability of being included in
The method of propensity score matching                    the intervention group and X, Y, and Z denote
involves forming pairs by matching on the                  specific observed characteristics. In this model,
probability that subjects have been part of the            the probability is a function of the observed
treatment group. The method uses all available             characteristics. Rosenbaum and Rubin (1983)
information to construct a control group. A                proved that when subjects in the control group
standard way to do this is using a probit or logit         have the same probability of being included in
regression model. In a logit specification, we get         the treatment group as subjects who actually
                                                           belong to the treatment group, the treatment and
  ln (pi / (1­pi)) = a + bXi + cYi + dZi +ei ,             control groups will have similar characteristics.




                                                                                                                                        67
        APPENDIX 5: EVALUATIONS USING QUANTITATIVE IMPACT EVALUATION APPROACHES1




Agriculture and rural development                    Case study: Philippines
                                                     The project: The Second Rural Credit Projects
Case study: Pakistan                                 (SRCP) operated between 1969 and 1974 with a
The projects: Irrigation in Pakistan suffers from    US$12.5 million loan from the World Bank. SRCP
the "twin menaces" of salinity and waterlogging.     was the continuation of a pilot credit project
These problems have been tackled through             started in 1965 and completed in 1969. As its
Salinity Control and Reclamation Projects            successful predecessor, SRCP aimed to provide
(SCARPs), financed in part by the Bank. Although     credit to small and medium rice and sugar farmers
technically successful, SCARP tubewells imposed      for the purchase of farm machinery, power tillers,
an unsustainable burden on the government's          and irrigation equipment. Credits were to be
budget. The project was to address this problem      channeled through 250 rural banks scattered
in areas with plentiful groundwater by closing       around the country. An average financial contribu-
public tubewells and subsidizing farmers to          tion to the project of 10% was required from both
construct their own wells.                           rural banks and farmers. The SRCP was followed
                                                     by a third loan of US$22.0 million from 1975 to
Methodology: The Independent Evaluation Group        1977 and by a fourth loan of US$36.5 million that
(IEG) commissioned a survey in 1994 to create        was still in operation at the time of the evaluation
a panel from two earlier surveys undertaken in       (1983).
1989 and 1990. The survey covered 391 farmers
in project areas and 100 from comparison areas.      Methodology: The study uses data of a survey of
Single and double differences of group means are     738 borrowers (nearly 20% of total project benefi-
reported.                                            ciaries) from seven provinces of the country. Data
                                                     were collected through household question-
Findings: The success of the project was that        naires on land, production, employment, and
the public tubewells were closed without             measures of standard of living. In addition, 47
the public protests that had been expected.          banks were surveyed to measure the impact on
Coverage of private tubewells grew rapidly.          their profitability, liquidity, and solvency. The
However, private tubewells grew even more            study uses before-and-after comparisons of
rapidly in the control area. This growth may         means and ratios to assess the project impact
be a case of contagion, though a demonstra-          on farmers. National level data are often used to
tion effect. But it seems more likely that other     validate the effects observed. Regarding the rural
factors (e.g., availability of cheaper tubewell      banks, the study compares measures of financial
technology) were behind the rapid diffusion          performance before and after the project, taking
of private water exploitation. Hence the             advantage of the fact that the banks surveyed
project did not have any impact on agricultural      joined the project at different stages.
productivity or incomes. It did, however, have
a positive rate of return by virtue of the savings   Findings: The mechanization of farming did not
in government revenue.                               produce an expansion of holding sizes (though


                                                                                                            69
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  the effect of a contemporaneous land reform                 in nutritional status of participating children over
                  should be taken into account). Mechanization did            time, differential participation, and differential
                  not change cropping patterns, and most farmers              project impact across social groups. Data on the
                  were concentrating on a single crop at the time of          change in nutritional status in project areas are
                  the interviews. No change in cropping intensity             compared to secondary data on the nutritional
                  was observed, but production and productivity               status of children outside the project areas. With
                  were found to be higher at the end of the project.          some assumptions, the use of secondary data
                  The project increased the demand for both family            makes the findings plausible.
                  and hired labor. Farmers reported an increase in
                  incomes and savings, and in several other welfare           Findings: The study concludes that the implemen-
                  indicators, as a result of the project. Regarding the       tation of GMP programs on a large scale is
                  project impact on rural banks, the study observes           feasible and that this had a positive impact on the
                  an increase in the net income of the sample banks           nutritional status of children of Tamil Nadu. More
                  from 1969 to 1975 and a decline thereafter. Banks'          specifically, these are the findings of the study:
                  liquidity and solvency position was negatively
                  affected by poor collection and loan arrears.               ˇ Program participation: Among children par-
                                                                                                    ,
                                                                                ticipating in GMP all service delivery indica-
                  health, nutrition, and population                             tors (age at enrolment, regular attendance of
                                                                                sessions, administration of vitamin A, and de-
                  Case study: India                                             worming) show a substantial increase between
                  The project: The Tamil Nadu Integrated Nutrition              1982 and 1986, though subsequently they de-
                  Project (TINP) operated between 1980 and 1989,                clined to around their initial levels. Levels of
                  with a credit of US$32 million from the Interna-              service delivery, however, are generally high.
                  tional Development Association (IDA). The                   ˇ Nutritional status: Mean weight and malnutrition
                  overall objective of the project was to improve               rates of children aged between 6 and 36 months
                  the nutritional and health status of pre-school               and participating in GMP have improved over
                  children, pregnant women, and nursing mothers.                time. Data on non-project areas in Tamil Nadu
                  The intervention consisted of a package of                    and all-India data show a smaller improvement
                  services including nutrition education, primary               over the same time period. Regression analy-
                  health care, supplementary feeding, administra-               sis of nutritional status on a set of explanatory
                  tion of vitamin A, and periodic de-worming. The               variables, including the participation in a cotem-
                  project was the first to employ Growth Monitor-               poraneous nutrition project (the National Meal
                  ing and Promotion (GMP) on a large scale. The                 Program) shows that the latter had no addi-
                  evaluation is concerned with the impact of the                tional benefit on nutritional outcomes. Positive
                  project on the nutritional status of children.                associations are also found between nutritional
                                                                                status and intensive participation in the pro-
                  Methodology: The study uses three cross-                      gram and complete immunization.
                  sectional rounds of data collected by the TINP              ˇ Targeting: Using tabulations and regression
                  Monitoring Office. Child and household charac-                analysis, it is shown that initially girls have ben-
                  teristics of children participating in the program            efited more from the program, but that at the
                  were collected in 1982, 1986, and 1990, each                  end of the program boys have benefited more.
                  round consisting of between 1,000 and 1,500                   Children from the scheduled caste are shown
                  observations. The study uses before-and-after                 to have benefited more than other groups. Nu-
                  comparisons of means, regression analysis, and                tritional status was observed to be improving at
                  charts to provide evidence of the following:                  all income levels, the highest income category
                  frequency of project participation, improvement               benefiting slightly more than the lowest.




70
                                              APPENDIX 6: DECISION TREE FOR SELECTING QUANTITATIVE
                                                   EVALUATION DESIGNS TO DEAL WITH SELECTION BIAS




decision tree for impact evaluation                                possible? If the treatment group is chosen at
design using quantitative impact                                   random, then a random sample drawn from the
evaluation techniques                                              sample population is a valid control group and
1. If the evaluation is being designed before                      will remain so provided they are outside the
   the intervention (ex ante), is randomization                    influence zone and contamination is avoided.


                                                             Implement an ex
                                              Yes            ante randomized
                                                           experimental design


                            Is a randomized                                                    Implement a suitable
                      Yes         design                                         Yes            quasi-experimental
                                feasible?                                                             design


  Is evaluation                                             Is selection                                                           Use panel
      being                                   No              based on                                                Yes         data-based
    designed                                               observables?                                                             design
     ex ante?

                                                                                              Are the unobservables
                                                                                 No
                      No                                                                         time invariant?




                              Is selection
                                based on                                                                                    No
                             observables?




                                                     Use                         Is there a group of                   Can a means
                                              well-triangulated            No     as-yet-untreated            No        be found to
                                                  plausible                         beneficiaries?                    observe them?
                                                association


                                              Use the pipeline
                                                                                        Yes                                 Yes
                                                 approach


                                                                                                                      Implement a suitable
                                  Yes                                                                                  quasi-experimental
                                                                                                                             design
Source: SG1 (2008).



                                                                                                                                             71
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                     This approach does not mean that targeting                  (a panel of persons, households, etc.) and
                     specific analytical units is not possible. The              selection is determined by unobservables,
                     random allocation may be to a subgroup of                   then some means of observing the supposed
                     the total population, e.g., from the poorest                unobservables should be sought. If that is
                     districts.                                                  not possible, then a pipeline approach can
                  2. If randomization is not possible, are all                   be used if there are as-yet untreated benefi-
                     selection determinants observed? If they                    ciaries. For example, the Asian Development
                     are, then there are a number of regression-                 Bank's impact study of microfinance in the
                     based approaches that can remove the                        Philippines matched treatment areas with
                     selection bias.                                             areas that were in the program but that had
                  3. If the selection determinants are unobserved                not yet received the intervention.
                     and if they are thought to be time invari-               5. If none of the above mentioned procedures is
                     ant, then using panel data will remove their                possible, then the problem of selection bias
                     influence, so a baseline is essential (or some              cannot be addressed. The impact evaluation
                     means of substituting for a baseline).                      will have to rely heavily on the intervention
                  4. If the study is done ex post so it is not possible          theory and triangulation to build an argument
                     to get information for exactly the same units               by plausible association.




72
                 APPENDIX 7: HIERARCHICAL MODELING AND OTHER STATISTICAL APPROACHES




This group of approaches covers a quite diverse set          occurring at another level. Such analyses often
of advanced modeling and statistical approaches.             attempt to decompose the total effect of the
Detailed discussion of these technical features              program into the effect across various program
is beyond the scope of this document. The                    levels and that between program sites within a
common element that binds these approaches                   level (Dehejia, 1999)" (Yang et al., 2004: 494).
is purpose modeling and estimating direct and
indirect effects of interventions at various levels          Also part of this branch of approaches is a range
of aggregation (from micro to macro). At the                 of statistical approaches such as nested models,
risk of substantial oversimplification we briefly            models with latent variables, multi-level regres-
mention a few of the approaches. In hierarchi-               sion approaches, and others (see, for example,
cal modeling, evaluators and researchers look at             Snijders and Bosker 1999). Other examples are
the interrelationships between different levels              typical economist tools such as partial equilib-
of a program. The goal is "to measure the true               rium analyses; general computable equilibrium
and often intertwined effects of the program. In             models (CGEs) are often used to assess the
a typical hierarchical linear model analysis, for            impact of, for example, macroeconomic policies
example, the emphasis is on how to model the                 on markets and example, subsequently on
effect of variables at one level on the relations            household welfare (see box A7.1).




  Box A7.1: Impact of the Indonesian financial crisis on the poor: Partial equilibrium modeling
  and CGE modeling with microsimulation


  General equilibrium models permit the analyst to examine ex-         households from the 1993 SUSENAS survey, together with de-
  plicitly the indirect and second-round consequences of policy        tailed information on price changes over the 1997­98 crisis pe-
  changes. These indirect consequences are often larger than the       riod, to compute household-specific cost-of-living changes. It
  direct, immediate impact, and may have different distributional      finds that the poorest urban households were hit hardest by the
  implications. General equilibrium models and partial equilibrium     shock, experiencing a 10%­30% increase in the cost of living
  models may thus lead to significantly different conclusions. A       (depending on the method used to calculate the change). Rural
  comparison of conclusions reached by two sets of research-           households and wealthy urban households actually saw the
  ers, examining the same event using different methods, reveals       cost of living fall.
  the differences between the models. Levinsohn et al. (1999) and          These results suggest that the poor are just as integrated
  Robillard et al. (2001) both look at the impact of the Indonesian    into the economy as other classes but have fewer opportunities
  financial crisis on the poor--the former using partial equilibrium   to smooth consumption during a crisis. However, the methods
  methods, the latter using a CGE model with micro-simulation.         used have at least three serious drawbacks. First, the consump-
  The Levinsohn study used consumption data for nearly 60,000          tion parameters are fixed; that is, no substitution is permitted

                                                                                                                  (continued on next page)


                                                                                                                                       73
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




  Box A7.1: Impact of the Indonesian financial crisis on the poor: Partial equilibrium modeling
  and CGE modeling with microsimulation (continued)


  between more expensive and less expensive consumption items.         profits, and households accrue profits and income to factors in
  Second, the results are exclusively nominal, in that the welfare     proportion to their endowments. Labor supply is endogenous. The
  changes are due entirely to changes in the price of consumption      micro-simulation model is constrained to conform to the aggregate
  and do not account for any concomitant change in income. Third,      levels provided by the CGE model. The Robillard team finds that
  this analysis cannot control for other exogenous events, such as     poverty did increase during the crisis, although not as severely
  the El Nińo drought and resulting widespread forest fires.           as the previous results suggest. Also, the increase in poverty
      Robillard et al. (2001) use a CGE model, connected to a mi-      was due in equal parts to the crisis and to the drought. Comparing
  crosimulation model. The results are obtained in two steps. First,   their microsimulation results to those produced by the CGE alone,
  the CGE is run to derive a set of parameters for prices, wages,      the authors find that the representative household model is likely
  and labor demand. These results are fed into a micro-simulation      to underestimate the impact of shocks on poverty. In contrast,
  model to estimate the effects on each of 10,000 households in the    ignoring both substitution and income effects, as Levinsohn et al.
  1996 SUSENAS survey. In the microsimulation model, workers           (1999) do, is likely to lead to overestimating the increase in poverty,
  are divided into groups according to sex, residence, and skill.      since it does not permit the household to reallocate resources in
  Individuals earn factor income from wage labor and enterprise        response to the shock.

  Source: World Bank (2003).




74
                                                          APPENDIX 8: MULTI-SITE EVALUATION APPROACHES




Multi-site evaluation approaches involve primary          and multiple sites are included in the experiment
data collection processes and analyses at multiple        in order to strengthen the external validity of the
sites or interventions. They usually focus on             findings. Control over all aspects of the evaluation
programs encompassing multiple interven-                  is very tight to keep as many variables constant
tions implemented in different sites (Turpin              over the different sites. Applications are mostly
and Sinacore, 1991; Straw and Herrell, 2002).             found in the health sector (see Kraemer, 2000).
Although these approaches are often referred to
as a family of methodologies, in what follows, and        Multi-site evaluation distinguishes itself from
in line with the literature, we will use a somewhat       cluster evaluation in the sense that its primary
more narrow definition of multi-site evaluations          purpose is summative. In addition, multi-site
alongside several specific methodologies to               evaluations are less participatory in nature vis-ŕ-vis
address the issue of aggregation and cross-site           intervention staff. In contrast to settings in which
evaluation of multiple interventions.                     multi-center clinical trials are applied, multi-site
                                                          evaluations address large-scale programs that,
Straw and Herrell (2002) use the term "multi-             because of their (complex) underlying strate-
site evaluation" both as an overarching concept,          gies, implementation issues, or other reasons,
i.e., including cluster evaluation and multi-center       are not amenable to controlled experimental
clinical trials, as well as a particular type of multi-   impact evaluation designs. Possible variations in
level evaluation distinguishable from cluster             implementation among interventions sites, and
evaluation and multi-center clinical trials. Here         variations in terms of available data require a
we use the latter definition to refer to a partic-        different, more flexible approach to data collec-
ular (though rather flexible) methodological              tion and analysis than in the case of the multi-
framework applicable to the evaluation of compre-         center clinical trials. A common framework of
hensive multilevel programs addressing health,            questions and indicators is established to counter
economic, environmental, or social issues.                this variability, enabling data analysis across
                                                          interventions in function of establishing general-
The multi-center clinical trial is a methodology          izable findings (Straw and Herrell, 2002).
in which empirical data collection in a selection of
homogenous intervention sites is systematically           Cluster evaluation is a methodology that is
organized and coordinated. Basically it consists of       especially useful for evaluating large-scale
a series of randomized controlled trials. The latter      interventions that address complex societal
are experimental evaluations in which treatment           themes such as education, social service
is randomly assigned to a target group while a            delivery, and health promotion. Within a cluster
similar group not receiving the treatment is used         of projects under evaluation, implementa-
as a control group. Consequently, changes in              tion among interventions may vary widely, but
impact variables between the two groups can be            single interventions are still linked in terms
traced back to the treatment, as all other variables      of common strategies, target populations, or
are assumed to be similar at group level. In the          problems that are addressed (Worthen and
multi-center clinical trial sample size is increased      Schmitz, 1997).

                                                                                                                   75
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  The approach was developed by the Kellogg                   stage) in close collaboration with stakeholders
                  Foundation in the 1990s and since then has been             from all levels. Its purpose is, on the one hand,
                  taken up by other institutions. Four elements               formative, as evaluators in close collaboration
                  characterize cluster evaluation (Kellogg Founda-            with stakeholders at project level try to explore
                  tion, 1991):                                                common issues as well as variations between sites.
                                                                              At the program level the evaluation's purpose
                  ˇ It focuses on a group of projects in order to             can be both formative in terms of supporting
                    identify common issues and patterns.                      planning processes as well as summative, i.e.,
                  ˇ It focuses on what happened as well as why.               judging what went wrong and why. A common
                  ˇ It is based on a collaborative process involving          question at the program level would be, for
                    all relevant actors, including evaluators and             example, to explore the factors that in the differ-
                    individual project staff.                                 ent sites are associated with positive impacts. In
                  ˇ Project-specific information is confidential and          general, the objective of cluster evaluations is
                    not reported to the higher level; evaluators              not so much to prove as to improve, based on a
                    only report aggregate findings; this type of              shared understanding of why things are happen-
                    confidentiality between evaluators and project            ing the way they are (Worthen and Schmitz,
                    staff induces a more open and collaborative               1997). It should be noted that not only cluster
                    environment.                                              evaluations but also multi-site evaluations are
                                                                              applicable to homogenous programs with little
                  Cluster evaluation is typically applied during              variation in terms of implementation and context
                  program implementation (or during the planning              among single interventions.




76
                 APPENDIX 9: METHODOLOGICAL FRAMEWORKS FOR ASSESSING THE EFFECTS
                             OF INTERVENTIONS, MAINLY BASED ON QUALITATIVE METHODS1




outcome mapping                                       Most significant change
Outcome mapping (IDRC, 2001) is a method-             The most significant change technique (Davies
ology that focuses on outcomes as behavioral          and Dart, 2005) is a form of participatory
change. The outcomes can be logically linked to       monitoring and evaluation. It is participatory
an intervention's activities, although they may       because many intervention stakeholders are
not be necessarily directly caused by them. These     involved both in deciding the types of change
changes are aimed at contributing to specific         to be recorded, and in analyzing the data. It is a
aspects of human and ecological well-being by         form of monitoring because it occurs through-
providing partners with new tools, techniques,        out the intervention cycle and provides informa-
and resources to contribute to the development        tion to help people manage the intervention. It
process. "Boundary partners" are individuals,         contributes to impact evaluation in part because
groups, and organizations with whom the interven-     it provides data on impact and outcomes that
tion interacts directly and with whom the interven-   can be used to help assess the performance
tion anticipates opportunities for influence; most    of the intervention as a whole--but largely
activities will involve multiple outcomes because     through providing a tool for identifying and
they have multiple boundary partners.                 rating the impacts that are valued by different
                                                      stakeholders.
Success case method
The success case method (Brinkerhoff, 2003) is        MAPP
a widely adopted example of a mixed-method            The Method for Impact Assessment of Projects
framework, drawing from several established           and Programs (Späth, 2004) is a methodological
traditions, including theory-based evaluation,        framework for combining a qualitative approach
organizational development, appreciative inquiry,     with participatory assessment instruments,
narrative analysis, and quantitative statistical      including a quantification step. It orients itself
analysis of impact. It has been expanded in scope     toward principles and procedures of Participatory
by those who combine it with realist methodolo-       Rural Appraisal methodology, including triangu-
gies (e.g., Dart) and soft systems methodologies      lation, "optimal ignorance," and communal
(e.g., Williams). It also shares much in common       learning. A major element of this methodology
with the positive deviance approach that has          is conducting workshops with representatives of
been applied to health interventions in many          relevant stakeholders. Perceived key processes
developing countries. The success case method         are jointly reflected in structured group discus-
identifies individual cases that have been particu-   sions in which at least six interlinked and logically
larly successful (and unsuccessful) and uses case     connected steps are accomplished: (i) lifeline;
study analytical methods to develop credible          (ii) trend analysis; (iii) activity list; (iv) influence
arguments about the contribution of the interven-     matrix; (v) transect--or data cross checking; and
tion to these.                                        (vi) development and impact profile.




                                                                                                                 77
                                 APPENDIX 10: WHERE TO FIND REVIEWS AND SYNTHESIS STUDIES
                                         ON MECHANISMS UNDERLYING PROCESSES OF CHANGE




Books on social mechanisms                            tional organizations: the Cochrane Society,
Authors like Elster (1989; 2007), Farnsworth          working within the health field; and the Campbell
(2007), Hedström and Swedberg (1998),                 Collaboration, working within the fields of
Swedberg (2005), Bunge (2004), and Mayntz             social welfare, education, and criminology. Both
(2004) have summarized and synthesized the            organizations subscribe to the idea of produc-
research literature on different (types of) social    ing globally valid knowledge about the effects of
mechanisms. Elster's explanation of social            interventions, if possible through synthesizing
behavior (2007) summarizes insights from              the results of primary studies designed as RCTs
neurosciences to economics and political science      and using meta-analysis as the form of synthe-
and discusses 20-plus mechanisms. They range          ses. In many (Western) countries second-order
from motivations, emotions, and self-interest to      knowledge-producing organizations have been
rational choice, games and behavior and collec-       established at the national level that not all are
tive decision making.                                 based on findings from RCTs. Hansen and Rieper
                                                      (2009) present information about some 15 of
Farnsworth (2007) takes legal arrangements like       them, including web addresses.
laws and contracts as a starting point and dissects
which (types of) mechanisms play a role when          Knowledge repositories and development
one wants to understand why laws sometimes            intervention impact
do or do not work. He combines insights from          The Coalition for Evidence-Based Policy offers
psychology, economics, and sociology and              "Social Programs That Work," a Web site provid-
discusses mechanisms such as the "slippery            ing policy makers and practitioners with clear,
slope," the endowment effect, framing effects,        actionable information on what works in social
and public goods production.                          policy, as demonstrated in scientifically valid
                                                      studies (www.evidencebasedprograms.org/).
review journals
Since the 1970s review journals have been             The International Organization for Coopera-
developed to address important developments           tion in Evaluation, a loose alliance of regional
within a discipline. An example is Annual             and national evaluation organizations from
Reviews, which publishes analytic reviews in 37       around the world, builds evaluation leadership
disciplines within the biomedical, life, physical,    and capacity in developing countries, fosters
and social sciences.                                  the cross-fertilization of evaluation theory and
                                                      practice around the world, addresses interna-
Knowledge repositories                                tional challenges in evaluation, and assists
Hansen and Rieper (2009) have inventoried a           evaluation professionals to take a more global
number of second-order evidence-producing             approach to identifying and solving problems.
organizations within the social (and behavioral)      It offers links to other evaluation organizations;
sciences. In recent years the production of           forums that network evaluators internationally;
systematic reviews has been institutionalized in      news of events and important initiatives; and
these institutions. There are two main interna-       opportunities to exchange ideas, practices, and

                                                                                                           79
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  insights with evaluation associations, societies,           practitioners to carry them out (www.povertyac-
                  and networks (http://ioce.net).                             tionlab.com/).

                  The Abdul Latif Jameel Poverty Action Lab (J-PAL)           The Development Impact Evaluation Initia-
                  fights poverty by ensuring that policy decisions            tive (DIME) is a World Bank-led effort involv-
                  are based on scientific evidence. Located in the            ing thematic networks and regional units under
                  Economics Department at the Massachusetts                   the guidance of the Bank's Chief Economist. Its
                  Institute of Technology, J-PAL brings together              objectives are--
                  a network of researchers at several universities
                  who work on randomized evaluations. It works                ˇ To increase the number of Bank projects with
                  with governments, aid agencies, bilateral donors,             impact evaluation components
                  and nongovernmental organizations to evaluate               ˇ To increase staff capacity to design and carry
                  the effectiveness of antipoverty programs using               out such evaluations
                  randomized evaluations, disseminate findings                ˇ To build a process of systematic learning based
                  and policy implications, and promote the use of               on effective development interventions with
                  randomized evaluations, including by training                 lessons learned from completed evaluations.




80
                                             APPENDIX 11: EVALUATIONS BASED ON QUALITATIVE AND
                                                             QUANTITATIVE DESCRIPTIVE METHODS




case 1: combining qualitative and                      During NRDP-I the project comprised activities in
quantitative descriptive methods--                     14 different areas grouped under four headings:
Ex post impact study of the noakhali
rural development Project in                           ˇ Infrastructure (roads, canals, market places,
Bangladesh1                                              public facilities)
                                                       ˇ Agriculture (credit, cooperatives, irrigation,
1. Summary                                               extension, marketing)
The evaluation examined the intended and               ˇ Other productive activities (livestock, fish
unintended socio-economic impacts of the                 ponds, cottage industries)
project, with particular attention to the impact       ˇ Social sector (health & family planning,
on women and to the sustainability and sustain-          education).
ment of these impacts. The evaluation drew on
a wide range of existing evidence and also used        The overarching objective of NRDP-I was to
mixed methods to generate additional evidence;         promote economic growth and social progress,
because the evaluation was conducted nine years        in particular aiming at the poorer sections of
after the project had ended, it was possible to        the population. The poorer sections were to
directly investigate the extent to which impacts       be reached through the creation of temporary
had been sustained. Careful attention was paid         employment in construction activities (infrastruc-
to differential impacts in different contexts to       ture) and engaging them in income-generating
interpret the significance of before/after and with/   activities (other productive activities). There
without comparisons; the intervention was only         was also an aim to create more employment in
successful in contexts that provided the other         agriculture for landless laborers through intensi-
necessary ingredients for success. The evalua-         fication. Almost all the major activities started
tion had significant resources and was preceded        under NRDP-I continued under NRDP-II, albeit
by considerable planning and review of existing        with some modifications and additions. The
evidence.                                              overarching objective was kept, with one notable
                                                       addition: to promote economic growth and
2. Summary and main characteristics                    social progress in particular, aiming at the poorer
The Noakhali Rural Development Project (NRDP)          segments of the population, including women.
was an integrated rural development project            A special focus on women was thus included,
(IRDP) in Bangladesh, funded for DKK 389 million       based on the experience that most of the benefits
by Danida. It was implemented in two phases            of the project had accrued to men.
over a period of 14 years, 1978­92, in the greater
Noakhali district, one of the poorest regions of       3. Purpose, intended use, and key evaluation
Bangladesh, which had a population of approxi-         questions
mately 4 million. More than 60 long-term expatri-      This ex post impact study was carried out nine
ate advisers--most of them Danish--worked 2­3          years after the project was terminated. At the
years each on the project together with a Bangla-      time of implementation NRDP was one of the
deshi staff of up to 1,000 (at the peak).              largest projects funded by Danida, and it was

                                                                                                             81
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  considered an excellent example of integrated               political--context. In comparison with ordinary
                  rural development, which was a common type                  evaluations, this study puts a lot more emphasis
                  of support during the 1970s and '80s. In spite              on understanding the national and in particular
                  of the potential lessons to be learned from the             the local context.
                  project, it was not evaluated upon completion
                  in 1992. This fact and an interest in the sustain-          Gathering evidence of impacts
                  ability factor in Danish development assistance             One of the distinguishing features of this impact
                  led to the commission of the study. What type               study, compared to normal evaluations, is the
                  of impact could still be traced in Noakhali nine            order and kind of fieldwork. The fieldwork lasted
                  years after Danida terminated its support to the            four months and involved a team of eight research-
                  project?                                                    ers (three European and five Bangladeshi) and 15
                                                                              assistants. The researchers spent 1.5­3.5 months
                  Although the study dealt with aspects of the                in the field, the assistants 2­4 months.
                  project implementation, its main focus was on the
                  project's socioeconomic impact in the Noakhali              The following is a list of the methods used:
                  region. The study aimed to identify the intended
                  as well as unintended impact of the project, in             ˇ Documentary study (project documents, re-
                  particular whether it had stimulated economic                 search reports, etc.)
                  growth and social development and improved                  ˇ Archival work (in the Danish embassy,
                  the livelihoods of the poor, including women,                 Dhaka)
                  which the project had set out to do.                        ˇ Questionnaire with former advisers and
                                                                                Danida staff members
                  The evaluation focused on the following                     ˇ Stakeholder interviews (Danida staff, former
                  questions:                                                    advisers, Bangladeshi staff, etc.)
                                                                              ˇ Quantitative analysis of project monitoring
                  ˇ What has been the short- and long-term--                    data
                    intended as well as unintended--impact of                 ˇ Key informant interviews
                    the project?                                              ˇ Compilation and analysis of material about
                  ˇ Has the project stimulated economic growth                  context (statistics, articles, reports, etc.)
                    and social development in the area?                       ˇ Institutional mapping (particularly NGOs in
                  ˇ Has the project contributed to improving the                the area)
                    livelihoods of the poorest section of the popu-           ˇ Representative surveys of project components
                    lation, including women?                                  ˇ Assessment of buildings, roads and irrigation
                  ˇ Have the institutional and capacity-building                canals (function, maintenance, etc.)
                    activities engendered or reinforced by the                ˇ Questionnaire-based interviews with benefi-
                    project produced sustainable results?                       ciaries and non-beneficiaries
                                                                              ˇ Extensive and intensive village studies (sur-
                  4. Concise description of the evaluation                      veys, interviews, etc.)
                                                                              ˇ Observation
                  Identifying impacts of interest                             ˇ Focus group interviews
                  This study focuses on the impact of NRDP, in                ˇ In-depth interviews (issue-based and life
                  particular the long-term impact (i.e., nine years             stories).
                  after). But impact cannot be understood in
                  isolation from implementation so the study                  In the history of Danish development coopera-
                  analyzes various elements and problems in the               tion no other project has been subject to so
                  way the project was designed and executed.                  many studies and reports, not to speak of the vast
                  Impact can also not be understood isolated from             number of newspaper articles. Most important
                  the context, both the natural/physical and in               for the impact study have been the appraisal
                  particular the societal--social, cultural, economic,        reports and the evaluations plus the final project

82
   a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




completion report. But in addition to this, there             number of roads were selected for study, both
exists an enormous number of reports on all                   of their current maintenance standard, their use,
aspects of the project. A catalogue from 1993 lists           etc., but also the employment the road construc-
more than 1,500 reports produced by and for                   tion and maintenance generated, particularly
         .
the NRDP Both the project and the local context               for groups of destitute women. The study also
were, moreover, intensively studied in a research             attempted to assess the socio-economic impact
project carried out in cooperation between the                of the roads on different groups (poor/better-off,
Centre for Development Research and Bangla-                   men/women, etc.).
desh Institute of Development Studies.
                                                              Assessing causal contribution
A special effort was made to solicit the views of             The impact of a development intervention is a
a number of key actors (or stakeholders) in the               result of the interplay of the intervention and the
project and other key informants. This included               context. It is the matching of what the project has
numerous former NRDP and BRDB officers,                       to offer and people's needs and capabilities that
expatriate former advisers as well as former key              produces the outcome and impact. Moreover,
Danida staff, both based in the Danish Embassy                the development processes engendered unfold
in Dhaka and in the Ministry of Foreign Affairs               in a setting that is often characterized by inequali-
in Copenhagen. They were asked about their                    ties, structural constraints, and power relations.
views on strengths and weaknesses of the                      This certainly has been the case in Noakhali. As
project and the components they know best,                    a consequence there will be differential impacts,
about their own involvement and about their                   varying between individuals and according to
judgment regarding likely impact. A question-                 gender, socio-economic group and political
naire survey was carried out among the around                 leverage.
60 former expatriate long-term advisers and
25 former key staff members in the Danish                     In addition to the documentary studies,
embassy, Danida, and other key informants.                    interviews, and questionnaire survey, the actual
In both cases about half returned the filled-                 fieldwork has employed a range of both quanti-
in questionnaires. This was followed up by a                  tative and qualitative methods. The approach
number of individual interviews.                              can be characterized as a contextualized, tailor-
                                                              made ex post impact study. There is considerable
The main method in four of the five component                 emphasis on uncovering elements of the societal
studies was surveys with interviews, based on                 context in which the project was implemented.
standardized questionnaires, with a random--                  This covers both the national context and the
or at least reasonably representative--sample of              local context. The approach is tailor-made in the
beneficiaries (of course combined with documen-               sense that it will be made to fit the study design
tary evidence, key informant interviews, etc.).               outlined above and apply an appropriate mix of
A great deal of effort was taken in ensuring                  methods.
that the survey samples were reasonably
representative.                                               An element in the method is the incorporation in
                                                              the study of both before/after and with/without
The infrastructure component was studied by                   perspectives. These, however, are not seen as the
partly different methods, because in this case                ultimate test of impact (success or failure), but
the beneficiaries were less well defined. It was              interpreted cautiously, bearing in mind that the
decided to make a survey of all the buildings                 area's development has also been influenced by
that were constructed during the first phase of               a range of other factors (market forces, changing
the project to assess their current use, mainte-              government policies, other development
nance standard, and benefits. In this phase the               interventions, etc.), both during the 14 years the
emphasis was on construction; in the second                   project was implemented and during the 9 years
phase it shifted to maintenance. Moreover, a                  after its termination.

                                                                                                                                          83
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  Considerable weight was accorded to studying                such a cautious approach that the question of
                  what has happened in the villages that have                 attribution was addressed. Arguably, elements
                  previously been studied and for which some                  of subjectivity may still have remained in the
                  comparable data exist. Four villages were                   conclusions and assumptions, but that is
                  studied intensively in 1979 and briefly restudied           unavoidable in a study that seeks to uncover
                  in 1988 and 1994. These studies--together with              the impact of an education project.
                  a thorough restudy in the year 2001--provide
                  a unique opportunity to compare the situation               Managing the impact evaluation
                  before, during, and after the project. Moreover,            The impact study was commissioned by Danida
                  10 villages were monitored under the project's              and carried out by Centre for Development
                  village-wise impact monitoring system in                    Research, who also co-funded the study as a
                  the years 1988­90, some of these being with                 component of its Aid Impact Research Program.
                  (+NRDP) and some (largely) without (­NRDP)                  The research team comprised independent
                  the project. Analysis of the monitoring data                researchers from Bangladesh, Denmark, and
                  combined with a restudy of a sample of these                the UK. A reference group of nine persons
                  villages illuminates the impact of the project in           (former advisers, Danida officers, and research-
                  relation to other factors. It was decided to study          ers) followed the study from the beginning to
                  a total of 15 villages, 3 intensively (all +NRDP ,          the end. It discussed the approach paper in an
                  about 3 weeks each) and 12 extensively (9 +NRDP  ,          initial meeting and the draft reports in a final
                             ,
                  3 ­ NRDP 3­5 days each). As a matter of princi-             meeting. In between it received three progress
                  ple, this part of the study looks at impact in              reports from the team leader and took up
                  terms of the project as a whole. It brings in               discussions by e-mail correspondence. The
                  focus the project benefits as perceived by differ-          study was prepared during the year 2000 and
                  ent groups and individuals and tries to study               fieldwork carried out in the period January­May
                  how the project has impinged on economic and                2001. The study consists of a main report and
                  social processes of development and change.                 seven topical reports.
                  At the same time it provides a picture of the
                  considerable variety found in the local context.            The first step in establishing a study design was
                                                                              the elaboration of an approach paper (study
                  In the evaluation of the mass education                     outline) by the team leader. This was followed by a
                  program, the problem of attribution was dealt               two-week visit to Dhaka and the greater Noakhali
                  with as carefully as possible. First, a parallel            area. During this visit, Bangladeshi researchers
                  comparison has been made between the benefi-                and assistants were recruited to the team, and
                  ciaries on the one hand and non-beneficiaries               more detailed plans for the subsequent fieldwork
                  on the other, to identify (if any) the changes              were drafted. Moreover, a background paper by
                  directly or indirectly related to the program.              Hasnat Abdul Hye, former Director General of
                  Such comparison was vital due to the absence                BRDB and Secretary, Ministry of Local Govern-
                  of any reliable and comparable baseline data.               ment, was commissioned.
                  Second, specific queries were made in relation
                  to the impact of the program as perceived by                The fieldwork was preceded by a two-day
                  the beneficiaries and other stakeholders of the             methodology-cum-planning workshop in Dhaka.
                  program, assuming that they would be able to                The actual fieldwork lasted four months--from
                  perceive the impact of the intervention on their            mid-January to mid-May 2001. The study team
                  own lives in a way that would not be possible               comprised 23 people: 5 Bangladeshi researchers,
                  for others. And finally, views of non-beneficia-            3 European researchers, 6 research assistants,
                  ries and non-stakeholders were sought to have               and 9 field assistants (all from Bangladesh). The
                  opinions from people who do not have any                    researchers spent 1.5­3.5 months in the field, the
                  valid reason for either understating or overstat-           assistants 2­4 months. Most of the time the team
                  ing the impact of the program. It was through               worked 60­70 hours a week. So it takes a good

84
   a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




deal of resources to accomplish such a big and                tion which is similar to the DAC definition. The
complex impact study.                                         key feature of IFAD evaluations is that they are
                                                              conducted just before or immediately after project
case 2: combining qualitative and                             conclusion: the effects can be observed after 4­7
quantitative descriptive methods--                            years of operations and the future evolution can be
Mixed-method impact evaluation of IFAd                        estimated through an educated guess on sustain-
projects in gambia, ghana, and Morocco2                       ability perspectives. Several impact domains are
                                                              considered, including household income and
1. Summary                                                    assets, human capital, social capital, food security,
The evaluation included intended and unintended               environment, and institutions.
impacts and examined the magnitude, coverage,
and targeting of changes. It used mixed methods               3. Sequencing of the process and choice of
to gather evidence of impacts and the quality of              methods
processes with cross-checking among sources.                  This short case study is based on evaluations
With regard to assessing causal contribution,                 conducted in Gambia, Ghana, and Morocco
it must be noted that no baseline data were                   between 2004 and 2006. As explained above,
available. Instead a comparison group was                     evaluations had multiple questions to answer
constructed, and analysis of other contributing               and impact assessment was but one of them.
factors was made to ensure appropriate compar-                Moreover, impact domains were quite diverse.
isons. The evaluation was undertaken within                   This meant that some questions and domains
significant resource constraints and was carried              required quantitative evidence (e.g., in the case
out by an interdisciplinary team.                             of household income and assets), whereas a
                                                              more qualitative assessment would be in order
2. Introduction and background                                for other domains (e.g., social capital). In many
Evaluations of rural development projects and                 instances, however, more than one method
country programs are routinely conducted by                   would have to be used to answer the same
the Office of Evaluation of IFAD. The ultimate                questions to cross-check the validity of findings,
objectives of these evaluations is to set a basis             identify discrepancies, and formulate hypotheses
for accountability by assessing the development               on the explanation of apparent inconsistencies.
results and contribute to learning and improve-
ment of design and implementation by providing                As the final objective of the evaluation was not
lessons learned and practical recommendations.                only to assess results but also to provide future
These evaluations follow a standardized method-               intervention designers with adequate knowledge
ology and a set of evaluation questions including             and insights, the evaluation design could not be
the following: (i) project performance (relevance,            confined to addressing a dichotomy between
effectiveness, and efficiency), (ii) project impact,          "significant impact has been observed" and "no
(iii) overarching factors (sustainability, innova-            significant impact has been observed." Findings
tion, and replication) and (iv) the performance of            would need to be rich enough and grounded in
the partners. As can be seen, impact is but one of            field experience to provide a plausible explana-
the key evaluation questions and the resources                tion that would lead, when suitable, to a solution
allocated to the evaluation (budget, specialists,             to identified problems and to recommendations
and time) that have to be shared for the entirety             to improve the design and the execution of the
of the evaluation.                                            operations.

Thus, these evaluations are to be conducted under             Countries and projects considered in this case
resource constraints. In addition, very limited data          study were diverse. In all cases, however, the first
are available on socio-economic changes taking                step in the evaluation consisted of a desk review
place in the project area that can be ascribed to an          of the project documentation. This allowed the
impact definition. IFAD adopts an impact defini-              evaluation team to understand or reconstruct

                                                                                                                                          85
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  the intervention theory (often implicit) and the            synergies or conflicts between parallel dynamics
                  logical framework. In turn, this would help to              could not be done simply through inferential
                  identify a set of hypotheses on changes that may            statistical instruments but required interaction
                  be observed in the field as well as on intermedi-           with a wider range of stakeholders.
                  ary steps that would lead to those changes.
                                                                              The third step in the process was the fielding of a
                  In particular, the preliminary desk analysis                data collection survey (after pre-testing the instru-
                  highlighted that the results assessment would               ments) that would help the evaluation cope with
                  have to be supplemented with some analysis of               the dearth of impact data. The selected techniques
                  implementation performance. The latter would                for data collection included a quantitative survey
                  include some insight into the business processes            with a range of 200­300 households (including
                  (e.g., the management and resource allocation               both project and control groups) and a more
                  made by the project implementation unit) and                reduced set of focus group discussion with groups
                  the quality of service rendered (e.g., the topics           of project users and "control groups" stratified
                  and the communication quality of an extension               based on the economic activities in which they
                  service or the construction quality of a feeder             had engaged and the area they were leaving.
                  road or of a drinking water scheme).
                                                                              In the quantitative survey standardized question-
                  The second step was to conduct a prepara-                   naires were administered to final project
                  tory mission. This mission was instrumental in              users (mostly farmers or herders) as well as to
                  fine-tuning our hypotheses on project results               non-project groups (control observations) on
                  and designing the methods and instruments.                  the situation before (recall methods) and after
                  Given the special emphasis of the IFAD interven-            the project. Recall methods were adopted to
                  tions on the rural poor, impact evaluation would            make up for the absence of a baseline.
                  need to shed light, to the extent possible, on the
                  following dimensions of impact: (i) magnitude               In the course of focus group interviews, open-ended
                  of changes, (ii) coverage (i.e., the number of              discussion guidelines were adopted; results were
                  persons or households served by the projects),              mostly of a qualitative nature. Some of the focus
                  and (iii) targeting (i.e., gauging the distribution         group facilitators had also been involved in the
                  of project benefits according to social, ethnic, or         quantitative survey and could refer the discus-
                  gender grouping).                                           sion to observations previously made. After the
                                                                              completion of data collection and analysis, a first
                  As pointed out before, a major concern was the              cross-checking of results could be made between
                  absence of a baseline survey which could be                 the results of quantitative and qualitative analysis.
                  used as a reference for impact assessment. This
                  required reconstructing the "before project"                As a fourth step, an interdisciplinary evaluation
                  situation. By the same token, it was clear that             team would be fielded. Results from the prelimi-
                  the observed results could not simply be attrib-            nary data collection exercise were made available
                  uted to the evaluated interventions. In addition            to the evaluation team. The data collection coordi-
                  to exogenous factors such as weather changes,               nator was a member of the evaluation team or in
                  other important factors were at play, for example,          a position to advise its members. The evaluation
                  changes in government strategies and policies               would conduct field visits and conduct a further
                  (such as the increased support to grassroots                validation survey and collect focus group data
                  associations by Moroccan public agencies) or                through participant observations and interviews
                  operations supported by other development                   with key informants (and further focus group
                  organizations in the same or in adjacent zones.             discussions if necessary). The team would also
                  This meant that the evaluated interventions                 spend adequate time with project management
                  would interplay with existing dynamics and                  units to gather a better insight of implementation
                  interact with other interventions. Understanding            and business processes.

86
   a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




The final impact assessment would be made by                     ered control groups, on the grounds that they
means of triangulation of evidence captured                      would broadly satisfy the same eligibility criteria
from the (scarce) existing documentation, the                    at entry as "older" project clients. However, no
preliminary data collection exercise, and the                    statistical technique (e.g., instrumental variables,
main interdisciplinary mission (figure A11.1).                   Heckman's procedure or propensity score) was
                                                                 adopted to test for sampling bias, due to limited
4. Constraints in data gathering and analysis                    time and resources.
Threats to the validity of recall methods. Accord-
ing to the available literature sources3 and our                 Coping with linguistic gaps. Given the broad
own experience, the reliability of recall methods                scope of the evaluations, a team of international
may be questionable for monetary indicators                      sector specialists was required. However, interna-
(e.g., income) but higher for easier-to-remember                 tional experts were not necessarily the best
facts (e.g., household appliances, approximate                   suited for data collection analysis, which calls
herd size). Focus group discussions helped                       for fluency in the local vernacular, knowledge
identify possible sources of bias in the quantita-               of local practices, and skills to obtain the most
tive survey and ways to address them.                            possible information within a limited time frame.
                                                                 Staggering the process in several phases was a
Finding "equivalent" samples for with and                        viable solution. The preliminary data collection
without-project observations. One of the                         exercise was conducted by a team of local special-
challenges was to extract a control sample that                  ists, with university students or local teachers or
would be "similar" in the salient characteristics                literate nurses serving as enumerators.
to the project sample. In other words, problems
of sampling bias and endogeneity should have                     5. Main value added of mixed methods and
been controlled for (e.g., more entrepreneur-                    opportunities for improvement
ial people are more likely to participate in a                   The choice of methods was made taking into
rural finance intervention). In sampling control                 account the objectives of the evaluations and
observations, serious attempts were made to                      the resource constraints (time, budget, and
match project and non-project households                         expertise) in conducting the exercise. The
based on similarity of main economic activities,                 combination of multiple methods allowed us to
agro-ecological environment, household size,                     cross-check the evidence and understand, for
and resource endowment. In some instances,                       example, when survey questions were likely to
household that had just started to be served                     be misinterpreted or generate over- or under-
by the projects ("new entries") were consid-                     reporting. In contrast, quantitative evidence




  Figure A11.1: Final impact assessment triangulation



                                              Focus groups



                                                                        Interdisciplinary mission
                 Desk review and                                             (key informants,                         Final
                 secondary data                                         participant observations)                  assessment



                                           Quantitative survey




                                                                                                                                          87
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  allowed us to shed light on the prevalence of                   questionnaires can be filled in, in the absence
                  certain phenomena highlighted during the focus                  of major transportation problems.
                  group discussion. Finally, the interactions with            ˇ   Data coding: it may vary depending on the
                  key informants and project managers and staff                   length and complexity of the questionnaire.
                  helped us better understand the reasons for                     It is safe to assume 5­7 days.
                  under- or over-achievements and come up with                ˇ   Time for conducting focus groups discussions:
                  more practical recommendations.                                 7 days based on the hypothesis that around 10
                                                                                  FGD would be conducted by 2 teams.
                  The findings, together with the main conclu-                ˇ   Data analysis. Depending on the analysis re-
                  sions and recommendations in the report,                        quirement, it will require one to two weeks
                  were adopted to design new projects or a new                    only to generate the tables and summary of
                  country strategy. There was also interest from                  focus group discussions.
                  the concerned project implementation agencies               ˇ   Drafting survey report: 2 weeks.
                  in adopting the format of the survey to conduct
                  future impact assessments on their own. Due                 Note: As some of the above tasks can be conducted
                  to time constraints, only inferential analysis              simultaneously, the total time for conducting a
                  was conducted on the quantitative survey data.              preliminary data collection exercise may be lower
                  A full-fledged econometric analysis would have              than the sum of its parts.
                  been desirable. By the same token, further
                  analysis of focus group discussion outcomes                 case 3: combining qualitative and
                  would be desirable in principle.                            quantitative descriptive methods--
                                                                              Impact evaluation: Agricultural
                  6. A few highlights on the management                       development projects in guinea4
                  The overall process design, as well as the choice
                  of methods and the design of the data collection            1. Summary
                  instruments, was made by the lead evaluator in              The evaluation focused on impact in terms of
                  the Office of Evaluation of IFAD, in consultation           poverty alleviation; the distribution of benefits was
                  with international sectoral specialists and the             of particular interest, not just the mean effect. All
                  local survey coordinator. The pre-mission data              data gathering was conducted after the interven-
                  collection exercise was coordinated by a local              tion had been completed; mixed methods were
                  rural sociologist, with the help of a statistician for      used, including attention to describing the differ-
                  the design of the sampling framework and data               ent implementation contexts. Assessing causal
                  analysis.                                                   contribution is the major focus of the case study.
                                                                              A counterfactual was created by creating a
                  The time required for conducting the survey and             comparison group, taking into account the
                  focus groups was as follows:                                endogenous and exogenous factors affect-
                                                                              ing impacts. Modeling was used to develop
                  ˇ Develop draft questionnaire and sampling                  an estimate of the impact. With regard to the
                    frame, identify enumerators: 3 weeks.                     management of the impact evaluation, it should
                  ˇ Conduct a quick trip on the ground, contact               be noted that the study was undertaken as part
                    project authorities and pre-test question-                of doctoral dissertation work; the stakeholder
                    naires: 3 days.                                           engagement and subsequent use of it was
                  ˇ Train enumerators' and coders' team: 3 days.              limited.
                  ˇ Survey administering: depending on the length
                    of the questionnaire, on average an enumera-              This impact evaluation concerned two types of
                    tor will be able to fill no more than three to five       agricultural projects based in the Kpčlč region,
                    questionnaires per day. In addition, time needs           in Guinea. The first one5 was the Guinean Oil
                    to be allowed for travel, rest. With a team of 6          Palms and Rubber Company (SOGUIPAH).
                    enumerators, in 9­10 working days up to 200               It was founded in 1987 by the Guinean govern-

88
   a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




ment to take charge of developing palm oil and                First, a contextual analysis realized all along
rubber production at the national level. With the             the research work with key informants was
support of several donors, SOGUIPAH quickly set               necessary to describe the project implementa-
up a program of industrial plantations6 by negoti-            tion scheme, the contemporaneous events, and
ating the ownership of 22,830 ha with villagers.              the existing agrarian dynamics. It was also used
In addition, several successive programs were                 to assess qualitatively whether those dynamics
implemented between 1989 and 1998 with                        were attributable to the project. A series of
SOGUIPAH to establish contractual plantations7                surveys and historical interviews (focused on
on farmers' own land and at the request of the                the pre-project situation) were conducted to
farmers (1,552 ha of palm trees and 1,396 ha of               establish the most reliable baseline possible.
rubber trees) and to improve 1,093 ha of lowland              An area considered "witness" to the agrarian
areas for irrigated rice production.                          dynamic that would have existed in the project's
                                                              absence was identified.
The impact evaluation took place in a context of
policy debates among different rural stakehold-               Second, a preliminary structured survey (of
ers at a regional level: two seminars had been                about 240 households) was implemented, using
held in 2002 and 2003 between the farmers'                    recall to collect data on the farmers' situation
syndicates, the state administration, the private             in the pre-intervention period and during the
sector, and development partners (donors,                     project. It was the basis of a judgment sample to
NGOs) to discuss a regional strategy for agricul-             realize in-depth interviews (see below), which
tural development. These two seminars revealed                aimed at describing the farming systems and
that there was little evidence of what should be              rigorously quantifying the farmers' income.
done to alleviate rural poverty, despite a long
history of development projects. The impact of                3. Assessing causal attribution
these projects on farmers' income seemed to be                By conducting an early contextual analysis,
particularly relevant to assess, notably to compare           the evaluator was able to identify a typology of
the projects' efficiency.                                     farming systems that existed before the project.
                                                              To set up a sound counterfactual, a judgment
This question was investigated through doctoral               sample was realized among the 240 households
thesis work that was entirely managed by the                  surveyed, by choosing 100 production units
AGROPARISTECH. 8 It was financed by AFD,                      that had belonged to the same initial types of
one of the main donors in the rural sector in                 farming system and that had evolved with (in
Guinea. This thesis proposed a new method, the                the project area) or without the project (in the
systemic impact evaluation, aiming at quantifying             witness area).
impact using a qualitative approach. It enabled
the understanding of the process through which                In-depth understanding of the endogenous and
impact materializes and rigorous quantification of            exogenous factors influencing the evolution and
the impact of agricultural development projects               possible trajectories of farming systems enabled
on the farmers' income, using a counterfactual.               the evaluator to rigorously identify the individu-
The analysis is notably based on the comprehen-               als whose evolution with or without the project
sion of the agrarian dynamics and the farmers'                was comparable. This phase of judgment sample
strategies, and permits the quantification of ex              was followed by in-depth interviews with the
post impact but also to devise a model of ex ante             hundred farmers. The evaluator's direct involve-
evolution for the following years.                            ment in data collection was then essential, hence
                                                              the importance of a small sample. It would not
2. Gathering evidence of impact                               have been possible to gather reliable data on
The data collection was carried out entirely ex               yields, modifications to production structures
post. Several types of surveys and interviews                 over time, and producers' strategies from a large
were used to collect evidence of impact.                      survey sample in a rural context.

                                                                                                                                          89
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  Then, based on the understanding of the way                 original farming system and the various trajecto-
                  the project proceeded and of the trajectories               ries with and without the project, which could
                  of these farmers, with or without the project, it           not be ignored. Whereas former civil servants or
                  was possible to build a quantitative model, based           traditional landlords beneficiated large contrac-
                  on Gittinger's method of economic analysis                  tual plantations, other villagers were deprived of
                  of development projects (Gittinger, 1982). As               their land for the needs of the project or received
                  the initial diversity of production units was               surfaces of plantations too limited to improve
                  well identified before sampling, this model was             their economic situation.
                  constructed for each type of farming system
                  existing before the project. Understanding the              Therefore, it seems important that the impact
                  possible evolution of each farming system with              evaluation of a complex development project
                  and without the project allowed for the estima-             include an analysis of the diversity of cases created
                  tion of the differential created by the project on          by the intervention, directly or indirectly.
                  farmers' income, i.e., its impact.
                                                                              The primary interest of this new method was to
                  4. Ensuring rigor and quality                               give the opportunity to build a credible impact
                  Although the objective differences between each             assessment entirely ex post. Second, it gave an
                  production unit studied appear to leave room                estimate of the impact on different types of farming
                  for the researcher's subjectivity when construct-           systems, making explicit the existing inequalities
                  ing the typology and sample, the rationale                  in the distribution of the projects' benefits. Third,
                  behind the farming system concept made it                   it permitted a subtle understanding of the reasons
                  possible to transcend this possible arbitrariness.          why the desired impacts materialized or not.
                  What underlies this methodological jump from
                  a small number of interviews to a model is the              6. Influence
                  demonstration that a finite number of types of              The results from this impact assessment were
                  farming systems exists in reality.                          available after four years of field work and data
                                                                              treatment. They were presented to the Guinean
                  Moreover, the use of a comparison group, the                authorities and to the local representatives of the
                  triangulation of most data collected by in-depth            main donors in the rural sector. In the field, the
                  interviews through direct observation and                   results were delivered to the local communities
                  contextual analysis, and the constant implication           interviewed and to the farmers' syndicates. The
                  of the principal researcher were key factors to             Minister of Agriculture declared that he would try
                  ensure rigor and quality.                                   to foster more impact evaluations on agricultural
                                                                              development projects. Unfortunately, there is
                  5. Key findings                                             little hope that the conclusions of this research
                  The large survey of 240 households identified               will change the national policy about these types
                  11 trajectories related to the implementation of            of projects, in the absence of an institutional-
                  the project. Once each trajectory and impact was            ized forum for discussing it among the different
                  characterized and quantified through in-depth               stakeholders.
                  interviews and modeling, this survey permitted
                  as well quantifying a mean impact of the project,           case 4: A theory-based approach with
                  on the basis of the weight of each type in the              qualitative methods--global Environment
                  population. The mean impact was only 24 /year/              Facility impact evaluation 20079, 10
                  household in one village poorly served by the
                  project, due to its enclosed situation, whereas it          Evaluation of three gEF-protected area
                  was 200 /year/household in a central village.               projects in East Africa

                  Despite a positive mean impact, highly differen-            1. Description of evaluation
                  tiated impacts also existed, depending on the               The objectives of this evaluation included--

90
   a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




ˇ To test evaluation methodologies that can                   environmental indicators and global data sets
  assess the impact of GEF interventions. The                 than other focal areas, both within the GEF and
  key activity of the GEF is "providing new and               in the broader international arena. The Evalua-
  additional grant and concessional funding to                tion Office chose protected areas as the central
  meet the agreed incremental costs of mea-                   theme for this phase of the Impact Evaluation
  sures to achieve agreed global environmental                because protected areas are one of the primary
  benefits."11 The emphasis of this evaluation                approaches supported by the GEF biodiversity
  was therefore on verifying the achievement of               focal area and its implementing agencies, and the
  agreed global environmental benefits.                       GEF is the largest supporter of protected areas
ˇ Specifically, to test a theory of change ap-                globally; previous evaluations have noted that an
  proach to evaluation in GEF's biodiversity                  evaluation of the GEF support for protected areas
  focal area, and assess its potential for broader            has not been carried out and recommended that
  application within GEF evaluations.                         such a study be undertaken; protected areas are
ˇ To assess the sustainability and replication                based on a set of explicit change theories, not
  of the benefits of GEF support and extract les-             just in the GEF, but in the broader conservation
  sons. It evaluated whether and how project                  community; in many protected area projects,
  benefits have continued, and will continue,                 substantial field research has been undertaken,
  after project closure.                                      and some have usable baseline data on key
                                                              factors to be changed by the intervention; a
Primary users                                                 protected areas strategy can be addressed at
The primary users of the evaluation are GEF                   both a thematic and regional cluster level (as in
entities. They include the GEF Council, which                 East Africa, the region chosen for the study); and
requested the evaluation; the GEF Secretariat,                the biodiversity focal area team has made consid-
which will approve future protected area projects;            erable progress in identifying appropriate indica-
implementing agencies (such as the World Bank,                tors for protected areas through its "managing
UN agencies and regional development banks);                  for results" system.
and national stakeholders who will implement
future protected area projects.                               The choice of projects
                                                              Lessons from a set of related interventions (or
2. Evaluation design                                          projects) are more compelling than those from
                                                              an isolated study of an individual project. To test
Factors driving selection of evaluation design                the potential for aggregation of project results,
The Approach Paper to the impact evalua-                      enable comparisons across projects and ease
tion12 considered the overall GEF portfolio to                logistics, it was decided to adopt a sub-regional
develop an entry-point which could provide a                  focus and select a set of projects that are
good opportunity to develop and refine effective              geographically close to each other. East Africa
and implementable impact evaluation method-                   is the sub-region with the largest number of
ologies. Themes and projects that are relatively              complete and active projects in the GEF portfolio
straightforward to evaluate were emphasized.                  with a protected area component, utilizing large
The Evaluation Office adopted the DAC definition              GEF and cofinancing expenditure.
of impact, which determined that closed projects
would be evaluated to assess the sustainability of            The following three projects were selected for
GEF interventions.                                            evaluation:

Biodiversity and protected areas                              ˇ Bwindi Impenetrable National Park and Mga-
The biodiversity focal area has the largest                     hinga Gorilla National Park Conservation Proj-
number of projects within the GEF portfo-                       ect, Uganda (World Bank)
lio of currently active and completed projects.               ˇ Lewa Wildlife Conservancy, Kenya (World
In addition, biodiversity has developed more                    Bank)

                                                                                                                                          91
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                   ˇ Reducing Biodiversity Loss at Cross-Border                      intermediate outcomes, which are then expected
                     Sites in East Africa, Regional: Kenya, Tanzania,                to lead to impact (see figure A11.2). The process
                     Uganda (UNDP).                                                  of these interventions, in a given context, is
                                                                                     determined by the contribution of a variety of
                   These projects were implemented on behalf of                      actions at multiple levels, some of which are
                                                            .
                   the GEF by the World Bank and UNDP They have                      outside the purview of the intervention (e.g.,
                   a variety of biodiversity targets, some of which are              actions of exterior actors at the local, national,
                   relatively easy to monitor (gorillas, zebras, rhinos).            or global levels or change in political situations,
                   Also, these projects were evaluated positively by                 regional conflicts, and natural disasters).
                   terminal and other evaluations and the continu-                   Subsequently, an intervention may have different
                   ance of long-term results was predicted. The                      levels of achievement in its component parts,
                   Bwindi Impenetrable National Park and                             giving mixed results towards its objectives.
                   Mgahinga Gorilla National Park Conservation
                   Project is a $6.7 million full-size project and the               The use of a hybrid evaluation model
                   first GEF-sponsored trust fund in Africa. The Lewa                During field testing it was decided that, given
                   Wildlife Conservancy is a medium-sized project,                   the intensive data requirements of a theory of
                   within a private wildlife conservation company.                   change approach and the intention to examine
                   The Reducing Biodiversity Loss at Cross-Border                    project impacts, the evaluation would mainly
                   Sites in East Africa Cross project is a $12 million               focus on the later elements of each project's
                   project, implemented at field level by government                 theory of change, when outcomes are expected
                   agencies, that aims to foster an enabling environ-                to lead to impact. Based on this approach, the
                   ment for the sustainable use of biodiversity.                     evaluation developed a methodology composed
                                                                                     of three components (see figure A11.3):
                   The advantages of a theory of change approach
                   An intervention generally consists of several                     ˇ Assessing implementation success and fail-
                   complementary activities that together produce                      ure: To understand the contributions of the




  Figure A11.2: Generic representation of a project's theory of change


                                                                                 Results continuum


                    Inputs                       Outputs                             Outcomes                         Impacts
                  The human,                  The immediate                        An intermediate               The ultimate result
                organizational,                 product of                          result brought                of a combination
                financial, and                project actions                          about by                     of outcomes
               material resources                                                     producing                      contributed
                contributed to                                                         outputs                     by the project
                   a project



                                                                      Intervention Process (at each stage)

                                                                  Activities                         Assumptions
                                                                Tasks carried                        Theory behind
                                                                out by project                          activity




92
   a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




  project at earlier stages of the results con-                         theory of change approach constructs and
  tinuum, leading to project outputs and out-                           validates the project logic connecting out-
  comes, a logframe analysis is used. Though                            comes and ultimate project impact. It involves
  the normally complex and iterative process                            a comprehensive assessment of the activities
  of project implementation is not captured by                          undertaken after project closure, along with
  this method, the logframe provides a means of                         their explicit and implicit assumptions. This
  tracing the realization of declared objectives.                       component enables an assessment of the
  GEF interventions aim to "assist in the protec-                       sustainability and/or catalytic nature of proj-
  tion of the global environment and promote                            ect interventions and provides a composite
  thereby environmentally sound and sustain-                            qualitative ranking for the achievements of
  able economic development."13                                         the projects. Elements of the varied aspects
ˇ Assessing the level of contribution (i.e., im-                        of sustainability include behavior change and
  pact): To provide a direct measure of project                         the effectiveness of capacity-building activities,
  impacts, a targets-threats analysis (threats-                         financial mechanisms, legislative change, and
  based analysis) is used to determine whether                          institutional development.
  global environmental benefits have actually
  been produced and safeguarded.14 The robust-                   The model incorporates three different elements
  ness of global environment benefits identified                 that may be involved in the transformation of project
  for each project (targets) is evaluated by col-                outcomes into impacts. These are as follows, and
  lecting information on attributes relating to the              were each scored for the level of achievement of
  targets' biological composition, environmental                 the project in converting outcomes into impacts:
  requirements, and ecological interactions. This
  analysis of targets is complemented by an as-                  ˇ Intermediary states. These are conditions that
  sessment of the level of "threat" (e.g., preda-                  are expected to be produced on the way to
  tion, stakeholder attitude, and behavior) faced                  delivering the intended impacts.
  by the global environment benefits. For targets                ˇ Impact drivers. These are significant factors
  and significant threats, trends over time (at                    or conditions that are expected to contribute
  project start, at project close, and currently),                 to the ultimate realization of project impacts.
  and across project and non-project areas are                     Existence of the impact driver in relation
  sought, so that a comparison is available to                     to the project being assessed suggests that
  assess levels of change.                                         there is a good likelihood that the intended
ˇ Explanations for observed impact: To unpack                      project impact will have been achieved. Ab-
  the processes by which the project addresses                     sence of these suggests that the intended
  and contributes to impact, an outcomes-im-                       impact may not have occurred or may be
  pacts theory of change analysis is used. This                    diminished.




  Figure A11.3: Components of impact evaluation framework


                Project logframe analysis                     Outcomes-impacts TOC analysis                       Threats-based analysis


                                               Assumption                 Assumption               Assumption

                                                                                                                             Impact
                                                                                                                 Reduced          Enhanced
                                                             State/                     State/
            Outputs                  Outcome                                                                    threats to        status of
                                                            condition                  condition
                                                                                                                   GEB              GEB




                                                                                                                                              93
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  ˇ External assumptions. These are potential                 external sources. Given that all three projects are
                    events or changes in the project environment              now closed, the participation from former project
                    that would negatively or positively affect the            staff enabled a candid and detailed exchange of
                    ability of a project outcome to lead to the in-           information (during workshops in Uganda and
                    tended impact, but that are largely beyond the            Kenya). The participants in return found the
                    power of the project to influence or address.             process to be empowering, as it clarified and
                                                                              supported the rationale for their actions (by
                  3. Data collection and constraints                          drawing out the logical connections between
                                                                              activities, goals and assumptions) and enabled
                  Logical framework and theory of change model                them to plan for future interventions.
                  The approach built on existing project logical
                  frameworks, implying that a significant part of             External validity: Given the small number of
                  the framework could be relatively easily tested             projects, their variety, and age (approved in varied
                  through an examination of existing project                  past GEF replenishment phases), the evaluation
                  documentation, terminal evaluation reports                  did not expect to produce findings that could be
                  and, where available, monitoring data. Where                directly aggregated. Nevertheless, given the very
                  necessary, targeted consultations and additional            detailed analysis of the interventions a few years
                  studies were carried out.                                   after project closure, it did provide a wealth of
                                                                              insights into the functioning of protected area
                  Assessing conservation status and threats to global         projects, particularly elements of their sustain-
                  environment benefits                                        ability after project closure. This allowed limited
                  A data collection framework for assessing the               generalization on key factors associated with
                  status of the targets and associated threats was            achievement of impact, on the basis of differ-
                  developed, identifying indicators for each, along           ent levels of results related to a set of common
                  with the potential sources of information. For the          linkages in the theoretical models. On this basis,
                  Bwindi and Lewa projects, the task of collecting            the Evaluation Office recommended that the GEF
                  and assessing this information was undertaken               Secretariat ensure specific monitoring of progress
                  by scientists from the Institute of Tropical Forest         toward institutional continuity of protected areas
                  Conservation, headquartered in Bwindi Impene-               throughout the life of a project.
                  trable National Park, and the Lewa Research
                  Department respectively. For the Cross-Borders              Weaknesses
                  project, this exercise was done by the Conserva-            Impact evaluations are generally acknowledged
                  tion Development Center, based on the existing              to be highly challenging. The objective of this
                  project documentation, a field visit to the project         particular study, examining GEF's impact at a
                  site, and consultations with key informants. The            "global" level in biodiversity, makes the study
                  objective of this exercise was to provide quantita-         particularly complex. A few concerns:
                  tive measures for each indicator from before the
                  project (baseline), at the project close, and present       ˇ The nature of changes in biodiversity is still
                  day. Where quantitative data were not available,              under debate. Such changes are often non-
                  detailed qualitative data were collected.                     linear, with uncertain time scales even in
                                                                                the short run, interactions within and across
                  Improving rigor                                               species, and exogenous factors (e.g., climate
                  Internal validity: The evaluation used a partici-             change). Evidence regarding the achievement
                  patory approach with substantial involvement of               of global environment benefits and their sus-
                  former project staff in drawing out theories of               tainability must therefore be presented with
                  change and subsequently providing data for verifi-            numerous caveats.
                  cation. These data were verified by local indepen-          ˇ Numerous explanations and assumptions may
                  dent consultants, via a process of triangulating              be identified for each activity that is carried
                  information from project documentation and                    out.

94
   a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




ˇ The approach may not always uncover unex-                   depleting substances in eastern Europe) should
  pected outcomes or synergies, unless they                   seek to combine Theory of Change approaches
  are anticipated in the theories or assumptions              with opportunistic use of existing data sets,
  of the evaluation team. However, fieldwork                  which might provide some level of quantifiable
  should be able to discern such outcomes, as                 counterfactual information.
  was the case in the Bwindi case study, which
  produced evidence of a number of unexpected                 Application: Impact of Lewa Wildlife
  negative impacts on local indigenous people.                Conservancy (Kenya)15
ˇ The association between activities and out-
  comes in the Theory of Change approach                      Context
  depends on measuring the level of activities                The Lewa GEF medium-sized project provided
  carried out, and then consciously (logically)               support for the further development of Lewa
  linking them with impact through a chain of                 Wildlife Conservancy ("Lewa"), a not-for-
  intermediate linkages and outcomes. Informa-                profit private wildlife conservation company
  tion on these intermediate outcomes may be                  that operates on 62,000 acres of land in Meru
  difficult to obtain, unless former project imple-           District, Kenya. The GEF awarded Lewa a grant
  menters participate fully in the evaluation.                of $0.75 million for the period 2000 to the end
                                                              of 2003, with co-financing amounting to $3.193
4. Concluding thoughts on the evaluation                      million.
approach
For biodiversity, GEF's first strategic priority is           Since the GEF grant, Lewa has been instru-
catalyzing sustainability of protected area                   mental in initiating the formation of the
systems, which aims for an expected impact                    Northern Rangelands Trust (NRT) in 2004. NRT
whereby "biodiversity [is] conserved and sustain-             is an umbrella local organization with a goal of
ably used in protected area systems."                         collectively developing strong community-led
                                                              institutions as a foundation for investment in
The advantage of the hybrid evaluation model                  community development and wildlife conserva-
used was that by focusing toward the end of the               tion in the Northern Rangelands of Kenya. The
results chain, it examined the combination of                 NRT membership comprises community conser-
mechanisms in place that led to a project's impacts           vation conservancies and trusts, local county
and ensure sustainability of results. It is during this       councils, the Kenya Wildlife Service, the private
later stage, after project closure, that the ecologi-         sector, and NGOs established and working within
cal, financial, political, socio-economic and institu-        the broader ecosystem. The establishment and
tional sustainability of the project are tested, along        functioning of the NRT has therefore been a very
with its catalytic effects. During project conceptu-          important aspect in understanding and assessing
alization, given the discounting of time, links from          the ultimate achievement of impacts from the
outcome to impact are often weak. Once a project              original GEF investment.
closes, the role of actors, activities, and resources
is often unclear; this evaluation highlighted these           The Lewa case study implemented the three
links and assumptions.                                        elements of the Impact Evaluation Framework,
                                                              which are summarized below.
Adopting a theory of change approach also
had the potential to provide a mechanism that                 Assess implementation success and failure
helped GEF understand what has worked and                     Given that no project logical framework or
what has not worked and allows for predictions                outcomes were defined as such in the original
regarding the probability of success for similar              GEF project brief, the GEF Evaluation Office
projects. The Evaluation Office team concluded                team for the Study of Local Benefits in Lewa,
that the most effective combination for its next              with the participation of senior Lewa staff, identi-
round of impact evaluation (phase-out of ozone-               fied project outcomes and associated outputs

                                                                                                                                          95
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                     that reflected the various intervention strategies                Assess the level of contribution (i.e., impact)
                     employed by the project and identified missed                     A targets-threats analysis of those ecological
                     opportunities in achieving the project goals. The                 features identified as global environment benefits
                     assessment provided an understanding of the                       (black rhinos and Grevy's zebra) was undertaken
                     project logic used (figure A11.2) and a review of                 with input from scientists from Lewa and the NRT
                     the fidelity with which the project activities were               research departments. Tables A11.2 and A11.3
                     implemented (figure A11.3).                                       provide an overview of the variables considered




  Figure A11.4: Project outputs and outcomes



     output 1.1: Management capacity of Lewa
     strengthened
                                                            outcome 1: Long-term institutional
                                                             and financial capacity of Lewa to
     output 1.2: Lewa revenue streams and                    provide global and local benefits
     funding base enhanced                                      from wildlife conservation
                                                                       strengthened
     output 1.3: Strategic plans and partnerships
     developed to improve effectiveness



     output 2.1: Security of endangered species
     increased
                                                                 outcome 2: Protection &                         outcome 2: Protection &
                                                                management of endangered                        management of endangered
     output 2.2: Research and monitoring of                     wildlife species in the wider                   wildlife species in the wider
     wildlife and habitats improved                              ecosystem strengthened                          ecosystem strengthened


     output 3.1: Capacity of local communities to
     undertake conservation-compatible income-
     generating activities strengthened

     output 3.2: Community natural resource                    outcome 3: Community-based
     management institutions strengthened and                     conservation and natural
     structures enhanced                                      resource management initiatives
                                                                       strengthened
     output 3.3: Community skills and roles
     developed to optimise wildlife benefits




  Table A11.1: Project Outcomes


  outcomes                                                                                                               Assessment
  outcome 1: Long-term institutional and financial capacity of Lewa to provide global and local benefits from            Fully achieved (5)
  wildlife conservation strengthened
  outcome 2: Protection and management of endangered wildlife species in the wider ecosystem strengthened                Well achieved (4)
  outcome 3: Community-based conservation and natural resource management initiatives strengthened                       Well achieved (4)


96
     a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




    Table A11.2: Change in key ecological attributes over time


                                                                                                                            conservation Status
    Key ecological
    attribute                                         Indicator                                unit            Baseline          Project end       now          trend
    Black rhino
    Population size                   Total population size of black rhino
                                      on Lewa                                                Number                   29                     40       54
    Productivity                      Annual growth rates at Lewa                            Percent                  12                     13       15

    Suitable secure habitat           Size of Lewa rhino sanctuary                           Acres              55,000                55,000       62,000

    Genetic diversity                 Degree of genetic variation                               --                             No data available
    grevy's zebra
    Population size                   Total population size of Grevy's zebra
                                      on Lewa                                                Number                 497                  435         430
    Productivity                      Annual foaling rates on Lewa                           Percent                  11                     11       12

    Population distribution           Number of known sub-populations                                                          No data available
                                      and connectivity
    Suitable habitat                  Community conservancies set aside
    (grassland and secure             for conservation under NRT                             Number                    3                      4       15
    water)
    Genetic diversity                 Degree of genetic variation                                                              No data available




    Table A11.3: Current threats to the global environment benefits


                                                                                    Severitya                                Scopeb
                                                                                   score (1­4)                             score (1­4)             overall ranking
    Black rhino
    Poaching and snaring                                                                 3                                      3                           3
    Insufficient secure areas                                                            2                                      3                           2
    Habitat loss (due to elephant density)                                               1                                      1                           1
    grevy's zebra
    Poaching                                                                             2                                      2                           2
    Disease                                                                              4                                      2                           3
    Predation                                                                            3                                      1                           2
    Habitat loss/ degradation                                                            3                                      3                           3
    Insufficient secure areas                                                            2                                      2                           2
    Hybridization with Burchell's zebra                                                  1                                      1                           1
a   Severity (level of damage): Destroy or eliminate GEBs/Seriously degrade the GEBs/Moderately degrade the GEBs/Slightly impair the GEBs.
b   Scope (geographic extent): Very widespread or pervasive/Widespread/Localized/Very localized.



                                                                                                                                                                        97
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  to increase robustness of the understanding of              In sum for Lewa
                  ecological changes that have taken place since              The analysis provided indication that the black
                  before the project started.                                 rhino and Grevy's zebra populations on the
                                                                              Lewa Conservancy are very well managed and
                  Provide explanations for observed impact                    protected. Perhaps the most notable achievement
                  Theory of change models were developed for                  has been the visionary, catalytic, and support role
                  each project outcome to establish contribu-                 that Lewa has provided for the conservation of
                  tion; the framework reflected in figure A11.5               these endangered species in the broader ecosys-
                  was used. This analysis enabled an examination              tem, beyond Lewa. Lewa has played a significant
                  of the links between observed project interven-             role in the protection and management of about
                  tions and observed impact. As per GEF princi-               40% of Kenya's black rhino population and is
                  ples, factors that were examined as potentially             providing leadership in finding innovative ways
                  influencing results included the appropriate-               to increase the coverage of secure sanctuaries
                  ness of intervention, the sustainability of the             for black rhinos. Regarding the conservation of
                  intervention and its catalytic effect--these                Grevy's zebra, Lewa's role in the establishment
                  are referred to as impact drivers. The next                 of community conservancies, which have added
                  step involved the identification of intermedi-              almost 1 million acres of land set aside for conser-
                  ary states, examining whether the successful                vation, has been unprecedented in East Africa
                  achievement of a specific project outcome                   and is enabling the recovery of Grevy's zebra
                  would directly lead to the intended impacts                 populations within their natural range. However,
                  and, if not, identifying additional conditions              the costs and resources required to manage
                  that would need to be met to deliver the impact.            and protect this increasing conservation estate
                  Taking cognizance of factors that are beyond                are substantial, and unless the continued and
                  project control, the final step identified those            increasing financing streams are maintained, it is
                  factors that are necessary for the realization              possible that the substantial gains in the conser-
                  and sustainability of the intermediary state(s)             vation of this ecosystem and its global environ-
                  and ultimate impacts, but outside the project's             mental benefits could eventually be reversed.
                  influence.
                                                                              In conclusion
                  An example is provided by a consideration of                The assessment of project conceptualization and
                  Outcome 3 that via community-based conser-                  implementation of project activities in Lewa has
                  vation and natural resource management                      been favorable, but, this is coupled with indica-
                  initiatives strengthened, expected to achieve               tions that threats from poaching, disease, and
                  enhanced conservation of black rhinos and                   habitat loss in and around Lewa continue to be
                  Grevy's zebras. The theory of change model                  severe. Moreover, evaluation of the other case
                  linking Outcome 3 to the intended impacts                   studies, Bwindi Impenetrable National Park and
                  is illustrated below, in figure A11.6. The                  Mgahinga Gorilla National Park Conservation
                  overall logframe assessment of the project's                Project, Uganda and Reducing Biodiversity Loss at
                  implementation for community-based conser-                  Cross-Border Sites in East Africa, Regional: Kenya,
                  vation and natural resource management                      Tanzania, Uganda, confirmed that to achieve
                  was well achieved. All intermediate factors/                long-term results in the generation of global
                  impact drivers/external assumptions that were               environment benefits the absence of a specific
                  identified received a score of partially to well            plan for institutionalized continuation would,
                  achieved, indicating that together with all its             in particular, reduce results over time--this was
                  activities, this component was well-conceived               the major conclusion of the GEF's pilot impact
                  and implemented.                                            evaluation.




98
 a p p E n d I x 1 1 : E va l u at I o n s b a s E d o n q u a l I tat I v E a n d q u a n t I tat I v E d E s c r I p t I v E m E t h o d s




Figure A11.5: Framework to establish contribution


                             External
                            assumption

                                                    Intermediate                              Impact                  Impact (enhanced
 Project outcome
                                                        state                            (reduced threats)           conservation status)


                            Impact driver




Figure A11.6: Model linking outcome to impact



                     LWC capacity
                    building in local
                       community
                     institutions is
                   scaled up to meet
                    demand [S2/ C2]                         Increased
                                                                                          Reduced threats
                                                           community
                                                                                           from poaching
                                                          support and
                                                                                           and the lack of
    outcome 3                                            land set aside
                                                                                            secure areas
    Community-                                          for conservation                                                  Impact
       based                                                                                                             Enhanced
   conservation                                                                                                      conservation status
    and natural                                            Community                                                      of GEBs
                                                                                         Reduced pressure
      resource                                          natural resource
                                                                                          on local natural
   management                                             needs better
                                                                                          resource base/
     initiatives                                           met in long
                                                                                          wildlife habitat
   strengthened                                              term


                                             Other community
                     Conservation-
                                                 land uses                     Livelihood
                    based land uses
                                             complement and                improvements don't
                   make a significant
                                             do not undermine               lead to increased
                     contribution to
                                            conservation-based                 population
                    livelihoods [A2]
                                              land uses [A1]




                                                                                                                                            99
                                              APPENDIX 12: FURTHER INFORMATION ON REVIEW AND
                                                   SYNTHESIS APPROACHES IN IMPACT EVALUATION




realist synthesis                                     and the realist synthesis. Both perspectives
This approach is different from the systematic        have something to offer. Opening the black box
research reviews. It conceptualizes interven-         of an intervention under review will be helpful
tions, programs, and policies as theories and         for experimental evaluators if they want to
collects earlier research findings by interpreting    understand why interventions have (no) effects
the specific policy instrument that is evaluated,     and/or side effects. Realists are confronted with
as an example or specimen of more generic             the problem of the selection of studies to be
instruments and tools (of governments). Next it       taken into account, ranging from opinion surveys,
describes the intervention in terms of its context,   oral history, and newspaper content analysis to
mechanisms (what makes the program work),             results based on more sophisticated methodolo-
and outcomes (the deliverables).                      gies. As the methodological quality of evaluations
                                                      can be and sometimes is a problem, particularly
Instead of synthesizing results from evalua-          with regard to the measurement of the impact
tions and other studies per intervention or per       of a program, realists can benefit from a stricter
program, realist evaluators first open the black      methodology and protocol, like the one used
box of an intervention and synthesize knowledge       by the Campbell Collaboration, when doing a
about social and behavioral mechanisms.               synthesis. For example, knowledge that is to be
Examples are Pawson's study of incentives             generalized should be credible and valid.
(Pawson, 2002), on naming and shaming, and
Megan´s law (Pawson, 2006) and Kruisbergen's          To combine Campbell standards and the realist
work (2005) on fear-arousal communication             evaluation approach, Van der Knaap et al. (2008)
campaigns trying to reduce the smuggling of           first conducted a systematic review according to
cocaine.                                              the Campbell standards. The research questions
                                                      were formulated, and next the inclusion and
Contrary to producers of systematic research          exclusion criteria were determined. This
reviews, realist evaluators do not use a hierarchy    included a number of questions. What types of
of research designs. For realists an impact study     interventions are included? At which participants
using the RCT design is not necessarily better        should interventions be aimed? What kinds of
than a comparative case study design or a process     outcome data should be reported? At this stage,
evaluation. The problem (of an evaluation) that       criteria were also formulated for inclusion and
needs to be addressed is crucial in selecting the     exclusion of study designs and methodological
design or method, not vice versa.                     quality. As a third step, the search for potential
                                                      studies was explicitly described. Once potentially
combining different meta approaches                   relevant studies had been identified, they were
In a study on the question which public policy        screened for eligibility according to the inclusion
programs designed to reduce and/or prevent            and exclusion criteria.
violence in the public arena work best, Van der
Knaap et al. (2008) have shown the relevance          After selecting the relevant studies, the quality
of combining the systematic research review           of these studies had to be determined. Van der

                                                                                                            101
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  Knaap et al (2008) used the Maryland Scientific             working or why has it not worked? Two research-
                  Methods Scale (MSMS) (Sherman et al., 1998;                 ers independently articulated these underlying
                  Welsh and Farrington, 2006). This is a five-point           mechanisms. The focus was on behavioral and
                  scale that enables researchers to draw conclu-              social "cogs and wheels" of the intervention
                  sions on methodological quality of outcome                  (Elster, 1989; 2007).
                  evaluations in terms of the internal validity. Using
                  a scale of 1­5, the MSMS is applied to rate the             In a second step the studies under review were
                  strength of scientific evidence, with 1 being the           searched for information on contexts (schools,
                  weakest and 5 the strongest scientific evidence             streets, banks, etc., but also types of offenders
                  needed for inferring cause and effect.                      and victims and type of crime) and outcomes.
                                                                              This completed the C[ontext], M[echanism]
                  Based on the MSMS scores, the authors then                  and O[utcome] approach that characterizes
                  classified each of the 36 interventions that were           realist evaluations. However, not every original
                  inventoried by analyzing some 450 English,                  evaluation study described which mechanisms
                  German, French, and Dutch articles and                      are assumed to be at work when the program is
                  papers into the following categories: effective,            implemented. The same goes for contexts and
                  potentially effective, potentially ineffective, and         outcomes. This meant that in most cases missing
                  ineffective.                                                links in or between different statements in the
                                                                              evaluation study had to be identified through
                  However, not all studies could be grouped in one            argumentational analysis.
                  of the four categories: in 16 cases the quality of the
                  study design was not good enough to decide on               Based on the evaluations analyzed, Van der
                  the effectiveness of a measure. The (remaining)             Knaap et al. (2008) traced the following three
                  nine interventions were labeled effective and               mechanisms to be at work in programs that had
                  the (final) six were labeled potentially effective.         demonstrated their impact or the very-likely-to-
                  Four interventions were labeled potentially                 come-impact:
                  ineffective and one was labeled ineffective in
                  preventing violence in the public and semi-public           ˇ The first is of a cognitive nature, focusing on
                  domain.                                                       learning, teaching, and training.
                                                                              ˇ The second (overarching) mechanism con-
                  To combine Campbell standards and the realist                 cerns the way the (social) environment is
                  evaluation approach, the realist approach was                 rewarding or punishing behavior (through
                  applied after finishing the Campbell-style                    bonding, community development, and the
                  systematic review. This means that only then                  targeting of police activities).
                  the underlying mechanisms and contexts as                   ˇ The third mechanism is risk reduction, for
                  described in the studies included in the review               instance, promoting protective factors.
                  were on the agenda of the evaluator. This was
                  done for the four types of interventions, whether           concluding remarks on review and
                  they were measured as being effective, potentially          synthesis approaches
                  effective, potentially ineffective, or ineffective. As      Given the "fleets" (Weiss, 1998) and the streams
                  a first step, information was collected concern-            of studies (Rist and Stame, 2006) in the world
                  ing social and behavioral mechanisms that were              of evaluation, it is not recommended to start
                  assumed to be at work when the program or                   an impact evaluation of a specific program,
                  intervention was implemented. Pawson (2006:                 intervention, or tool of government without
                  24) refers to this process as "to look beneath the          making use of the accumulated evidence to be
                  surface [of a program] in order to inspect how              found in systematic reviews and other types of
                  they work." One way of doing this is to search              meta-studies. One reason concerns the efficiency
                  articles under review for statements that address           of the investments: what has been sorted out
                  the why question: why will this intervention be             does not need (always) to be sorted out again.

102
a p p E n d I x 1 2 : f u rt h E r I n f o r m at I o n o n r E v I E w a n d s y n t h E s I s a p p r o a c h E s I n I m pa c t E va l u at I o n




If over and over again it has been found that                    Different approaches in the world of (impact)
awareness-raising leads to behavior changes only                 evaluation are a wise thing to have, but
under specific conditions, then it is wise to have               (continuous) paradigm wars ("randomistas
that knowledge ready before designing a similar                  versus relativistas"--realists versus experi-
program or evaluation. A second reason is that                   mentalists) run the risk of developing into
by using results from synthesis studies the test                 intellectual ostracism. Wars also run the risk of
of an intervention theory can be done with more                  vesting the image of evaluations as a "helter-
rigor. The larger the discrepancy between what                   skelter mishmash [and] a stew of hit-or-miss
is known about mechanisms a policy or program                    procedures" (Perloff, 2003), which is not
believes to be at work and what the policy in fact               the best perspective to live with. Combining
tries to set into motion, the smaller the chances                perspectives and paradigms should therefore
of an effective intervention.                                    be stimulated.




                                                                                                                                               103
                                                               APPENDIX 13: BASIC EDUCATION IN GHANA1




Introduction                                           of communities over a 15-year period. The test
In 1986 the government of Ghana embarked on            scores are directly comparable because exactly
an ambitious program of educational reform,            the same tests were used in 2003 as had been
shortening the length of pre-university education      applied 15 years earlier.
from 17 to 12 years, reducing subsidies at the
secondary and tertiary levels, increasing the length   There was no clearly defined project for this
of the school day, and taking steps to eliminate       study, rather support to the sub-sector through
unqualified teachers from schools. These reforms       four large operations. The four projects had
were supported by four World Bank credits--the         supported a range of activities, from rehabilitating
Education Sector Adjustment Credits I and II, the      school buildings to assisting in the formation of
Primary School Development Project, and the            community-based school management commit-
Basic Education Sector Improvement Project. An         tees. To identify the impact of these various activi-
impact study by IEG looked at what had happened        ties a regression-based approach was adopted that
to basic education (grades 1­9, in primary and         analyzed the determinants of school attainment
junior secondary school) over this period.             (years of schooling) and achievement (learning
                                                       outcomes, i.e., test scores). For some of these
data and methodology                                   determinants--notably books and buildings--
In 1988­89 the Ghana Statistical Service (GSS)         the contribution of the World Bank to better
undertook the second round of the Ghana Living         learning outcomes could then be quantified.
Standards Survey (GLSS 2). Half of the 170 areas
surveyed around the country were chosen at             The methodology adopted a theory-based
random to have an additional education module,         approach to identify the channels through which
which administered math and English tests to all       a diverse range of interventions were having
those aged 9­55 years with at least three years of     their impact. As discussed below, the qualitative
schooling and surveyed schools in the enumer-          context of the political economy of education
ation areas. Working with bothGSS and the              reform in Ghana at the time proved to be a vital
Ministry of Education, Youth and Sport (MOEYS),        piece of the story.
IEG resurveyed these same 85 communities and
their schools in 2003, applying the same survey        Findings
instruments. In the interests of comparability, the    The first major finding from the study was the
same questions were kept, although new ones            factual. Contrary to official statistics, enroll-
were added pertaining to school management, as         ments in basic education had been rising steadily
were two whole new questionnaires--a teacher           over the period. This discrepancy was readily
questionnaire for five teachers at each school         explained: in the official statistics, both the
and a local language test in addition to the math      numerator and denominator were wrong. The
and English tests. The study thus had a possibly       numerator was wrong as it relied on the adminis-
unique data set--not only could children's test        trative data from the school census, which had
scores be linked to both household and school          incomplete coverage of the public sector and did
characteristics, but this could be done in a panel     not cover the rapidly growing private sector. A

                                                                                                               105
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  constant mark-up was made to allow for private              absolute enrolments falling. But by 2000, more
                  sector enrollments, but the IEG analysis showed             than 90% of Ghanaians 15 and older had attended
                  that that had gone up fourfold (from 5% to 20%              school, compared to 75% 20 years earlier. In
                  of total enrollments) over the 15 years. The                addition, drop-out rates have fallen, so comple-
                  denominator was based on the 1984 census, with              tion rates have risen: by 2003, 92% of those
                  an assumed rate of growth that turned out to be             entering grade 1 complete JSS (grade 9). Gender
                  too high once the 2000 census became available,             disparities have been virtually eliminated in basic
                  thus underestimating enrolment growth.                      enrolments. Primary enrolments have risen in
                                                                              both disadvantaged areas and amongst the lowest
                  More strikingly still, learning outcomes have               income groups. The differential between both the
                  improved markedly: 15 years ago nearly two-thirds           poorest areas and other parts of the country, and
                  (63%) of those who had completed grades 3­6                 between enrollments of the poor and non-poor,
                  were, using the English test as a guide, illiterate.        have been narrowed but still exist.
                  By 2003 this figure had fallen to 19%. The finding
                  of improved learning outcomes flies in the face             Statistical analysis of the survey results showed
                  of qualitative data from many, though not all, key          the importance of building school infrastructure
                  informant interviews. But such key informants               based on enrollments. Building a school, and
                  display a middle class bias that persists against the       so reducing children's travel time, has a major
                  reforms that were essentially populist in nature.           impact on enrollments. Although the majority of
                                                                              children live within 20 minutes of school, some
                  Also striking are the improvements in school                20% do not, and school building has increased
                  quality revealed by the school-level data:                  enrollments among these groups. In one area
                                                                              surveyed, average travel time to the nearest
                  ˇ In 1988, fewer than half of schools could use             school was cut from nearly an hour to less than 15
                    all their classrooms when it was raining, but             minutes, with enrollments increasing from 10%
                    in 2003 over two-thirds could do so.                      to 80%. In two other areas, average travel time
                  ˇ Fifteen years ago over two-thirds of primary              was reduced by nearly 30 minutes and enroll-
                    schools reported occasional shortages of                  ments increased by more than 20%. Rehabilitat-
                    chalk. Only one in 20 does so today, with 86%             ing classrooms so that they could be used when
                    saying there is always enough.                            it is raining also positively affects enrollments.
                  ˇ The percentage of primary schools having at               Complete rehabilitation can increase enrollments
                    least one English textbook per pupil has risen            by as much as one-third. Across the country as
                    from 21% in 1988 to 72% today, and for math               a whole, the changes in infrastructure quantity
                    books in junior secondary school (JSS) these              and quality have accounted for a 4% increase
                    figures are 13% and 71%, respectively.                    in enrolments between 1988 and 2003, about
                                                                              one-third of the increase over that period. The
                  School quality has improved across the country,             World Bank has been the main source of finance
                  in poor and non-poor communities alike. But                 for these improvements. Before the first World
                  there is a growing disparity within the public              Bank program, communities were responsible
                  school sector. Increased reliance on community              for building their own schools. These structures
                  and district financing has meant that schools in            collapsed after a few years. The Bank has financed
                  relatively prosperous areas continue to enjoy               8,000 school pavilions around the country, provid-
                  better facilities than do those in less-well-off            ing more permanent structures for the school
                  communities.                                                that can better withstand the weather.

                  The IEG study argues that Ghana has been a                  Learning outcomes depend significantly on school
                  case of a quality-led quantity expansion in basic           quality, including textbook supply. Bank-financed
                  education. The education system was in crisis in            textbook provision accounts for around
                  the seventies; school quality was declining and             one-quarter of the observed improvement in test

106
                                                                      a p p E n d I x 1 3 : b a s I c E d u c at I o n I n G h a n a




scores. But other major school-level determi-         higher for children who have attended primary
nants of achievement, such as teaching methods        and JSS than for children who have not), but
and supervision of teachers by the head teacher       there is a return to cognitive achievement.
and circuit supervisor, have not been affected by     Children who attain higher test scores as a result
the Bank's interventions. The Bank has not been       of attending school can expect to enjoy higher
heavily involved in teacher training and plans to     income; but children who learn little in school
extend in-service training have not been realized.    will not reap any economic benefit.
Support to "hardware" has been shown to have
made a substantial positive contribution to both      Some policy implications
attainment and achievement. But when satisfac-        The major policy finding from the study relates
tory levels of inputs are reached--which is still     to the appropriate balance between hardware
far from the case for the many relatively deprived    and software in support for education. The latter
schools--future improvements could come from          is now stressed. But the study highlights the
focusing on what happens in the classroom.            importance of hardware: books and buildings.
However, the Bank's one main effort to change         It was also of course important that teachers
incentives--providing head teacher housing            were in their classrooms; the government's own
under the Primary School Development Project          commitment (borne out of a desire to build
in return for the head teacher signing a contract     political support in rural areas) helped ensure
on school management practices--was not a             this happened.
great success. Others, notably DFID and USAID,
have made better progress in this direction but       In the many countries and regions in which
with limited coverage.                                educational facilities are inadequate, then
                                                      hardware provision is a necessary step in
The policy context, meaning government                increasing enrollments and improving learning
commitment, was an important factor in making         outcomes. The USAID project in Ghana encour-
the Bank's contributions work. The government         ages teachers to arrange children's desks in
was committed to improving the quality of life        groups rather than rows--but many of the
in rural areas, through the provision of roads,       poorer schools don't have desks. In the words of
electricity, and schools, as a way of building a      one teacher, "I'd like to hang posters on my walls
political base. Hence there was a desire to make it   but I don't have posters. In fact, as you can see, I
work. Party loyalists were placed in key positions    don't have any walls."
to keep the reform on track, the army distributed
textbooks in support of the new curriculum in the     These same concerns underlie a second policy
early 1990s to make sure they reached schools on      implication. Central government finances
time, and efforts were made to post teachers to       teacher's salaries and little else in basic education.
new schools and make sure that they received          Other resources come from donors, districts, or
their pay on time.                                    the communities themselves. There is thus a real
                                                      danger of poorer communities falling behind,
Teachers also benefited from the large civil          as they lack both resources and the connec-
service salary increase in the run up to the 1992     tions to access external resources. The reality of
election. Better education leads to better welfare    this finding was reinforced by both qualitative
outcomes. Existing studies on Ghana show how          data--field trips to the best and worst performing
education reduces fertility and mortality. Analysis   schools in a single district in the same day--and
of IEG's survey data shows that education             the quantitative data, which show the poorer
improves nutritional outcomes, with this effect       performance of children in these disadvantaged
being particularly strong for children of women       schools. Hence children of poorer communities
living in poorer households. Regression analysis      are left behind and account for the remaining
shows there is no economic return to primary          illiterate primary graduates, which should be a
and JSS education (i.e., average earnings are not     pressing policy concern.

                                                                                                                               107
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  The study highlighted other areas of concern:               now accounts for 20% of primary enrolments
                  first, low teacher morale, manifested through               compared to 5% 15 years earlier. This is a sector
                  increased absenteeism; and second, the                      that has had limited government involvement
                  growing importance of the private sector, which             and none from the Bank.




108
                                                               APPENDIX 14: HIERARCHY OF QUASI-EXPERIMENTAL DESIGNS




                                                                                                                                         the stage of the
                                                                                                     Project                              project cycle
                                                                                                  intervention     Mid-       End of      at which each
                                                                                 Start of project (process not      term      project   evaluation design
                                                                                    (pre-test)   discrete event) evaluation (post-test)    can be used
  Quantitative Impact Evaluation design                                                     T1                                            T2                T3
  relatively robust quasi-experimental designs
  1. Pre-test/post-test non-equivalent control group                                       P1                      X                                        P2                    Start
  design with statistical matching of the two groups.                                      C1                                                               C2
  Participants are either self-selected or are selected by
  the project implementing agency. Statistical techniques
  (such as propensity score matching), drawing on high-
  quality secondary data used to match the two groups on
  a number of relevant variables.
  2. Pre-test/post-test non-equivalent control group                                       P1                      X                                        P2                    Start
  design with judgmental matching of the two                                               C1                                                               C2
  groups. Participants are either self-selected or are
  selected by the project implementing agency. Control
  areas usually selected judgmentally and subjects are
  randomly selected from within these areas.
  less robust quasi-experimental designs
  3. Pre-test/post-test comparison where the                                                                       X                      P1                P2              During project
  baseline study is not conducted until the project                                                                                       C1                C2             implementation
  has been under way for some time (most commonly                                                                                                                        (often at mid-term)
  this is around the mid-term review).
  4. Pipeline control group design. When a project is                                      P1                      X                                        P2                    Start
  implemented in phases, subjects in Phase 2 (i.e., who                                    C1                                                               C2
  will not receive benefits until some later point in time)
  can be used as the control group for Phase 1 subjects.
  5. Pre-test/post-test comparison of project group                                         P1                     X                                        P2                    Start
  combined with post-test comparison of project                                                                                                             C2
  and control group
  6. Post-test comparison of project and control                                                                   X                                        P1                    End
  groups                                                                                                                                                    C1
  non-experimental designs (the least robust)
  7. Pre-test/post-test comparison of project group                                         P1                     X                                        P2                    Start
  8. Post-test analysis of project group                                                                           X                                        P1                    End
Source: Bamberger et al. (2006).
Note: T = time; P = project participants; C = control group; P1, P2, C1, C2 = first and second observations; X = project intervention (a process rather than a discrete event).



                                                                                                                                                                                          109
                                       APPENDIX 15: INTERNATIONAL EXPERTS WHO CONTRIBUTED
                                                               TO THE SUBGROUP DOCUMENTS




Marie-Hélene Adrien: President and Senior             Masafumi Nagao: Research Professor, Center for
   Consultant, Universalia                               the Study of International Cooperation in
Paul Balogun: Consultant, Author                         Education, Hiroshima University
Michael Bamberger: Consultant, Author                 Michael Quinn Patton: Consultant, Author,
Fred Carden: Director of Evaluation Unit, IDRC           Former President of AEA
   Canada                                             Ray Pawson: Professor of Social Research Method-
Stewart Donaldson: Professor and Chair of                ology, School of Sociology and Social Policy,
   Psychology, Director of the Institute of Organi-      University of Leeds
   zational and Program Evaluation Research,          Bob Picciotto: Visiting Professor, Kings College,
   and Dean of the School of Behavioural and             London
   Organizational Sciences, Claremont Graduate        Patricia Rogers: Professor in Public Sector Evalua-
   University                                            tion, CIRCLE (Collaboration for Interdisci-
Osvaldo Feinstein: Consultant. Author, Editor            plinary Research, Consulting and Learning
Ted Freeman: Consultant and Partner, Gross               in Evaluation), Royal Melbourne Institute of
   Gilroy, Inc.                                          Technology
Sulley Gariba: Consultant, Executive Director,        Thomas Schwandt: University Distinguished
   Institute of Policy Alternatives                      Teacher/Scholar and Professor of Education,
Jennifer Greene: Professor, Educational                  University of Illinois at Urbana-Champaign
   Psychology, University of Illinois at Urbana-      Nicoletta Stame: Professor, University of Rome
   Champaign                                             "La Sapienza"
Ernie House: Emeritus Professor, School of            Bob Williams: Consultant, Author, member of the
   Education, University of Colorado                     Editorial Boards of the American Journal of
Mel Mark: Professor of Psychology, Penn State            Evaluation and New Directions in Evaluation
   University
John Mayne: Consultant, Author, Adviser on
   public sector performance




                                                                                                            111
                                                                                                                            ENDNOTES




Executive Summary                                            tions. Both replicatory and systemic effects can result
   1. Available at www.worldbank.org/ieg/nonie.              from processes of change at institutional or benefi-
   2. OECD-DAC (2002): "Glossary of Key Terms                ciary levels. With respect to the first, evaluations that
in Evaluation and Results Based Management,"                 cover replicatory effects are quite scarce. This is in dire
OECD-DAC, Paris.                                             contrast with the manifest presence of replication (and
                                                             the related concept of scaling up) as explicit objectives
Introduction                                                 in many policy interventions. For further discussion
    1. The history of impact evaluations in some             on replication, see, for example, GEF (2007). These
countries goes back many decades (Oakley, 2000).             dimensions can be addressed in a theory-based impact
    2. The Maryland Scientific Methods Scale (MSMS)          evaluation framework (see chapter 3).
is, for example, used in parts of criminology and in             8. This is the interpretation that has received
several countries (see Leeuw, 2005). RCTs are believed       the most attention in methodological guidelines of
to be the top design (level 5).                              international organizations working on impact evalua-
                                                             tion, such as the World Bank or the Asian Develop-
chapter 1                                                    ment Bank.
    1. An interesting overview of public-private partner-        9. In this context one can distinguish between the
ships and their evaluation is given by Utce Ltd. and         effect of aid modalities on "the way business is being
Japan Pfi Association (2003).                                done" (additionality of funding, direction of funding,
    2. "We probably also under-invest in evaluative          public sector performance, coherence of policy
research on types of interventions that tend to have         changes, quality of intervention design, etc.; see, e.g.,
diffused, wide-spread benefits" (Ravallion, 2008: 6). See    Lawson et al., 2005), i.e., what we call institutional-level
also Jones et al. (2008), who have identified geographi-     impact, and subsequently the impact of interventions
cal and sectoral biases in impact evaluation.                funded (in part) by general budget support, sector
    3. Complexity in terms of the nature of change           budget support, or debt relief funds at the beneficiary
processes induced by an intervention.                        level. In the latter case, we are talking about impact
    4. For example, Elbers et al. (2008) directly assess     evaluation as it is understood in most of the literature.
the impact of a set of policy variables (i.e., the equiva-
lent of a multi-stranded program) by means of a              chapter 2
regression-based evaluation approach (see chapter 4)             1. "Values inquiry refers to a variety of methods
on outcome variables.                                        that can be applied to the systematic assessment of the
    5. Though not necessarily easy to measure.               value positions surrounding the existence, activities,
    6. Please note that the two levels should not be         and outcomes of a social policy and program" (Mark
regarded as a dichotomy. In fact, a particular interven-     et al., 1999: 183).
tion might induce a "cascade" of processes of change             2. For a discussion on different dimensions of sustain-
at different institutional levels (e.g., national, provin-   ability in development intervention, see Mog (2004).
cial government, cooperatives) before finally affecting
the welfare of individuals.                                  chapter 4
    7. A third and fourth level of impact, more                 1. Economists employ several useful techniques
difficult to pinpoint, respectively refer to the replica-    for estimating the marginal impact of an extra dollar
tory impact and the wider systemic effects of interven-      invested in a particular policy intervention. See, for

                                                                                                                                 113
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  example appendix 1, second example. We consider              based approaches to impact evaluation referred to
                  these methods to be complementary to impact evalua-          earlier are structural equation models that can be used
                  tion and beyond the scope of this guidance.                  to model some of the more complex causal relation-
                      2. The larger the sample size, the more likely it is     ships that underlie interventions, using, for example,
                  that groups are equivalent, on average.                      an intervention theory as a basis.
                      3. We would like to thank Antonie de Kemp of                 14. In general, regression-based techniques (and
                  IOB for insightful suggestions. See also SG1 (2008).         quasi-experimental techniques that rely on existing
                      4. Alternative, more nuanced classifications distin-     data) are primarily constrained by the availability of
                  guish between experimental, quasi-experimental, and          existing data (see chapter 8). In contrast, experimen-
                  passive observational (correlational) research designs.      tal and quasi-experimental techniques that rely on
                  Features that distinguish one type of design from            design-based group comparisons face more pressing
                  another are (i) control over exposure to the treatment;      constraints in terms of the need for ex ante involve-
                  (ii) control over the nature of the treatment; and (iii)     ment of evaluators in a policy intervention (see
                  control over the timing and nature of measurement. In        appendix 14). Consequently, there is probably more
                  experiments one has control over i, ii, and iii; in quasi-   scope for extending the use of the former group of
                  experiments one usually controls ii and iii only; and in     techniques.
                  passive observational studies one does not have full             15. This might need to be analyzed using other
                  control over any of these features (see, e.g., Posavac and   methods (see §4.4 and chapter 5).
                  Carey, 2002; personal communication, J. Scott Bayley).           16. See appendices 7 and 8 for brief discussions on
                      5. We discuss only a selection of available methods.     additional approaches applicable to impact evaluation
                  See Shadish et al. (2002) or Mohr (1995) for additional      problems in multi-level settings.
                  (quasi-experimental and regression-based) methods.               17. However, as explained below, in some cases
                      6. It is difficult to identify general guidelines for    these methods can be articulated to quantitative
                  avoiding these problems. Evaluators have to be aware         methods of impact evaluation (see also chapter 5).
                  of the possibility of these effects affecting the validity       18. See also SG2 (2008).
                  of the design. For other problems, as well as solutions,         19. One of the methods that relies on the
                  see Shadish et al. (2002).                                   reconstruction of stakeholder perspectives is called
                      7. For further discussion on the approaches              the strategic assessment approach, also known as
                  discussed below, see appendices 3­6.                         assumptional analysis. It can be found in a series of
                      8. For an explanation, see Wooldridge (2002),            studies (Jackson, 1989) but has as its core knowledge
                  chapter 18.                                                  basis Mason and Mitroff 's (1981) book Challenging
                      9. This subsection comes largely from Bamberger          Strategic Planning Assumptions (see also Leeuw,
                  (2006).                                                      2003; see also chapter 3).
                      10. The approach is similar to a fixed-effects regres-       20. Participatory Learning and Action as a generic
                  sion model that uses deviations from individual means        approach with an associated set of methods has its
                  to deal with (unobserved) selection effects.                 origins in rapid rural appraisal and participatory rural
                      11. Although in reality one will not find such a clear   appraisal. Participatory poverty assessment processes
                  linear correlation as figure 4.2.                            have built strongly on this tradition.
                      12. With instrumental variables one may try to               21. Although particular case studies of localized
                  get rid of an expected bias, but the technique cannot        intervention activities within the sector program might
                  guarantee that endogeneity problems will be solved           be conducted in a participatory manner.
                  completely (the instrumental variable may also be                22. When addressing the attribution problem,
                  endogenous). Moreover, with weak instruments the             the role of participatory approaches is also restricted
                  precision of the estimate may be low.                        because perceptions and experiences of participants
                      13. Alternatively, impact evaluation in the case of      collected through participatory methods run the
                  complex interventions or complex processes of change         risk of making an evaluation "partnerial." In such
                  can rely on several statistical modeling approaches          a situation, the distinction between evaluator and
                  to capture the complexity of a phenomenon. For               evaluated is blurred. As policies and programs often--
                  example, an extension of reduced form regression-            implicitly or explicitly--deal with interests, incentives,


114
                                                                                                                            EndnotEs




and disincentives, this complicates the process and            4. This is an issue that is closely related to the idea
the reliability of the evaluation outcomes. (See also      of external validity. If one knows how an intervention
§8.3 for a wider discussion of data quality issues.)       affects groups of people in different ways, then one
    23. Throughout this document we have used              can more easily generalize findings to other similar
the rather generic terms "quantitative" and "qualita-      settings.
tive" methods of research/evaluation. Although we
are aware of the limitations of these concepts, we         chapter 6
have opted to use them because of their widespread              1. This step may rely on statistical methods
accepted use. In practice, often but not always, a         (meta-analysis) for analyzing and summarizing the
distinction can be made between methods of data            results of included studies, if quantitative evidence at
collection and methods of data analysis. In addition,      the level of single-intervention studies is available and
one should distinguish between the type of method          if interventions are considered similar enough.
and the scale of measurement (type of data). For
example, quantitative data (that is, data measured on      chapter 8
interval or ratio scales) can be collected using what         1. In some cases, talking about the "end" of an
are often called qualitative methods. Rather than          intervention is not applicable or is less applicable, for
spending a lot of effort on coherently separating these    example, in institutional reforms, new legislation, fiscal
issues, we decided to keep things simple for the sake      policy, etc.
of argument (and space).                                      2. For example, with secondary data sets, what do
    24. Please note that different methods rely on         we know about the quality of the data collection (e.g.,
different types of sampling or selection of units of       sampling errors, training and supervision of interview-
analysis. For example, quantitative descriptive analysis   ers) or data processing (e.g., dealing with missing values,
(preferably) relies on data based on random (simple,       weighting issues)? We cannot simply take for granted that
stratified, clustered) samples or on census data. In       a data set is free from error and bias. Lack of information
contrast, many qualitative methods rely on nonran-         on the process of generating the database inevitably
dom sampling techniques such as purposive or               constrains any subsequent data analysis efforts.
snowball sampling or do not rely on sampling at all,
as they might focus on a relatively small number of        chapter 9
observations.                                                 1. An example from Europe stresses this point. In
    25. Appendix 9 presents a list of qualitative          some situations, educational evaluators of the Danish
methodological frameworks that combine several             Evaluation Institute discussed their reports with up
qualitative (and occasionally quantitative) methods        to 20-plus stakeholders before the report was cleared
for the purposes of evaluating the effects of an           and published (Leeuw, 2003).
intervention (see also chapter 5 on combining                 2. For a broader discussion on ethics in evaluation,
methods).                                                  see Simons (2006).

chapter 5                                                  Appendix 2
    1. This dimension is only addressed by quantitative       1. The text is a literal citation of Scriven (2008:
impact evaluation techniques.                              21­22).
    2. The most commonly used term is mixed methods
(see for example Tashakkori and Teddlie, 2003). In the     Appendix 4
case of development research and evaluation, see               1. In traditional usage, a variable is endogenous if it is
Bamberger (2000) and Kanbur (2003).                        determined within the context of a model. In economet-
    3. This is true for the broad interpretation of the    rics, it is used to describe any situation in which an
concept of triangulation as used by Mikkelsen (2005).      explanatory variable is correlated with the disturbance
Other authors use the concept in a more restrictive        term. Endogeneity arises as a result of omitted variables,
way (e.g., Bamberger [2000] uses triangulation in the      measurement error, or in situations where one of the
more narrow sense of validating findings by looking at     explanatory variables is determined along with the
different data sources).                                   dependent variable (Wooldridge, 2002: 50).

                                                                                                                                 115
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                     2. The approach is similar to a fixed-effects                7. A contract between SOGUIPAH and the farmer
                  regression model, using deviations from individual          binds the farmer to reimburse the cost of the planta-
                  means.                                                      tion and deliver his production to SOGUIPAH.
                                                                                  8. AGROPARISTECH is a member of the Paris
                  Appendix 5                                                  Institute of Technology, which is a consortium of 10
                     1. For further examples see White (2006).                of the foremost French Graduate Institutes in Science
                                                                              and Engineering. AGROPARISTECH is a leader Institute
                  Appendix 9                                                  in life sciences and engineering.
                     1. Source: SG2 (2008).                                       9. Source: SG2 (2008).
                                                                                  10. The GEF Evaluation Office section of the GEF
                  Appendix 11                                                 website contains the 11 papers produced by the impact
                      1. This case study is drawn from the 2002 report        evaluation in 2007, under the heading of "ongoing
                  published by the Ministry of Foreign Affairs, Denmark       evaluations."
                  (SG2, 2008).                                                    11. Instrument for the Establishment of the
                      2. Source: SG2 (2008).                                  Restructured Global Environment Facility.
                      3. Typical problems with recall methods are that            12. GEF Evaluation Office, "Approach Paper to
                  of incorrect recalling and telescoping, i.e., projecting    Impact Evaluation," February 2006.
                  backward or forward onto an event: for example, the             13. See the Preamble, "Instrument for the
                  purchase of a durable good that took place seven years      Establishment of the Restructured Global Environ-
                  ago (before the project started) could be projected to      ment Facility."
                  four years ago, during project implementation (see,             14. This is based on Nature Conservancy's conser-
                  e.g., Bamberger et al., 2004).                              vation action planning methodology.
                      4. Source: SG2 (2008).                                      15. Full case study at http://www.thegef.org/
                      5. The second project was inland valley develop-        uploadedFiles/Evaluation_Office/Ongoing_Evalua-
                  ment for irrigated rice cultivation and is not presented    tions/Ongoing_Evals-Impact-8Case_Study_Lewa.pdf.
                  here.
                      6. Industrial plantations are the property of           Appendix 13
                  SOGUIPAH and are worked by salaried employees.                  1. White (2006).




116
                                                                                                     REFERENCES




ADB (2006) Impact Evaluation--Methodologi-             Working Paper 172, Overseas Development
  cal and Operational Issues, Economics and            Institute, London.
  Research Department, Asian Development            Bourguignon, F., and M. Sundberg (2007)
  Bank, Manila.                                        "Aid effectiveness, opening the black box",
Agresti, A., and B. Finlay (1997) Statistical          American Economic Review 97(2), 316­321.
  Methods for the Social Sciences, Prentice         Brinkerhoff, R. (2003) The Success Case Method,
  Hall, New Jersey.                                    Berrett Koehler, San Francisco.
Baker, J.L. (2000) Evaluating the Impact of         Bryman, A. (2006) "Integrating quantitative and
  Development Projects on Poverty, The World           qualitative research: How is it done?" Qualita-
  Bank, Washington, D.C.                               tive Research 6(1), 97­113.
Bamberger, M. (2000) "Opportunities and             Bunge, M. (2004) "How Does It Work? The Search
  challenges for integrating quantitative and          for Explanatory Mechanisms", Philosophy of
  qualitative research", in: M. Bamberger (ed.)        the Social Sciences 34(2), 182­210.
  Integrating Quantitative and Qualitative          Campbell, D.T. (1957) "Factors relevant to the
  Research in Development Projects, World              validity of experiments in social settings",
  Bank, Washington, D.C.                               Psychological Bulletin 54, 297­312.
Bamberger, M. (2006) Conducting Quality Impact      Campbell, D.T., and J.C. Stanley (1963) "Experi-
  Evaluations under Budget, Time and Data              mental and quasi-experimental designs for
  Constraints, World Bank, Washington, D.C.            research on teaching", in: N. L. Gage (ed.)
Bamberger, M., J. Rugh, M. Church and L. Fort          Handbook of Research on Teaching, Rand
  (2004) "Shoestring Evaluation: Designing             McNally, Chicago.
  Impact Evaluations under Budget, Time and         Carvalho, S., and H. White (2004) "Theory-based
  Data Constraints", American Journal of               evaluation: The case of social funds", American
  Evaluation 25(1), 5­37.                              Journal of Evaluation 25(2), 141­160.
Bamberger, M., J. Rugh and L. Mabry (2006)          Casley, D.J., and D.A. Lury (1987) Data Collec-
  Real-World Evaluation Working Under                  tion in Developing Countries, Oxford Univer-
  Budget, Time, Data, and Political                    sity Press, New York.
  Constraints, Sage Publications, Thousand          CGD (2006) When Will We Ever Learn? Improv-
  Oaks, CA.                                            ing Lives through Impact Evaluation, Report
Bamberger, M., and H. White (2007) "Using              of the Evaluation Gap Working Group, Center
  strong evaluation designs in developing              for Global Development, Washington, DC.
  countries: Experience and challenges",            Chambers, R. (1995) "Paradigm Shifts and
  Journal of Multidisciplinary Evaluation              the Practice of Participatory Research and
  4(8), 58­73.                                         Development", in: S. Wright and N. Nelson
Bemelmans-Videc, M.L., and R.C. Rist (eds.)            (eds.) Power and Participatory Develop-
  (1998) Carrots, Sticks and Sermons: Policy           ment: Theory and Practice, Intermediate
  Instruments and their Evaluation, Transac-           Technology Publications, London.
  tion Publishers, New Brunswick.                   Clarke, A. (2006) "Evidence-Based Evaluation in
Booth, D., and H. Lucas, H. (2002) "Good Practice      Different Professional Domains: Similarities,
  in the Development of PRSP Indicators",              Differences and Challenges", in: I.F. Shaw,

                                                                                                            117
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                    J.C. Greene and M.M. Mark (eds.) The SAGE                    effectiveness", in: G.K. Pitman, O.N. Feinstein
                    Handbook of Evaluation, Sage Publications,                   and G.K. Ingram (eds.) Evaluating Develop-
                    London.                                                      ment Effectiveness, Transaction Publishers,
                  Coleman, J.S. (1990) Foundations of Social                     New Brunswick.
                    Theory, Belknap Press, Cambridge.                                        .
                                                                              Elbers, C., J.W Gunning and K. De Hoop (2008)
                  Cook, T.D. (2000) "The false choice between                    "Assessing sector-wide programs with statistical
                    theory-based evaluation and experimenta-                     impact evaluation: a methodological proposal",
                    tion", in: P.J. Rogers, T.A. Hacsi, A. Petrosino             World Development 37(2), 513­520.
                    and T.A. Huebner (eds.) (2000) Program                    Elster, J. (1989) Nuts and Bolts for the Social
                    Theory in Evaluation: Challenges and                         Sciences, Cambridge University Press,
                    Opportunities, New Directions for Evalua-                    Cambridge.
                    tion, 87, Jossey-Bass, San Francisco.                     Elster, J. (2007) Explaining Social Behavior--
                  Cook, T.D., and D.T. Campbell (1979) Quasi-                    More Nuts and Bolts for the Social Sciences,
                    Experimentation: Design and Analysis for                     Cambridge University Press, Cambridge.
                    Field Settings, Rand McNally, Chicago.                    Farnsworth, W. (2007) The Legal Analyst--A
                  Cooke, B. (2001) "The Social Psychological Limits              Toolkit for Thinking about the Law, Univer-
                    of Participation?" in: B. Cooke and U. Kothari               sity of Chicago Press, Chicago.
                    (eds.) Participation: The New Tyranny?, Zed               GEF (2007) "Evaluation of the Catalytic Role of
                    Books, London.                                               the GEF", Approach Paper, GEF Evaluation
                  Connell, J.P., A.C. Kubisch, L.B. Schorr and                   Office, Washington, D.C.
                    C.H. Weiss (eds.) (1995) New Approaches                   Gittinger, J.P. (1982) Economic Analysis of
                    to Evaluating Community Initiatives, The                     Agricultural Projects, Johns Hopkins Univer-
                    Aspen Institute, Washington, D.C.                            sity Press, Baltimore.
                  Cousins, J.B., and E. Whitmore (1998) "Framing              Greene, J.C. (2006) "Evaluation, democracy and
                    Participatory Evaluation", in: E. Whitmore                   social change", in: I.F. Shaw, J.C. Greene and
                    (ed.) Understanding and Practicing Partici-                  M.M. Mark (eds.) The SAGE Handbook of
                    patory Evaluation, New Directions for                        Evaluation, Sage Publications, London.
                    Evaluation 80, Jossey-Bass, San Francisco.                                                            .
                                                                              Greenhalgh, T., G. Robert, F. Macfarlane, P Bate
                  Davies, R., and J. Dart (2005) The `Most Signifi-              and O. Kyriakidou (2004) "Diffusion of Innova-
                    cant Change' Technique, http://www.mande.                    tions in Service Organizations: Systematic
                    co.uk/docs/MSCGuide.pdf (last consulted                      Review and Recommendations", The Milbank
                    May 12, 2009).                                               Quarterly 82(1), 581­629.
                  Deaton, A. (2005) "Some remarks on randomiza-               Hair, J.F., B. Black, B. Babin, R.E. Anderson
                    tion, econometrics and data", in: G.K. Pitman,               and R.L. Tatham (2005) Multivariate Data
                    O.N. Feinstein and G.K. Ingram (eds.) Evaluat-               Analysis, Prentice Hall, New Jersey.
                    ing Development Effectiveness, Transaction                Hansen, H.F., and Rieper, O. (2009) "Institution-
                    Publishers, New Brunswick, NJ.                               alization of second-order evidence producing
                  Dehejia, R. (1999) "Evaluation in multi-site                   organizations", in: O. Rieper, F.L. Leeuw and
                    programs", Working paper, Columbia Univer-                   T. Ling (eds.) The Evidence Book: Concepts,
                    sity and NBER, http://emlab.berkeley.edu/                    Generation and Use of Evidence, Transaction
                    symposia/nsf99/papers/dehejia.pdf (last                      Publishers, New Brunswick.
                    consulted January 12, 2009).                                           .
                                                                              Hedström, P (2005) Dissecting the Social: On the
                  De Leeuw, E.D., J.J. Hox and D.A. Dillman (eds.)               Principles of Analytical Sociology, Cambridge
                    (2008) International Handbook of Survey                      University Press, Cambridge.
                    Methodology, Lawrence Erlbaum Associates,                 Hedström, P., and R. Swedberg (1998) Social
                    London.                                                      Mechanisms: An Analytical Approach to
                  Duflo, E. and M. Kremer (2005) "Use of random-                 Social Theory, Cambridge University Press,
                    ization in the evaluation of development                     Cambridge.


118
                                                                                                           rEfErEncEs




Henry, G.T. (2002) "Choosing Criteria to Judge           Support Work? Evidence from Tanzania,
   Program Success--A Values Inquiry", Evalua-           Overseas Development Institute, London.
   tion 8(2), 182­204.                                Leeuw, F.L. (2003) "Reconstructing Program
House, E. (2008) "Blowback: Consequences of              Theories: Methods Available and Problems to
   Evaluation for Evaluation", American Journal          be Solved", American Journal of Evaluation
   of Evaluation 29(4), 416­426.                         24(1), 5­20.
IDRC (2001) Outcome Mapping: Building                 Leeuw, F.L. (2005) "Trends and Developments in
   Learning and Reflection into Develop-                 Program Evaluation in General and Criminal
   ment Programs, International Development              Justice Programs in Particular", European
   Research Centre (IDRC), Ottawa.                       Journal on Criminal Policy and Research
IEG (2005) "OED and Impact Evaluation: A                 11, 18­35.
   Discussion Note," Operations Evaluation            Leeuw, F.L., and J.E. Furubo (2008) "Evaluation
   Department, World Bank, Washington, D.C.              Systems ­ What Are They and Why Study
IFAD (2002) Managing for Impact in Rural                 Them?", Evaluation 14(2), 157­169.
   Development: A Practical Guide for M&E,            Levinsohn, J., S. Berry, and J. Friedman (1999)
   IFAD, Rome.                                           "Impacts of the Indonesian Economic Crisis:
Jackson, M. C. (1989) "Assumptional analysis",           Price Changes and the Poor", Working Paper
   Systems Practice 14, 11­28.                           7194, National Bureau of Economic Research,
Jerve, A.M., and E. Villanger (2008) The challenge       Cambridge.
   of Assessing Aid Impact: A Review of               Lipsey, M.W. (1993) "Theory as Method: Small
   Norwegian Practice, Study commissioned by             Theories of Treatments," in: L.B. Sechrest and
   NORAD, Chr. Michelsen Institute, Bergen.              A.G. Scott (eds.), Understanding Causes and
Jones, N., C. Walsh, H. Jones and C. Tincati (2008)      Generalizing about Them, New Directions
   Improving Impact Evaluation Coordina-                 for Program Evaluation 57, Jossey-Bass, San
   tion and Uptake--A Scoping Study Commis-              Franscisco.
   sioned by the DFID Evaluation Department           Lister, S., and R. Carter (2006) Evaluation of
   on Behalf of NONIE, Overseas Development              General Budget Support: Synthesis Report,
   Institute, London.                                    Joint evaluation of general budget support
Kanbur, R. (ed.) (2003) Q-Squared: Combining             1994­2004, Department for International
   Qualitative and Quantitative Methods in               Development, University of Birmingham.
   Poverty Appraisal, Permanent Black, Delhi.         Maluccio, J. A., and R. Flores (2005) "Impact
Kellogg Foundation (1991) Information on                 evaluation of conditional cash transfer
   Cluster Evaluation, Kellogg Foundation,               program: The Nicaraguan Red de Protección
   Battle Creek.                                         Social", International Food Policy Research
Kraemer, H.C. (2000) "Pitfalls of Multisite              Institute, Washington, D.C.
   Randomized Clinical Trials of Efficacy and                            .
                                                      Mansuri, G., and V Rao (2004) Community-Based
   Effectiveness." Schizophrenia Bulletin 26,            and -Driven Development: A Critical Review, The
   533­541.                                              World Bank Research Observer 19(1), 1­39.
                   .
Kruisbergen, E.W (2005) Voorlichting: doen of         Mark, M.M., G.T. Henry and G. Julnes (1999)
   laten? Theorie van afschrikwekkende voorlich-         "Toward an Integrative Framework for Evalua-
   tingscampagnes toegepast op de casus van              tion Practice", American Journal of Evalua-
   bolletjesslikkers, Beleidswetenschap 19(3),           tion 20, 177­198.
   3­1.                                               Mason, I., and I. Mitroff (1981) Challenging
Kusek, J., and R.C. Rist (2004) Ten Steps to a           Strategic Planning Assumptions, Wiley, New
   Results-Based Monitoring and Evaluation               York.
   System: A Handbook for Development Practi-         Mayne, J. (2001) "Addressing Attribution through
   tioners, World Bank, Washington, D.C.                 Contribution Analysis: Using Performance
Lawson, A., D. Booth, M. Msuya, S. Wangwe and            Measures Sensibly", Canadian Journal of
   T. Williamson (2005) Does General Budget              Program Evaluation 16(1), 1­24.

                                                                                                                  119
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                  Mayntz, R. (2004) "Mechanisms in the Analysis of               A. Killoran, M. Kelly, C. Swann, L. Taylor,
                     Social Macro-phenomena", Philosophy of the                  L. Milward and S. Ellis (eds.) Evidence-Based
                     Social Sciences 34(2), 237­259.                             Public Health, Oxford University Press, Oxford.
                  McClintock, C. (1990) "Administrators as applied            Pawson, R. (2006) Evidence-Based Policy:
                     theorists", in: L. Bickman (ed.) Advances in                A Realist Perspective, Sage Publications,
                     Program Theory, New Directions for Evalua-                  London.
                     tion, Jossey-Bass, San Francisco.                        Pawson, R., and N. Tilley (1997) Realistic Evalua-
                  Mikkelsen, B. (2005) Methods for Develop-                      tion, Sage Publications, Thousand Oaks, CA.
                     ment Work and Research, Sage Publications,               Perloff, R. (2003) "A potpourri of cursory thoughts
                     Thousand Oaks, CA.                                          on evaluation", Industrial-Organizational
                  Mog, J.M. (2004) "Struggling with sustainabil-                 Psychologist 40(3), 52­54.
                     ity: A comparative framework for evaluating              Picciotto, R. (2004) "The value of evaluation
                     sustainable development programs", World                    standards: A comparative assessment" Paper
                     Development 32(12), 2139­2160.                              presented at the European Evaluation Society's
                  Mohr, L.B. (1995) Impact Analysis for Program                  6th biennial Conference on Democracy and
                     Evaluation, Sage Publications, Newbury Park,                Evaluation, Berlin.
                     CA.                                                      Picciotto, R., and E. Wiesner (eds.) (1997) Evalua-
                  Morgan, S.L., and C. Winship (2007) Counterfac-                tion and Development: The Institutional
                     tuals and Causal Inference--Methods and                     Dimension, World Bank Series on Evaluation
                     Principles for Social Research, Cambridge                   and Development, Transaction Publishers,
                     University Press, Cambridge.                                New Brunswick.
                  Mukherjee, C., H. White and M. Wuyts (1998)                 Pollitt, C. (1999) "Stunted by stakeholders? Limits
                     Econometrics and Data Analysis for Develop-                 to collaborative evaluation", Public Policy and
                     ing Countries, Routledge, London.                           Administration 14 (2), 77­90.
                  North, D.C. (1990) Institutions, Institutional              Posavac, E.J., and R.G. Carey (2002) Program
                     Change and Economic Performance,                            Evaluation: Methods and Case Studies,
                     Cambridge University Press, New York.                       Prentice Hall, Englewood Cliffs, NJ.
                  Oakley, A. (2000) Experiments in Knowing:                   Pretty, J.N., I. Guijt, J. Thompson and I. Scoones
                     Gender and Method in the Social Sciences,                   (1995) A Trainers' Guide to Participatory
                     Polity Press, Cambridge.                                    Learning and Action, IIED Participatory
                  OECD-DAC (2000) Effective Practices in Conduct-                Methodology Series, IIED, London.
                     ing a Multi-donor Evaluation, OECD-DAC,                  Ravallion, M. (2008) "Evaluation in the practice
                     Paris.                                                      of development", Policy Research Working
                  OECD-DAC (2002) Glossary of Key Terms in                       Paper 4547, World Bank, Washington, D.C.
                     Evaluation and Results Based Management,                 Rieper, O., F.L. Leeuw and T. Ling (eds.) (2009)
                     OECD-DAC, Paris.                                            The Evidence Book: Concepts, Generation
                  Oliver, S., A. Harden, R. Rees, J. Shepherd, G.                and Use of Evidence, Transaction Publishers,
                     Brunton, J. Garcia and A. Oakley (2005) "An                 New Brunswick, NJ.
                     Emerging Framework for Including Different               Rist, R., and N. Stame (eds.) (2006) From Studies
                     Types of Evidence in Systematic Reviews for                 to Streams--Managing Evaluative Systems,
                     Public Policy", Evaluation 11(4), 428­446.                  Transaction Publishers, New Brunswick.
                  Patton, M.Q. (2002) Qualitative Research and                Robilliard, A.S., F. Bourguignon and S. Robinson
                     Evaluation Methods, Sage Publications,                      (2001) "Crisis and Income Distribution:
                     Thousand Oaks, CA.                                          A Micro-Macro Model for Indonesia", Interna-
                  Pawson, R. (2002) "Evidence-based Policy: The                  tional Food Policy Research Institute, Washing-
                     Promise of `Realist Synthesis'", Evaluation                 ton, D.C.
                     8(3), 340­358.                                           Roche, C. (1999) Impact Assessment for Develop-
                  Pawson, R. (2005) "Simple Principles for The                   ment Agencies: Learning to Value Change,
                     Evaluation of Complex Programmes", in:                      Oxfam, Oxford.

120
                                                                                                           rEfErEncEs




Rogers, P. J. (2008) "Using programme theory          Simons, H. (2006) "Ethics in evaluation", in: I.F.
   for complex and complicated programs",                Shaw, J.C. Greene and M.M. Mark (eds.) The
   Evaluation 14(1), 29­48.                              SAGE Handbook of Evaluation, Sage Publica-
Rogers, P.J., T.A. Hacsi, A. Petrosino, and T.A.         tions London.
   Huebner (eds.) (2000) Program Theory in            Snijders, T., and R. Bosker (1999) Multilevel
   Evaluation: Challenges and Opportunities,             Analysis: An Introduction to Basic and
   New directions for evaluation 87, Jossey-Bass,        Advanced Multilevel Modeling, Sage Publica-
   San Francisco.                                        tions, London.
Rosenbaum, P.R., and D.B. Rubin (1983) "The           Späth, B. (2004) Current State of the Art in
   central role of the propensity score in observa-      Impact Assessment: With a Special View on
   tional studies for causal effects", Biometrika        Small Enterprise Development, Report for
   70, 41­55.                                            SDC.
Rossi, P          .
        .H., M.W Lipsey, and H.E. Freeman (2004)      Straw, R.B., and J.M. Herrell (2002) "A Framework
   Evaluation: A Systematic Approach, Sage               for understanding and improving Multisite
   Publications, Thousand Oaks, CA.                      Evaluations", in: J.M. Herrell and R.B. Straw
Salamon, L. (1981) "Rethinking public manage-            (eds.), Conducting Multiple Site Evaluations
   ment: Third party government and the                  in Real-World Settings, New Directions for
   changing forms of government action", Public          Evaluation 94, Jossey-Bass, San Francisco.
   Policy 29(3), 255­275.                             Swedberg, R. (2005) Principles of Economic
Salmen, L., and E. Kane (2006) Bridging Diversity:       Sociology, Princeton University Press, Prince-
   Participatory Learning for Responsive                 ton, NJ.
   Development, World Bank, Washington, D.C.          Tashakkori, A., and C. Teddlie (eds.) (2003)
Scriven, M. (1976) "Maximizing the Power of              Handbook of Mixed Methods in Social and
   Causal Investigations: The Modus Operandi             Behavioral Research, Sage Publications,
   Method", in: G. V. Glass (ed.) Evaluation             Thousand Oaks, CA.
   Studies Review Annual, Vol. 1, Sage Publica-       Trochim, W.M.K. (1989) "An introduction to
   tions, Beverly Hills, CA.                             concept mapping for planning and evalua-
Scriven, M. (1998) "Minimalist theory: The least         tion", Evaluation and Program Planning
   theory that practice requires", American              12, 1­16.
   Journal of Evaluation 19(1), 57­70.                Tukey, J.W. (1977) Exploratory Data Analysis,
Scriven, M. (2008) "Summative Evaluation of              Addison-Wesley, Reading, PA.
   RCT Methodology: An Alternative Approach           Turpin, R.S., and J.M. Sinacore (eds.) (1991)
   to Causal Research", Journal of Multidisci-           Multisite Evaluations. New Directions for
   plinary Evaluation 5(9), 11­24.                       Evaluation 50, Jossey-Bass, San Francisco.
SG1 (2008) NONIE: Impact Evaluation                   Utce Ltd & Japan Pfi Association (2003) Impact
   Guidance--Sections 1 and 2, Subgroup 1,               Evaluation Study on Public-Private Partner-
   Network of Networks on Impact Evaluation.             ships: The Case of Angat Water Supply
SG2 (2008) NONIE Impact Evaluation Guidance,             Optimization Project and the Metropolitan
   Subgroup 2, Network of Networks on Impact             Waterworks and Sewerage System, Republic
   Evaluation.                                           of the Philippines.
Shadish, W. R., T.D. Cook and D.T. Campbell           Vaessen, J., and J. De Groot (2004) "Evaluat-
   (2002) Experimental and Quasiexperimen-               ing Training Projects on Low External Input
   tal Designs for Generalized Causal Inference,         Agriculture: Lessons from Guatemala",
   Houghton Mifflin Company, Boston.                     Agricultural Research & Extension Network
Sherman, L.W., D.C. Gottfredson, D.L. MacKen-            Papers 139, Overseas Development Institute,
                 .
   zie, J. Eck, P Reuter and S.D. Bushway (1998)         London.
   "Preventing crime: What works, what doesn't,       Vaessen, J., and D. Todd (2008) "Methodologi-
   what's promising", National Institute of Justice      cal challenges of evaluating the impact of
   Research Brief, July 1998, Washington, D.C.           the Global Environment Facility's biodiver-

                                                                                                                  121
I m pa c t E va l u at I o n s a n d d E v E l o p m E n t ­ n o n I E G u I d a n c E o n I m pa c t E va l u at I o n




                     sity program", Evaluation and Program                       1, International Initiative for Impact Evalua-
                     Planning 31(3), 231­240.                                    tion, New Delhi.
                  Van der Knaap, L.M., F.L. Leeuw, S. Bogaerts                White, H., and G. Dijkstra (2003) Programme
                     and L.T.J. Nijssen (2008) "Combining                        Aid and Development: Beyond Conditional-
                     Campbell standards and the realist evaluation               ity, Routledge, London.
                     approach--the best of two worlds?" American              Whitmore. E (1991) "Evaluation and empower-
                     Journal of Evaluation 29(1), 48­57.                         ment: it's the process that counts", Empower-
                  Van De Walle, D., and D. Cratty (2005) "Do Donors              ment and Family Support Networking
                     Get What They Paid For? Micro Evidence on                   Bulletin, 2(2), 1­7.
                     the Fungibility of Development Project Aid",             Wholey, J.S. (1987) "Evaluability Assessment:
                     World Bank Policy Research Working Paper                    Developing Program Theory", in: L. Bickman
                     3542, World Bank, Washington, D.C.                          (ed.) Using Program Theory in Evaluation,
                  Vedung, E. (1998) "Policy instruments: Typologies              New Directions for Program Evaluation,
                     and theories", In: M. L. Bemelmans- Videc, and              Jossey-Bass, San Francisco.
                     R. C. Rist (eds.), Carrots, Sticks and Sermons:          Wooldridge, J.M. (2002) Econometric Analysis
                     Policy Instruments and their Evaluation,                    of Cross Section and Panel Data, The MIT
                     Transaction Publishers, New Brunswick.                      Press, Cambridge.
                  Webb, E.J., D.T. Campbell, R.D. Schwartz and L.             World Bank (2003) A User's Guide to Poverty and
                     Sechrest (2000) Unobtrusive measures, Sage                  Social Impact Analysis, Poverty Reduction
                     Publications, Thousand Oaks, CA.                            Group and Social Development Department,
                  Weiss, C.H. (1998) Evaluation--Methods for                     World Bank, Washington, D.C.
                     Studying Programs and Policies, Prentice                 Worrall, J. (2002) "What evidence in evidence-
                     Hall, New Jersey.                                           based medicine?" Philosophy of Science, 69,
                  Welsh, B., and D.P. Farrington (eds.) (2006)                   316­330.
                     Preventing crime: What Works for Children,               Worrall, J. (2007) "Why there's no cause to
                     Offenders, Victims and Places, Springer,                    randomize", The British Journal for the
                     Berlin.                                                     Philosophy of Science 58(3), 451­488.
                  White, H. (2002) "Combining quantitative and                Worthen, B.R., and C.C. Schmitz (1997) "Concep-
                     qualitative approaches in poverty analysis",                tual Challenges Confronting Cluster Evalua-
                     World Development 30(3), 511­522.                           tion." Evaluation 3(3), 300­319.
                  White, H. (2006) Impact Evaluation Experience               Yang, H., J. Shen, H. Cao and C. Warfield (2004)
                     of the Independent Evaluation Group of the                  "Multilevel Evaluation Alignment: An Explica-
                     World Bank, World Bank, Washington, D.C.                    tion of a Four-Step Model", American Journal
                  White, H. (2009) "Some Reflection on Current                   of Evaluation 25(4), 493­507.
                     Debates in Impact Evaluation", Working Paper




122
ISBN 978-1-60244-120-0