58349 THE WORLD BANK Research Observer EDITOR Emmanuel Jimenez. World Bank CO-EDITOR Gershon Feder, World Bank EDITORIAL BOARD Barry Eichengreen, University of California-Berkeley Jeffrey S. Hammer. Princeton University Ravi Kanbur, Cornell University Howard Pack, University of Pennsylvania Ana L. Revenga. World Bank Luis Serven, World Bank Sudhir Shetty. World Bank The World Bank Research Observer is intended for anyone who has a professional interest in development. Observer articles are written to be accessible to nonspecialist readers; con tributors examine key issues in development economics, survey the literature and the lat est World Bank research, and debate issues of development policy. Articles are reviewed by an editorial board drawn from across the Bank and the international community of econo mists. Inconsistency with Bank policy is not grounds for rejection. The journal welcomes editorial comments and responses, which will be considered for pub lication to the extent that space permits. On occasion the Observer considers unsolicited contributions. Any reader interested in preparing such an article is invited to submit a proposal of not more than two pages to the Editor. Please direct all editorial correspon dence to the Editor, The World Bank Research Observer, 1818 H Street, NW, Washington. DC 20433, USA. The views and interpretations expressed in this journal are those of the authors and do not necessarily represent the views and policies of the World Bank or of its Executive Directors or the countries they represent. The World Bank does not guarantee the accuracy of data included in this publication and accepts no responsibility whatsoever for any consequences of their use. When maps are used, the boundaries, denominations, and other information do not imply on the part of the World Bank Group any judgment on the legal status of any territory or the endorsement or acceptance of such boundaries. For more information, pLease visit the Web sites of the Research Observer at www.wbro.oxfordjournals.org, the World Bank at www.worldbank.org, and Oxford University Press at www.oxfordjournals.org. THE WORLD BANK Research Observer Volume 24 · Number 1 · February 2009 Rural Poverty: Old ChaLLenges in New Contexts Stefan Dercon 1 Symposium on Evaluation Evaluation in the Practice of Development Martin Ravallion 29 Tim'ing and Duration of Exposure in Evaluations of Social Programs Elizabeth M. King and Jere R. Behrman 55 Symposium on Financial Sector Competition in the Financial Sector: Overview of Competition Policies Stijn Claessens 83 Access to Financial Services: Measurement, Impact, and Policies Thorsten Beck, AsH Demirguc;-Kunt. and Patrick Honohan 119 Subscriptions A subscription to The World Rank Research Observer (ISSN 0257-3032) comprises 2 issues. Prices include postage; for subscribers outside the Americas, issues are sent air freight. Annual Subscription Rate (Volume 24, 2 issues, 2009) Academic !ibmr;es Print edition and site-wide online access; US$151/flOl/151 Print edition only; US$143/t:95/ 1 43 Site-wide online access only: US$l431£95/ 1 43 Corporate Print edition and site-wide online access; US$227/t:l51/ 2 27 Print edition only: CS$216/t:144! 2 16 Site-wide online access only: US$216it:144i 2 16 Personal Print edition and individual online access: US$52!t:3 5! 5 2 Please note: US$ rate applies to US & Canada, Euros applies to Europe. EK£ applies to UK and Rest of World. Readers with mailing addresses in IlOn-OECD countries and in sOcialist economies in tmnsition are eligible to receive complimentary subscriptions on request by writing to tlze UK address belo,,~ There may be other subscription rates available: for a complete listing. please visit www.wbro.oxfordjournals.org/subscriptions. Full pre-payment in the correct currency is required for all orders. Payment should be in US dollars for orders being delivered to the lISA or Canada: Euros for orders being delivered within Europe (excluding the UK): GSP sterling for orders being delivered elsewhere (Le. not being delivered to US."', Canada, or Europe). All orders should be accompanied by full payment and sent to your nearest Oxford Journals office. Subscriptions are accepted for complete volumes only. Orders are regarded as firm, and payments are not refundable. Our prices include Standard Air as postage outside of the UK. Claims must be notified within four months of despatch/order date (whichever is later). Subscriptions in the EEC may be subject to European VAT. [f registered, please supply details to avoid unnecessary charges. For subscriptions that include online versions. a proportion of the subscription price may be subject to UK \!\'T. Subscribers in Canada. please add CST to the prices quoted. Personal rate subscriptions are only available if payment is made by personal cheque or credit card, delivery is to a private address, and is for personal use only Back issues: The current year and two previous years' issues are available from Oxford Cniversity Press. Previous volumes can be obtained from the Periodicals Service Company, 11 Main Street. Germantown, NY 12526. USA. E-mail: psc@periodicals.com. Tel: (518) 537-4700. Pax: (518) 537-5899. Contact information: Journals Customer S"rvice Department. Oxford University Press, Great Clarendon Street. Oxford OX2 6DP, UK. E-mail: inls.cusLserv@oxfordiournals.org. Tel: +44 (0)186'5353907. Pax: +44 (0)1865353485. In the Americas. please contact: journals Customer Service Department, Oxford University Press. 2001 Evans Road, Cary, NC 27513. \JSA. E-mail: jnlorders@oxfordjournals.org. Tel: (800) 852-7323 (toll-free in USA/Canada) or (919) 677-0977. Pax: (919) 677-1714. In Japan, please contact: journals Customer Service Department, Oxford University Press. 4-5-10-81' Shiba. Minato-ku. Tokyo. 108-8386, japan. E-mail: custserv.jp@oxfordjournals.org. Tel: -'-81 3 5444 5858. Fax: +81 3 34542929. Postal information: The World Bank Research Obsen'er (ISSN 0257-3032) is published twic" a year. in Pebruary and August, by Oxford jjniversity Press for the International Bank for Reconstruction and Development/THE WORill HANK. Postmaster: send address changes to The World Bank Research Observer. Journals Customer Service Department. Oxford University Press, 2001 Evans Road. Cary. NC 27513-2009. Periodicals postage paid at Cary. NC and at additional mailing offices. Communications regarding original articles and editorial management should be addressed to The Editor. The WiJrld Bank Research Observer. The World Bank. 1818 H Street, NW. VVashington, D.c. 20433, USA. Oxford Journals Environmental and Ethical Policies: Oxford Journals is committed to working with the global community to bring the highest quality research to the widest possible audience. Oxford Journals will protect the environment by implementing environmentally friendly poliCies and practices wherever possible. Please see http://www.oxfordjournals.org! ethicalpolicies.html for further information on Oxford Journals' environmental and ethical policies. Digital Object Identifiers: For information on dois and to resolve them, please visit www.doi.org. Permissions: For information on how to request permissions to reproduce articles or information from lhis journal. please visit www.oxfordjournals.org/jnls!permissions. Advertising: Advertising, inserts. and artwork enquiries should be addressed to Advertising and Special Sales. Oxford Journals. Oxford University Press, Great Clarendon Street. Oxford. OX2 6DP, UK. Tel: +44 (O)186S 354767; tax: +44(0)1865 353774: E-mail: jnlsadvertising@oxfordjournals.org. Disclaimer: Statements of fact and opinion in the articles in The World Bank Research Observer are those of the respective authors and contributors and not of the International Bank for Reconstruction and Development/THE WORill RANK or Oxford University Press. Neither Oxford University Press nor the lntcrnational Bank for Reconstruction and Development/THE WORLD RANK make any representation. express or implied, in respect of the accuracy of the material in this journal and cannot accept any legal responsibility or liability for any errors or omissions that may be made. The reader should make her or his own evaluation as to the appropriateness or otherwise of any experimental technique described. Paper used: The World Bank Reselm'h Observer is printed on add-free paper that meets the minimum requirements of ANSI Standard 239.48-1984 (Permanence of Paper). Indexing and abstracting: The WorM Balik ReSl'arch Observer is indexed andior abstracted by AmilNFOR.'1, CAB Abstracts, Current Contents/Social arid Behavioral Sciences, Journal oj Ecollomic LiteratllreiEcOIlU/. PAIS International, RePEc (Research in Economic Papers), Social Services ella/ion Index, and Wilson Business Abstracts. Copyright 2009 The International Bank for Reconstruction and Development/THE WORLD BANK All rights reserved; no part of this publication may be reproduced. stored in a retrieval system. or transmitted in any form or by any means, electronic, mechanical. photocopying. recording, or otherwise without prior "l'Titten permission of the publisher or a license permitting restricted copying issued in the EK by the Copyright Licensing Agency Ltd. 90 Tottenham Court Road. London Wl? 9HE, or in the USA by the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 0192 3. Typeset by 'lechset CompOSition Limited, Chennai, India: Printed by Edwards Brotbers Incorporated, USA Rural Poverty: Old Challenges in New Contexts Stefan Dercon Poverty is still a predominantly rural phenomenon. However, the context of rural poverty has been changing across the world. with high growth in some economies and stagnation in others. Furthermore. increased openness in many economies has affected the specific role of agricultural growth for rural poverty reduction. This paper revisits an 'old' question: how does growth and poverty reduction come about if most of the poor live in rural areas and are dependent on agriculture? What is the role of agricultural and rural development in this respect? Focusing on Sub-Saharan Africa, and using economic theory and the available evidence, the author comes to the conclusion that changing con texts has meant that agricultural growth is only crucial as an engine for growth in par ticular settings, more specifically in landlocked, resource-poor countries, which are often also characterized by relativelu low potential for agriculture. However, extensive market failures in keu factor markets and Iike1u spatial effects give a remaining crucial role for rural development policies, including focusing on agriculture. to assist the inclusion of the rural poor in growth and development. How to overcome these market failures remains a keu issue for further research. JEL codes: 041. QI0, 055 Even though poverty has been reducing in some parts of the world in recent years, not least in East Asia and China, and more recently in South Asia, its per sistence in large parts of Africa and elsewhere keeps it high on the agenda. Poverty persistence is to a large extent related to a poor growth performance in national economies. Furthermore, in large parts of the world, poverty also remains in terms of sheer numbers a mainly rural and agricultural phenomenon. in that most of the poor depend on the rural sector for their livelihood. In this paper I will revisit some well-rehearsed but nevertheless relevant ques tions. What is the place of rural development and agricultural growth in growth and poverty reduction? What are the main rural constraints on growth and The Author 2009. PublL~hed by Oxford University Press on behalf of the International Bank for Reconstruction and Development I THE WORW BANK. All rights reserved. For permissions, please e-mail: journals.permiBsions@oxfordjournals.org. doi;lO.1093/wbro/lkp003 Advance Access publication April 9. 2009 24:1-28 poverty? Has more recent economic theory and empirical research given more guidance as to what can be done? These issues will be discussed. keeping in mind some of the poorest parts of the world. most notably Sub-Saharan Africa. I will use some recent theoretical models, including building on Lewis (1954), as my guide to structure the discussion. including that of the empirical evidence. even though this is incomplete, to corroborate my conclusions. These questions are well rehearsed and feature prominently in many general discussions on development. Textbooks typically engage with issues at various levels. for example Ray (1998) and Bardhan and Udry (1999). Longer treatments can be found in Timmer (2002) or de Janvry and others (2002). What is different in this paper is that I want to revisit some of the theory and evidence specifically with Sub-Saharan Africa in mind; as a result it is highly selective. Overall, the experience with agricultural growth, growth in general. and poverty reduction over the last decades has been disheartening in this region. It has provided the context for renewed calls for a strong focus on agriculture in Africa as a necess ary condition for growth and poverty reduction. For example. Sachs has been most vocal in calling for a 'green revolution' in Africa as an essential part of its development strategy (Sachs 2005). More nuanced analyses. such as the recent World Development Report 2008 on agriculture for development. emphasize the crucial role of agriculture as being "vital for stimulating growth in other parts of the economy" (World Bank 2007. p. xiii) in Sub-Saharan Africa, requiring strong productivity increases in smallholder agriculture. In the context of the substantial heterogeneity in circumstances and opportu nities across Sub-Saharan Africa. the next section will first briefly summarize the current evidence on the evolution of rural poverty, contrasting the African experi ence with other parts of the world. The evidence is consistent with a classic view that a move out of agriculture is correlated with overall poverty reduction. In the African context, neither a large scale poverty reduction nor a move out of the rural sector is occurring. Of course, this does not prove a causal link. Nor does it prove that a focus on agricultural growth is not warranted. I will ask how theory as well as the available evidence can inform us on what role agricultural and rural development has to play in both growth and poverty reduction in Africa. In the next section I offer a discussion of the relevance of a macroview on the role of agriculture in growth and poverty reduction. This will require us to revisit some of the old and seemingly unfashionable questions related to sectoral and urban-rural linkages to understand better the role of the relevance of agricul tural and rural development in the Africa setting. One of the key problems in this review is the relatively sparse evidence. so I rely largely on a simple but suggestive model of rural- urban linkages to make sense of this evidence. Combining this with recent approaches to the question of the scope for growth in Africa, empha sizing its heterogeneity in opportunities (Ndulu and others 2008). allows us to 2 The Y\'cJrld Balik Research ()bservt~r, vol. 24. 110. 1 (February 2009) identify those specific settings in which agricultural development is likely to be essential in stimulating growth and poverty reduction, as well as the nature and role of rural development in other settings. In particular, I will argue that the role of agriculture is likely to be very different in different settings. depending on whether a country can take advantage of manufacturing opportunities, whether it is dependent on others for its natural resources. or whether it is landlocked and with few natural resources of its own. I will argue that, especially in the latter case, a focus on agricultural growth may be an essential, if difficult, route out of poverty. In the final section, the strong assumptions regarding the nature of markets implied in the macroview will be complemented by a microview which emphasizes different market failures and the possibility of poverty traps. I will focus on three examples of serious market failures-those related to credit. to risk. and to spatial effects-and I will review their theoretical impacts and the available evidence for them. These market failures. especially in circumstances where they may lead to poverty traps. bring rural and agricultural policies back to the fore. leading to suggestive policy conclusions on the role of agriculture and rural development. Rural Poverty Patterns Poverty is still predominantly a rural phenomenon. Pick a random poor person in the world and the odds are that this person will be living and working in the rural areas as a farmer or agricultural worker. Even though the data are not without problems. the most recent estimates suggest that about 76 percent of the poor in the world live in rural areas, well above the overall population share living in rural areas. which is 58 percent (Ravallion and others 2007).1 Sub Saharan Africa is no exception: while it has the highest poverty rate overall, rural poverty is about a quarter higher than urban poverty, with 65 percent of the population and 70 percent of the poor living in rural areas. At current patterns of growth, poverty reduction. and population growth. poverty is likely to remain a predominantly rural phenomenon for the next few decades (Ravallion and others 20(7). Is this changing? The figures in Ravallion and others (2007) also offer a sug gestive insight into the recent patterns of urbanization of poverty over the period 1993 to 2002. Even though the urban poverty rate has marginally decreased in the world. the urban share of poverty has been increasing (from about 19 to 24 percent) as the urban population has grown faster than the rural population. largely due to in-migration. 2 There is considerable variation in this pattern as welL In Sub-Saharan Africa, a very marginal decrease in rural poverty and stag nating urban poverty rates, with a growing share of the urban population in the Dacon 3 Table 1. Growth of GDP and Agricultural GDP Growth GDP Growth GDP Growth Agricultural GDP Growth GDP AgriCllltllral 1990-2000 2000-04 1990-2000 2000-04 Low income 4.7 5.5 3.1 2.7 Middle income 3.8 4.7 2.0 3.4 East Asia & Pacific 8.5 8.1 3.4 3.4 South Asia 5.6 5.8 3.1 1.9 Sub-Saharan Africa 2.5 3.9 3.3 3.6 Source: World Development Indicators. World Bank. total population. is responsible for a more substantial urbanization of poverty and little change in total poverty. The global overall urbanization of poverty in the context of relatively substantial poverty reduction, and with the larger part of the 'stock' of the poor living in rural areas, also implies that rural poverty has con tributed most to overall declines in poverty: using a simple decomposition, Ravallion and others (2007) calculate that about 80 percent of aggregate poverty declines stem from rural poverty declines. But this obviously does not prove any causality between, say, urbanization and poverty. or indeed that what happens in the rural or agricultural economy is in itself the cause of poverty declines. In particular, these patterns have to be seen in the broader context of the economy. 'Growth in GDP per capita and poverty reduction are well known to be coinciding with a gradual decrease of the share of GDP from agriculture as well as the share of the population engaged in agriculture. For example, across all low income countries, growth stood at about S percent a year, with the share of agri culture declining from 32 to 23 percent of GDP between 1990 and 2004 (World Bank 200Sb). Contrasting the experience in the 1990s of Sub-Saharan Africa with the relatively fast growing regions in East Asia and Pacific (including China) and South Asia, where substantial poverty reduction is taking place, is instructive (table 1). We observe growth of overall GDP far outpacing agricultural GDP growth in both East Asia and South Asia, but not so in Sub-Saharan Africa, where overall growth and growth in agricultural GDP were more similar at 2.S and 3.3 percent a year respectively.3 With population growth still at about 2.3 percent per year in Sub-Saharan Africa, much higher than in these other regions, per capita growth has been minimal. In other words, there is little sign of a structural transformation in Sub-Saharan Africa, despite signs of some urbaniz ation of poverty, compared to the patterns observed in more successful regions. For some, agricultural growth is seen as the key engine of growth (Timmer 2007). The higher recent growth rate in African agriculture is, then, a sign of hope, even if at current rates they can hardly be viewed as evidence of rapid agri cultural transformation. In such a view. the key is to raise agricultural 4 The World Bank Research Observer. vol. 24, no. 1 (February 2(09) productivity to allow growth to take off. A standard conceptualization of this process focuses on the existence of linkages in inputs and outputs between agri culture and other sectors (Johnston and Mellor 1961). Showing sympathy with this view. in the World Development Report 2008 it is argued that, in currently agriculture-based economies, high growth in agriculture may well be the road to economic take off. 4 Even though the historical evidence on the role of agriculture in allowing the Industrial Revolution to start in Europe is suggestive, and that agricultural growth was also a significant element in the growth success of East Asia and China, it is much harder to argue that agricultural growth is essential to allow growth to take off, and the evidence for this is not clear-cut and is hard to come by. Historians are still debating vigorously whether an agricultural revolution, in terms of a period of rapid increases in productivity during the 18th and 19th cen turies. with its seeds earlier, was a key cause of the Industrial Revolution and sub sequent growth. first in England and then elsewhere in Europe (Crafts 1985; Allen 1999). The timing of the growth in labor productivity is disputed. some dating it much earlier than commonly suggested (Allen 1999), while some even dispute that there was any significant productivity increase in agriculture in England in the period 1560 to 1850, so that its role as a precursor for overall growth increases is even more minimal (Clark 2002). Productivity increases in agriculture may have been a consequence of the Industrial Revolution, linked to competition over labor, rather than a precursor (Gantham 1989). Furthermore, recent comparisons of European agricultural productivity with levels in the Yangtze Delta in the 17th to early 19th centuries suggest that Chinese land and labor productivity in agriculture were close to the best performers in Europe in that period (England and the Low Countries), undermining the view that agricul tural circumstances allowed industrialization to take off in these countries and not elsewhere. More recently, while the important role of policy changes leading to agricultural growth in the growth success of China is not disputed, this does not necessarily extend to other Asian success stories. For example, Korea appears not to have invested in agricultural productivity in the period leading up to its rapid industrialization (Amsden 1989).5 Growth linkages from agriculture in currently developing countries, including Africa, are commonly suggested to be considerable (for a review, see Staatz and Dembele 2007). While possible, the evidence is hard to compile and subject to considerable methodological problems. Simultaneity difficulties affect the econo metric evidence from time series for particular countries, while panel data analy sis produces ambiguous results (World Bank 2007). Most analysis depends on simulation models (such as that based on computed general equilibrium models or on input-output models), which inevitably have to impose strong and untested behavioral assumptions to derive results (Dorosh and Haggblade 2003). Dercon 5 There is stronger evidence in favor of the view that agricultural growth helps growth to be more pro-poor. albeit possibly dependent on context. For example. in China. it is estimated that growth from agricultural growth contributed up to four times more to poverty reduction than growth from industry or the service sector (Ravallion and Chen 2007). However. the rather favorable land distribution there may well have played a key role in this. a factor contributing to similar evidence in other East Asian settings. such as Vietnam. Evidence from India suggests a more subtle message: the impact of the growth in agriculture is matched by growth in services in terms of their poverty-reducing effects. although these effects. from non-farm growth. are larger in states with higher initial farm productivity (Ravallion and Datt 1996, 2002). Furthermore, evidence from Foster and Rosenzweig (2004) suggests that areas in India with the slowest growth in agricul tural productivity had the largest growth in the rural non-farm tradable sector. This discussion of patterns and change in rural poverty helps to motivate our further analysis on Africa. It is clear that poverty is highest in rural areas, but is that sufficient reason to focus on rural areas and agriculture? Successful poverty reduction is not simply equated with relatively high growth in agriculture. At best, during periods of rapid poverty reduction correlated with growth. rural growth is likely to be important for poverty reduction. but successful growth is associated with growth in the non-agricultural sector fast outpacing agricultural growth. In any case. this implies that understanding rural poverty changes cannot naively focus only on what happens in the rural sector or in agriculture. Analysis will have to be done in the context of overall growth and changes. taking into account rural-urban linkages. Arguably, the first systematic treatment of these issues are in the Lewis model (lewiS 1954). This model is effectively part of a theory of urban and rural interactions. albeit with particular assumptions about the functioning of markets in urban settings and. especially, the nature of incentives and decisionmaking. Much work has refined this type of analysis: a shared under lying research question is how does growth and poverty reduction come about if much activity and labor is initially in the agricultural or rural sector?6 These questions have somewhat been forgotten in much recent research. However. the context of the rural sector has also rather dramatically changed in the developing world, when compared say to the 1970s and 1980s. In most of the developing world. not least in Sub-Saharan Africa. market-based reforms ("getting the prices right") has moved forward dramatically, In Africa, the context of farming and agricultural markets has substantially changed. and virtually everywhere important liberalization. mostly domestic but also increasingly of international trade. has been taking place. Globalization and increased openness. as well as investment in commercial agriculture and marketing. are changing the context slowly but in an irreversible way. High commodity prices. including for 6 The World Bank Research Observer. vol. 24, no. 1 (February 2009) cereals. provides new opportunities for agriculture. All these factors provide a different context to ask (again): Can and should agriculture be the engine for growth? Can and should it be an engine for poverty reduction in the context of growth? A Role for Rural and Agricultural Growth 'in Growth and Poverty Reduction in Africa? Much basic analysis of the importance of agricultural growth for poverty reduction is based on simplistic premises. For example. it tends to be stated that since the poor are employed in agriculture, agriculture must be the basis of poverty reduction efforts. Against this is one of the basic insights from most data on poverty: that a systematic increase in prosperity tends to be linked to having fewer people dependent on agriculture for their livelihoods. The key question in this respect is how to get them out of the agricultural sector in a sustainable way. As it is one of the clearest analyses of this issue, I will build on the work by Eswaran and Kotwal (l993a. 1993b. 1994. 2002). which. though containing excellent economic theoretical inSights. is bereft of even a single equation. The rel evance of their work lies mainly in the clarity of the questions asked and answered; most of the points have been made by others. but rarely as crisply and convincingly. In this section. I will briefly summarize the key points they make and give a flavor of their analysis. which is done with India in mind. Then I will discuss how applicable these insights are for other parts of the world, most notably for Africa. Eswaran and Kotwal's analysis can be thought of as a Lewis model within a proper general equilibrium framework. It also drops some of the most difficult assumptions underlying Lewis's original analysis. but which have remained present in many of the subsequent contributions. To put it bluntly. there is no assumption that rural labor markets work in a way equivalent to agricultural workers spending too much time sitting under trees (surplus labor that can be freely extracted). Furthermore. industrial workers do not choose to eat shirts during a critical phase of the growth process (that is, when total output in agri culture is going down, they are willing to shift to consuming less food and more manufacturing goods).7 Theoretical Framework The Eswaran and Kotwal model assumes a two-sector economy, industry, and agriculture. There are two goods: shirts and food. Production in both sectors is Dercon 7 characterized by (different) constant returns to scale production technology. using labor as well as land in agriculture. There are landowners and workers in the rural economy, and workers in the industrial sector. A crucial assumption is that preferences are lexicographic: people will first need to have enough food before they will buy shirts. It captures an Engel effect that richer people will spend less on basic essentials, but, by making it more extreme, its relevance comes out more directly. An alternative way of looking at this is to state that there is no circum stance in which, for very poor people, lower prices for manufactured goods cannot induce them to cut back on essential basic commodities. Even though in reality it may not be as clear-cut. this view takes seriously that poverty is related to deprivation in essential food intake. As EW show. relaxing the assumptions does not fundamentally change the result, but it does make the dichotomy in implications under different processes of technological and other development marginally less striking. There is also an initial inequality in this economy. Some (the rich) have assets such as land; the poor only have labor. At first. the poor only eat food. since they do not have enough to satisfy their basic needs; once sated with food, they do not eat more. So there is a maximum level of spending on food. and a poor person is someone who only spends on food. It is further assumed that there are clearing and integrated labor markets across rural and urban areas. 8 This means that people are indifferent as to whether they work in agriculture or industry: contrary to Lewis. these markets are not perfectly segmented, but integrated. 9 Clearing product markets are those where demand equals supply in each. All these assumptions about markets imply that poverty will go down if labor demand increases. resulting in an increase of real wages. In other words. real wage increases for the initially poor determine whether poverty declines. But how does this work? Using the assumptions. a generic model can be developed that can be used to contrast a number of alterna tive strategies to achieve this. Understanding the context and situation in which these different strategies are effective ways of reducing poverty will also help us to understand how important it is to focus on agricultural and rural development. First, in a closed economy. the policy to be considered is (neutral) industrial progress via total factor productivity (TFP) growth. Under these assumptions. Eswaran and Kotwal show that this implies that more shirts are being produced for the same amount of labor. Prices for shirts will go down, but crucially the poor do not care for these cheaper shirts. since they still do not have enough food. The result is that there is no incentive for anyone to move out of agriculture. since total food supply would go down and demand for food would go up. In the end. the TFP growth only benefits the rich, who have enough food and are already consuming shirts, which they can consume more of. The marginal product of labor in industry goes up, but the price of shirts goes down. Employment is unchanged, and nominal wages. food prices, and therefore real 8 The World Bank Research Observer. vol. 24, no. 1 (February 2009) wages for the poor are the same as before. Poverty is simply unchanged. despite TFP growth in the industrial sector. Next. consider again a closed economy and a policy of (neutral) technological progress in agriculture. More food is being produced for the same labor. This is obviously of interest to all the workers: there is more food for the same amount of work. Once there is more food consumed, some may cross the threshold and be sated with food, and become interested in buying shirts. The result is that shirt prices are being pushed up. This creates incentives for firms to expand production and attract more labor to deliver this increased production. Higher demand for labor will require increasing nominal wages. Rural wages will move up as well. while food prices will go down somewhat. due to the higher production and shift to shirts by some previously poor consumers. In equilibrium. labor will have moved out of agriculture into shirt production, and higher equilibrium real wages will imply a reduction in poverty. The contrasting results are striking and lead to the conclusion that. in a closed economy, growth in agriculture may well be essential for poverty reduction. while industrial progress has no impact. The presence of demand linkages is the key factor, but, for poverty reduction, the relevant linkages are only via commodities consumed by the poor. Mellor has long emphasized this process as well (1999). But there is a difference here: the issue is not just growth linkages but also the link with poverty. Agriculture is then the central engine for poverty-reducing growth. The results are nevertheless strongly affected by the assumptions about open ness. In an open economy, the central demand and supply constraints do not matter anymore for traded commodities. Basic food staples can typically be imported. while shirts can be exported. Therefore. assume that both goods are tradable goods. so that only world prices matter. We can now revisit both cases. First, consider the impact of industrial progress. More shirts are being produced for the same labor input, but prices of shirts remain the same, as world prices are not affected. Firms have an incentive to expand production, so the demand for labor and nominal wages will increase. Even though food supply goes down. in this case workers can move since food imports can go up. So the marginal product of labor goes up in agriculture as well. allowing rural and urban wages to increase in nominal and real terms. Food imports with higher real wages will mean that more food is being consumed and that some workers will start consuming shirts. The result is that poverty declines. Second, the impact of agricultural technological progress is now very similar to industrial progress. The demand linkages are not crucial anymore for the link between real wages and output. and real wages increase with more people buying shirts than before. To put it simply. poverty reduction can then be achieved by any source of increased domestic competitiveness relative to the rest of the world. a Dercan 9 Relevance for Afn'ca? Are these results relevant for Sub-Saharan Africa? In the Eswaran and Kotwal model. if growth is driven by agricultural growth that is technologically neutral or labor intensive, then the scope for poverty reduction will be larger. Land is typi cally not highly unequally distributed in most African countries, where there is relatively low landlessness and where, in some countries, such as Ethiopia, there is a remarkably equal land distribution. Productivity increases will reward the poorer farmers as well, so that growth may have substantial poverty impacts. The changing circumstances in African agriculture, in terms of allowing many factor and product markets to work more freely. especially in terms of removing much of the urban bias. has improved the opportunities for this process to materialize. A "green revolution for Africa" may have substantial returns. However. the necessity of agricultural growth to deliver both growth and poverty reduction is not so clear-cut. As in the modeL opening up the economy has removed the crucial dependence on progress in agriculture for delivering poverty reduction as the crucial demand linkage with agriculture is removed: growth in other sectors, provided it is labor intensive. can similarly promote poverty reduction.12 To assess the case for the crucial role of agricultural growth. the growth opportunities for Africa have to be considered. Recent work by Ndulu and others (2008) provides the foundation for a sugges tive three-way description of the growth opportunities of Sub-African countries; a similar description is used in Collier (2007). These researchers make distinctions in terms of growth opportunities: first, there are resource-rich economies; second. there are coastal and other well-located countries; third. there are land-locked economies without natural resources. Each of these groups has very different pro blems at their core when trying to boost growth and to reduce poverty. For resource-rich economies, the key issue is to manage wealth: how to trans late the underlying wealth controlled by the nation into the basis for sustainable and shared prosperity. The key problems they tend to face are Dutch disease and governance problems. and they are more likely to be ravaged by violent conflict think Nigeria, Angola, or Congo. For coastal and other well-located countries the challenge is very different. They have no natural resources, and so no immediate source of wealth. Wealth needs to be generated. They have two production factors they can put to good use: they have people and they have their location to their advantage. Much of coastal Africa, not least Ghana, Cote d'Ivoire, Kenya, and South Africa, springs to mind. Their main challenge is how to take advantage of the opportunities offered by their location. They are countries that in principle should be able to take advantage of world trade opportunities, so their priorities are likely to have to include building up trade infrastructure, managing market institutions and 12 The World Bank Research Observer. vol. 24, no. 1 (February 2009) it regulation, investing in skills, and supporting the formation of well-working labor markets. These are very different challenges, but globalization offers serious opportunities for them. Without working on their constraints, they are bound to be left behind; but the potential is there. This leaves the landlocked, resource-poor economies without natural resources. They are suffering most from the agglomeration effects: they have little to offer, and they totally depend on their neighbors to overcome these effects. Matters are made worse if their better located or better endowed neighbors are mishandling their economies, or indeed if they are in conflict with these states. All these factors are creating further negative externalities. Examples are Burundi, Burkina Faso, and Ethiopia. So when is agricultural growth essential? How does agricultural growth fit into this framework? First, take the resource-rich countries. Agriculture is unlikely to be an essential source of growth. Nevertheless, such an economy needs to find ways of diversifying and building up its productive capacity, and agriculture could play a role in this. In this context, the burden of agricultural growth to drive overall growth is not present. so that efforts for intensification or diversification can have a much more pro-poor bias. This could involve a focus on smallholder agriculture, for example by supporting new technologies and activities with higher labor productivity. But clearly, there are a variety of ways to encourage the distribution of the wealth of such a country, and it would be hard to argue that stimulating agricultural growth is essential. Moreover. investing in rural areas, including in basic services such as health, education, and infrastructure, could be an effective alternative form of redistribution, and it may have higher long-term returns in terms of transforming the economy than a narrow focus on agricul ture. For example, there would be less of a burden to ensure that these invest ments were largely in high potential areas. Second, consider the well-located economies, who are best placed to take advantage of world economic opportunities. Managing their comparative advan tage, via labor markets, skills, regulation, and investment climate, is most essen tial. The role of agriculture is similar to the Eswaran and Kotwal model in an open economy: "industrial" progress is most likely the best route and a vehicle to take advantage of trade opportunities. The role of agriculture is then more sub sidiary: it makes sense to encourage progress in agriculture as well, if only as a means of managing an exit from agriculture when trade-based growth takes off. I 1 Skill creation via better health and education, also in the rural sector, is then most helpful as well. if only since it will facilitate the development of better skilled labor that can in due course be absorbed. The experience of Indonesia in the late 1970s and 1980s is rather reminiscent of this, with active rural policies but which ultimately led to more absorption in the urban economy. For African econ omies, the key challenge is to overcome their relative marginalization in the world Dercon 13 economy. as latecomers are put at a disadvantage when competing with countries that have already an established industrial base. such as many Asian economies. This may require specific support for manufacturing industrial development for this potential to materialize (Collier and Venables 20(7). but this is still the best route for growth. The landlocked. resource-poor countries are a rather different problem. In many cases-think of Ethiopia or Burkina Paso-the agricultural base is at best highly vulnerable. But their risk of total marginalization relative to the world economy is also highest. They are mainly dependent on the ability of their better located neighbors to pull them into trade-oriented opportunities. often involving migrant labor. In terms of active policies. the opportunities are limited: infrastruc ture and skill creation are sensible. but as locations for investment they are likely to remain down the pecking order for a long time. not least because many of their better located neighbors are still only barely integrating with the world economy at present. As a consequence. the best way to think of these economies is as if they were effectively closed economies. irrespective of active trade liberaliza tion. As the model predicted, agricultural growth is then essential for both growth and poverty reduction-but don't expect any miracles. Technological progress in agriculture has to be actively pursued. as well as other measures to raise rural productivities as the main way of delivering growth that also has clear poverty-reducing impacts. Rural and agricultural development is likely to be hardest here. as they are often agriculturally more vulnerable areas. but it is here where it has to be attempted with most vigor. In order to achieve poverty reduction. there are also likely to be important trade-offs between stimu lating agriculture in the high potential areas (as likely to be required for growth) and promoting it in more marginal areas where poverty may be highest. With commodity prices high. as in recent years. this tension between growth and poverty is even more present. as promoting overall agricultural output becomes even more cruciaL The contrast between. say Ethiopia and India, is striking here. Ethiopia may have been trying to open up in recent years, but its neighbors that are better located have not been taking advantage of trade opportunities. and in general its relations with them are at best frosty (such as is the case with Somalia and Eritrea). This limits its options dramatically. At best it can create a basic infra structure and create skills, but growth is likely to have to come via agriculture to encourage any systematic and persistent decline in rural poverty. Its population cannot take advantage of growth opportunities in neighboring countries for the purposes of trade or even of migration. India has pockets and even states that face similar constraints on local natural resources, and other problems, as does Ethiopia (for example, Bihar). But its economy is broadly integrated. and with some states taking advantage of its 14 The World Bank Research Observer, vol. 24. no. 1 (February 2009) increased openness and location. growth externalities and employment opportu nities provide options for these lagging states to at least take partial advantage of the overall change. with some beneficial impacts on local poverty reduction. But even so. Ethiopia cannot expect miracles from a strategy of focusing on agriculture alone. Its relatively landlocked nature as well as its dependence on rainfall implies that some (unexpectedly) good harvests can push down tempor arily food prices received by farmers to extremely low levels as export parity prices (the world price minus transactions costs to supply the world market from the farm) are bound to be low. In 2001/02 for example. after a brief period of yield increases and area expansion. good weather contributed to a bumper harvest for maize, but prices were pushed to levels at which farmers found it not profitable to harvest. Such events undermine agricultural transformation, making it imperative that demand growth takes place to avoid prices to collapse occasionally in such landlocked economies with high transactions costs. Market Failures and Poverty Traps? The previous section focused on a macro- or general equilibrium narrative on the role of agricultural and rural development in growth and poverty reduction. Changing contexts. not least that of increased openness of developing countries and the move toward more market-based economies, changes the role of agricul ture and rural development in growth and poverty reduction. one reason being that urban-rural and sectoral interactions have to be properly integrated in the analysis. However, this analysis was conducted using the assumption of well functioning factor and product markets. As much of the rural development research in recent decades has highlighted. market failures are prevalent. even despite the removal of many policy-induced market imperfections. such as in agri cultural product markets. 12 Resources are then not allocated effiCiently until mar ginal returns to different factors of production are equalized. Factor markets, including those for labor, serve different people differently. resulting in heterogen eity in the extent to which people with different initial circumstances can take advantage of opportunities. such as those offered by growth. Rural non-market institutions may have developed partially to substitute or compensate for market failures, but not completely. This has important implications, not just for the extent of growth that can be achieved, but most importantly, for our purposes, for the extent to which the poor can partake in growth. While the underlying taxon omy of general growth opportunities in different parts of Sub-Saharan Africa, which was developed in the previous section, is unlikely to be fundamentally affected. the ability of these growth narratives, to deliver poverty reduction and broad inclusion in the growth process, definitely is. Dercon 15 In this section, I will try to put some of the key lessons from the extensive micro level research on rural households and institutions from the last few decades into the broader context of poverty reduction and growth. The key ques tion is: What do we know about the key constraints of the rural poor to partici pate in or contribute to growth? Much of the recent academic literature on rural issues has explored many of these market failures in factor markets. such as land. labor, credit. and insurance. This typically forms the core of much of the teaching offered in graduate schools in the microeconomics of development. 13 However, the key issue for our purposes is to explore how microlevel issues can be put into the bigger picture of growth and poverty reduction. Following this I will consider the question of what would cause some poor people to remain locked in rural poverty, even in those countries that manage to start growth processes that are intensive in terms of labor. Furthermore. for countries that are absolutely dependent on rural development efforts for poverty reduction via growth. such as landlocked. resource-poor countries with Hpoor" neighbors. the need to unlock rural potential means that we should be especially careful when examining what may cause growth in particular rural areas to lag behind. I will focus on three instances that illustrate a more general principle and finding in recent theoretical work, one suggested by empirical evidence: that initial poverty and market failures conspire to keep some poor people persistently poor or even in a poverty trap. In addition. I will focus on three problems linked to market failures which may induce the following processes: access to capital (credit market failure); risk (insurance market failure); and spatial externalities (the curse of geography). Credit Market Failure and Poverty Traps The most obviously observable market failure is that of credit markets to conform to the assumptions of perfectly competitive markets. Under perfect and complete markets. anyone with a profitable project should be able to get a loan at the current interest rate. If markets were perfect and efficient, no bank would ask for collateral to secure the loan. 14 In practice, without collateral. one typically would not get the loan. Collateral requirements can be understood as an important means by which credit markets handle the central problems that bedevil these markets: asymmetric information, such as moral hazard and adverse selection. and enforcement problems. Since imperfect information means that borrowers may not be able to know which projects are more risky among many risky pro jects. or whether lenders will implement other actions than initially committed to after the loans have been granted. collateral may be asked for to secure the loans. Collateral may also help to enforce the repayment of loans. 16 The World Bank Research Observer. vol. 24, no. 1 (February 2009) Starting from initial asset poverty for some, it is obvious that this may be a market failure that is particularly hurtful for the poor. by excluding them from profitable opportunities. A number of careful. suggestive models. most notably Eswaran and Kotwal (1986), show the key implications of this: the rich do not just earn more income because they have more assets, but they can also use assets more efficiently. Market failures force some of the poor to be inefficient. and they exacerbate any initial inequality-some of the poor may stay behind. There is much suggestive evidence that similar processes are common in agricultural settings, where they are often linked to credit market failures. A key prediction of this model is that the marginal return to bringing more land into production by the poor outweighs that of the rich, and that average output per hectare is larger for the poor than for the rich. This negative correlation between cultivated land area and output per hectare is commonly observed in developing countries. Binswanger and others (1995) provided a comprehensive overview of the evidence and looked into different explanations. Land quality heterogeneity is certainly part of the story. but factor market failures, including those related to credit, are likely to be relevant as well. The model described is effectively a static model-but its potential dynamic implications are intuitively appealing. Starting from some inequality in assets, those with more wealth earn higher returns and plausibly can accumulate at a high rate, while the poor enter into technology or activity portfolios with lower returns and may not be able to start accumulating any wealth. This intuition is at the basis of a number of growth models leading to poverty traps for some and accumulation for others. Banerjee and Newman (1993) showed the adverse impact of asset inequality on growth, linked to credit market failures. When threshold levels of assets are needed to enter into different activities, then entry into profitable activities is closed off for those with limited assets, and they are trapped in poverty. while others can climb the occupational ladder. A poverty trap is an equilibrium outcome and a situation from which one cannot emerge without outside help, for example via a positive Windfall to this group, such as by redistribution or aid, or via a fundamental change in the functioning of markets. Much other work suggests poverty traps and overall efficiency and growth losses, due to poverty and inequality combined with credit market failure, where some people are unable to exploit growth-promoting opportunities for investment (for example Galor and Zeira 1993; Benabou 1996; Aghion and Bolton 1997). This model of credit market failure is also a central part of the narrative underlying the 2006 World Development Report (World Bank 2005a); it discusses also some suggestive evidence of its implications. If poverty traps induced by the credit market are present in rural areas, or more generally, underinvestment is present, limiting growth. Furthermore, if growth picks up but access to profitable opportunities requires some minimal Dercon 17 investment (for example being able to invest in the migration of some household members, or sunk costs for a newly profitable activity). then credit market failures may result in the exclusion of some of the poor from the benefits of growth. Proving the existence of credit-induced poverty traps is difficult. but there is defi nitely evidence in rural Africa of entry costs (in particular activities) and assets (in the presence of limited access to investment capital) leading to poorer house holds holding less profitable portfolios (Dercon 1998: Barrett and others 2005). What to do about them is less clear. Credit market interventions have long been a favored intervention. in recent times mainly via microfinance schemes. although other interventions may well be better and more useful for resolving credit market failures (Besley 1994). It is not altogether clear that the poorest are benefiting most from microfinance (Amendatriz de Aghion and Morduch 2005): more needs to be learned from careful evaluations of specific products that could have the highest impact (Karlan and Goldberg 2006). For Africa, the key question is also whether microcredit is not overrated as a means to fostering inclusion of the rural poor in the economy: often. if applied to rural settings. it is seen as a means of helping to get the rural poor into off-farm business activities. usually with restrictions on the eligible activities. via the support offered by the microfi nance institution. Even though credit market restrictions may exclude them from certain profitable opportunities. it is unlikely to be the case that large scale poverty reduction is going to be achieved by making more and more people dependent on entrepreneurial activities. In most economies. the transformation toward a higher income and low poverty economy has been achieved by an increase in wage employment. via higher labor demand. with farmers becoming employees in industry and the service sector. It is likely that in the long run. the highest returns are to be obtained from health. education. and skills. not least in African economies that are resource rich or whose potential is in manufacturing exports. Of course. the transition is likely to be helped by allowing some people to take advantage of entrepreneurial opportunities in agriculture and the non-farm sector. but it is unlikely to be a large-scale successful route. There is a dilemma here as well: as long as growth is not taking off. the poor are likely to be helped more by encouraging entrepreneurial activities. as labor demand is not picking up. In most African contexts. returns to education are convex: low for primary and. possibly. high for higher levels of education (S6derbom and others 2006). so that investing in education under the circum stances of the 1980s and 1990s. with low wage-labor demand, is unlikely to offer rapid routes out of poverty; though if growth were to pick up in a sustained way, this could change. Failing that, microcredit schemes could offer a solution for many of the poor, but without growth it does not offer real scope for large-scale poverty reduction. In any case, and at best. microcredit schemes with more flexi bility. that do not try to tie people to particular entrepreneurial activities. but that 18 The World Ballk Research Observer. vol. 24. 110. 1 (February 2009) respond to the general financing needs of families, may be more effective (Karlan and Mullainathan 2007). Insurance Market Failures and Risk-induced Poverty Traps Another serious market failure impacting disproportionately on the poor is the lack of insurance and protection of the poor in the face of risk. The existence of complete insurance markets (or, to be technically more accurate, complete state contingent markets) is another assumption for perfect markets that tends to be violated in practice. Problems with asymmetric information and enforcement issues, not dissimilar to those causing credit market failures. are again typically responsible for the limited spread of insurance mechanisms in developing countries. Even if they wanted to, the poor could not get any insurance for most of the risks they face. Uninsured risk causes considerable hardship to the poor. Developing countries are still characterized by a high incidence of natural disasters, drought, conflict, and insecurity, as well as economic shocks, such as commodity price shocks and cur rency shocks. Health problems are widespread, as are pests in agriculture. It is com monplace to view these as "transitory" problems, requiring temporary solutions, such as some form of safety net, after which one should get back to the bigger issues of development. For the policymaker it also often means that it is just a social issue that should not distract the key (macroeconomic) policymakers from the bigger issue of how to stimulate growth in the economy and alleviate widespread poverty, However, this is misleading. There is increasing evidence that risk and shocks are a cause of lower growth, which results especially in the lower growth of the incomes of the poor and possibly in poverty traps. Focusing attention on the poor could then again be contributing to both growth and equity: in any case. it could be instrumental in ensuring that the poor can benefit from emerging grovvth. Households in developing countries have developed sophisticated mechanisms to cope with risk. Typically, one could consider two types of responses: risk management strategies and risk-coping strategies. Risk-management strategies involve trying to shape the risks faced by entering into activity portfolios that are more favorable in terms of risks. For example, entering into low risk activities or diversifying into portfolios of activities with differing risk profiles-growing more drought resistant crops, entering into petty trading or firewood collection, seaso nal migration, and so on. Risk-coping strategies involve activities for coping with consequences of risk in income. Two types are commonly observed: self-insurance using savings, often in the form of cattle or small ruminants, to be sold off when the need arises; and informal mutual support mechanisms, where members of group or community provide transfers to each other in times of need, typically on a reciprocal basis (Fafchamps 1992). Dercon 19 These strategies are not without cost: income risk-management strategies result in a reduction in mean income and a variability in income. while adjusting asset portfolios to cope with risk typically involves investing in liquid assets with lower returns. rather than in productive illiquid investment. This affects their long-term income and their ability to move out of poverty. Indeed. there is growing evidence that these strategies imply substantial efficiency loss for the poor. which the rich-typically better protected via insurance. asset, and credit do not have to endure (Dercon 2002). Morduch (1995) documented how more profitable technologies are not adopted because they are too risky in a particular setting in India. The same farmers have been found to hold livestock as a precau tion against risk even when more productive investment opportunities exist (Rosenzweig and Wolpin 1993). Rosenzweig and Binswanger (1993) found that the loss in efficiency between the richest and poorest quintiles in their sample from India was more than 25 percent. attributable to portfolio adjustments in assets and activities due to risk exposure. In Ethiopia. Dercon and Christiaensen (2007) have found that modern input adoption is lowered due to the downside risk related to input loans. whose repayment is strictly enforced, even when rains and harvests fail. Over time. this results in substantial efficiency losses. which affect the poor disproportionately. These risk-management strategies may trap the poor in poverty: to avoid further destitution. they are forced to forgo profitable but risky opportunities, and with it the opportunity to move out of poverty.15 Even so. they cannot fully protect themselves: there is much evidence that although the strategies contribute to less variability in consumption and nutritional levels. they are still not able to cope with some serious. repeated shocks. not least those affecting whole commu nities, regions, or countries (Morduch 1999; Dercon 2002). These uninsured shocks typically wipe out assets. pushing the households down the asset distri bution. They could be pushed below some critical threshold. trapping them into poverty from then on, for example due to the risk strategies they then need to follow to avoid further destitution. or due to other processes. There is growing evidence that these processes are an important cause of poverty persistence and possibly permanent traps in developing countries. Jalan and Ravallion (2003) investigated the presence of poverty traps using data from China. and, although they did not find a pure poverty trap, they found that households took several years to recover from a single income shock, and that the recovery was much slower for the poor. There is also related evidence from Africa. Dercon (2004), using panel data from rural Ethiopia, found signs of poverty per sistence linked to uninsured shocks, with the impact of rainfall for up to the pre vious four years affecting current growth rates, and the extent to which households had suffered in the famine of 1984-85 still an explanatory factor in growth rates in the 1990s. Furthermore, it took on average 10 years for livestock 20 The World Bank Research Observer. vol. 24. no. 1 (February 2(09) holdings, a key form of savings in rural Ethiopia, to recover to the levels seen before the 1984-85 famine. In a careful study, Elbers and others (2007) use simulation-based econometric methods to calibrate a growth model that explicitly accounts for risk and risk responses applied to panel data from rural Zimbabwe. They found that risk substantially reduces growth, reducing the capital stock (in the steady state) by more than 40 percent. Two-thirds of this loss is due to ex ante strategies by which households try to minimize the impact of risk. Barrett (2005) has found suggestive evidence on poverty traps by looking at the livestock hold ings of pastoralists in Kenya. There is also increasing evidence of the long-term implications of uninsured shocks, focusing on health and education. For example, the permanent impact of drought on children is well documented-lower adult height, poor education out comes, and therefore lower lifetime earnings. For example, the impact of drought and war in rural Zimbabwe in the early 1980s on a particularly vulnerable cohort of children was estimated at 7-12 percent of lifetime earnings or more (on this and other evidence, see Dercon and Hoddinott 2003). All this evidence points to the important consequences of lack of insurance and protection in rural settings in developing countries, particularly as they affect the poor. Given that the root cause is again a market failure, exacerbated by poverty, there is a clear case for interventions that are potentially both poverty reducing and stimulating of efficiency and growth; in any case, they could ensure that the poor can more effectively take part in growth processes. In industrialized countries, not least in Europe, the failings of the insurance markets are largely resolved via some form of universal social insurance and substantial direct means-tested transfers. For developing countries, this is not likely to be cost effective, involving high administrative costs and high informational require ments. To put it simply, the means for such systems are unlikely to be available. Many responses can be considered, such as reducing the risks faced by rural households (for example by preventative health services, or better water manage ment for agriculture). strengthening existing responses (such as investing in a better provision of savings products. or better functioning assets markets, such as that for live animals), and improving forms of insurance and broader social pro tection in the form of safety nets. While each have their problems and advan tages, in recent years a number of specifically interesting initiatives have been taken with respect to insurance. even if their potential benefits are not quite sub stantiated yet. 16 Spatial Effects Another common cause for market failure is the presence of spatial externalities. Externalities are said to be present if economic or other interactions create social Vercon 21 gains or costs beyond those taken into account by those involved in the inter action. The standard example is environmental damage from production invol ving pollution not accounted for by the buyers and sellers of the commodity produced. A more general phenomenon in developing countries. that can be best understood in terms of externalities. involves geographically defined areas that appear to stay behind-poor neighborhoods. or even poor regions or countries. If one looks at the performance of the developing world. it has been striking over recent decades that some countries-largely in Africa-appear to have become increasingly marginalized. with low economic growth. persistent population growth. and. generally. persistent poverty. Less studied but at least as important is that even in countries where growth is high, there appears to be areas that sys tematically stay behind and do not benefit from the overall economic growth in terms of income growth and poverty reduction. Certain regions in China and India may well fit this bill. Much less documented but no less true. the geographi cal disparity in growth and poverty-reduction performance within African countries is similarly present. I 7 Such disparity may well be explained in terms of theories emphasizing agglom eration or location effects. predicting that firms will exploit increasing returns resulting from the presence of externalities to locating in the same geographical areas, implying that firms would locate in clusters (Fujita and others 1999). The corollary is that some less attractive locations may have missed the boat: not only would they not get the required investment. any capital present may well move out to capture the higher returns elsewhere. For those areas that missed the boat, there is a negative externality from the success of other areas. Clearly, this is a form of poverty trap: although initially these areas may not have been very differ ent. once they missed the boat they can only escape by a serious exogenous shock or massive effort. They face a substantial threshold that they need to overcome to attract or retain capital for accumulation. Other explanations similarly emphasize externalities related to the specific local context, for example low local endowments in terms of pUblic goods. common property resources. and private asset holdings. If growth processes require a certain threshold of local endowments to take off. then poorly endowed areas may well find it hard to escape poverty. Proving this is difficult. There is some evidence for China and a few other countries. I8 For Africa, the role and effects of remote ness on growth and poverty in Africa are well discussed in Christiaensen and others (2005). with evidence from a number of countries. Recall that these externalities are again market failures that specifically affect those at the lower end of the asset distribution-this time with assets broadly defined to include public and environmental goods. Given that poverty traps are identified, this empirical evidence would justify "poor areas" programs-massive investment programs in particular deprived areas to build up locational and community capital. 22 The World Bank Research Observer. vol. 24. no. 1 (February 2009) However, these empirical studies lack sufficient detail and a clear narrative about how these externalities come about. More evidence would be needed to guide and prioritize the type of interventions that would be most beneficial. For example, most rural "poor areas" typically are characterized by remote ness--often linked to the lack of roads and communications infrastructure. One of the most common donor-policy responses is to build roads into poor areas. While undoubtedly bringing some benefits to remote communities. it is not necessarily the case that this is what is needed to unlock the growth potential of an area. In some countries. there is evidence that this may well be an appropriate response. 19 Still, historically much road bUilding in developing countries has been in response to local economic growth or at least in recognition of some growth potential (such as cash crops or mining), and it was not the main cause of growth. Alternatives. such as irrigation, health. or educational schemes may be more important for unlocking their potential. In any case, just doing a little is not going to help--in order for such areas to catch up, substantial levels of investment would be needed to lift them over the threshold. It may well be the case that creating opportunities for migration is a superior policy. This, however, also involves costs, and so may be difficult too. If these thresholds are substantial, then growth opportunities elsewhere may simply bypass many rural poor. But it is nevertheless the prime example of a potential rural poverty trap for which solutions have to be firmly considered in the context of rural- urban and other linkages, and a narrow focus on rural areas may be ineffective. Conclusions Contrary to most of the rest of the developing world, poverty levels have remained stagnant in Sub-Saharan Africa in the last few decades, in a context of slow growth. As most of the poor are living in rural areas, it may be concluded that agricultural growth and rural development policies have to be at the core of growth and poverty-reduction poliCies. In this paper, I have used a "macro-" (intersectoral) and microperspective to discuss some of the key concerns and issues involved in an African context. Using a framework based on Eswaran and Kotwal (1993b), and evidence on growth opportunities in Africa based on Ndulu and others (2008). I conclude that agri cultural growth is likely to be essential for landlocked and resource-poor econom ies in Africa, even though growth via agriculture is likely to be difficult. In other economies, in particular those with good locations for engagement into manufacturing exports, or resource-rich economies. agriculture is not the crucial constraint. Even if rural development policies are likely to be crucial for allowing Dercon 23 the gradual transformation of the economy, the pressure for agricultural growth to be an engine of growth is not present. This analysis is however based on well-functioning factor and product markets, allowing specifically the poor to take advantage of opportunities irrespective of their endowments. Perfect markets in rural settings are not the appropriate assump tion. There is considerable evidence that this implies that the poor may remain excluded of profitable opportunities to grow out of poverty, even if growth has picked up in the economy. potentially leading to poverty traps. Rural development policies targeted on the poor, and possibly including the stimulating of agricultural production by poor farmers, may then be part of the effort to make growth more inclusive. The mechanisms by which this exclusion may happen are well under stood, and I have discussed credit and insurance market failures, as well as spatial effects in the form of "poor areas." The evidence on the appropriate responses is still only emerging, and clearly more experimentation and research is needed. Notes Stefan Dercon is Professor of Development Economics at Oxford University and affiliated to the European Development Ketwork (EL'DK), the Bureau of Economic Analysis and Research on Development (BREAD) and the Centre for Economic Policy Research (CEPR): email address: stefan.dercon@economics.ox.ac.uk. This paper was completed as part of research funded by the UK Department for International Development (DFID) in the context of a research program on improv ing institutions for pro-poor growth. An earlier version was presented at the World Bank's workshop entitled "Frontiers in Practice: Reducing Poverty through Better Diagnostics" in 2006. DFID and the World Bank are not responsible for the views expressed. I am grateful to Mark Koyama, Andrew Zeitlin, the editor, and three anonymous referees for very helpful comments on an earlier version of this paper. 1. This estimate is based on a poverty line of $1.08 a day in 1993 PPP. applied with a rural urban correction based on cost-of-living differences between rural and urban areas. 2. As the definition of what constitutes an urban area is not standardized across the world's stat istical offices, some caution is needed with these statements. 3. East Asia includes here the Pacific. Data from the World Development Indicators for 1990 2000 suggest 8.5 percent overall growth in East Asia and Pacific and 5.6 percent growth in South Asia, with respectively 3.4 and 3.1 percent growth in agricultural GDP. Growth rates post-2000 are not dissimilar. although overall growth in Sub-Saharan Africa appears to have picked up somewhat. 4. In the analysiS of the World Development Report. the authors refer to agricultural-based econ omies as those with high shares of GDP and labor in agriculture. The geographical composition of this group in their report is such that 82 percent of the Sub-Saharan African rural population is in this group and that the share of Africans in the total population of this group is about 90 percent. In short, their analysis and prescriptions for agriculture-based countries are effectively about Africa. 5. The World Development Report 2008 argues in this respect that rural investment did take place in Korea, but much earlier. in the first part of the 20th century. 6. Further 'classic' treatments are in Ranis and Fei (1964) and Jorgenson (1961). 7. For an excellent exposition of the Lewis model, see Ray (1998). 8. This assumption concerning perfect factor markets is crucial and will be challenged in the next section, qualifying some of the results in this section. Although offering a rather different 24 The World Bank Research Observer. vol. 24, no. 1 (February 2009) model from the one discussed here. Jorgensen (1961) can be seen as a predecessor, as he also brought in neoclassical market clearing assumptions and issues related to food consumption as a way of characterizing his 'dual economy' modeL Ranis and Fei (1964) can be credited with offering some of the general equilibrium analysis presented in EW as well. including the impact of trade. although they used a dual economy setting rather than competitive factor markets, 9. This assumption can be questioned. However, as one purpose of this framework is to explain why in some contexts, it may not be crucial to focus on agriculture to get growth and poverty reduction, this assumption actually biases our arguments in favor of a focus on agriculture, as surplus labor implies that labor can be taken out of agriculture into the urban sector without any impact on total food supply. 10. The current high agricultural commodity prices across the world offer such an opportunity for agriculture. although the long-run level of incentives offered is harder to predict. 11. Of course. high-value agricultural activities. such as flowers, fruits, or vegetables in Kenya. are also effective means for taking advantage of locational and other advantages. Air transport is a (possibly increasingly) expensive means of transporting exports. making location sufficiently near to ports a continuing necessity for a trade-oriented growth model; which is not a straightforward option for landlocked countries such as Ethiopia. 12. This type of research is carefully discussed in detail in Bardhan and Udry (1999) and Mookherjee and Ray (2001). 13. Ray (1998) provides an excellent entry point to this literature. 14. In this discussion, I use the issue of collateral as a heuristic device to show the differential impact on the poor of credit market failures. without arguing that this is necessarily the key failure. Most other failures in credit markets can be shown to result in a specific disadvantage for poorer households. A useful discussion of different market failures in rural credit markets (and what to do about it) is in Besley (1994) and in Ray (1998). 15. See Banerjee (2003) for a formal poverty-trap model building on this idea. 16. For a helpful review of targeted transfers and safety nets, see Ravallion (2003). For a review of a broad set of means of offering "insurance" to the poor. see Dercon (2003) and the contributions therein. Of particular interest are initiatives to offer rainfall insurance. including in Africa. where even initial evaluations are yielding surprising results (Gine and Yang 20(7). 17. For a review. see Kanbur and Venables (2005). Lack of convergence between rural and urban areas in a number of wealth indicators across 12 African countries is found in Sahn and SHfel (2003). 18. Jalan and Ravallion (2002) identified geographic poverty traps in rural China during the 1980s. finding that community characteristics affect the income growth performance of otherwise identical individuals. controlling for latent heterogeneity. These results show that in some areas living standards were falling while elsewhere otherwise identical households were enjoying rising living standards. an effect entirely due to externalities from the initial community characteristics. 19. Jalan and Ravallion (2002) presented evidence for China that roads are relevant for growth. Microdata from Ethiopia using a much smaller sample suggest similar effects. that is growth effects from levels of infrastructure (Dercon 2004). References Aghion. E, and E Bolton. 1997. ',\ Trickle-Down Theory of Growth and Development with Debt Overhand." Review of Economic Studies 64(2}:151-62. Allen, R.C. 1999. "Tracking the Agricultural Revolution in England." Economic History Review 52(2):209-35. Amsden, A. H. 1989. Asia's Next Giant: Soutll Korea and Late Industrialization. New York: Oxford University Press. Dercon 25 Armendariz de Aghion. B.. and J. Morduch. 2005. The Economics of Microfinance. Cambridge. Mass: MIT Press. Banerjee. A. 2003. "The Two Poverties." In S. Dercon ed .. Insurance against Poverty. Oxford: Oxford University Press. Banerjee. A., and A. Newman. 1993. "Occupational Choice and the Process of Development." Journal of Political Economy 101 (2):2 74-98. Bardhan, P., and C. Udry. 1999. Development Microeconomics. Oxford: Oxford University Press. Barrett, C. 2005. "Rural Poverty Dynamics: Development Policy Implications." Agricultural Economics 32(1):45-60. Barrett, CoB.. M. Bezuneh, D.Co Clay, and T. Reardon. 2005. "Heterogeneous Constraints. Incentives and Income Diversification Strategies in Rural Africa." Quarterly Journal of International Agriculture 44(1):37-60. Benabou, R. 1996. "Inequality and Growth." In B. Bernanke. and J. Rotemberg eds .. National Bureau of Economic Research lvfacroeconomics Annual. Cambridge: MIT Press. pp. ] 1-74. Besley, T.. 1994. "How Do Market Failures Justify Intervention in Rural Credit Markets?" World Bank Research Observer 9(1):27-47. Binswanger, H.. K. Deininger, and G. Feder. 1995. "Power, Distortions. Revolt and Reform in Agricultural and Land Relations." In J. Behrman, and T.N. Srinivasan eds.. Handbook of Development Economics. vol. 3. Amsterdam: North-Holland. Christiaensen. L.. L. Demery, and S. Paternostro. 2005. "Reforms. Remoteness and Risk in Africa: Understanding Inequality and Poverty During the 1990s." In R. Kanbur. and A.J. Venables eds., Spatial Inequality and Development. Oxford: Oxford University Press. Clark, G. 2002. "The Agricultural Revolution and the Industrial Revolution, England 1500-1912." Mimeo. June. nc. Davis Department of Economics. Collier, P. 2007. The Bottom Billion, Why the Poorest Countries are Failing and What Can Be Done About It. New York: Oxford University Press. Collier. P.. and A.J. Venables. 2007. "Rethinking Trade Preferences: How Africa Can Diversify its Exports." The World Economy 30(8):1326-45. Crafts. N. 1985. British Economic Growth During the Industrial Revolution. Oxford: Clarendon Press. Datt, G.. and M. Ravallion. 1998. "Why Have Some Indian States Done Better Than Others at Reducing Rural Poverty?" Economica 65(257):17-38. ___. 2002. "Is India's Economic Growth Leaving the Poor Behind?" Journal of Economic Perspectives 16(3):89-108. Dercon, S. 1998. "Wealth. Risk and Activity Choice: Cattle in Western Tanzania." Journal of Development Economics 55(1):1-42. ___. 2002. "Income Risk. Coping Strategies and Safety Nets." World Bank Research Observer 17: 141-66. ___. 2003. Insurance against Povertu Oxford: Oxford University Press and WIDER. ___. 2004. "Growth and Shocks: Evidence from Rural Ethiopia." Journal of Development Economics 74(2):309-29. Dercon, S., and L. Christiaensen. 2007. "Consumption Risk. Technology Adoption and Poverty Traps: Evidence from Ethiopia." World Bank Policy Research Working Paper 4257. Dercon, S.. and J. Hoddinott. 2003. "Health. Shocks and Poverty Persistence." In S. Dercon ed .. Insurance against Povertu Oxford: Oxford University Press. Dorosh, P., and S. Haggblade, 2003. "Growth Linkages. Price Effects and Income Distribution in Sub-Saharan Africa." Journal of African Economies 12(2):207-35. 26 Tile World Bank Researcll Observer, vol. 24. riO. 1 (February 2(09) Elbers. C.· J.w. Gunning. and B. Kinsey. 2007. "Growth and Risk: Methodology and Micro Evidence." The World Bank Economic Review 21 (1): 1-20. Eswaran. M.. and A. Kotwa!. 1986. '1\.ccess to Capital and Agrarian Production Organisation." Economic Journal 96:482-98. _~~_. 199 3a. "Export Led Development: Primary vs. Industrial Exports." Journal oj Development Economics 41, July:163-72. ___. 1993b. '1\ Theory of Real Wage Growth in LDCs." Journal oj Development Economics(42 , December:243-69. 1994. Why Poverty Persists in India. New Delhi: Oxford University Press. _ _. 2002. "The Role of Service Sector in the Process of Industrialization." journal oj Development Economics(68):401-20. Fafchamps, M. 1992. "Solidarity Networks in Preindustrial Societies: Rational Peasants with a Moral Economy." Economic Development and Cultural Change 4] (October): 147-74. Foster. A.. and M.D. Rosenzweig. 2004. 'i\gricultural Productivity Growth, Rural Economic Diversity, and Economic Reforms: India. 1970-2000." Economic Development and Cultural Change 52(3):509-42. Fujita. M.. P. Krugman. and A.J. Venables. 1999. The Spatial Economy: Cities. Regions. and International Trade. Cambridge. Mass.: MIT Press. Galor, 0 .. and J. Zeira. 1993. "Income Distribution and Macroeconomics." Review oj Economic Studies 60:35-52. Gantham. G. 1989. '1\.gricultural Supply During the Industrial Revolution: French Evidence and European Implications." The Journal oj Economic History 49(1):43-72. Gine. X .· and D. Yang. 2007. "Insurance. Credit. and Technology Adoption: Field Experimental Evidence from Malawi." World Bank Policy Research Working Paper 4425. Jalan, J., and M. Ravallion. 2002. "Geographic Poverty Traps? A Micro Model of Consumption Growth in Rural China." journal oj Applied Econometrics 17:329-46. ~__. 2003. "Household Income Dynamics in Rural China." In S. Dercon ed .. Insurance against Poverty. Oxford: Oxford University Press. Janvry, A. de. E. Sadoulet, and R. Murgai. 2002. "Rural Development and Rural Policy." In B. GardnerG. Rausser (eds.), Handbook oj Agricultural Economics. vo!. 2. A. Amsterdam: North Holland: 1593-658. Johnston, B.E. and J.w. Mellor. 1961 "The Role of Agriculture in Economic Development." American Economic Review 51(4):566-93. Jorgenson. D.W. 1961. "The Development of a Dual Economy." Economic journal 71(282):309-34. R. Kanbur, and A.J. Venables eds. 2005. Spatial Inequality and Development. Oxford: Oxford University Press. Karlan, D.. and N. Goldberg. 2006. "The Impact of Microfinance: A Review of Methodological Issues." Mimeo. Poverty Action Lab and Yale University, Karlan, D.. and S. Mullainathan. 2007. "Is Microfinance Too Rigid?" Mimeo. Poverty Action and Yale University. Lewis, A.\v' 1954. "Economic Development with Unlimited Supplies of Labor." Manchester School 139-91. .. Mellor, J 1999. "Faster. More Equitable Growth: The Relation between Growth in Agriculture and Poverty Reduction." Agricultural Policy Development Project Research Report 4. Abt Associates Inc.· Cambridge. Mass. Mookherjee. D., and D. Ray. 2001. Readings in the Theory oj Economic Development. Oxford: Blackwell. Dercon 27 Morduch. ]. 1995. "Income Smoothing and Consumption Smoothing." Journal of Economic Perspectives 9 (Summer): 103-14. ___. 1999. "Between the State and the Market: Can Informal Insurance Patch the Safety Net?" World Bank Researclt Observer 14(2):187-207. B.]. Ndulu, S.A. O·Connell. R.H. Bates. P. Collier. and C.C. Soludo eds. 2008. Tlte Political Economy of Economic Growtlt in Africa. 1960- 2000. Cambridge: Cambridge University Press. Ranis. G.. and J.C.H. FeL 1964. Development of tlte Surplus Economy: Tlteory and Policy Homewood. Ill.: R.D. Irwin for the Economic Growth Center. Yale University. Ravallion. M. 2003. "Targeted Transfers in Poor Countries: Revisiting the Trade-Offs and Policy Options." World Bank Policy Research Working Paper 3048. Ravallion. M.. and S. Chen. 2007. "China's (Uneven) Progress Against Poverty." Journal of Development Economics 82(1):1-42. Ravallion. M.. and G. Datt. 1996. "How Important to India's Poor is the Sectoral Composition of Economic Growth?" World Bank Economic Review 10:1-26. 2002. "Why Has Economic Growth Been More Pro-poor in Some States of India than Others." Journal of Development Economics 68(2002):381-400. Ravallion. M.. S. Chen. and P. Sangraula. 2007. "New Evidence on the Urbanization of Global Poverty." World Bank Policy Research Paper 4199. Ray. D. 1998. Development Economics. Princeton: Princeton University Press. Rosenzweig. M.. and H. Binswanger. 1993. "Wealth. Weather Risk and the Composition and Profitability of Agricultural Investments." Economic Journal 103:56-78. Rosenzweig. M.. and K. Wolpin. 1993. "Credit Market Constraints. Consumption Smoothing. and the Accumulation of Durable Production Assets in Low-income Countries: Investment in Bullocks in India." Journal of Political Economy 101(2):223-44. Sachs. J. 2005. The End of Poverty: Economic Possibilities for Our Time. New York: Penguin Press. Sahn, D.. and D. Stirel. 2003. "Urban-Rural Inequality in Living Standards in Africa." Journal of African Economies 12(1):564-97. Soderbom. M.. F. 'leal. A. Wambugu. and G. Kahyarara. 2006. "The Dynamics of Returns to Education in Kenyan and Tanzanian Manufacturing." Oxford Bulletill of Economics and Statistics 68(3):261-88. Staatz. J.. and N.N. Dembele. 2007. "Agriculture for Development in Sub-Saharan Africa." Background paper for the World Development Report, World Bank. Timmer. P.C. 2002. 'j\griculture and Economic Development." In B. Gardner. and G. Rausser eds. Handbook of Agricultural Economics. vol. 2A, Amsterdam: North Holland. 2007. 'i\.griculture and Economic Growth." In D.A. Clark ed .. Tlte Elgar Companion to Development Studies. Cheltenham: Edward Elgar. World Bank. 2005a. World Development Report 2006: Equity and Development. Washington D.C.: Oxford University Press for the World Bank. ___. 2005b. World Development Indicators. Washington D.C.: Oxford University Press for the World Bank. ___. 2007. World Development Report 2008: Agriculture for Development. Washington D.C.: Oxford University Press for the World Bank. 28 Tile World Bank Researcll Observer. vol. 24. no. 1 (February 2009) Evaluation in the Practice of Development Martin RavaLlion Standard methods of impact evaluation often leave significant gaps between what we know about development effectiveness and what we want to know-gaps that stem from distortions in the market for knowledge. The author discusses how evaluations might better address these knowledge gaps and so be more relevant to the needs of prac titioners. It is argued that more attention needs to be given to identifying policy-relevant questions (including the case for intervention), that a broader approach should be taken to the problems of internal validity (including heterogeneity and spillover effects), and that the problems of external validity (including scaling up) merit more attention by researchers. JEL codes: H43, 022 Anyone who doubts the potential benefits to development practitioners from evaluation should study China's economic reforms. In 1978. the Communist Party's 11th Congress broke with its ideology-based view of policymaking in favor of a pragmatic approach, which Deng Xiaoping famously dubbed "feeling our way across the river." 1\t its core was the idea that public action should be based on evaluations of experiences with different policies-Hthe intellectual approach of seeking truth from facts" (Du 2006, p. 2). In looking for facts, a high weight was put on demonstrable success in actual policy experiments on the ground. The first major application was to rural reform. While there had been much dissatisfaction with collectivized farming. there were competing ideas as to what needed to be done. The evidence from local experiments was eventually instrumental in per suading even the old guard of the Party's leadership (many of whom still favored collectivized farming) that household contracts could deliver higher food output. The evidence had to be credible. A new research group did field work studying the local experiments-though they were certainly not randomized exper iments-in using contracts with individual farmers. The evidence might not be The Author 2009. Published by Oxford University Press on behalf of the International Bank for Reconstruction and Development / THE WORUJ BANK. All right.s reserved. For permissions. please e-mail: journals.permissions@oxfordjournals.org doi;10.1093/wbro/lkp002 Advance Access publication March 25.2009 24:29-53 conclusive by today's scientific standards, but it helped to convince skeptical pol icymakers (many still imbued in Maoist ideology) of the merits of scaling up the local initiatives (Luo 2007). The rural reforms that were then implemented nationally helped achieve what was probably the most dramatic reduction in the extent of poverty the world has yet seen. Unfortunately we still have a long way to go before we will be able to say that this story from China is typical of development policy making elsewhere. (And China still has much that it could do to enhance the credibility of its own efforts at evidence-based policymaking.) In this paper I argue that we underinvest in rig orous evaluations of development interventions and that the evaluations that are done currently are not as useful as they could be. Distortions in the "market for knowledge" about development effectiveness leave persistent gaps between what we know and what we want to know; and the learning process is often too weak to guide practice reliably. The outcome is almost certainly one of less overall impact on poverty. I first try to understand how the gaps in our knowledge about development effectiveness come to exist and persist. I then identify a number of things that need to change in current approaches to evaluation if the potential to inform development practice is to be fulfilled. Examples are given from recent research, although a number of these issues remain under-researched. I hope that the dis cussion will help to change that. Why Might We Underinvest in Rigorous Evaluations? The focus of this paper is the problem of assessing the impact of a development project, where "impact" is measured against explicit counterfactual outcomes (such as in the absence of the project); the essential characteristic of a rigorous evaluation is that it includes a credible strategy for identifying the counterfactual. The topic embraces both ex ante and ex post evaluation (and possibly both for the same project). Ex ante evaluation is a key input to project appraisal. Ex post evalu ation can sometimes provide useful insights into how a project might be modified along the way, and is certainly a key input to the accumulation of knowledge about development effectiveness. which guides future policymaking. There are good reasons why not everything that is done in the name of devel opment gets evaluated. Rigorous evaluations are rarely easy. Practical and logisti cal difficulties abound. Special-purpose data collection and close supervision are typically required. The analytic and computational demands for valid inferences can also be daunting and require specialized skills. 30 The World Bank Research Observer, vol. 24. no. 1 (February 2009) Knowledge-Market Failures However. there are also reasons to doubt that the market for knowledge about development effectiveness works well. The outcome of these market failures is almost certainly that we underinvest in rigorous impact evaluations. Suppliers and demanders of knowledge about development effectiveness do not typically have the same information about the quality of the evaluation-giving an example of what economists call "asymmetric information," which is a well known source of market failure. 1 In the present context, development prac titioners cannot easily assess the quality and expected benefits of an impact evalu ation in order to weigh them against the costs. Short-cut methods promise quick results at low cost, though rarely are users well informed of the inferential dangers. 2 Since it is often hard for practitioners to know whether research is of good quality or not, there is a real risk that rigorous evaluations are driven out by nonrigorous ones of doubtful veracity. Another important feature of this market is the degree of control that individ ual project "managers" (including staff in both aid agencies and governments) have over what gets evaluated and how much is spent on evaluation. This can be thought of as a noncompetitive feature of the market for knowledge about devel opment effectiveness. in that the project manager more or less has the power to block the supply of knowledge. The decision about whether resources should be invested in data and research on a specific project or policy is often made by (or heavily influenced by) the individual practitioners involved. or by political stake holders whose incentives need not be well-aligned with knowledge demands. The portfolio of evaluations is almost certainly biased towards programs that work well; managers of weak programs try to avoid rigorous evaluation, which threa tens to expose the program's weaknesses. Lighter "evaluations" are often easier to manipulate for the purpose of showing seemingly positive results. Decentralized decision-making about evaluation generates another source of market failure: the benefits from the rigorous evaluation of a development project spill over to other projects, which typically do not share in the cost of doing that evaluation. Development is a learning process, in which future practitioners benefit from current research. The individual project manager will typically not take account of these external benefits when deciding how much to spend on evaluation. This is what economists call an "externality." An implication of the externalities in the market for knowledge about development effectiveness is that we tend to underinvest in research that can draw useful lessons for other projects and settings besides that of the specific evaluation. Certain types of evaluations are likely to be more prone to these sources of market failure. It is typically far easier to evaluate an intervention that yields all its likely impacts within 1 year (say) than an intervention that takes many years. Rava/Jion 31 It can be no surprise that credible evaluations of the longer term impacts of (for example) infrastructure projects are rare. Similarly, we know very little about the long-term impacts of development projects that do deliver short-term gains; for example, we know much more about the short-term impacts of transfer payments on the current nutritional status of children in recipient families than about the possible gains in their longer term productivity from better nutrition in childhood. So future practitioners are often poorly informed about what works and what does not. There is a "myopia bias" in our knowledge. favoring development pro jects that yield quick results. We probably also underinvest in evaluations of types of interventions that tend to have diffused, widespread benefits. Impacts for such interventions are often harder to identify than for cleanly assigned programs with well-defined benefici aries, since one typically does not have the informational advantage of being able to observe nonparticipants (as the basis for inferring the counterfactual). It may also be hard to fund evaluations for such interventions, since they often lack a well-defined constituency of political support. The implication of all this is that, without strong institutional support and encouragement, there will probably be too few evaluations, particularly of the long-term impacts of development interventions and of broader sectoral or economy-wide reforms. And the evaluations that do get done will focus too much on internal validity (whether valid inferences are drawn about the impact of that specific project in its specific setting) relative to external validity (whether valid inferences are drawn for other projects, either as scaled up versions of that project in the same setting or as similar projects in different settings). The fact that long term evaluations are so rare (though it is widely agreed that development does not happen rapidly) and that we clearly know too little about external validity suggest that the available support is currently insufficient or it is misallocated. Rising Support for Evaluations Increasingly evaluations do receive support beyond what is demanded by the immediate practitioners. There has been substantial grmvth in donor support for impact evaluations in recent years. Donor governments are increasingly being pressed by their citizens to show the impact of development aid which has gener ated extra resources for financing impact evaluations. Unfortunately, the resources available are not alvllays used for making rigorous evaluations. And it is not clear that the extra resources are having as much impact as they could on the incen tives facing project managers and governments. Donor support needs to focus on increasing marginal private benefits from evaluation or reducing marginal costs. Nonetheless, there is now a broader awareness of the problems faced when trying to do evaluations, including the age-old problem of identifying "causal" impacts. 32 The World Bank Research Observer, vol. 24. no. 1 (February 2009) This has helped make donors less willing to fund weak proposals for evaluations that are unlikely to yield reliable knowledge about development effectiveness. What does get evaluated, however, is still only a modest fraction of what gets done on the ground in the name of development. That may always be the case. given the costs of evaluation. But what is more worrying is that this fraction is a decidedly nonrandom one. Typically, a self-selected sample of practitioners approaches the funding sources, often with researchers already in tow. This process is likely to favor projects and policies that are expected to have benefits by their advocates. All this makes it very important that new efforts by the development commu nity to support impact evaluations of development policies-to address the market failures discussed above--should start from those knowledge gaps, not from a researcher's prior preference for one sort of data or method. That is not always the case. For example, while the recent enthusiasm for Randomized Control Trials (RCTs) (also called social experiments)-see, for example. Banerjee (2007) and Duflo and Kremer (200S)-has generated some interesting new research, it is not based on any clear strategic assessment of how this particular method would fill the knowledge gaps of highest priority. Nor is there any obvious reason why doing more social experiments would help correct for the distortions that generated those knowledge gaps. Randomization is clearly only feasible for a nonrandom subset of policies and settings; for example, it is rarely feasible to randomize the location of infrastructure projects and related programs, which are core activities in almost any poor country's development strategy. And even for the types of pro grams for which randomization is an option. it will be adopted more readily in some settings than others, given that social experiments raise ethical and political concerns-stemming from the fact that some of those to which a program is ran domly assigned will almost certainly not need it, while some in the control group will. A better idea would be to randomize what gets evaluated rigorously and then choose a method appropriate to each sampled intervention. with randomiz ation as one option. The rest of this article will explore how we might assure that future work on impact evaluation is more relevant to the needs of development practitioners. While better approaches to evaluation will not, on their own. solve all the pro blems in the market for knowledge discussed above. recognizing those problems is the logical starting point for thinking about what constitutes better evaluation. How Can We Do Better in Filling Key Knowledge Gaps? The archetypal formulation of the evaluation problem aims to estimate the average impact on those to which a specific program is assigned (the participants) Ravallion 33 by attempting to infer the counterfactual from those to which it is not assigned (nonparticipants). While this is an undeniably important and challenging problem, solving it is not sufficient for assuring that evaluation is relevant to development practice. Questions for Evaluations Evaluations should not take the intervention as predetermined, but must begin by probing the problem that a policy or project is addressing. Why is the intervention needed? How does it. relate to overall development goals, such as poverty reduction? What are the market, or governmental, failures it addresses? What are its distributional goals? What are the trade-offs with alternative (including exist ing) policies or programs? As Devarajan and others (1997) argue, researchers can often play an important role in addressing these questions. This involves more precise identification of the policy objectives (properly weighing gains across different subgroups of a population and different generations); the relevant con straints, which include resources. information. incentives. and political economy constraints; and the causal links through which the specific intervention yields its expected outcomes. This role in conceptualizing the case for intervention can be especially import ant when the capacity for development policymaking is weak or when it is cap tured by lobby groups advocating narrow sectoral interests. The ex ante evaluative role for research can also be crucial when practitioners have overly strong prior beliefs about what needs to be done. Over time, some practitioners become experts at specific types of intervention, and some may even lobby for those inter ventions. The key questions about whether the intervention is appropriate in the specific setting may not even get asked. Evaluators themselves can also become lobbyists for their favorite methods. Too often it is not the question that is driving the evaluation agenda but a preference for certain types of data or certain methods; the question is then found that fits the methodology. not the other way around. Starting with the question. not the method. often points the evaluator toward types of data and methods outside the domain traditionally favored by his or her own disciplinary background. For example, some of the World Bank's research economists trying to understand per sistent poverty and the impacts of antipoverty programs have been drawn to the theories and methods favored in other social sciences. such as anthropology. soci ology, and social psychology; see, for example, the collection of papers in Rao and Walton (2004). Good researchers, like good detectives. assemble, and interpret diverse forms of evidence in testing empirical claims. As already noted, rigorous impact evaluations require credible strategies for identifying the counterfactual-taking proper account of the likely sources of 34 The World Bank Research Observer, \'01. 24. no. 1 (February 2009) bias, such as when outcomes are only compared over time for program partici pants. or when participants and nonparticipants are compared at only one date; see Ravallion (2008) for a survey of the (experimental and nonexperimental) methods available for this task. This is all about internal validity; which has been the main focus of researchers working on evaluations. In this discussion I will flag some issues that have received less attention yet matter greatly to the impact of an evaluation. The choice of counterfactual is one such issue. The classic evaluation focuses on counterfactual outcomes in the absence of the program. This counterfactual may fall well short of addressing the concerns of policymakers. The alternative of doing nothing is rarely of interest to policymakers. who prefer instead to spend the same resources on some other program (possibly a different version of the same program). A specific program may appear to perform well against the option of doing nothing. but it is still performing poorly against some feasible alternative. For example. in an impact evaluation of a workfare program in India. Ravallion and Datt (1995) showed that the program substantially reduced poverty among the participants relative to the counterfactual of "no program." but that once the costs of the program were factored in (including the foregone income of workfare participants) the alternative counterfactual of a uniform (untargeted) allocation of the same budget outlay would have had more impact on poverty. Formally; the evaluation problem is essentially no different if some alternative program is the counterfactual; in principle we can repeat the analysis relative to the "do nothing counterfactual" for each possible alternative and compare them. But this is rare in practice. Nor is it evident that the classic formulation of the impact evaluation problem yields the most relevant impact parameters. For example. there is often an interest in better understanding the horizontal impacts of a program. that is the differences in impacts at a given level of counterfactual outcomes. as revealed by the jOint distribution of outcomes under treatment and outcomes under the counterfac tual. We cannot know this from a standard impact evaluation, which only reveals net counterfactual mean outcomes for those treated. Instead of focusing solely on the net gains to the poor (say) we may ask how many losers there are among the poor. and how many gainers. Counterfactual analysis of the joint distribution of outcomes over time is useful for understanding impacts on poverty dynamics. This approach is developed in Ravallion and others (1995) for the purpose of measuring the impacts of changes in social spending on the intertemporal jOint distribution of income. Instead of only measuring the impact on poverty (the marginal distribution of income) the authors exploit panel data to distinguish impacts on the number of people who escape poverty over time (the "promotion" role of a safety net) from impacts on the number who fall into poverty (the "protection" role). (This is only possible if RavaIlion 35 one can identify how impacts vary with household characteristics; the discussion will return to this issue in discussing impact heterogeneity below.) Ravallion and others apply this approach to an assessment of the impact on poverty transitions of reforms in Hungary's social safety net. Spillover Effects A further way in which the classic impact evaluation problem often needs to be adapted to the needs of practitioners concerns its assumption that impacts for direct participants do not spill over to nonparticipants. Only under this assump tion can we infer the counterfactual from an appropriate sample of the nonparti cipants. Spillover effects are recognized as a concern in evaluating large public programs for which contamination of the control group can be hard to avoid due to the responses of markets and governments; spillover are also relevant in drawing lessons for scaling up based on an RCT. For further discussion, see Moffitt (2003, 2(06). An example of spillover effects can be found in the Miguel and Kremer (2004) study of treatments for intestinal worms in children. The authors argue that an evaluation design. in which some children are treated and some are retained as controls, would seriously underestimate the gains from treatment by ignoring the externalities between treated and "control" children. The design for the authors' own evaluation avoided this problem by using mass treatment at the school level instead of individual treatment (using control schools at sufficient distance from treatment schools). Spillover effects can also arise from the way markets respond to an interven tion. Consider the example of an Employment Guarantee Scheme (EGS) in which the government commits to give work to anyone who wants it at a stipulated wage rate; this was the aim of the famous EGS in the Indian state of Maharashtra; in 2006 the Government of India implemented a national version of this scheme. The attractions of an EGS as a safety net stem from the fact that access to the program is universal (anyone who wants help can get it) but that all participants must work to obtain benefits and at a wage rate that is considered low in the specific context. The universality of access means that the scheme can provide effective insurance against risk. The work requirement at a low wage rate is taken by proponents to imply that the scheme will be self-targeted to the income poor. The EGS is an assigned program in that there are well-defined "participants" and "nonparticipants." And at first glance it might seem appropriate to collect data on both groups and compare their outcomes either by random assignment or after cleaning out observable heterogeneity. However, this classic evaluation design could give a severely biased result. The gains from such a program are 36 The World Bank Research Observer. vol. 24. no. 1 (February 2009) very likely to spill over into the private labor market. If the employment guarantee is effective then the scheme will establish a firm lower bound to the entire wage distribution-assuming that no able-bodied worker would accept non-EGS work at any wage rate below the EGS wage. So even if one picks a perfect comparison group, one will conclude that the scheme has no impact, since wages will be the same for participants and nonparticipants. But that would entirely miss the impact, which could be large for both groups. Spillover effects can also arise from the behavior of governments. Chen and others (2009) find evidence of such spillover effects in their evaluation of a World Bank-supported poor-area development program in rural China. When the program selected certain villages to participate, the local government withdrew some of its own spending on development projects in those villages, in favor of non program villages-the same set of villages from which the comparison group was drawn. Ignoring these spillover effects generated a nonnegligible underesti mation of the impact of the program. Chen and others show how, under certain assumptions, one can estimate the maximum bias due to the specific type of spil lover effects that arises from local government spending responses to external development aid. In the case of the poor-area program in China that Chen and others study, their results suggest that the spending responses of local govern ments to the external aid entail that the standard "difference-in-difference" method may well capture only two-thirds of the true impact. Heterogeneity Practitioners should never be happy with an evaluation that assumes common (homogeneous) impact. The impact of an assigned intervention can vary across those receiving it. Even with a constant benefit level. eligibility criteria entail differential costs to participants. For example, the foregone labor earnings incurred by participants in workfare or conditional cash transfer schemes (via the loss of earnings from child labor) will vary according to skills and local labor market conditions. By recognizing the scope for heterogeneity in impacts and the role of contex tual factors. one can make evaluative research more relevant to good policymak ing. For example, in the aforementioned evaluation of a poor-area development program in rural China. Chen and others (2009) find low overall impact but con siderable heterogeneity, in that different types of households benefited more than others. with the relatively better educated amongst the poor achieving the highest returns to the project's investments. The policy implication is that choosing differ ent beneficiaries would have greatly increased the project's overall impact; indeed, the study estimated that an alternative process of beneficiary selection that better exploited the heterogeneity in impacts could have led to a four-fold increase in the Ravallion 37 project's overall rate of return. By developing a deeper understanding of such het erogeneity, evaluations can help develop better projects. Heterogeneity of impacts in terms of observables is readily allowed for by adding interaction effects between the intervention and observables to one's model of outcomes. However, not all sources of heterogeneity are observable, and participants and stakeholders often react to factors unobserved by the researcher-confounding efforts to identify true impacts using standard methods, including experiments; this is what Heckman and others (2006) refer to as "essential heterogeneity." With some extra effort, one can also allow for latent heterogeneity in the impacts of an intervention (using a random coefficients esti mator in which the impact estimate contains a stochastic component). Applying this approach to the evaluation data for PROGRESA (a conditional cash transfer program in Mexico), Djebbari and Smith (2008) found that they could convin cingly reject the assumption of common (homogeneous) effects made by past evaluations of that program. When there is such heterogeneity, it can be of interest to policymakers to dis tinguish marginal impacts (from small program expansions or contractions) from the average impacts that have received the bulk of attention. Following Bjorklund and Moffitt (1987), the marginal treatment effect can be defined as the mean gain to units that are indifferent between participating or not. This requires that we model explicitly the choice problem facing participants (Bjorklund and Moffitt 1987; Heckman and Navarro-Lozano 2004). We may also want to estimate the jOint distribution of outcomes under treatment and outcomes under the counter factual, and a method for doing so is outlined in Heckman and others (1997). External Validity Arguably the most important thing to learn from any evaluation relates to its lessons for future policies (including reforms to the interventions being evaluated). External validity is highly desirable, but it can be hard to achieve. We naturally want research findings to have a degree of generalizability, so they can provide useful knowledge to guide practice in other settings. Thus empirical researchers need to focus on why a policy or program has an impact; a question to which I will return. However, too often impact evaluations are a "black box"; under certain assumptions, they reveal average impacts among those who receive a program, but say little or nothing about the economic and social processes leading to that impact. And only by understanding those processes can we draw valid lessons for scaling up, or for taking the same project to other settings. Research that tests the theories that underlie the rationales for intervention can thus be useful in practice. 38 The World Bank Research Observer. vol. 24, no. 1 (February 2()09) When the policy issue is whether to expand a given program at the margin, the classic estimator of mean impact is actually of rather limited interest. For example, we may want to know the marginal impact of a greater duration of exposure to the program. An example can be found in the study by Ravallion and others (2005) of the impacts on workfare participants of leaving the program relative to staying (recognizing that this entails a nonrandom selection process). Another example can be found in the study by Behrman and others (2004) of the impacts on children's cognitive skills and health status of longer exposure to a preschool program in Bolivia. The authors provided an estimate of the marginal impact of higher program duration by comparing the cumulative effects of differ ent durations using a matching estimator. In such cases. selection into the program is not an issue, and we do not even need data on units who never participated. Relatedly. one must recognize the importance of context since this can be key to drawing valid lessons for other settings. Relevant contextual factors may include the circumstances of participants, the economic. cultural, and political environ ment. and the administrative context. Unless we understand how such factors influence the outcomes of an intervention. the evaluation will have weak external validity. The next section returns to this issue. Given that we can expect in general that any intervention will have hetero geneous impacts-some participants gain more than others-serious concerns can arise about the external validity of RCTs. The people who are normally attracted to a program, taking account of the expected benefits and costs to them personally, may differ systematically from the random sample of people who were included in the trial. 3 The RCT may well have evaluated a very different program to the one that is actually implemented on the basis of that RCT. External validity concerns about impact evaluations can also arise when certain institutions need to be presented to even facilitate the evaluations. For example. when randomized trials are tied to the activities of specific non-govern mental organizations (NGOs) as the facilitators. there is a concern that the same intervention at the national scale may have a very different impact in places where the NGO is not present. Making sure that the control group areas also have the NGO can help. but even then we cannot rule out interaction effects between the NGO's activities and the intervention. In other words, the effect of the NGO may not be "additive" but "multiplicative," such that the difference between measured outcomes for the treatment and control groups does not reveal the impact in the absence of the NGO. Furthermore, the very nature of the inter vention may change when it is implemented by a government rather than an NGO. This may happen because of unavoidable differences in (among other things) the quality of supervision, the incentives facing service providers, and administrative capacity. Rava11ioll 39 A further external validity concern is that, while partial equilibrium assump tions may be fine for a pilot, general equilibrium effects (sometimes called "feed back" or "macro" effects) can be important when the pilot is scaled up nationally. For example, an estimate of the impact on schooling of a tuition subsidy based on a randomized trial may be deceptive when scaled up, given that the structure of returns to schooling will alter. Heckman and others (1998) demonstrated that partial equilibrium analysis can greatly overestimate the impact of a tuition subsidy once relative wages adjust. although Lee (2005) found a much smaller difference between the general and partial equilibrium effects of a tuition subsidy in a slightly different model. A special case of the general problem of external validity is scaling up. There are many things that can change when a pilot program is scaled up: the inputs to the intervention can change, the outcomes can change, and the intervention can change; Moffitt (2006) gave examples in the context of education programs. The realized impacts on scaling up can differ from the trial results (whether random ized or not) because the socio-economic composition of program participation varies with scale. Ravallion (2004) discussed how this can happen in theory and presented the results from a series of country case studies. all of which suggest that the incidence of program benefits becomes more pro-poor with scaling up. Trial results could over- or underestimate impacts on scaling up. Larger projects may be more susceptible to rent seeking or corruption (as Deaton [2006] suggests); alternatively, the political economy may entail that the initial benefits tend to be captured more by the nonpoor (as shown by Lanjouw and Ravallion 1999, using data for India). Evaluative research should regularly test the assumptions made in operational work. Even field-hardened practitioners do what they do on the basis of some implicit model of how the world works, which rationalizes what they do, and how their development project is expected to have an impact. Existing methods of rapid ex ante impact assessment evidently also rely heavily on the models held by prac titioners. Researchers can perform a valuable role in helping to make those models explicit and (where possible) helping to assess their veracity. A case in point is the questionable assumption-routinely made by both project staff and evaluators-that the donor's money is actually financing what recipients claim it is financing. Research has pointed to a degree of fungibility in development aid. whereby the marginal use of public finds is unlikely to be the specific project that is being evaluated. Yet an assessment of "aid effectiveness" is (presumably) just that-an evaluation of the impact of the aid, not the project per se. These are different evaluation problems. Assessments of aid effectiveness need to take a broader view of public spending, as advocated by Devarajan and others (1997). How broad it needs to be is unclear. There is some evidence that external aid sticks to its sector (quaintly 40 The World Bank Research Observer. vol. 24, no. 1 (February 20(9) called a "flypaper effect" in economics); on this see van de Walle and Mu (2007). The existence of fungibility and flypaper effects points to the need for a sectoral approach in efforts to evaluate the impacts of development aid. What Determines Impact? The above discussion points to the need to supplement standard evaluations by information that can throw light on the factors influencing measured outcomes. That can be crucial for drawing useful policy lessons. including redesigning a program and scaling up. The relevant factors relate to both the participants (such as understanding program take-up decisions and how the outcomes are influ enced by participants' characteristics) and program context (such as understand ing how the quantity/quality of service provision affects outcomes and how the role of local institutions influences outcomes). This section elaborates some of the ways that we might learn more about how a program does, or does not, have an impact, so as to better address the issues raised above. An obvious approach to understanding which factors influence a program's performance is to repeat it across different types of participants and in different contexts. Duflo and Kremer (2005) and Banerjee (2007) have argued that repeated RCTs across varying contexts and scales should be used to decide what works and what does not in development aid. Even putting aside the aforemen tioned problems encountered in social experiments, the feasibility of doing a suffi cient number of trials-sufficient to span the relevant domain of variation found in reality for a given program, as well as across the range of policy options-is far from clear. The number of RCTs needed to test even one large national program could well be prohibitive. It is questionable whether this is a sound strategy for filling the existing gaps in our knowledge about deVelopment effectiveness. Nonetheless, even if one cannot go as far as Banerjee (2007) would like, it can be agreed that evaluation designs should plan for contextual variation. Important clues can often be found in the geographic differences in impacts. These can stem from geographic differences in relevant population characteristics or from deeper location effects, such as agro-climatic differences and differences in local insti tutions (such as local "social capital" or the effectiveness of local public agencies). An example can be found in the study by Galasso and Ravallion (2005) in which the targeting performance of Bangladesh's Food-for-Education program was assessed across each of 100 villages in Bangladesh, with the results being corre lated with the characteristics of those villages. The authors found that the revealed differences in performance were partly explicable in terms of observable village characteristics, such as the extent of intravillage inequality (with more unequal villages being less effective in reaching their poor through the program). Ravallion 41 Failure to allow for such location differences has been identified as a serious weakness in past evaluations; see for example the comments by Moffitt (2003) on trials of welfare reforms in the United States. The literature suggests that location is a key dimension of context. An impli cation is that it is less problematic to scale up from a pilot within the same geo graphic setting (with a given set of relevant institutions) than to extrapolate the trial to a different setting. In one of the few attempts to test how well evaluation results from one location can be extrapolated to another location, Attanasio and others (2003) divided the seven states of Mexico in which the PROGRESA evalu ation was done into two groups. They found that results from one group had poor predictive power for assessing likely impacts in the other group. Useful clues for understanding impacts can sometimes be found by studying impacts on what can be called "intermediate" or "structural" measures. The typical evaluation design identifies a small number of "final outcome" indicators, and it aims to assess the program's impact on those indicators. Instead of using only final outcome indicators, one may choose to also study impacts on certain intermediate indicators of behavior deemed relevant on theoretical grounds. For example, the intertemporal behavioral responses of participants in antipoverty programs are of obvious relevance to understanding their impacts. An impact evaluation of a program of compensatory cash transfers to Mexican farmers found that the transfers were partly invested, with second-round effects on future incomes (Sadoulet. de Janvry, and Davis 2001). Similarly, Ravallion and Chen (2005) found that participants in a poor-area development program in China saved a large share of the income gains from the program. Identifying responses through savings and investment provides a clue to understand the current impacts on living standards and the possible future welfare gains beyond the pro ject's current life span. Instead of focusing solely on the agreed welfare indicator relevant to the program's goals. one collects and analyzes data on a potentially wide range of intermediate indicators relevant to understanding the processes determining impacts. This also illustrates a common concern in evaluation studies, given behavioral responses, namely that the study period is rarely much longer than the period of the program's disbursements. However. a share of the impact on peoples' living standards will usually occur beyond the disbursement period. This does not necess arily mean that credible evaluations will need to track welfare impacts over much longer periods than is typically the case----raising concerns about feasibility. But it does suggest that evaluations need to look carefully at impacts on partial inter mediate indicators of longer term impacts even when good measures of the welfare objective are available within the project cycle. The choice of such indicators will need to be informed by an understanding of participants' behavioral responses to the program. That understanding will be informed by both theory and data. 42 The World Bank Research Observer, vol. 24. no. I (February 2009) In learning from an evaluation, one often needs to draw on information external to the evaluation. Qualitative research (intensive interviews with par tiCipants and administrators) can be a useful source of information on the underlying processes determining outcomes; see the discussion on "mixed methods" in Rao and Woolcock (2003). One approach is to use such methods to test the assumptions made by an intervention; this has been called "theory based evaluation," although that is hardly an ideal term given that identification strategies for mean impacts are often theory based. Weiss (2001) illustrated this approach in the abstract in the context of evaluating the impacts of community based antipoverty programs. An example is found in a World Bank evaluation of social funds (SFs). as summarized in Carvalho and White (2004). While the overall aim of an SF is typically to reduce poverty. the study was interested in seeing whether SFs worked as intended by their designers. For example, did local communities participate? Who participated? Was there "capture" of the SF by local elites (as some critics have argued)? Building on Weiss (2001), the evaluation identified a series of key hypothesized links connecting the interven tion to outcomes and tested whether each one worked. For example. in one of the country studies, Rao and Ibanez (2005) tested the assumption that an SF works by local communities collectively proposing the subprojects that they want; for an SF in Jamaica, the authors found that the process was often domi nated by local elites. In practice. it is very unlikely that all the relevant assumptions are testable (including alternative assumptions made by different theories that might yield similar impacts). Nor is it clear that the process determining the impact of a program can always be decomposed into a neat series of testable links within a unique causal chain; there may be more complex forms of interaction and simul taneity that do not lend themselves to this type of analysis. For these reasons, theory-based evaluation cannot be considered an alternative to assessing impacts on final outcomes by credible (experimental or nonexperimental) methods, although it can still be a useful complement to such evaluations for better under standing measured impacts. Project monitoring databases are an important. underutilized, source of infor mation for understanding how a program works. Too often. however, the project monitoring data collected and the information system used have negligible eva luative content. This is not inevitably the case. For example, RavaUion's (2000) method of combining spending maps with poverty maps can allow rapid assess ments of the targeting performance of a decentralized antipoverty program. This illustrates how, at modest cost, standard monitoring data can be made more useful for providing information on how the program is working. and in a way that provides sufficiently rapid feedback to a project to allow corrections along the way. RavaIliorl 43 The Proempleo experiment in Argentina provides an example of how infor mation external to the evaluation can carry important insights. Proempleo was a pilot wage subsidy and training program for unemployed workers. The ReT by Galasso and others (2004) randomly assigned vouchers for a wage subsidy across (typically poor) people currently in a workfare program and tracked their sub sequent success in getting regular work. A randomized control group located the counterfactual. The results indicated a significant impact of the wage-subsidy voucher on employment. But when cross-checks were made against central administrative data. supplemented by informal interviews with the hiring firms. it was found that there was very low take-up of the wage subsidy by firms. The scheme was highly cost effective: the government saved 5 percent of its workfare wage bill for an outlay on subsidies that represented only 10 percent of that saving. However. the cross-checks against these other data revealed that Proempleo did not work the way its design had intended. The bulk of the gain in employment for participants was not through higher demand for their labor induced by the wage subsidy. Rather the impact arose from supply-side effects: the voucher appeared to have had credential value to workers-it acted like a "letter of introduction" that few people had (and how it was allocated was a secret locally). This could not be revealed by the evaluation. but required supplementary data. The extra insight obtained about how Proempleo actually worked in the context of its trial setting also carried implications for scaling up. which put emphasis on providing better information for poor workers about how to get a job rather than providing wage subsidies. Spillover effects also point to the importance of a deeper understanding of how a program operates. Indirect (or "second-round") impacts on nonparticipants are common. A workfare program may lead to higher earnings for nonparticipants; or a road improvement project in one area might improve accessibility elsewhere. Depending on how important these indirect effects are thought to be in the specific application, the "program" may need to be redefined to embrace the spil lover effects. Or one might need to combine the type of evaluation discussed here with other tools, such as a model of the labor market, to pick up other benefits. An extreme form of a spillover effect is an economy-wide program. The classic evaluation tools for assigned programs have little obvious role for economy-wide programs in which no explicit assignment process is evident, or. if it is. the spil lover effects are likely to be pervasive. When some countries get the economy wide program but some do not. cross-country comparative work (such as growth regressions) can reveal impacts. That identification task is often difficult. because there are typically latent factors at country level that simultaneously influence outcomes and whether a country adopts the policy in question. And even when the identification strategy is accepted, carrying the generalized lessons from cross country regressions to inform policymaking in anyone country can be highly 44 The World Bank Research Observer, vol. 24. no. 1 (February 2009) problematic. There are also a number of promising examples of how simulation tools for economy wide policies such as Computable General Equilibrium models can be combined with household-level survey data to assess impacts on poverty and inequality.4 These simulation methods make it far easier to attribute impacts to the policy change, although this advantage comes at the cost of the need to make many more assumptions about how the economy works. In both assessing impacts and understanding the reasons for those impacts, there is often scope for a "meso" level analysis in which theory is used to inform empirical analysis of what would appear to be the key mechanisms linking an intervention to its outcomes, and this is done in a way that identifies key struc tural parameters that can be taken as fixed when estimating counterfactual out comes. This type of approach can provide deeper insights into the factors determining outcomes in ex post evaluations and can also help in simulating the likely impacts of changes in program or policy design ex ante. Naturally, simulations require many more assumptions about how an economy works. 5 As far as possible one would like to see those assumptions anchored to past knowledge built up from rigorous ex post evaluations. For example, by com bining a randomized evaluation design with a structural model of education choices and exploiting the randomized design for identification, one can greatly expand the set of policy-relevant questions about the design of a program that a conventional evaluation can answer; examples using the PROGRESA evaluation data can be found in Todd and Wolpin (2002), Attanasio and others (2004). and de Janvry and Sadoulet (2006). This strand of the literature has revealed that a budget-neutral switch of the enrolment subsidy in PROGRESA from primary to secondary school would have delivered a net gain in school attainments, by increasing the proportion of children who continue onto secondary school. While PROGRESA had an impact on schooling, it could have had greater impact. However, it should be recalled that this type of program has two objectives: increasing schooling (reducing future poverty) and reducing current poverty, through the targeted transfers. To the extent that refocusing the subsidies on sec ondary schooling would reduce the impact on current income poverty (by increasing the forgone income from children's employment). the case for this change in the program's design would need further analysis. Many of these observations point to the important role played by theory in understanding why a program mayor may not have an impact. However, the theoretical models found in the evaluation literature are not always the most rel evant to developing country settings. The models have stemmed mainly from the literature on evaluating training and other programs in developed countries, in which selection is seen largely as a matter of individual choice amongst those eli gible. This approach does not sit easily with what we know about many antipov erty programs in developing countries, in which the choices made by politicians RavalIiorl 45 and administrators appear to be at least as important to the selection process as the choices made by those eligible to participate. We often need a richer theoreti cal characterization of the selection problem to assure relevance. An example of one effort in this direction can be found in the Galasso and Ravallion (2005) model of a decentralized antipoverty program; their model focuses on the public-choice problem facing the government and the local collec tive action problem facing communities, with individual participation choices treated as a trivial subproblem. Such models can also point to instrumental variables for identifying impacts and studying their heterogeneity. An example of the use of a more structural approach to assessing an economy wide reform can be found in Ravallion and van de Walle (2008). Here the policy being studied was the decollectivization of agriculture in Vietnam and the sub sequent efforts to develop a private market in land-use rights. These were huge reforms. affecting the livelihoods of the vast majority of the Vietnamese people. Ravallion and van de Walle developed models to explain how farmland was allo cated to individual farmers at the time of decollectivization. how those allocations affected living standards. and how the subsequent reallocations of land amongst farmers (that were permitted by the subsequent market-oriented agrarian reforms) responded to the inefficiencies left by the initial administrative assign ment of land at the time of decollectivization. Naturally, many more assumptions need to be made about how the economy works--essentially to make up for the fact that one cannot observe nonparticipants in these reforms as a clue to the counterfactual. Not all of those assumptions are testable. However. the principle of evaluation is the same. namely to infer the impacts of these reforms relative to explicit counterfactuals. For example. Ravallion and van de Walle assessed the welfare impacts of the privatization of land-use rights against both an efficiency counterfactual (the simulated competitive market allocation) and an equity coun terfactual (an equal allocation of quality-adjusted land within communes). This type of approach can also throw light on the heterogeneity of the welfare impacts of large reforms; in the Vietnam case. the authors were able to assess both the overall impacts on poverty and identify the presence of both losers and gainers, including among the poor. Does Published Knowledge Reliably Guide Development Practice? The benefits from evaluations depend in part on their publication. which is the main way they feed into development knowledge. Development policymaking draws on accumulated knowledge built up in large part from published findings. 46 The World Bank Research Observer. vol. 24, no. 1 (February 2009) At the same time, publishing in refereed journals is important to a researcher's credibility and career prospects. Thus publication processes-notably the incen tives facing journal editors and reviewers, researchers, and those who fund research-are relevant to our success in achieving development goals. There are reasons for questioning how well the publication process performs in helping to realize the social benefits from rigorous evaluations. Three issues stand out. First, the cost of completing the publication stage in the cycle of research can be significant, and it is hard to reduce these costs; writing the paper the right way, documenting everything that was done, addressing the concerns of referees and editors, all take time. Practitioners are often unwilling to fund these costs, and they even question the need for publication. Again a large share of the benefits is external, to which individual project staff naturally attach low weight. Second, received wisdom develops its own inertia through the publication process, with the result that it is often harder to publish a paper that reports unexpected or ambiguous impacts when judged against current theories. past evi dence, or both. Reviewers and editors are likely to apply different standards according to whether they believe the results hold on a priori grounds. In the context of evaluating development projects, the prior belief is often that the project will have positive impacts, for that is presumably the main reason why the project was funded in the first place. Then a preference for confirming prior beliefs will tend to bias our knowledge in favor of finding positive impacts. Negative or nonimpacts will not get reported as easily. When there is a history of research on a type of intervention, the results of the early studies will set the prior beliefs against which later work is judged. An initial bad draw from the true distribution of impacts may then distort knowledge for some time after. A third source of bias is that the review process in scientific pUblishing (at least in economics) tends to put greater emphasis on the internal validity of an evalua tive research paper than on its external validity. The bulk of the effort goes into establishing that valid inferences are being drawn about causal impacts within the sample of program participants. The authors may offer some concluding (and possibly highly cautious) thoughts on the broader implications for scaling up the program well beyond that sample. However, these claims will rarely be established with comparable rigor to the efforts put into establishing internal validity. and the claims are rarely challenged by reviewers. These imperfections in the research publication industry undoubtedly have feedback effects on the production of evaluations. Researchers will tend to work harder to obtain positive findings, or at least results consistent with received wisdom, so as to improve their chances of getting their work published. No doubt, extreme biases (in either direction) will be eventually exposed. But this takes time. Researchers have no shortage of instruments at their disposal to respond to the (often distorted) incentives generated by professional publication processes. Key Ravalliorl 47 decisions on what to report, and indeed the topic of the research paper, naturally lie with the individual researcher, who must write the paper and get it published. In the case of impact evaluations of development projects, the survey data (often collected for the purpose of the evaluation) will typically include multiple indi cators of "outcomes." If one collects 20 indicators (say) then there is a good chance that at least one of them shows statistically significant impacts of the project even when it had no impact in reality. A researcher keen to get published might be tempted to report results solely for the significant indicator. (Journal reviewers and editors rarely ask what other data were collected.) The dangers to knowledge generation are plain. The threat of replication by another researcher can help assure better behavior. But in economics, replication studies tend to have low status and are actually quite rare. Thus, as Rodrik (2009) points out, there will be little or no incentive for researchers to carry out the great many repetitions that would probably be called for in the agenda for the mass RCTs proposed by Banerjee (2007) and Duflo and Kremer (2005), given that professional journals would have little inter est in such replications of the same intervention and method in different settings. Nor do researchers have a strong incentive to make their data publicly available for replication purposes. Some professional economics journals have adopted a policy that the datasets used in accepted papers should be made available this way, although enforcement is not uniformly strong. In choosing how to respond to this environment. the individual researcher faces a trade-off between publishability and relevance. Thankfully, the fact of being policy relevant is not in itself an impediment to publish ability in most jour nals, though any research paper that lacks originality, rigor, or depth will have a hard time getting published. It is by maintaining the highest standards that we assure that relevant research is publishable, as well as being credible when carried to policy dialogues. However, it must be acknowledged that the set of research questions that are most relevant to development policy overlap only par tially with the set of questions that are seen to be in vogue by the editors of the professional journals at any given time. The dominance of academia in the respected publishing outlets is understandable. but it can sometimes make it harder for researchers doing work more relevant to development practitioners, even when that work meets academic standards. Academic research draws its motivation from academic concerns that overlap imperfectly with the issues that matter to development practitioners. Provided that scholarly rigor is maintained, the cost to a researcher's published output of doing policy relevant research might not be high. but it would be naIve to think that the cost is zero. Communication and dissemination of the published findings on development effectiveness can also be deficient. Researchers sometimes lack the skills or per sonalities needed for effective communication with nontechnical audiences. 48 The World Bank Research Observer. vol. 24, no. 1 (February 2009) Having worked very hard to assure that the data and analysis are sound. and so pass muster by accepted scientific criteria, it does not come easily for all research ers to translate the results into just a few key policy messages. which do not seem to do justice to all the work involved. The externality problem can also arise here. whereby social returns from outreach exceed private returns. A research insti tution will often need to support its researchers with specialized staff who possess strong communication skills. Conclusions We underinvest in some of the most important tools for enhancing development effectiveness. Weak incentives facing key decision-makers-stemming from knowl edge externalities. asymmetric information, and noncompetitive features of the market for knowledge-entail that too few rigorous impact evaluations of develop ment interventions get done. This problem appears to be particularly severe for evaluations of projects that yield benefits over long periods and for efforts in rigor ously understanding the lessons that can be drawn for other projects and settings. While donor support for evaluation is helping redress these problems, there is still a long way to go; greater support is needed. but existing support could also be made more effective if it were aimed at changing private incentives to evaluate, by either raising the marginal benefits or lowering the marginal costs facing project managers. The process of knowledge generation through evaluations is probably also affected by biases on the publication side. which distort the incentives facing individual researchers in doing evaluations. None of this is helped by the fact that even the most rigorous methods found in practice often fall well short of delivering credible answers to the questions posed by practitioners. Those questions start at the outset of the project cycle and even embrace the rationale for the intervention. They include understanding why the intervention might have greater impact for some participants, and in some set tings, than others. They include the lessons for both the intervention under study and (importantly) future interventions. The classic estimate of the mean impact on those treated is of strictly limited utility for addressing these issues. Nor is the task helped by the fact that researchers have at times overstated what their favorite method can deliver for practitioners, and that they have often chosen what they evaluate according to whether their favorite method is feasible, rather than whether the question is important to development. Interventions are even being chosen. or designed, to fit certain preferred evaluation methods. At the same time, exaggerated claims are sometimes made by nonresearchers about what can be learnt about development effectiveness in a short time with little or no credible data. RuvulliQn 49 Looking forward, greater effort is needed to develop approaches to evaluation that can throw more useful light on the external validity of findings on specific projects (including implications for scaling up) and that can provide a deeper understanding of what determines why an intervention does. or does not. have an impact. Fungibility and flypaper effects also point to the need for a broader sec toral approach to assessing aid effectiveness. There is still much to do if we want to realize the potential for evaluative research to inform development policy by "seeking truth from facts." Notes Martin Ravallion is Director, Development Research Group. World Bank; his email address is Mravallion@worldbank.org. For helpful comments on an earlier version of this article, and related discussions on this topic. the author is grateful to Francois Bourguignon. AsH Demirguc-Kunt. Gershon Feder. Jed Friedman. f~anuela Galasso. Markus Goldstein. Bernard Hoekman. Beth King. Danny Leipziger. David McKenzie. Luis Seven, Lyn Squire. Dominique van de Walle. Michael Woolcock, and the journal's reviewers. These are the views of the author and should not be attribu ted to the World Bank or any affiliated organization. 1. The classic account of this problem is given in Akerlof (1970). 2. For example. DEeD (2007) outlines an approach to "ex ante poverty impact assessment" that claims to assess the "poverty outcomes and impacts" of a project in just 2--3 weeks at a cost of $10,000-40.000. which. as the authors point out. is appreciably less than standard impact evalu ations. The DECD paper proposes that a consultant fills in a series of tables giving the project's "short-term and long-term outcomes" across a range of (economic and noneconomic) dimensions for each of the various groups of identified "stakeholders," as well as the project's "transmission channels." through induced changes in prices. employment. transfers. and so on. Many readers (including many practitioners) would not know just how hard it is to make such assessments in a credible way. and the paper offers no guidance to readers on what degree of confidence one can have in the results of such an exercise. 3. This is sometimes called "randomization bias"; see Heckman and Smith (1995). See also the discussion in Moffitt (2004). 4. See. for example. Bourguignon and Ferreira (2003) and Chen and Ravallion (2004). 5. For a useful overview of ex ante methods, see Bourguignon and Ferreira (2003). References Akerlof. George. 1970. "The Market for 'Lemons'; Quality Uncertainty and the Market Mechanism." Quarterly Journal oJ Economics 84;488- 500. Attanasio, Drazio. C,ostas Meghir, and Ana Santiago. 2004. Education Choices in Mexico: Using a Structural Model and a Randomized Experiment to Evaluate PRDGRESA. Working Paper EWP04/04. London: Institute of Fiscal Studies. Attanasio, Orazio, Costas Meghir. and Miguel Szekely. 2003. Using Randomized Experiments and Structural Models Jar Scaling Up: Evidence Jrom the PROGRESA Evaluation. Working Paper EWP04/ 03. London; Institute of Fiscal Studies. Banerjee. Abhijit. 2007. Making Aid Work. Cambridge. MA; MIT Press. 50 The World Bank Research Observer. vol. 24. no. I (February 2009) Behrman. Jere. Yingmei Cheng. and Petra Todd. 2004. "Evaluating Preschool Programs When Length of Exposure to the Program Varies: A Nonparametric Approach." Review of Economics and Statistics 86(1):108-32. Bjorklund. Anders. and Robert Moffitt. 1987. "The Estimation of Wage Gains and Welfare Gains in Self-Selection." Review of Economics and Statistics 69(1):42-9. Bourguignon. Fram;ois. and Francisco Ferreira. 2003. "Ex-ante Evaluation of Policy Reforms Using Behavioural Models." In Francois F Bourguignon. and Luiz Pereira da SUva, eds., The Impact of Economic Policies on Poverty and Income Distribution. New York: Oxford University Press. Carvalho, Soniya. and Howard White. 2004. "Theory-Based Evaluation: The Case of Social Funds." American Journal of Evaluation 25(2):141-60. Chen. Shaohua, and Martin RavaIlion. 2004. "Welfare Impacts of China's Accession to the World Trade Organization." World Bank Economic Review 18(1):29-58. Chen, Shaohua, Ren Mu, and Martin RavaIlion. 2009. 'i\re there Lasting Impacts of Aid to Poor Areas?" Journal of Public Economics 93(3-4):512-528. Deaton. Angus. 2006. "Evidence-based Aid Must not Become the Latest in a Long String of Development Fads." Boston Review July. (http://bostonreview.netfBR31.4/deaton.html). Devarajan, Shantayanan, Lyn Squire, and Sethaput Suthiwart-Narueput. 1997. "Beyond Rate of Return: Reorienting Project Appralsal." World Bank Research Observer 12(1):35-46. Djebbari. Habiba, and Jeffrey Smith. 2008. "Heterogeneous Program Impacts of PROGRESA." Journal of Econometrics 145(1-2):64-80. Du, Runsheng. 2006. The Course of China's Rural Reform. Washington, DC: International Food Policy Research Institute. Duflo, Esther, and Michael Kremer. 2005. "Use of Randomization in the Evaluation of Development Effectiveness," In George Pitman, Osvaldo Feinstein" and Gregory Ingram, eds.. Evaluating Development Effectiveness. New Brunswick, NJ: Transaction Publishers. Galasso, Emanuela, and Martin RavaIlion. 2005. "Decentralized Targeting of an Anti-Poverty Program." Journal of Public Economics 89(4):705-27. Galasso, Emanuela, Martin RavaIlion, and Agustin Salvia. 2004. '1\ssisting the Transition from Workfare to Work: Argentina's Proempleo Experiment." Industrial and Labor Relations Review 57( 5):128-42. Heckman. James, and Salvador Navarro-Lozano. 2004. "Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models," Review of Economics and Statistics 86(1):30-57. Heckman, James, and Jeffrey Smith. 1995. '1\ssessing the Case for Social Experiments." Journal of Economic Perspectives 9(2):85-110. Heckman. James. L. Lochner, and C. Taber. 1998. "General Equilibrium Treatment Effects," Americarl Economic Review Papers and Proceedings 88(2):381-6. Heckman. James. Jeffrey Smith, and Nancy Clements. 1997. "Making the Most Out of Programme Evaluations and Social Experiments: Accounting for Heterogeneity in Programme Impacts," Review of Economic Studies 64(4):487-535. Heckman. James, Serio Urzua. and Edward Vytlacil. 2006. "Understanding Instrumental Variables in Models with Essential Heterogeneity." Review of Economics and Statistics 88(3):389-432. de Janvry. Alain, and Elisabeth Sadoulet. 2006. "Making Conditional Cash Transfer Programs More Efficient: DeSigning for Maximum Effect of the Conditionality." World Bank Economic Review 20(1):1-29. Ravallion 51 Lanjouw, Peter, and Martin Ravallion. 1999. "Benefit Incidence and the Timing of Program Capture," World Bank Economic Review 13(2):257-74. Lee, Donghoon. 2005. '1\n Estimable Dynamic General Equilibrium Model of Work: Schooling, and Occupational Choice." International Economic Review 46(1):1-34. Luo, Xiaopeng. 2007. "Collective Learning Capacity and the Choice of Reform Path." Paper pre sented at the IFPRIIGovernment of China CDnference: Taking Action for the World's Poor and Hungry People, Beijing. Miguel. Edward. and Michael Kremer. 2004. "Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities." Econometrica 72(1):159-217. Moffitt, Robert. 2003. The Role of Randomized Field Trials in Social Science Research: A Perspective from Evaluations of Reforms of Social V\-rlfare Programs. Cemmap Working Paper, CWP23/02. Department of Economics. University College London. ___. 2004. "The Role of Randomized Field Trials in Social Science Research." American Behavioral Scientist 47(5):506-40. ___. 2006. "Forecasting the Effects of Scaling Up Social Programs: An Economics Perspective." In Barbara Schneider, and Sarah-Kathryn McDonald. eds.· Scale-Up in Education: Ideas in Principle. Lanham: Rowman and Littlefield. OECD (Organisation for Economic Co-operation and Development). 2007. A Practical Guide to Ex Ante Poverty Impact Assessment. Paris: Development Assistance Committee Guidelines and Reference Series. OECD. Rao, Vijayendra. and Ana Maria Ibanez. 2005. "The Social Impact of Social Funds in Jamaica: A Mixed Methods Analysis of Participation, Targeting and Collective Action in Community Driven Development." Journal of Development Studies 41(5):788-838. Rao, Vijayendra, and Michael Walton (eds.). 2004. Culture and Public Action. Stanford: Stanford University Press. Rao. Vijayendra, and Michael Woolcock. 2003. "Integrating Qualitative and Quantitative Approaches in Program Evaluation." In Francois J. Bourguignon, and Luiz Pereira da Silva, eds., The Impact of Economic Policies on Poverty and Income Distribution: Evaluation Techniques and Tools, 165-90. New York: Oxford University Press. Ravallion, Martin. 2000. "Monitoring Targeting Performance when Decentralized Allocations to the Poor are Unobserved." World Bank Economic Review 14(2):331-45. 2004. "Who is Protected from Budget Cuts?" Journal of Policy Reform 7(2):109-22. ___. 2008. "Evaluating Anti-Poverty Programs." In Paul Schultz, and John Strauss, eds.. Handbook of Development Economics. vol. 4. Amsterdam: North-Holland. Ravallion, Martin, and Gaurav Datt. 1995. "Is Targeting through a Work Requirement Efficient? Some Evidence for Rural India." In Dominique van de Walle. and Kimberly Nead, eds .. Public Spending and the Poor: Theory and Evidence. Baltimore: Johns Hopkins University Press. Ravallion, Martin, and Shaohua Chen. 2005. "Hidden Impact: Household Saving in Response to a Poor-Area Development Project." Journal of Public Economics 89(11-12):2183- 204. Ravallion. Martin. and Dominique van de Walle. 2008. Land in Transition: Reform and Poverty in Rural Vietnam. Basingstoke: Palgrave Macmillan and World Bank. Ravallion. Martin, Dominique van de Walle, and Madhur Gaurtam. 1995. "Jesting a Social Safety Net." Journal ofPuhlic Economics 57(2):175-99. Ravallion. Martin, Emanuela Galasso, Teodoro Lazo, and Ernesto Philipp. 2005. "What Can Ex PartiCipants Reveal about a Program's Impact?" Journal of Human Resources 40(1):208-30. 52 The World Bank Research Observer. vol. 24. no. 1 (February 2009) Rodrik. Dani. 2009. "The New Development Economics: We Shall Experiment. but How Shall We Learn?" In Jessica Cohen. and William Easterly. eds., What Works in Development? Thinking Big and Thinking Small, Washington: Brookings Institution Press. Sadoulet, Elisabeth, Alain de Janvry. and Benjamin Davis. 2001. "Cash Transfer Programs with Income Multipliers: PROCAMPO in Mexico." World Development 29(6):1043-56. Todd. Petra. and Kenneth Wolpin. 2002. Using a Social Experiment to Validate a Dynamic Behavioral Model oj Child Schooling and Fertility: Assessing the Impact oj a School Subsidy Program in Mexico. Penn Institute for Economic Research Working Paper 03-022. Department of Economics. University of Pennsylvania. Van de Walle, Dominique. and Ren Mu. 2007. "Fungibility and the Flypaper Effect of Project Aid: Micro-Evidence for Vietnam." Journal oj Development Economics 84(2):667-85. Weiss, Carol. 2001. "Theory-Based Evaluation: Theories of Change for Poverty Reduction Programs," In Osvaldo Feinstein, and Robert Piccioto. eds .. Evaluation and Poverty Reduction. New Brunswick. NJ: Transaction Publications. Ravallion 53 Timing and Duration of Exposure in Evaluations of Social Programs Elizabeth M. King. Jere R. Behrman Impact evaluations aim to measure the outcomes that can be attributed to a specific policy or intervention. While there have been excellent reviews of the different methods for estimating impact, insufficient attention has been paid to questions related to timing: How long after a program has begun should it be evaluated? For how long should treat ment groups be exposed to a program before they benefit from it? Are there time patterns in a program's impact? This paper examines the evaluation issues related to timing, and discusses the sources of variation in the duration of exposure within programs and their implications for impact estimates. It reviews the evidence from careful evaluations of pro grams (with a focus on developing countries) on the ways that duration affects impacts. A critical risk that faces all development aid is that it will not payoff as expected--or that it will not be perceived as effective--in reaching development targets. Despite the billions of dollars spent on improving health, nutrition. learn ing. and household welfare, we know surprisingly little about the impact of many social programs in developing countries. One reason for this is that governments and the development community tend to expand programs quickly even in the absence of credible evidence, which reflects an extreme impatience towards ade quately piloting and assessing new programs first. This impatience is understand able given the urgency of the problems being addressed. but it can result in costly but avoidable mistakes and failures: it can also result in really promising new pro grams being terminated too soon when a rapid assessment shows negative or no impact. However, recent promises of substantially more aid from rich countries and large private foundations have intensified interest in assessing aid effectiveness. This interest is reflected in a call for more evaluations of the impact of donor funded programs in order to understand what type of intervention works and ,t'! The Author 2009. Published by Oxford University Press on behalf of the International Bank for Reconstruction and Development I THE WORLD BANK. All rights reserved. For permiSSions. please e-mail: journals.permissions@oxfordjournals.org doi:1O.1093/wbro/lkn009 Advance Access publication February 23. 2009 24:55-82 Figure 1. The Timing of Evaluations Can Affect Impact Estimates ti '" Co .~ E ~ e a.. Time unit t1 t2 t3 Time unit what doesn't. 1 Researchers are responding enthusiastically to this call. There have been important developments in evaluation methods as they apply to social programs. especially on the question of how best to identify a group with which to compare intended program beneficiaries-that is. a group of people who would have had the same outcomes as the program group without the program. 2 The timing question in evaluations, however. is arguably as important but relatively understudied. This question has many dimensions. For how long after a program has been launched should one wait before evaluating it? How long should treatment groups be exposed to a program before they can be expected to benefit from it. either partially or fully? How should one take account of the het erogeneity in impact that is related to the duration of exposure? This timing issue is relevant for all evaluations. but particularly so for the evaluation of social pro grams that require changes in the behaviors of both service providers and service users in order to bring about measurable outcomes. If one evaluates too early; there is a risk of finding only partial or no impact; too late, and there is a risk that the program might lose donor and public support or that a badly designed program might be expanded. Figure 1 illustrates this point by showing that the true impact of a program may not be immediate or constant over time, for reasons that we discuss in this paper. Comparing two hypothetical programs whose impact differs over time. we see that an evaluation undertaken at time tl indicates that the case in the bottom panel has a higher impact than the case in the top panel. while an evaluation at time t3 suggests the opposite result. 56 The World Bank Research Observer, vol. 24, no. 1 (February 2009) This paper discusses key issues related to the timing of programs and the time path of their impact, and how these have been addressed in evaluations. 3 Many evaluations treat interventions as if they were instantaneous. predictable changes in conditions and equal across treatment groups. Many evaluations also implicitly assume that the effect on individuals is dichotomous (that is. that individuals are either exposed or not), as might be the case in a one-shot vaccination program that provides permanent immunization. There is no consideration of the possi bility that the effects vary according to differences in program exposure. 4 Whether the treatment involves immunization or a more process-oriented program such as community organization. the unstated assumptions are often that the treatment occurs at a specified inception date. and that it is implemented completely and in precisely the same way across treatment groups. There are several reasons why implementation is neither immediate nor perfect. why the duration of exposure to a treatment differs not only across program areas but also across ultimate beneficiaries, and why varying lengths of exposure might lead to different estimates of program impact. This paper discusses three broad sources of variation in duration of exposure. and reviews the litera ture related to those sources (see Appendix Table A-I for a list of the studies reviewed). One source pertains to organizational factors that affect the leads and lags in program implementation. and to timing issues related to program design and the objectives of an evaluation. A second source refers to spillover effects. including variation that arises from the learning and adoption by beneficiaries and possible contamination of the control groups. Spillover effects are external (to the program) sources of variation in the treatment: while these may pertain more to compliance than timing. they can appear and intensify with time, and so affect estimates of program impact. A third source pertains to heterogeneous responses to treatment. Although there can be different sources of heterogeneity in impact. the focus here is on those associated with age or cohort. especially as these cohort effects interact with how long a program has been running. Organizational Factors and Variation in Program Exposure Program Design and the Timing of Evaluations How long one should wait to evaluate a program depends on the nature of the intervention itself and the purpose of the evaluation. For example. in the case of mv I AIDS or tuberculosis treatment programs, adherence to the treatment regime over a period of time is necessary for the drugs to be effective. While drug effec tiveness in treating the disease is likely to be the outcome of interest. an evalu ation of the program might also consider adherence rates as an intermediate King and Behrman 57 outcome of the program-and so the evaluation need not take place only at the end of the program but during the implementation itself. In the case of worker training programs, workers must first enroll for the training, and then some time passes during which the training occurs. If the training program has a specific duration, the evaluation should take place after the completion of the training program. However, timing may not be so easy to pin down if the timing of the interven tion itself is the product of a stochastic process. For example, a market downturn may cause workers to be unemployed. triggering their eligibility for worker train ing. or a market upturn may cause trainees to leave the program to start a job as Ravallion and others (2005) observe in Argentina's Trabajar workfare program. In cases where the timing of entry into (or exit from) a program itself differs across potential beneficiaries, the outcomes of interest depend on an individual selection process and on the passage of time. An evaluation of these programs should consider selection bias. Randomized evaluations of trials with well-defined start and end dates do not address this issue. In fact the timing of a program may be used for identification purposes. For example, some programs are implemented in phases. If the phasing is applied ran domly, the random variation in duration can be used for identification purposes in estimating program impact (Rosenzweig and Wolpin 1986 is a seminal article on this point). One instance is Mexico's PROGRESA (Programa de Educaci6n. Salud y Alimentaci6n) which was targeted at the poorest rural communities when it began. Using administrative and census data on measures of poverty. the program identified the potential beneficiaries. Of the 506 communities chosen for the evaluation sample, about two-thirds were randomly selected to receive the program activities during the first two years of the program, starting in mid 1998, while the remaining one-third received the program in the third year, start ing in the fall of 2000. The group that received the intervention later has been used as a control group in evaluations of PROGRESA (see. for example, Schultz 2004; Behrman. Sengupta, and Todd 2005). One way to regard duration effects is that, given constant dosage or intensity of a treatment. lengthening duration of exposure is akin to increasing intensity, and thus the likelihood of greater impact. Two cases show that impact is likely to be underestimated if the evaluation coverage is too short. First. skill development programs are an obvious example of the importance of the duration of program exposure: beneficiaries who attend only part of a training course are less likely to benefit from the course and attain the program goals than those who complete the course. In evaluating the impact of a training course attended by students, Rouse and Krueger (2004) distinguish between students who completed the com puter instruction offered through the Fast ForWard program and those who did not. The authors define completion as a function of the amount of training 58 The World Bank Research Observer, vol, 24, rIO. 1 (February 2009) attended and the actual progress of students toward the next stage of the program. as reflected in the percentage of exercises at the current level mastered at a prespecified level of proficiency.5 The authors find that. among students who received more comprehensive treatment-as reflected by the total number of com pleted days of training and the level of achievement of the completion criteria performance improved more quickly on one of the reading tests (but not all) that the authors use. Banerjee and others (2007) evaluate two randomly assigned programs in urban India: a remedial training program that hired young women to teach chil dren with low literacy and numeracy skills. and a computer-assisted learning program. Illustrating the point that a longer duration of exposure intensifies treat ment. the remedial program raised average test scores by 0.14 of a standard devi ation in the first year and 0.28 of a standard deviation in the second year of the program. while computer-assisted learning increased math scores by 0.35 of a standard deviation in the first year and 0.4 7 of a standard deviation in the second year. The authors interpret the larger estimate in the second year as an indication that the first year laid the foundation for the program to help the chil dren benefit from its second year. Lags in Implementation One assumption that impact evaluations often make is that. once a program starts, its implementation occurs at a specific and knowable time that is usually determined at a central program office. Program documents. such as World Bank project loan documents. typically contain official project launch dates. but these dates often differ from the date of actual implementation in a project area. When a program actually begins depends on supply- and demand-related realities in the field. For example. a program requiring material inputs (such as textbooks or medicines) relies on the arrival of those inputs in the program areas: the timing of the procurement of the inputs by the central program office may not indicate accurately when those inputs arrive at their intended destinations. 6 In a large early childhood development program in the Philippines, administrative data indi cate that the timing of the implementation differed substantially across program areas: because of lags in central procurement, three years after project launch not all providers in the program areas had received the required training (Armecin and others 2006). Besides supply lags. snags in information flows and project finances can also delay implementation. In conditional cash transfer programs in Mexico and Ecuador, delays in providing the information about intended house hold beneficiaries prevented program operators in some sites from making punc tual transfers to households (Rawlings and Rubio 2005; Schady and Araujo 20(8). 7 In Argentina poor municipalities found it more difficult to raise the King and Behrman 59 cofinancing required for the subprojects of the country's Trabajar program, which weakened the program's targeting performance (Ravallion 2002). It is possible to address the problem of implementation lags in part if careful and complete administrative data on timing are available for the program: cross referencing such data with information from public officials or community leaders in treatment areas could reveal the institutional reasons for variation in implementation. For example, if there is an average gap of one year between program launch and actual implementation, then it is reasonable for the evalu ation to make an allowance of one year after program launch before estimating program impact. 8 However, reliable information on dates is often not readily avail able, so studies have tended to allot an arbitrary grace period to account for lags. Assuming a constant allowance for delays, moreover, may not be an adequate solution if there is wide variation in the timing of implementation across treat ment areas. This is likely to be the case if the program involves a large number of geographical regions or a large number of components and actors. In programs that cover several states or provinces, region or state fixed-effects might control for duration differences if the differences are homogeneous within a region or state. If the delays are not independent of unobservable characteristics in the program areas, that may also influence program impact. An evaluation of Madagascar's SEECAUNE program provides an example of how to define area specific starting dates. It defined the start of the program in each treatment site as the date of the first child-weighing session in that site. The area-specific date takes into account the program's approach of gradual and sequential expansion, and the expected delays between the signing of the contract with the implementing NGO and the point when a treatment site is actually open and operational (Galasso and Yau 2006). This method requires detailed program-monitoring data. If a program has many components, the solution may hinge on the evaluator's understanding of the technical production function and thus on identifying the elements that must be present for the program to be effective. For example, in a school improvement program that requires additional teacher training and instructional materials, the materials might arrive in schools at about the same time, but the additional teacher training might be achieved only over a period of several months, perhaps because of differences in teacher availability. The evaluator, when considering the timing of the evaluation, must decide whether the effective program start should be defined according to the date when the materials arrive in schools or when all (or most?) of the teachers have completed their training. In the Madagascar example above, although the program has several components (for example, growth monitoring, micronutrient supplemen tation, deworming), the inception date of each site was fixed according to a growth-monitoring activity, that of the first weighing session (Galasso and Yau 2006). 60 Tile World Bank Researcll Observer, vol. 24, no. 1 (February 2009) Although the primary objective of evaluations is usually to measure the impact of programs, often they also monitor progress during the course of implemen tation and thus help to identify problems that need correction. An evaluation of the Bolivia Social Investment Fund illustrates this point clearly (Newman and others 2002). One of the program components was to improve the drinking water supply through investments in small-scale water systems. However, the first lab oratory analysis of water quality showed little improvement in program areas. Interviews with local beneficiaries explained why: contrary to plan, people desig nated to maintain water quality lacked training; inappropriate materials were used for tubes and the water tanks; and the lack of water meters made it difficult to collect fees needed to finance maintenance work. After training was provided in all the program communities, a second analysis of water supply indicated sig nificantly less fecal contamination in the water in those areas. How are estimates of impact affected by variation in program start and by lags in implementation? Variation in program exposure that is not incorporated into program evaluation is very likely to bias downward the intent-to-treat (ITT) estimates of the program's impact, especially if such impact increases with the exposure of the beneficiaries who are actually treated. But the size of this underestimation, for a given average lag across communities, depends on the nature of the lags. If the program implementation delays are not random, it matters if they are inversely or directly correlated with unobserved attributes of the treated groups that may positively affect program success. If the implemen tation lags are directly correlated with unobserved local attributes, then the true ITT effects are underestimated to a larger extent; for example, central adminis trators may put less effort into starting the programs in areas that have worse unobserved determinants of the outcomes of interest, such as a weaker manage ment capability of local officials to implement a program in these areas. If implementation delays are instead inversely associated with the unobserved local attributes (that is, the central administrators put more effort into starting the program in those same areas), then the ITT effects are underestimated to a lesser extent. If instead the program delays are random, the extent of the under estimation depends on the variance in the implementation lags (still given the same mean lag). All else being equal. greater random variance in the lags results in greater underestimation of the ITT effects. This is because a larger classical random measurement error in a right-side variable biases the estimated coefficient more towards zero. If the start of the treatment for individual beneficiaries has been identified cor rectly, implementation delays in themselves do not necessarily affect estimates of treatment-on-the-treated (TOT) effects. In some cases this date of entry can be relatively easy to identify: for example, the dates on which beneficiaries enroll in a program may be established through a household or facility surveyor King and Behrman 61 administrative records (for example. school enrollment rosters or clinic logbooks). In other cases, however. the identification may be more difficult: for example. ben eficiaries may be unable to distinguish among alternative. contemporaneous pro grams or to recall their enrollment dates, or the facility or central program office may not monitor beneficiary program enrollments. 9 Nonetheless, even if the vari ation in treatment dates within program areas is handled adequately and enroll ment dates are identified fairly accurately at the beneficiary level. nonrandom implementation delays bias TOT estimates. Even a well-specified facility or house hold survey may be adversely affected by unobservables that may be related to the direction and size of the program impact. The duration of exposure, like program take-up, has to be treated as endogenous. The problem of selection bias motivates the choice of random assignment to estimate treatment effects in social programs. Learning by Providers A different implementation lag is associated with the fact that program operators (or providers of services) themselves face a learning curve that depends on time in training and on-the-job experience. This most likely produces some variation in the quality of program implementation that is independent of whether there has been a lag in the procurement of the training. This too is an aspect of program operation that is often not captured in impact evaluations. Although the evalu ation of Madagascar's SEECALINE program allotted a grace period of two to four months for the training of service providers. it is likely that much of the learning by providers happened on the job after the formal training. While the learning process of program operators may delay full program effec tiveness, another effect could be working in the opposite direction. The "pioneer ing effect" means that implementers may exhibit extra dedication, enthusiasm, and effort during the first stages. because the program may represent an innova tive endeavor to attain an especially important goal. (A simplistic diagram of this effect is shown in Figure 1, bottom panel.) Jimenez and Sawada (1999) find that newer EDUCO schools in EI Salvador had better outcomes than older schools (with school characteristics held constant). They interpret this as evidence of a Hawthorne effect-that is, newer schools were more motivated and willing to undertake reforms than were the older schools. If such a phenomenon exists. it would exert an opposite pull on the estimated impacts and. if sufficiently strong, might offset the learning effect. at least in the early phases of a new program. Over time, however. this extra dedication, enthusiasm. and effort are likely to wane. lO If there are heterogeneities in this unobserved pioneering effect across program sites that are correlated with observed characteristics (for example. schooling of program staff), the result will be biased estimates of the impact of such characteristics on initial program success. 62 Tile World Bank Research Observer. vol. 24. no. 1 (February 2009) Spi LLover Effects The observable gains from a social program during its entire existence, much less after only a few years of implementation, may be an underestimate of its full potential impact for several reasons that are external to the program design. First, evaluations are typically designed to measure outcomes at the completion of a program, and yet the program might yield additional and unintended outcomes in the longer run. Second, while the assignment of individuals or groups of indi viduals to a treatment can be defined, program beneficiaries may not actually take up an intervention--or may not do so until after they have learned more about the program. Third, with time, control groups or groups other than the intended beneficiaries might find a way of obtaining the treatment, or they may be affected simply by learning about the existence of the program-possibly because of expectations that the program will be expanded to their area. If non compliance is correlated with the outcome of interest, then the difference in the average outcomes between the treatment and the control groups is a biased esti mate of the average effect of the intervention. We discuss these three examples below. Short-Run and Long-Run Outcomes Programs that invest in cumulative processes, such as a child's physiological growth and accumulation of knowledge, require the passage of time. This implies that longer program exposure would yield greater gains, though probably with diminishing marginal returns. Also, such cumulative processes could lead to out comes beyond those originally intended-and possibly beyond those of immediate interest to policymakers. Early childhood development (ECD) programs are an excellent example of short-run outcomes that could lead to long-run outcomes beyond those envisioned by the original design. These programs aim to mitigate the multiple risks facing very young children, and to promote their physical and mental development by improving nutritional intake and/or cognitive stimulation. The literature review by Grantham-McGregor and others (2007) identifies studies that use longitudinal data from BraziL Guatemala, Jamaica. the Philippines, and South Africa that establish causality between preschool cognitive development and subsequent schooling outcomes. The studies suggest that a one standard deviation increase in early cognitive development predicts substantially improved school outcomes in adolescence, as measured by test scores, grades attained, and dropout behavior (for example, 0.71 additional grade by age 18 in Brazil). Looking beyond childhood, Garces, Thomas, and Currie (2002) find evidence from the U.S. Head Start program that links preschool attendance not only to higher educational attainment but also to higher earnings and better adult social King and Be/trotan 63 outcomes. Using longitudinal data from the Panel Study of Income Dynamics and controlling for the participants' disadvantaged background, they conclude that exposure to Head Start for whites is associated in the short run with significantly lower dropout rates, and in the long run with 30 percent greater probability of high school completion, 28 percent higher likelihood of attending college, and higher earnings in their early twenties. For African-Americans participation in Head Start is associated with a 12-percentage-point lower probability of being booked for or charged with a crime. Another example of an unintended long-run outcome is provided by an evalu ation (Angrist, Bettinger, and Kremer 2004) of Colombia's school voucher program at the secondary level (PACES or Programa de Ampliaci6n de Cobertura de la Educaci6n Secundaria). This finds longer-run outcomes beyond the original program goal of increasing the secondary school enrollment rate of the poorest youths in urban areas. Using administrative records, the follow-up study finds that the program increased high-school graduation rates of voucher students in Bogota by 5 -7 percentage points, which is consistent with the earlier outcome of a 10-percentage-point increase in eighth-grade completion rates (Angrist and others 2002). Correcting for the greater percentage of lottery winners taking college admissions tests, the program increased test scores by two-tenths of a stan dard deviation in the distribution of potential test scores. In their evaluation of a rural roads project in Vietnam, Mu and van de Walle (2007) find that, because of developments external to the program, rural road construction and rehabilitation produced larger gains as more time elapsed after project completion. The impacts of roads depend on people using them, so for the benefits of the project to be apparent, more bicycles or motorized vehicles must be made available to rural populations connected by the roads. But the impacts of the new roads also include other developments that arose more slowly, such as a switch from agriculture to non-agricultural income-earning activities, and an increase in secondary schooling following a rise in primary school completion. These impacts grew at an increasing rate as more months passed, taking two years more on average to emerge. In the long run, however, impacts can also vanish. Short-term estimates are not likely to be informative about such issues as the extent of diminishing mar ginal returns to exposure, which would be an important part of the information basis of policies. In Vietnam the impact of the rural roads project on the avail ability of foods and on employment opportunities for unskilled jobs emerged quite rapidly: it then waned as the control areas caught up with the program areas, an effect we return to below (Mu and van de Walle 2007). In Jamaica a nutritional supplementation-cum-psychological-stimulation program for children under two yielded mixed effects on cognition and education years later (Walker and others 2005). While the interventions benefited child development-even at age 11, 64 The World Bank Researdl Observer, vol. 24, no. 1 (February 2009) stunted children who received stimulation continued to show cognition benefits small improvements from supplementation noted at age 7 were no longer present at age 11. In fact. impact can vanish much sooner after a treatment ends. In the example of two randomized trials in India. although impact rose in the second year of the program, one year after the programs had ended. impact dropped. For the remedial program. the gain fell to 0.1 of a standard deviation and was no longer statistically significant; for the computer learning program, the gain dropped to 0.09 of a standard deviation, though it was still significant (Banerjee and others 2007). Chen, Mu. and Ravallion (2008) point to how longer-term effects might be invisible to evaluators of the long-term impact of the Southwest China Project. which gave selected poor villages in three provinces funding for a range of infra structure investments and social services. The authors find only small and statisti cally inSignificant average income gains in the project villages four years after the disbursement period. They attribute this partly to significant displacement effects caused by the government cutting the funding for nonproject activities in the project villages and reallocating resources to the nonproject villages. Because of these displacement effects. the estimated impacts of the project are likely to be underestimated. To estimate an upper bound on the size of this bias, the increase in spending in the comparison villages is assumed to be equal to the displaced spending in the project villages. Under this assumption, the upper bound of the bias could be as high as 50 percent-and it could be even larger if the project actually has positive long-term benefits. Long-term benefits. however, are often not a powerful incentive to support a program or policy. The impatience of many policy makers with a pilot-evaluate learn approach to policymaking and action is usually coupled with a high dis count rate. This results in little appetite to invest in programs for which benefits are mostly enjoyed in the future. Even aid agencies exhibit this impatience, and yet programs that are expected to have long-run benefits would be just the sort of intervention that development aid agencies should support because local poli ticians are likely to dismiss them. Learning and Adoption by Beneficiaries Programs do not necessarily attain full steady-state effectiveness after implemen tation commences. Learning by providers and beneficiaries may take time, a necessary transformation of accountability relationships may not happen immedi ately, or the behavioral responses of providers and consumers may be slow in becoming apparent. The success of a new child-immunization or nutrition program depends on parents learning about the program and bringing their children to the providers, King and Behrman 65 and the providers glVlng treatment. In Mexico's PROGRESA the interventions were randomly assigned at the community level. If program uptake were perfect, a simple comparison between eligible children in the control and treatment localities would have been sufficient to estimate the program TOT effect (Behrman and Hoddinott 2005). However, not all potential beneficiaries sought services: only 61 64 percent of the eligible children aged 4 to 24 months and only half of those aged 2 to 4 years actually received the program's nutritional supplements. The evaluation found no significant ITT effects, but did find that the TOT effects were significant, despite individual and household controls. In Colombia's secondary-education voucher program too, information played a role at both the local government level and the student level (King. Orazem, and Wohlgemuth 1999). Since the program was cofunded by the central and munici pal governments, information given to the municipal governments was critical to securing their collaboration. At the beginning of the program, the central govern ment met with the heads of the departmental governments to announce the program and solicit their partiCipation; in turn the departmental governors invited municipal governments to participate. Dissemination of information to families was particularly important, because participation was voluntary and the program targeted only certain students (specifically those living in neighborhoods classified among the two lowest socioeconomic strata in the country) on the basis of specific eligibility criteria. Some local governments used newspapers to dissemi nate information about the program. In decentralization reforms, the learning and adoption processes are arguably more complex because the decision to participate and the success of implemen tation depend on many more actors. Even the simplest form of this type of change in governance entails a shift in the accountability relationships between levels of government and between governments and providers-for example, the transfer of the supervision and funding of public hospitals from the national government to a subnational government. In Nicaragua's autonomous schools program in the 1990s, for example. the date a school signed the contract with the government was considered to be the date the school officially became autonomous. In fact. the signing of the contract was merely the first step toward school autonomy: it would have been followed by training activities, the election of the school man agement council, the development of a school improvement plan. and so on. Hence. the reform's full impact on outcomes would have been felt only after a period of time. and the size of this impact might have increased gradually as the elements of the reform were put in place. However. it is not easy to determine the length of the learning period. Among teachers, school directors. and parents in the so-called autonomous schools. the evaluation finds a lack of agree ment on whether their schools had become autonomous and the extent to which this had been achieved (King and OzIer 1998). An in-depth qualitative analysis in 66 The World Bank Research Observer. vol. 24. 110. 1 (February 2(09) a dozen randomly selected schools confirms that school personnel had different interpretations of what had been achieved (Rivarola and Fuller 1999). Studies of the diffusion of the Green Revolution in Asia in the mid-1960s high light the role of social learning among beneficiaries. Before adopting the new technology, individuals seem to have learned about it from the experiences of their neighbors (their previous decisions and outcomes). This wait-and-see process accounted for some of the observed lags in the use of high-yielding seed varieties in India at the time (Foster and Rosenzweig 1995; Munshi 2004). In rice villages the proportion of farmers who adopted the new seed varieties rose from 26 percent in the first year following the introduction of the technology to 31 percent in the third year; in wheat villages. the proportion of adopters increased from 29 percent to 49 percent. Farmers who did not have neighbors with com parable attributes (such as farm size or characteristics unobserved in available data such as soil quality) may have had to carry out more of their own exper imentation. This would probably have been a more costly form of learning because the farmers bore all the risk of the choices they made (Munshi 2004). The learning process at work during the Green Revolution is similar to that described by Miguel and Kremer (2003. 2004) about the importance of social networks in the adoption of new health technology. in this case deworming drugs. Survey data on individual social networks of the treatment group in rural Kenya reveal that social links provided nontreatment groups better information about the deworming drugs, and thus led to higher program take-up. Two years after the start of the deworming program. school absenteeism among the treat ment group had fallen by about one-quarter on average. There were significant gains in several measures of health status-including reductions in worm infec tion. child growth stunting. and anemia-and gains in self-reported health. But children whose parents had more social links to early treatment schools were sig nificantly less likely to take deworming drugs. The authors speculate that this dis appointing finding could be due to overly optimistic expectations about the impact of the drugs. or to the fact that the health gains from deworming take time to be realized. while the side effects of the drugs are immediately felt. Providing information about a program, however, is no guarantee of higher program uptake. One striking example of this is given by a program in Uttar Pradesh, India. which aimed to strengthen community participation in public schools by providing information to village members (Banerjee and others 2008). More information apparently did not lead to higher participation by the Village Education Committee (VEe). by parents, or by teachers. The evaluators attribute this poor result to more deep-seated information blockages: village members were unaware of the roles and responsibilities of the VEe. despite the existences of these committees since 2001. and a large proportion of the VEe members were not even aware of their membership. King and Behrman 67 The nutritional component in PROGRESA was undersubscribed (because parents lacked information about the program and its benefits), and the community mobil ization in Uttar Pradesh was found wanting (because basic information about the roles and powers of village organizations is difficult to convey). Impact evaluations that do not take information diffusion and learning by beneficiaries into account obtain downward-biased ITT and TOT impact estimates. The learning process might be implicit-for example when program information diffuses to potential beneficiaries during the course of implementation, perhaps primarily by word of mouth-or it could be explicit. for example when a program aims an information campaign at potential beneficiaries during a well-defined time period. Two points are worth noting about the role of learning in impact evaluation. One is the simple point discussed above that learning takes time. A steady-state level of effective demand among potential beneficiaries (effective in the sense that the benefi ciaries actually act to enroll in or use program services) is related to the process of expanding effective demand for a program. I I This implies that ITT estimates of program impact are biased downward if the estimates are based on data obtained prior to the attainment of this steady-state effective demand. The extent of the bias depends on whether learning (or the expansion of effective demand) is correlated with unobserved program attributes; specifically, there is less dovvnward bias if this correlation is positive. There may be heterogeneity in this learning process: those pro grams that have better unobserved management capabilities may promote more rapid learning, while those that have worse management capabilities may face slower learning. Heterogeneity in learning would affect the extent to which the ITT and TOT impacts that are estimated before a program has approached effectiveness are down ward-biased-but to a lesser degree if the heterogeneity in learning is random. The second point is that the learning process itself may be a program com ponent, and thus an outcome of interest in an impact evaluation. How benefici aries learn and decide to participate is often external to a program, since the typical assumption is that beneficiaries will take up a program if the program exists. In fact. the exposure of beneficiaries to specific communication interven tions about a program may be necessary to encourage uptake. There is a large lit erature. for example. that shows a strong association between exposure to mass media information campaigns and the use of contraceptive methods and family planning services. The aims of such campaigns have been to make potential bene ficiaries aware of these services. and to break down sociocultural resistance to them (Cleland and others 2006). This "social marketing" approach has been used also to stimulate the demand for insecticide-treated mosquito nets for malaria control. and has increased demand, especially among the poorest and most remote households (Rowland and others 2002; Kikumbih and others 2005). To understand how learning takes place is to begin to understand the "black box" that lies between program design and outcomes-and if this learning were 68 Tile World Bank Research Observer. vol. 24. no. 1 (February 2009) promoted in a random fashion, it could serve as an exogenous instrument for the estimation of program impact Peer Effects The longer a program has been in operation, the more likely it is that specific inter ventions will spill over to populations beyond the treatment group and thus affect impact estimates. Peer effects increase impact, as in the case of the Head Start example already mentioned. Garces, Thomas and Currie (2002) find strong spillover effects within the family-higher birth-order children (that is, younger siblings) seem to benefit more than their older siblings, especially among Mrican-Americans, because older siblings are able to teach younger ones. Hence, expanding the defi nition of impact to include peer effects adds to impact estimates. Peer effects also arise when specific program messages (either directly from communications interventions or from observing treatment groups) diffuse to control groups and alter their behavior in the same direction as in the treatment group. While this contagion is probably desirable from the point of view of policy makers, it likely depresses impact estimates since differences between the control and treatment groups are diminished. Another form of leakage that grows with time may not be so harmless from the point of view of program objectives. For programs that target only specific populations, time allows political pressure to build for the program to be more inclusive and even for nontargeted groups to find ways of obtaining treatment (for example through migration into program sites). Because of the demand-driven nature of the Bolivia Social Investment Fund, for instance, not all communities selected for active promotion applied for and received a SIF-financed education project, but some communities not selected for active promotion nevertheless applied for promotion and obtained an edu cation project (Newman and others 2002). Heterogeneity of impact An examination of how program impact varies according to the observable characteristics of the beneficiaries can teach us important lessons on policy and program design. Our focus here is on occasions when duration or timing differ ences interact with the sources of heterogeneity in impact. One important source of heterogeneity in some programs is cohort membership. Cohort Effects The age of beneficiaries may be one reason why duration of exposure to a program matters, and the estimates of ITT and TOT impacts can be affected King and Behrman 69 substantially by whether the timing is targeted toward critical age ranges. Take the case of ECD programs, such as infant feeding and preschool education, which target children for just a few years after birth. This age targeting is based on the evidence that a significant portion of a child's physical and cognitive development occurs at a very young age, and that the returns to improvements in the living or learning conditions of the child are highest at those ages. The epidemiological and nutritional literatures emphasize that children under three years of age are especially vulnerable to malnutrition and neglect (see Engle and others 2007 for a review). Finding that a nutritional supplementation program in Jamaica did not produce long-term benefits for children. Walker and others (2005) suggest that prolonging the supplementation--or supplementing at an earlier age, during pregnancy. and soon after birth-might have benefited later cognition. It might have been more effective than the attempt to reverse the effects of undernutrition through supplementation at an older age. Applying evaluation methods to drought shocks, Hoddinott and Kinsey (2001) also conclude that in rural Zimbabwe children in the age range of 12 to 24 months are the most vulnerable to such events: these children lose 1.5-2 centimeters of physical growth. while older children 2 to 5 years of age do not seem to experience a slowdown in growth. 12 In a follow-up study Alderman, Hoddinott, and Kinsey (2006) conclude that the longer the exposure of young children to civil war and drought, the larger the negative effect of these shocks on child height; moreover. older children suffer less than younger children in terms of growth. I 3 Interaction of Cohort Effects and Duration of Exposure As discussed above, the impacts of some programs crucially depend on whether or not an intended beneficiary is exposed to an intervention at a particularly critical age range, such as during the first few years of life. Other studies illustrate that the duration of exposure during the critical age range also matters. The evalu ation by Frankenberg, Suriastini. and Thomas (2005) of Indonesia's Midwife in the Village program shows just this. The program was intended to expand the availability of health services to mothers and thus improve children's health out comes. By exploiting the timing of the (nonrandom) introduction of a midwife to a community, the authors distinguish between the children, living in the same community. who were exposed and those who were not exposed to a midwife. The authors group the sample of children into three birth cohorts. For each group, the extent of exposure to a village midwife during the vulnerable period of early childhood varied as a function of whether the village had a midwife and. if so, when she had arrived. In communities that had a midwife from 1993 onward. children in the younger cohort had been fully exposed to the program when data were collected. whereas children in the middle cohort had been only partially 70 The World Bank Research Observer. vol. 24. no. 1 (February 2009) exposed. The authors conclude that partial exposure to the village midwife program conferred no benefits in improved child nutrition, while full exposure from birth yielded an increase in the height-for-age z-score of 0.35 to 0.44 of a standard deviation among children aged 1 to 4 years. Three other studies test the extent to which ECD program impacts are sensitive to the duration of program exposure and the ages of the children during the program. Behrman, Cheng, and Todd (2004) evaluated the impact of a preschool program in Bolivia, the Proyecto Integral de Desarrollo Infantil. Their analysis explicitly takes into account the dates of program enrollment of individual chil dren. In their comparison of treated and untreated children. they find evidence of positive program impacts on motor skills. psychosocial skills. and language acqui sition that are concentrated among children 37 months of age and older at the time of the evaluation. When they disaggregated their results by the duration of program exposure, the effects were most clearly observed among children who had been involved in the program for more than a year. Like the Bolivia evaluation, the evaluation of the early childhood development program in the Philippines mentioned above finds that the program impacts vary according to the duration of exposure of children, although this variation is not as dramatic as the variation associated with children's ages (Armecin and others 2006). Administrative delays and the different ages of children at the start of the program resulted in the length of exposure of eligible children varying from 0 to 30 months, with a mean duration of 14 months and a substantial standard devi ation of 6 months. Duration of exposure varied widely. even when a child's age was controlled for. The study finds that, for motor and language development. two- and three-year-old children exposed to the program had z-scores 0.5 to 1.8 standard deviations higher, depending on length of exposure, than children in the control areas, and that these gains were much lower among older children. Gertler (2004) also estimates how duration of exposure to health interventions in Mexico's PROGRESA affected the probability of child illness, using two models-one assumes that program impact is independent of duration. and the other allows impact to vary according to the length of exposure. The interventions required that children under 2 years be immunized. visit nutrition monitoring clinics. and obtain nutritional supplements. and that their parents receive train ing on nutrition. health. and hygiene; children between 2 and 5 years of age were expected to have been immunized already. but were to obtain the other services. Gertler finds no program impact after a mere 6 months of program exposure for children under 3 years of age. but with 24 months of program exposure the illness rate among the treatment group was about 40 percent lower than the rate among the control group. a difference that is significant at the 1 percent level. The interaction of age effects and the duration of exposure has been examined also by Pitt, Rosenzweig. and Gibbons (1993) and by Duflo (2001) in Indonesia King and Behrman 71 and by Chin (2005) in India in their evaluations of schooling programs. These studies use information on the region and year of birth of children, combined with administrative data on the year and placement of programs, to measure duration of program exposure. Duflo (2001). for example. estimates the impact of a massive school construction program on subsequent schooling attainment and on the wages of the birth cohorts affected by the program in Indonesia. From 1973 to 1978 more than 61.000 primary schools were built throughout the country, and the enrollment rate among children aged 12 rose from 69 percent to 83 percent. By linking district-level data on the number of new schools by year and matching these data with intercensal survey data on men born between 1950 and 1972, Duflo defines how long an individual was exposed to the program. The impact estimates indicate that each new school per 1,000 children increased years of education by 0.12-0.19 percent among the first cohort fully exposed to the program. Chin (2005) uses a similar approach in estimating the impact of India's Operation Blackboard. Taking grades 1- 5 as the primary school grades. ages 6 -1 0 as the corresponding primary school ages. and 1988 as the first year that schools would have received program resources, Chin supposes that only students born in 1978 or later would have been of primary school age for at least one year in the program regime, and therefore were potentially exposed to the program for most of their schooling. The evaluation compares two birth cohorts: a younger cohort born between 1978 and 1983. and therefore potentially exposed to the program. and an older cohort. The impact estimates suggest that accounting for duration somewhat lowers the impact as measured, but it remains statistically sig nificant, though only for girls. Conclusions This paper has focused on the dimensions of timing and duration of exposure that relate to program or policy implementation. Impact evaluations of social pro grams or policies typically ignore these dimensions; they assume that interven tions occur at a specified date and produce intended or predictable changes in conditions among the beneficiary groups. This is perhaps a reasonable assump tion when the intervention itself occurs within a very short time period and has an immediate effect, such as some immunization programs, or is completely under the direction and control of the evaluator, as in small pilot programs. In the examples we have cited (India's Green Revolution. Mexico's PROGRESA con ditional cash transfer program, Madagascar's child nutrition SEECALINE program, and an early childhood development program in the Philippines, among others), this is far from true. Indeed, initial operational fits and starts in most pro grams, and a learning process for program operators and beneficiaries, can delay 72 Tile Y'.0rld Bank Researcll Observer. vol. 24, no. 1 (February 2009) full program effectiveness; also. there are many reasons why these delays are not likely to be the same across program sites. We have catalogued sources of the variation in the duration of program exposure across treatment areas and beneficiaries. including program design fea tures that have built-in waiting periods. lags in implementation due to administra tive or bureaucratic procedures. spillover effects. and the interaction between sources of heterogeneity in impact and duration of exposure. Some evaluations demonstrate that accounting for these variations in length of program exposure alters impact estimates significantly. so ignoring these variations can generate misleading conclusions about an intervention. Appendix Table A-I indicates that a number of impact evaluation studies do incorporate one or more of these timing and duration effects. The most commonly addressed source of duration effects is cohort affiliation. This is not surprising. since many interventions, such as edu cation and nutrition programs, are allocated on the basis of age, in terms of both timing of entry into and exit from the program. On the other hand, implemen tation lags are recognized but often not explicitly addressed. What can be done to capture timing and the variation in length of program exposure? First, the quality of program data should be improved. Such data could come from administrative records on the design and implementation details of a program, in combination with survey data on program take-up by beneficiaries. Program data on the timing of implementation are likely to be available from program management units. but these data may not be available at the desired level of disaggregation-this might be the district. community. providers. or indi vidual. depending on where the variation in timing is thought to be the greatest. Compiling such data on large programs that decentralize to numerous local offices could be costly. There is obviously a difference in the primary concern of the high-level program manager and of the evaluator. The program manager's concern is the disbursement of project funds and the procurement of major expenditure items. whereas the evaluator's concern would be to ascertain when the funds and inputs reach treatment areas or beneficiaries. Second, the timing of the evaluation should take into account the time path of program impacts. Figure I illustrates that program impact, however measured, can change over time, for various reasons discussed in the paper. so there are risks of not finding significant impact when a program is evaluated too early or too late. The learning process by program operators or by beneficiaries could produce a curve showing increasing impact over time. while a pioneering effect could show a very early steep rise in program impact that is not sustainable. Figure 1 thus suggests that early rapid assessments to judge the success of a program could be misleading, and also that repeated observations may be necessary to estimate true impact. Several studies that we reviewed measured their outcomes of interest more than once after the start of the treatment, and some compared short-run King and Behrman 73 and long-run effects to examine whether the short-run impact had persisted. Possible changes in impact over time imply that evaluations should not be a once-off activity for any long-lasting program or policy. In fact. as discussed above, examining long-term impacts could point to valuable lessons about the dif fusion of good practices over time (Foster and Rosenzweig 1995) or, sadly, how governments can reduce impact by implementing other policies that (perhaps unintentionally) disadvantage the program areas (Chen, Mu, and Ravallion 2008). Third, the appropriate evaluation method applied should take into account the source of variation in duration of program exposure. Impact estimates are affected by the length of program exposure, depending on whether or not the source of variation in duration is common within a treatment area and whether or not this source is a random phenomenon. Some pointers are: If the length of implementation lags is about equal across treatment sites, then a simple comparison between the bene ficiaries in the treatment and control areas would be sufficient to estimate the average impact of the program or the ITT effects under many conditions-though not if there are significant learning or pioneering effects that differ across them. If the delays vary across treatment areas but not within those areas-and if the variation is random or independent of unobservable characteristics in the program areas that may also affect program effectiveness-then it is also possible to estimate the ITT effects with appropriate controls for the area, or with fixed effects for different exposure categories. In cases where the intervention and its evaluation are designed together, such as pilot programs, it is possible and desir able to explore the time path of program impact by allocating treatment groups to different lengths of exposure in a randomized way. This treatment allocation on the basis of duration differences can yield useful operational lessons about program design, so it deserves more experimentation in the future. 74 The World Bank Research Observer. vol. 24. no. 1 (February 2009) ~ Appendix "" '" a. I:!j '" ::r Table A-I. Examples of Evaluations That Consider Timing Issues and Duration of Program Exposure in Estimating Program Impact ~ ';::: " Sources of variation in timing and duration of program exposure Sllort-rull and Callort interacted Implemelltation long-run Learning by Learning and use Collort witll duration of Studies Countr.1J Intervention lags outcomes beneficiaries by beneficiaries effects exposure Angrist and others Colombia School voucher program for x x (2002) secondary level Angrist. Bettinger, and Kremer (2004) Armecin and others Philippines Comprehensive early x x x (2006) childhood development program Banerjee and others India Balsakhi school remedial and x x (2007) computer-assisted learning programs Behrman and Mexico PROGRESA nutrition x Hoddinott (200S) intervention Behrman, Cheng. Bolivia PIDI preschool program x x and Todd (2004) Behrman. Sengupta. Mexico PROGRESA education x and Todd (2005) in tervention Schultz (2004) Chin (200S) India Operation Blackboard: x x x additional teachers per school Duflo (2001) Indonesia School construetion program x x x Continued -..J U"I ..... 0\ Table A·I. Continued Sources of variation in timing and duration of program exposure Short-run (Illd Cohort interacted ~ ('I> Implementation long-run I,earning by Leaming and lise Cohort Wit/l duration of ~ Studies Country Intervention lags outcomes beneficiaries b,l} benefiCiaries effects exposure ::!.. "' tll Foster and Rosenzweig India Green Revolution: new seed x ''" " (1995) varieties "'" ~ Frankenberg, Indonesia Midwife in the Village x x ~ Suriastini, and program " ;:, ~ Thomas (2005) '" Galasso and Yau Madagascar SEECALINE child nutrition x x x x '" ""t 1ii (2006) program " (§ ,.... Garces. Thomas. United Head Start program: ECD x " and Currie (2002) States ~ Hoddinott and Kinsey 7imbabwe Drought shocks; civil war x x x " '..... " (2001) ~ Alderman. Hoddinott. "'"' ""t and Kinsey (2006) !S ~ Jimenez and Sawada El Salvador EDUCO schools: community x " a a (1999) participc1.tion ::::; e1 Gertler (2004) Mexico PROGRESA health and x x ::: "" '" ::: nutrition services King and Ozier Nicaragua School autonomy reform x "" t::l:l (1998) '" ::: 3 Rivarola and Fuller '" ::: (1999) Miguel and Kremer Kenya School-hased deworming x (2003, 2004) program Mu and van de Walle Vietnam Rural roads rehahilitation x x (2007) project Munshi (2004) India Green Revolution: new seed x varieties Rouse and Krueger United Fast ForWord program: x (2004) States computer-assisted learning Walker and others Jamaica Nutrition supplementation x x (2005) Note: Review articles on early childhood development programs-for example, Engle and others (2007) and Grantham-McGregor and others (2007}-cover a long list of studies that we mention in the text but are not listed in this table: many of those studies examine age-specific effects, and some examine short- and long-run impacts. ..... ..... Notes Elizabeth M. King (corresponding author) is Research Manager, Development Research GrouP. at the World Bank: her address for correspondence is eking@worldbank.org. Jere R. Behrman is Professor. Department of Economics. at the University of Pennsylvania. The authors are grateful to Laura Chioda and to three anonymous referees for helpful comments on a previous draft. All remaining errors are ours. 1. For instance. the International Initiative for Impact Evaluation (3IE) has been set up by gov ernments of several countries. donor agencies. and private foundations to address the desire of the development community to build up systematically more evidence about effective interventions. 2. There have been excellent reviews of the choice of methods as applied to social programs. See. for example. Grossman (1994). Heckman and Smith (I99S). Ravallion (2001). Cobb-Clark and Crossley (2003), and Duflo (2004). 3. To keep the discussion focused on the timing issue and the duration of exposure. we avoid dis cussing the specific evaluation method (or methods) used by the empirical studies that we cite. However. we restrict our selection of studies to review to those that have a sound evaluation design, whether experimental or using econometric techniques. Nor do we discuss estimation issues such as sample attrition bias. which is one of the ways in which a duration issue has been taken into account in the evaluation literature. 4. See Heckman. Lalonde. and Smith (1999) for a review. 5. Because Rouse and Krueger (2004) define the treatment group more stringently. however, the counterfactual treatment received by the control students becomes more mixed, and a share of these students is contaminated by partial participation in the program. 6. In their assessment of the returns to World Bank investment projects. Pohl and Mihaljek (1992) cite construction delays among the risks that account for a wedge between ex ante (apprai sal) estimates and ex post estimates of rates of returns. They estimate that. on average, projects take conSiderably more time to implement than expected at appraisal: six years rather than four years. 7. In Mexico's well-known PROGRESA program. payment records from an evaluation sample showed that 27 percent of the eligible population had not received benefits after almost two years of program operation, possibly as a result of delays in setting up the program's management infor mation system (Rawlings and Rubio 20(5). In Ecuador's Bono de Desarrollo Humano. the lists of the beneficiaries who had been allocated the transfer through a lottery did not reach program oper ators, and so about 30 percent of them did not take up the program (Schady and Araujo 2(08). 8. Chin (2005) makes a one-year adjustment in her evaluation of Operation Blackboard in India. Although the Indian government allocated and disbursed funds for the program for the first time in fiscal 1987, not all schools received program resources until the following school year. In addition to the delay in implementation. Chin also finds that only one-quarter and one-half of the project teachers were sent to one-teacher schools. while the remaining project teachers were used in ways the central government had not intended. Apparently, the state and local governments had exercised their discretion in the use of the OB teachers. 9. In two programs that we know, administrative records at the individual level were maintained at local program offices. not at a central program office, and local record-keeping varied in quality and form (for example. some records were computerized and some were not). so that a major effort was required to collect and check records during the evaluations. 10. Leonard (2008) provides an example of such an effect. He uses the presence of a Hawthorne effect. produced by the unexpected arrival of a research team to observe a physician, in order to measure an exogenous. short-term change in the quality of service provided by a physician. Indeed. there was a significant jump in quality upon the arrival of observers, but quality returned to pre visit levels after some time. 11. Information campaigns for programs that attempt to improve primary-school quality or to enhance child nutrition through primary-school feeding programs in a context in which virtually 78 The World Bank Research Ollsaver, vol. 24. no. 1 (February 2009) all primary-school-age children are already enrolled would seem less relevant than such campaigns as part of a new program to improve preschool child development where there had previously been no preschool programs. 12. The authors estimate the impact of atypically low rainfall levels by including a year's delay because the food shortages would be apparent only one year after the drought, but before the next harvest was ready. 13. To estimate these longer-run impacts. Alderman. Hoddinott. and Kinsey (2006) combine data on children's ages with information on the duration of the civil war and the episodes of drought used in their analysis. They undertook a new household survey to trace children measured in earlier surveys. References Alderman, Harold, John Hoddinott, and William Kinsey. 2006. "Long Term Consequences of Early Childhood Malnutrition." Oxford Economic Papers 58(3):450-74. Angrist, Joshua. Eric Bettinger. and Michael Kremer. 2004, "Long-Term Consequences of Secondary School Vouchers: Evidence from Administrative Records in Colombia." National Bureau of Economic Research Working Paper No. 10713, August. Angrist. Joshua D.. Eric Bettinger. Erik Bloom. Elizabeth M. King, and Michael Kremer. 2002. "Vouchers for Private Schooling in Colombia: Evidence from a Randomized Natural Experiment." American Economic Review 92 ( 5): 1 5 3 5 - 58. Armecin. Graeme. Jere R. Behrman. Paulita Duazo, Sharon Ghuman. Socorro Gultiano. Elizabeth M. King. and Nannette Lee. 2006. Early Childhood Development through an Integrated Program: Evidence from the Philippines. Policy Research Working Paper 3922. Washington. DC: World Bank. Banerjee, Abhijit. Shawn Cole, Esther Duflo. and Leigh Linden. 2007. "Remedying Education: Evidence from Two Randomized Experiments in India." Quarterly Journal of Economics 122(3): 1235-64. Banerjee, Abhijit v.. Rukmini Banerji, Esther Dufio, Rachel Glennerster. and Stuti Khemani. 2008. Pitfalls of Participatory Programs: Evidence from a Randomized Evaluation in Education in India. Policy Research Working Paper 4584. Washington. DC: World Bank. Behrman. Jere R.. and John Hoddinott. 2005. "Programme Evaluation with Unobserved Heterogeneity and Selective Implementation: The Mexican 'Progresa' Impact on Child Nutrition." Oxford Bulletin of Economics and Statistics 67(4):547-69. Behrman. Jere R.o Yingmei Cheng. and Petra E. Todd. 2004. "Evaluating Preschool Programs When Length of Exposure to the Program Varies: A Nonparametric Approach." Review of Economics and Statistics 86(1):108-32. Behrman, Jere R.. Piyali Sengupta, and Petra Todd. 2005. "Progressing through PROGRESA: An Impact Assessment of Mexico's School Subsidy Experiment." Economic Development and Cultural Change 54(1):237-75. Chen, Shaohua, Ren Mu, and Martin Ravallion. 2008. Are There Lasting Impacts of Aid to Poor Areas? Policy Research Working Paper 4084. Washington, DC: World Bank. Chin, Aimee. 2005. "Can Redistributing Teachers across Schools Raise Educational Attainment? Evidence from Operation Blackboard in India." Journal of Development Economics 78(2):384-405. Cleland, John. Stan Bernstein, Alex Ezeh, Anibal Faundes, Anna Glasier. and Jolene Innis. 2006. "Family Planning: The Unfinished Agenda." Lancet 368(November):181O-27. Cobb-Clark, Deborah A.. and Thomas Crossley. 2003. "Econometrics for Evaluations: An Introduction to Recent Developments." Economic Record 79(247):491- 5 11. King and Behrman 79 Duflo. Esther. 2001. "Schooling and Labor Market Consequences of School Construction in Indonesia: Evidence from an Unusual Policy Experiment." American Economic Review 91(4): 795-813. ___. 2004. "Scaling Up and Evaluation." InE Bourguignon. and B. Pleskoviceds.. Annual World Bank Conference on Development Economics: Accelerating Development. Washington, DC: World Bank. Engle. Patrice L.. Maureen M. Black. Jere R. Behrman, Meena Cabral de Mello, Paul J. Gertler, Lydia Kapiriri. and Reynaldo Martorell. and Mary Eming Young. and the International Child Development Steering Group. 2007. "Strategies to Avoid the Loss of Developmental Potential in More Than 200 Million Children in the Developing World." Lancet 369(January):229-42. Foster. Andrew D.. and Mark R. Rosenzweig. 1995. "Learning by Doing and Learning from Others: Human Capital and Technical Change in Agriculture." Journal of Political Economy 103(6): 1176-1209. Frankenberg. Elizabeth. Wayan Suriastini. and Duncan Thomas. 2005. "Can Expanding Access to Basic Health Care Improve Children's Health Status? Lessons from Indonesia's 'Midwife in the Village' Programme." Population Studies 59(1):5-19. Galasso, Emanuela, and Jeffrey Yau. 2006. Learning througf, Monitoring: Lessons from a Large-Scale Nutrition Program in Madagascar. Policy Research Working Paper 4058. Washington, DC: World Bank. Garces, Eliana. Duncan Thomas. and Janet Currie. 2002. "Longer-Term Effects of Head Start." American Economic Review 92(4):999-1012. Gertler. Paul J. 2004. "Do Conditional Cash Transfers Improve Child Health? Evidence from Progresa's Control Randomized Experiment." tlmerican Economic Review 94(2):336-41. Grantham-McGregor. Sally. Yin Bun Cheung. Santiago Cueto, Paul Glev.rwe, Linda Richter, and Barbara Strupp, and the International Child Development Steering Group. 2007. "Developmental Potential in the First 5 Years for Children in Developing Countries." Lancet 369(9555):60-70. Grossman. Jean Baldwin .. 1994. "EvalUating Social Policies: Principles and u.s. Experience." World Bank Research Observer 9(2): 159 -80. Heckman. James J., and Jeffrey A. Smith. 1995. '1\ssessing the Case for Social Experiments." Journal of Economic Perspectives 9(2):85 -110. Heckman, James J., R. J. Lalonde, and Jeffrey A. Smith. 1999. "The Economics and Econometrics of Active Labor Market Programs." In Orley Ashenfelter. and David Card, eds., Handbook of Labor Economics. Vol. 1. Amsterdam: North-Holland. Hoddinott, John. and William Kinsey. 2001. "Child Growth in the Time of Drought." Oxford Bulletin of Economics and Statistics 63(4):409-36. Jimenez, Emmanuel, and Yasuyuki Savvada. 1999. "Do Community-Managed Schools Work? An Evaluation ofEI Salvador's EDUCO Program." World Bank Economic Review 13(3):415-41. Kikumbih, Nassor. Kara Hanson, Anne Mills, Hadji Mponda, and Joanna Armstrong Schellenberg. 2005. "The Economics of Social Marketing: The Case of Mosquito Nets in Tanzania." Social Science and Medicine 60(2):369-81. King, Elizabeth M.. and Berk Ozier. 1998. What's Decentralization Got to Do with Learning? The Case of Nicaragua's School Autonomy Reform. Working Paper on Impact Evaluation of Education Reforms 9 (June). Washington, DC: Development Research Group, World Bank, King. Elizabeth M.. Peter E Orazem, and Darin Wohlgemuth. 1999. "Central Mandates and Local Incentives: Colombia's Targeted Voucher Program." World Bank Economic Review 13(3):467-91. 80 The World Balik Research Observer. vol. 24, 110. 1 (February 2009) Leonard. Kenneth L.. 2008. "Is Patient Satisfaction Sensitive to Changes in the Quality of Care? An Exploitation of the Hawthorne Effect." Journal of Health Economics 27(2):444- 59. Miguel. Edward. and Michael Kremer. 2003. "Social Networks and Learning about Health in Kenya." !l