78759 Impact Assessment Framework: SME Finance OCTOBER 2012 Impact Assessment Framework: SME Finance 1 Impact Assessment Framework: SME Finance October 2012 Prepared by the World Bank on behalf of the G20 Global Partnership for Financial Inclusion (GPFI) SME Finance Sub-Group. Available online at: www.gpfi.org, www.worldbank.org/financialinclusion Acknowledgments This report was commissioned by the GPFI SME Finance Sub-Group, and prepared by Claudia Ruiz (lead author) and Inessa Love, under the guidance of Douglas Pearce. The report benefited from input and review by: Giorgio Albareto (Banca d’Italia), Aidan Coville (World Bank), Elizabeth Davidson (World Bank), Susanne Dorasil (BMZ), Felipe Alexander Dunsch (World Bank), Aurora Ferrari (World Bank), Matthew Gamser (IFC), Randall Kempner (Aspen Network of Development Entrepreneurs), Leora Klapper (World Bank), Miriam Koreen (OECD), David McKenzie (World Bank), and Riccardo Settimo (Banca d’Italia). The report was completed under the leadership of the co-chairs of the G20 SME Finance Sub-Group: Susanne Dorasil (Federal Ministry for Economic Cooperation and Development, Germany), Aysen Kulakoglu (Treasury of Turkey), and Jonathan Rose (U.S. Treasury). © 2012 International Bank for Reconstruction and Development / The World Bank 1818 H Street NW Washington DC 20433 Telephone: 202-473-1000 Internet: www.worldbank.org This work is a product of the staff of The World Bank with external contributions. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of The World Bank, its Board of Executive Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. The boundaries, colors, denominations, and other information shown on any map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. Rights and Permissions The material in this work is subject to copyright. Because The World Bank encourages dissemination of its knowledge, this work may be reproduced, in whole or in part, for noncommercial purposes as long as full attribution to this work is given. Any queries on rights and licenses, including subsidiary rights, should be addressed to the Office of the Publisher, The World Bank, 1818 H Street NW, Washington, DC 20433, USA; fax: 202-522-2422; e-mail: pubrights@worldbank.org. Global Partnership for Financial Inclusion 2 Impact Assessment Framework: SME Finance Table of Contents Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 I. Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 II. Why Are Impact Evaluations Relevant for SME Policies? . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 III. Menu of SME Finance Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 IV. Implementing an Impact Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Operational Aspects of an Impact Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Budget Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Time Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Selecting an Impact Evaluation Method for an SME Finance Policy . . . . . . . . . . . . . . . . . . . . 15 Steps in the Impact Evaluation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 V. Impact Evaluation Methods—The Experimental Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 20 VI. Impact Evaluation Methods—Non-experimental Approaches . . . . . . . . . . . . . . . . . . . . . . . 23 Difference-in-Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Instrumental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Regression Discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Propensity Score Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 VII. Minimal Standard Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 VIII. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Appendix 1. General Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Appendix 2. Size and Power of RCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Appendix 3. Examples of Impact Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Appendix 4. Assumptions, Strengths, and Limitations of Different Approaches . . . . . . . . . . . . 44 Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 3 Figure 1. Evaluation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Figure 2. Suggested Designs for Evaluations Planned Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Figure 3. Suggested Designs for Evaluations Not Planned Ahead . . . . . . . . . . . . . . . . . . . . . . . . 17 Figure 4. The DD Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 5. Random Discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Table 1. Examples of SME Finance Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 2. Steps in the Impact Evaluation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Box 1. Types of random assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Box 2. Public Sector Intervention Evaluation: Business Training in Bosnia and Herzegovina . . . 20 Box 3. Encouragement Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Box 4. Regulatory Reform Evaluation: Business Registration in Mexico . . . . . . . . . . . . . . . . . . . 24 Box 5. Public Intervention Evaluation: Thailand Microfinance Fund . . . . . . . . . . . . . . . . . . . . . . . 26 Box 6. Financial Infrastructure Evaluation—Role of Angel Funds in U.S. Start-up Firms . . . . . . . 28 Box 7. Public Intervention Evaluation: Chile’s Supplier Development Program . . . . . . . . . . . . . . 31 Box 8. Changes in Behavior in Response to Program Assignment Experiment . . . . . . . . . . . . . . 36 Box 9. Scaling Up Small Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Global Partnership for Financial Inclusion 4 Impact Assessment Framework: SME Finance Executive Summary Small and medium enterprises (SMEs) are a policy provides an overview of the relevance, application, priority for many countries, given their significance strengths, and limitations of impact evaluation in terms of employment and economic activity. techniques. Relevant operational information Many new policies, legal reforms, programs, and regarding budget and timing issues is also present funds from both the public sector and donors focus in the Framework. on access to financial services and investment The Framework covers experimental and non- for SMEs. It is therefore important to assess and experimental approaches that can be used to understand the impacts of these interventions to evaluate a broad context of SME policies and support SME finance so that they can be designed programs and provides examples of actual impact and implemented to most effectively meet their evaluations for each of the components of the SME goals in a particular market or country. Finance Policy Guide (GFPI 2011). Impact evaluation is an empirical assessment of The experimental approaches discussed in this whether a program or policy has achieved desired Framework include basic randomized control objectives. Impact evaluations help policy makers trials, oversubscription, randomized phase-in, and to quantify the effects of different policies, design encouragement design. All of these approaches the most effective interventions (that is, programs, rely on a randomization device that allows the policies, and regulations), improve targeting, refine evaluator to isolate the impact of a policy: policies to better fit objectives, optimize the scarce use of resources, and understand the underlying „„ Basic RCTs refer to classic random assignments mechanisms. Tracking the impact of a policy, that take a baseline survey and randomly select regulation, or program during its implementation some SMEs to receive the intervention. This (real-time impact evaluation) allows modifications approach can prove useful for interventions to be made that can ensure the intended results that are not implemented at the national level, are achieved. such as local/regional interventions. Surprisingly, the cost of more rigorous impact „„ In the oversubscription design, a subset of firms evaluations is not much higher than the cost of is randomly assigned to receive a program minimal-standard monitoring. The most expensive from the set of eligible firms that apply to it. This part of both monitoring and assessing impact is approach is useful to evaluate interventions in collecting new data. If data are available, then which the lack of funds necessitates limiting the difference in cost between two methods is the number of firms that can participate in the not substantial. For instance, in cases where intervention. administrative data can be used, the budget to design and implement an impact evaluation is „„ The randomized phase-in approach significantly reduced. randomizes the timing or sequence in which a project is rolled out. As its name suggests, this This Impact Assessment Framework discusses methodology is well suited to evaluate policies the importance of rigorous impact evaluation and that are implemented at different stages. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 5 „„ In the encouragement design, certain firms not be associated with the outcome variable are randomly promoted (for instance, through for reasons other than participation in the financial incentives or marketing campaigns) to intervention. For instance, if a lending project participate in the program, although the program took place in a municipality with a particular is available to the rest of the population. This political party ruling, then the presence of approach can be used to evaluate policies this political party would strongly predict implemented at the national level and that were SME exposure to the lending project. But any not rolled out differently. change in SME outcomes should be due to the project itself and not through other channels The experimental approach allows for credible associated to the political party in charge. identification of the intervention impact and can be used to plan impact evaluations of different types in „„ Regression discontinuity is used to evaluate advance. However, experimental evaluations need interventions in which a defined cutoff to be set up before the policy or program is put in determines eligibility (such as policies provided place, and their findings might not hold in different to certain SMEs with less than a specific contexts (an issue commonly referred to as external number of workers in the year before the validity). intervention). By comparing the outcomes of firms that just passed the cutoff with firms that The non-experimental methodologies covered in just missed the cutoff, evaluators can measure this Framework include difference-in-differences, the intervention’s effect. instrumental variables, regression discontinuity, and propensity score matching. Unlike experiments, „„ The propensity score matching (PSM) non-experimental evaluations do not include an methodology can be used to evaluate an exogenous device planned in advance to isolate SME intervention in which the institutional the impact of a policy. Thus, these methods rely on arrangements that defined selection into the identifying a control group and then using statistical project are observed and known, and a control techniques to ensure the impact estimate is properly group is not maintained. A control group can be measured. These approaches are commonly used made up of firms not participating in the program, to evaluate policies when an evaluation was not and the impact of the intervention determined planned in advance. by comparing the evolution of outcomes over time between the two sets of firms. „„ The difference-in-difference approach uses a comparable group of firms that was not exposed While the lack of a randomization device to the policy of interest as its control group. The makes it more challenging for non-experimental approach then compares the outcomes over methodologies to isolate the impact of an time of SMEs exposed to the policy relative to intervention, when done properly, these other firms from the control group. As long as a approaches provide robust estimates of the effect control group can be identified, this approach of interventions. could be used to evaluate a variety of policies, The Framework also discusses minimal standard including national-level interventions targeting monitoring, which consists of monitoring outcomes SMEs and interventions at the regional level, over time for the subjects receiving the intervention. among others. The main difference with other impact evaluation „„ The instrumental variables approach relies methods is that a minimal standard monitoring on instruments to isolate the impact of a does not follow a control group to identify the effect policy. Instruments are strong predictors of of a policy, which makes the results less rigorous participation in the intervention but should and credible. Global Partnership for Financial Inclusion 6 Impact Assessment Framework: SME Finance This Framework provides insights and criteria on the The Framework offers the following overall basis of which a suitable approach can be selected guidance: to evaluate an SME Finance policy, regulation, or „„ To isolate a policy’s effect, it is important to program, including: conduct a rigorous impact evaluation instead „„ Basic RCTs are well suited to evaluate SME of relying on before–after comparisons, which interventions that have a clear distinction tend to generate flawed results. between those who participate in the program „„ Impact evaluations planned ahead of the and those who do not (for example, public intervention offer more evaluation method programs providing financial training to SMEs). options than evaluations conducted after the „„ Approaches that randomize the rollout of the program or policy has been rolled out. Thus, it implementation through randomized phase-in pays to plan evaluations before the intervention or encouragement design can be more suitable has started. to evaluate interventions where the distinction „„ There is no “one size fits all� approach to impact of who participates is not clear, such as broad evaluation, and the most appropriate approach SME finance policies or regulatory reforms. to evaluate an intervention will depend on the „„ To evaluate policies such as bank lending to operational characteristics of the policy being SMEs, where institutions follow certain criteria evaluated. to select eligible firms, both oversubscription „„ Rigorous impact evaluations can be and regression discontinuity might be suitable complemented by qualitative assessments approaches. Oversubscription is particularly to provide a better understanding on the relevant when there are limited resources or functioning, limitations, and strengths of the implementation capacity and demand for a evaluated policy. program or service exceed supply. „„ Data collection is typically the most costly „„ Where the evaluation takes place after the component of an evaluation. Evaluations that policy has been already implemented, the rely on existing data and ongoing or already- evaluation approach is mainly determined planned surveys can save on this cost by the characteristics of the intervention and component. how it was implemented. For instance, the difference-in-difference approach might be well „„ Real-time impact evaluation during suited to evaluate policies aimed at improving implementation allows modifications to opportunities for female-led SMEs (since be made to help ensure that the intended the evaluator can compare the evolution of impacts are achieved. Rigorous impact female-led relative to male-led SMEs) or SME evaluation assessments can improve the interventions that were rolled out sequentially design, implementation, and impact of policies, across regions for political or logistical reasons, regulations, and programs to support SME such as financial infrastructure projects. finance. „„ Alternatively, policies with a cutoff that determined who was eligible for the intervention are well suited for the regression discontinuity approach, such as a factoring project for SMEs employing fewer than 50 workers at the time of registration. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 7 I. Introduction and Overview SMEs play a key role in economic development This Framework was prepared as a resource for and make an important contribution to employment. regulators and policy makers to provide an over- Financial access is critical for SME growth and view of methodologies used to evaluate the impact development, and the availability of external finance of various SME finance policies, interventions, and is positively associated with productivity and growth. regulations. The Framework provides a compre- However, access to financial services remains a hensive set of impact evaluation techniques; their key constraint to SME growth and development, key assumptions, strengths, and limitations; and especially in emerging economies (GFPI 2011). examples of their implementation in SME finance policy contexts.1 The techniques described in this Policy makers and regulators have a wide menu of Framework can be applied to real-time impact as- tools at their disposal to support increased access sessment that feeds back into policy implementa- to financial services, as demonstrated in the tion. Operational aspects of impact evaluation, such comprehensive GPFI SME Finance Policy Guide as budget and timing issues, are also discussed in (2011). Financial access for SMEs can be expanded the Framework. As detailed in the Framework, the by promoting a favorable legal and regulatory impact evaluation approaches can then be selected environment, complemented by a sound financial to suit different policy contexts and priorities. infrastructure and targeted public interventions. It is important to assess the impacts of various policies The first part of the Framework introduces the vari- in order to prioritize, tailor, and sequence reforms ous impact evaluation approaches, discusses bud- to be most effective in addressing constraints to get and time considerations for planning an evalu- financial access in a particular market or country. ation, presents an outline of all necessary steps in the impact evaluation process, and maps evalua- Impact evaluations assess whether a program or tion approaches to different types of SME finance policy has achieved the desired objectives. These policies. The role of qualitative assessments as a evaluations are usually systematic empirical studies, complement of impact evaluation is also discussed. most often using actual data and statistical methods The second part of the Framework addresses in to measure outcomes and quantify the impact of more detail the different methods. Section V de- the program or policy. Impact evaluations are a key scribes the experimental approach. Section VI cov- ingredient for policy analysis and for understanding ers non-experimental methodologies, which range what works—that is, what are the most effective from difference-in-difference and instrumental vari- policies to achieve desired objectives, such as ables to regression discontinuity and propensity alleviating poverty, increasing access to finance, score matching. Section VII describes the minimal or enhancing growth and development in certain standard monitoring, discusses its advantages and contexts. Thus, it is important to include impact disadvantages, and contrasts this method to more evaluation in the design of policy and legal reforms rigorous impact evaluation techniques. Appendices and interventions. ­­ 1 and 2 present technical considerations regarding 1 The intention of the Framework is to provide an overview of impact evaluation methods and how they can be applied, rather than to present an exhaustive survey of all existing or ongoing evaluations. Global Partnership for Financial Inclusion 8 Impact Assessment Framework: SME Finance estimation approaches. Appendix 3 compiles some a systematic review of various evaluation methods examples of impact evaluations of SME finance relevant to SME finance, with pros and cons of interventions. Finally, Appendix 4 summarizes the each method and examples from their applications key assumptions, strengths, and limitations of each in SME finance policies. Gertler et al. (2011) offer evaluation approach examined in the Framework. a comprehensive impact evaluation guideline with detailed information on operational and technical Several recent surveys on the topic of impact issues. Bauchet et al. (2011) provide an excellent evaluation are relevant for this paper. McKenzie survey of randomized evaluations of microfinance. (2010) offers a survey of impact evaluations in Winters, Salazar, and Maffioli (2010) provide a a broader area of finance and private sector thorough survey of impact evaluations of agricultural development. He makes a strong case for impact projects. While the objective of this framework is also evaluations in the financial private development to review different impact evaluation approaches, area. This paper complements his work, as it offers our focus is on SME finance policies. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 9 II. Why Are Impact Evaluations Relevant for SME Policies? SME interventions can benefit from using impact „„ Help prioritize resources by identifying the most evaluations in various ways. These evaluations can: cost-effective policies; and „„ Clarify the effect that interventions have on „„ Make it possible to trace the different stages of firms’ outcomes and whether that impact an intervention so that evaluators are able to achieved the expected objectives; distinguish which key step in the program is not working as expected. „„ Help to improve existing programs by comparing alternative design choices (for instance, Unlike minimal-standard monitoring or simple comparing the performance of loan contracts before-and-after comparisons, impact evaluations with weekly versus monthly payments); isolate the effect of an intervention from all other factors that might alter the outcome of interest. „„ Improve program targeting by identifying which firms benefit the most, or what barriers prevent others from gaining from interventions; Global Partnership for Financial Inclusion 10 Impact Assessment Framework: SME Finance III. Menu of SME Finance Policies While the impact evaluation methods presented are those examined in the GPFI SME Finance in this Framework can serve to evaluate a broad Policy Guide, which are classified in three groups: set of SME and financial inclusion interventions, (1) regulatory and supervisory frameworks; (2) the Framework’s main focus are SME Finance financial infrastructure; and (3) public interventions. policies. More concretely, the SME finance reforms Table 1 provides examples of these policies by type and interventions that the Framework covers of intervention. Table 1. Examples of SME Finance Policies INTERVENTION TYPE EXAMPLES 1. Regulatory and supervisory frameworks Frameworks to promote competition Regulations enabling entry of new banks Regulatory framework for licensing requirements 2. Financial infrastructure Insolvency regime Bankruptcy reforms Credit information systems Introduction of credit bureaus, credit registries Equity Investment Reforms encouraging venture capital, angel funds Accounting and auditing standards for SMEs Reforms facilitating business registration procedures 3. Public interventions Public credit guarantee (PCG) schemes Funds for guarantee to SMEs Lending by state-owned financial institutions Micro and SME finance programs Apexes and other wholesale funding facilities Direct lending in the form of grants SME capacity, creditworthiness Business/financial literacy training for entrepreneurs Value-chain organization projects Subsidies to promote technology transfer to SMEs Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 11 IV. Implementing an Impact Evaluation The key challenge in evaluating the impact of any SMEs that obtain a loan may be fundamentally program is to ensure that observed outcomes different from those that do not. While controlling are a direct result of the program itself and would observables (such as size, age, and industry) may not have occurred without the program. Without reduce these differences, some of the important credibly addressing this, the impact evaluation may differences are more difficult to observe—such assign the outcome to the program, when in reality as the entrepreneurial talent of the owners, their it could have occurred without it. different risk preferences, or their social support network. Observed differences in performance To clearly see the issue at stake, suppose a program between these two groups may be attributed to affects some SMEs but not others. In essence, two these latent (that is, unobservable) differences in questions must be addressed to resolve the issue: characteristics rather than to access to finance. (1) How would SMEs who have participated in the program have done without the program? and (2) Impact evaluation techniques deal with these How would those who have not participated in the issues by identifying a proper counterfactual program have fared if they had participated? These group to compare with the group of SMEs that questions are referred to as counterfactual because were affected by the policy and, in this way, neither of these scenarios occurred in reality and estimate as cleanly as possible the effect of the thus are unobservable. policy of interest. In general, impact evaluation approaches can be classified in two broad groups: Observing the same SME over time will not, in experimental and non-experimental (see Figure most cases, give a reliable estimate of the impact 1).2 Experimental methodologies randomly assign the program had on it because many other things the intervention between the group that participates may have changed at the same time the program in the program or policy (the treatment group) was introduced. The solution to this problem is to and the group that does not (the control group) to estimate the average impact of the program rather ensure that any difference between these groups than the impact on each firm. One way to do that is is attributed to the intervention. There are different to compare the average impact on the group that ways to conduct randomized assignment. The has participated in the program (also known as the most common randomized approaches that will “treatment� group) with an outcome for a similar be discussed in the Framework are described in group that has not (the “comparison� or “control� Box 1. Non-experimental approaches—such as group). difference-in-difference, instrumental variables, The challenge is to ensure that this comparison regression discontinuity, and propensity score group is identical to the treatment group in all matching—identify a control group and then use ways, except for participating in the program. For statistical techniques to ensure that the impact example, to evaluate the impact of access to finance estimate is properly measured. Sections V and VI on SME productivity, it is not sufficient to compare of the Framework describe in detail each of these those with a loan to those without one because methods, their assumptions, and their advantages 2 While minimal standard monitoring is not considered a rigorous impact evaluation method, it is a widely used approach to monitor the effect of policies on the targeted subjects. Section VII describes this method in more detail. Global Partnership for Financial Inclusion 12 Impact Assessment Framework: SME Finance and disadvantages, providing examples of specific program participants and stakeholders about the SME finance policies that were evaluated with them. policy, its success, and its limitations. Through Appendix 4 summarizes the main assumptions and surveys, interviews, focus groups, and/or case characteristics of the evaluation methods discussed studies, qualitative evaluations collect additional in this paper. information that sheds light on the satisfaction of participants, on the relevant mechanisms While discussing in detail qualitative assessments responsible for the impact of the intervention, and is beyond the focus of this Framework, it is worth on general feedback to adjust and improve the mentioning that these types of analysis are an operation of the policy or intervention. The OECD important complement to the findings reached Framework for the evaluation of SME policies by through a rigorous impact evaluation. Qualitative Storey and Potter (2007) provides an in-depth assessments are commonly based on opinions of review of these assessments. Figure 1. Evaluation Approaches Box 1. Types of Random Assignment Basic assignment. The classic model for random assignment is to take a baseline survey and randomly assign some participants to the project. This can be done on the level of individuals, firms, schools, or villages. Oversubscription design. In this design, all eligible candidates are allowed to apply to the program, and a subset of all applicants is randomly assigned to receive the program (via a lottery system, for example). This design is useful when resources are limited and the demand for a program or service exceeds supply. This design can also be useful in randomizing among marginal loan applicants, as in Karlan and Zinman (2010). Randomized phase-in. Because of the resource constraints, some units (individuals or geographic areas) subject to the program cannot receive the treatment at the same time. In such cases, randomizing who receives the program first is a fair way to allocate the resources and also allows for an impact evaluation of the program’s effectiveness. Encouragement design. In this design, some individuals or firms are randomly “encouraged� (via financial incentives or marketing materials) to participate in the program, even though the program is available to the rest of the population. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 13 Operational Aspects of an Impact in which the data will be gathered, and the labor costs of each country. Yet the Alliance for Finan- Evaluation cial Inclusion (AFI) provides estimated survey costs Budget Considerations for different types of surveys with different sample sizes. According to AFI, a nationally representative The overall cost of implementing an impact cross-sectional survey could range from $100,000 to evaluation usually represents a small fraction of $700,000 depending on the sample size (from 1,000 the total cost of the intervention. While the cost to 7,000 observations) and the country where the of an impact evaluation varies, it is possible to survey was conducted. Information from the Living generate reasonable estimates up-front based on Standards Measurement Study of the World Bank understanding the main cost drivers. These costs can indicates that survey costs range from $150 to $300 be broadly categorized into “technical assistance� per household, with a usual sample size between and “data collection,� with data collection being the 3,000 to 5,000 households. most important cost driver, generally constituting approximately 60 to 80 percent of the cost of an The possibility of using administrative data can impact evaluation. For instance, while the average thus greatly reduce impact evaluation costs, but World Bank impact evaluation costs $500k to assessing the availability of relevant data is critical $900k (Gertler et al. 2011) when data collection is and needs to consider the following factors: required, the cost declines to $50k to $200k when 1. Impact evaluations require data before and administrative data can be used. after the intervention in both control and Administrative data consists of information treatment groups. Administrative data would collected for some official purpose, such as need to be available over these time periods reporting to government agencies or maintaining for these population groups. records of program participants. While this data is 2. The more time points available, the more not designed to perform evaluations, if the available accurate the results: data available at regular indicators fit with the objectives of the evaluation, intervals for the indicators of interest improve administrative data is a valid option to consider.3 precision, preferably available before, during, Data Collection Costs and after the intervention. It is difficult to determine the costs of data collection 3. Access and confidentiality can be challenging: precisely since these will depend on different vari- While administrative data may exist, accessing ables such as the sample size needed, the type of these data may be difficult for security reasons. data to be collected (household, individual, admin- In addition, the time required to access the data istrative), the length of the surveys, the frequency in a workable format needs to be factored into the process. 3 For example, the Italian tax authority conducts a “Sector Studies� survey to collect information on SMEs activities, economic outcomes, and other variables with the objective of computing how much SMEs pay in taxes. These administrative data have been used in different evaluation projects. In Chile, the Suppliers Development Program, which seeks to strengthen the commercial linkages between small- and medium-sized local suppliers and their large firm customers, keeps records of all participating firms. These records were used in an evaluation to understand the effect of the program on SME productivity (Arraiz, Henriquez, and Stucchi 2011). Global Partnership for Financial Inclusion 14 Impact Assessment Framework: SME Finance 4. Available indicators dictate the questions that monitoring include monitoring costs of the can be asked: The types of outcomes that evaluation (if planned ahead) and researchers’ can be monitored are restricted to the types of time, but these are usually a small part of the overall indicators collected in the administrative data. budget. In addition, minimal-standard monitoring Evaluators must make sure that the available does not need to collect data on the control groups. indicators allow them to monitor the main It does, however, need data on the periods before outcomes of interest for the evaluation. the intervention started and after it was rolled out. 5. Data format and quality: Administrative data Technical Assistance Costs are usually collected for purposes other than In addition to data collection costs, impact statistical analysis. As such, data may not evaluation work requires budgeting staff time, necessarily be in a format that can be directly travel arrangements, and dissemination costs. analyzed, requiring effort to clean and reframe Contributions from researchers are not needed for analysis purposes. The quality of these data throughout the whole project timeframe; they also needs to be scrutinized if the evaluation mainly contribute work for the impact evaluation team has not been involved in its collection. design and sampling, as well as data cleaning and In many cases, administrative data are not available analysis. However, most impact evaluations that in exactly the right format needed for the impact include data collection need a constant presence evaluation for any of the reasons described above. in the field, such as a field coordinator, to monitor However, it is often possible to work with the office data collection efforts. Still, these costs are usually responsible for collecting the administrative data a smaller part of the overall budget compared to to adapt the data collection activities (for example, data collection. by adding specific questions to the larger survey or by including control group data collection). Ex In summary, the main budget items of an impact ante evaluations allow the administrative data to be evaluation are: adapted to suit the needs of the evaluation, which „„ Data collection. The team should identify is not possible when relying on historical data for an all primary and secondary data collection ex post evaluation. requirements and provide a budget for Impact evaluation methods will only affect costs in completion (minimum baseline and follow-up as much as they influence the data requirements. data), including qualitative and/or cost-analysis For instance, approaches such as propensity data collection requirements where applicable. score matching or regression discontinuity „„ Impact evaluation team. The budget should require information on a large set of subjects. include all staff and consultant time for Non-experimental methods such as difference-in- managing the impact evaluation, including difference also require baseline data to ensure that design, implementation, and analysis. the control and treatment groups are comparable. Though more limited in its precision, an RCT is the „„ Travel. All necessary travel costs for required only method that does not specifically require a project supervision must be factored in, baseline to be conducted since, by definition, the including airfare, accommodations, and food. control and treatment groups will be comparable. „„ Specialists. The budget should include any However, it is generally good practice to collect additional consultant time and travel for baseline data for any evaluation method used. technical assistance (such as survey instrument Additional costs of impact evaluations not development, data quality control, and data necessarily associated with minimal-standard entry program development). Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 15 „„ Dissemination plan. Any costs associated with conduct multiple surveys (midline and endline), which travel or logistics must be taken into account for allow the evaluators to draw short-term and long- at least one field-based presentation at baseline term conclusions. In addition, tracking the progress and one at follow-up, as well as any costs of the intervention with a midline survey may help associated with producing written materials. to realign the program to improve the overall project outcomes. Follow-up surveys that measure long- „„ Miscellaneous. The budget should include any term impacts after the program implementation often additional costs related to the impact evaluation, produce the most convincing evidence regarding such as payments for institutional review of the program effectiveness. research protocol. The timing of an evaluation must also account for Time Considerations when certain information is needed to inform decision making and must synchronize evaluation and data Ideally, impact evaluations should be planned prior collection activities to key decision-making points. to the rollout of the program. Doing so allows the The production of results should be timed to inform team to collect meaningful pre-intervention baseline budgets, program expansion, or other policy decisions. data and organize the project implementation for an eventual RCT (allocation of treatment and control group) and helps stakeholders to reach consensus Selecting an Impact Evaluation on the program objectives. Method for an SME Finance Policy To identify the impact of any intervention, evaluators The operational characteristics of the policy then need to allow sufficient time for the impacts to should guide the selection of the impact evaluation manifest. Both short-term and long-term impacts method. More concretely, there are two important can be considered, depending on the intervention, components of the policy that matter when selecting objectives, and the theory of change backing the an evaluation approach: i) who is eligible to the project design. The following factors need to be program and ii) how eligible subjects are selected weighted to determine when to collect follow-up data to participate or receive the program. There is no (Gertler et al. 2011): “one size fits all� impact evaluation approach, and the best approach will differ with the situation and „„ Program cycle (including program duration), the policy’s characteristics. An additional factor to time of implementation, and potential delays. consider is whether the evaluation was planned ex „„ Expected time needed for the program to affect ante (before the program has started) or is occurring ex post (during or after the program began). outcomes, as well as the nature of outcomes of interest. SME finance impact evaluations that are planned in advance offer more options for evaluation methods „„ Policymaking cycles. than those conducted after the program or policy Often, performing an evaluation too soon after the has been rolled out. Planning ahead has several intervention may miss the important long-term advantages. For instance, the evaluator can carry out consequences. Also, the evaluation timeline must baseline analysis to establish appropriate comparison adapt to the timeline of the project rather than to groups. Evaluators can also decide whether they the evaluation driving the timeline of the project. need to collect specific data not covered in other Evaluators therefore need to be flexible regarding sources. Under some circumstances, evaluators the timing. A strong monitoring system can help track can introduce a randomization device to increase the progress of the actual implementation. comparability of control and treatment groups and thus strengthen the evaluation results. When sufficient budget is available, it is advised to Global Partnership for Financial Inclusion 16 Impact Assessment Framework: SME Finance Figure 2. Suggested Designs for Evaluations Planned Ahead Planned evaluations can be used even in Other very common interventions are those that interventions in which no obvious control group was take place simultaneously and at the national followed. For instance, national interventions that level. Evaluators can still find methods to evaluate were implemented at the same time everywhere the impact of these types of interventions. Think, can still be evaluated with a rigorous method. Think for instance, of interventions trying to reduce of a nationwide intervention in which firms apply the regulatory costs that SMEs face. We might to participate in a program. Evaluators might plan expect these interventions to have a substantially ahead for an encouragement device to evaluate the higher effect on SMEs than on larger firms. If this intervention (such as reducing the cost of applying is the case, then evaluators can plan ahead a for randomly selected firms). Now think of this difference-in-difference evaluation by comparing same intervention but with the additional constraint the performance of SMEs before and after the of limited fund availability, reducing the number of intervention with that of larger firms. firms the program can accommodate. Evaluators Finally, evaluators might be creative and utilize can use an oversubscription design in which firms the lack of information among SMEs about new from the pool of applicants are randomly assigned nationwide interventions. Let us suppose that a to the program while the others are not. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 17 regulation to facilitate the requirements to open that randomly provide an incentive to some groups a business was implemented but not marketed to participate in the program. to the public. Evaluators might then plan an Figure 3 presents a method to help evaluators encouragement design evaluation in which they select an approach for evaluations that were not randomly provide detailed information on the new planned before the intervention. If, for instance, a regulation only to a subset of firms. credit bureau was established in different regions Figure 2 presents a method for selecting the over time, a difference-in-difference approach can most appropriate evaluation approach when evaluate its impact by comparing the outcomes the evaluation is planned ahead of the program over time on regions where the credit bureau implementation. While there is no unique mapping of started (the treatment group) with comparable evaluation approaches for interventions, in general, regions where the credit bureau was not yet interventions that clearly distinguish participants implemented (the control group). from non-participants are good candidates for Sections V and VI discuss in more detail the main RCTs. Several public interventions might fall into features of each of the impact evaluation methods, this category, such as programs providing training providing examples of interventions evaluated or grants to SMEs. In other interventions, such as using each approach and discussing their main regulatory reforms, who receives the benefits and assumptions, advantages, and disadvantages. who does not might not be as clear. These types of interventions might be more suitable to evaluate Appendix 3 discusses several examples of impact approaches that randomize the rollout of the evaluations performed for various SME finance implementation sequentially throughout regions or policies. Figure 3. Suggested Designs for Evaluations Not Planned Ahead Global Partnership for Financial Inclusion 18 Impact Assessment Framework: SME Finance Steps in the Impact Evaluation if the policy was intended to increase employment of SME workers, then a natural indicator to evaluate Process is the number of jobs. Evaluators should identify This section summarizes the recommended steps which indicators they plan to use, keeping in mind that an impact evaluation should follow.4 We the data available to perform the evaluation. classified the main steps into four groups: pre- During the evaluation design stage, the team evaluation assessment, evaluation design, data must review if the indicators to monitor can be collection, and analysis of results. retrieved from data already available or if new During the pre-evaluation assessment, the team data collection is needed. Since collecting data is must have a clear understanding of the intervention the most expensive part of an impact evaluation, that will be evaluated. It is important to know its an effective way to maintain a tight budget is by main operational characteristics, such as eligibility using preexisting data whenever possible. Based criteria for the program and how the eligible SMEs on the intervention’s characteristics and the type are selected for participation. This information of data to be used, evaluators must decide on is crucial since these characteristics will be the the most suitable impact evaluation approach main factors influencing the selection of the proper (mainly, identify which subjects will constitute the impact evaluation method. treated and control groups). In the next section, we provide some guidelines on how to select the At this stage, the team should also identify the appropriate method. objectives for which the policy was designed. Was the policy intended to increase employment of SME Data collection is the third step, and it will apply workers? Was it planned to raise productivity of rural in cases in which evaluators plan to collect new SMEs? Having clearly defined policy objectives will data. This includes the entire process from survey guide the evaluation team to decide which indicators design, to piloting the questionnaires, conducting to monitor throughout the evaluation. For instance, fieldwork, and validating the data. Table 2. Steps in the Impact Evaluation Process I. Pre-evaluation assessment  ave a clear understanding of the characteristics of the „„ H intervention „„ Identify objectives of the intervention „„ Identify the outcomes/indicators to evaluate II. Evaluation design „„ Review data available to perform evaluation and determine whether new data is needed „„ Select an impact evaluation method III. Data collection (if needed) „„ Design survey „„ Pilot questionnaires „„ Conduct fieldwork „„ Process and validate data IV. Analysis of results „„ Produce findings of the evaluation 4 Gertler et al. (2011) provide an in-depth description of a roadmap for impact evaluations. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 19 In the final stage, evaluators analyze the outcomes in the treatment and the control groups and produce the results. At this stage, the evaluators can determine the impacts of the intervention and present them to the appropriate policy makers. Table 2 outlines the main activities to follow at each step of the impact evaluation process. Global Partnership for Financial Inclusion 20 Impact Assessment Framework: SME Finance V. Impact Evaluation Methods—The Experimental Approach In recent years, randomized experiments, also identification problem—ensuring the outcome of known as randomized control trials (RCTs), have the program or policy would not have occurred in increasingly become the preferred method of such program or policy’s absence. evaluation for many development economists Box 2 describes an RCT that evaluated a business (Duflo and Kremer 2006). The essence of the training program targeted at entrepreneurs in RCT design lies in randomly assigning some units Bosnia and Herzegovina. For a discussion of (individuals or firms) to receive the “treatment� (that several prominent examples of RCT evaluations is, participation in the program) and others to serve relevant for SME finance policies, see Appendix 3. as a control group. Such random assignment allows for a credible attribution of the outcomes observed Key Assumptions to the program investigated. The key assumption of the RCT evaluation is the The key reason the RCT methodology has gained random assignment of subjects (such as SMEs) to so much popularity lies in its ability to address the Box 2. Public Sector Intervention Evaluation: Business Training in Bosnia and Herzegovina While access to finance has long been thought of as a constraint on SME growth, another set of constraints has recently emerged—business skills, or “managerial capital,� which is thought to be lacking in many entrepreneurs. Thus, business training programs and managerial education have become an important focus for policy makers. Business training programs are a good example of interventions that can be evaluated using RCT because they can be randomly administered to a subset of the SMEs to create a clear control group. A randomized evaluation of a comprehensive business and financial literacy training program for entrepreneurs ages 18 to 30 was conducted in Bosnia and Herzegovina (Bruhn and Zia 2011). The sample included small businesses with an average of two employees. The course covered basic business concepts and accounting skills, as well as investment and growth strategies, with a particular emphasis on the importance of up-front capital investment. The researchers randomly selected treatment and control groups, and performed baseline surveys in both groups. Similar to many other RCT studies, this study had a relatively low take-up rate: only 39 percent of those in the treatment group actually attended the business training course; others cited lack of time as the reason for nonattendance. The authors found that the training program led to better business practices, such as separation of business and personal accounts and more favorable loan terms, greater investment, and some improvements in sales and profits (but only among a subsample of entrepreneurs with higher financial literacy). However, the program had no effect on firm survival or business start-up, or on loan default rates. The type of information generated by such studies would enable policy makers to design effective financial literacy training programs and target the subsets of SMEs for which such training programs would be the most effective. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 21 participate in the program. While such assignment impacts differ for different subsamples, such as is random by design, it must be assumed that men and women. SMEs cannot manipulate the program assignment (for example, by moving into or out of the affected Limitations areas). In addition, all those assigned to the control Not All Policies Are Suitable for RCT group must be credibly excluded from receiving any benefits from the intervention. For an RCT to work, there must be a clear distinction between the treatment and control groups. The Strengths best candidates for RCTs are programs that are targeted to individuals, firms, or local communities. Clear Comparison Group For example, Ravallion (2009) argues that The random assignment to participate in the randomization is not suitable for a large subset program by design creates a valid comparison of policies important for development economics group since individuals or firms are randomly placed because most often these policies apply to the in the treatment group or the control group. Hence, whole country, the whole population, or all firms. placement does not depend on any preexisting Investigating such a policy using RCT is unlikely characteristics that may influence the outcome of to be feasible because no group can be randomly the program. In this case, one can be reasonably selected not to receive the “treatment.� Examples well assured that program participation is the only of such policies within the SME finance framework reason different average outcomes are observed in include most policies affecting legal, regulatory, and the two groups. In other words, when a randomized supervisory frameworks, as those policies most evaluation is correctly designed and implemented, often are implemented on an economy-wide scale. it provides an unbiased estimate of the impact of However, such policies can often be evaluated the program in the study sample. using encouragement design, a type of RCT (see Box 3), or nonrandomized methods. Baseline Data Not Necessary Sometimes policies that are intended to affect RCT evaluations can be performed without detailed the whole economy may be designed to allow for baseline data, which can save on the costs of data randomization or for ex post program evaluation collection. Nevertheless, baseline data are often if the rollout happens in stages. For example, an helpful to verify the assignment and also study how Box 3. Encouragement Design Encouragement design is likely to be applicable for a wide variety of evaluations of SME-related policies. This method can be very useful for evaluating policies and interventions that are implemented at the country level, such as most changes in regulatory and supervisory frameworks. Such policies can be evaluated in a semi-randomized fashion. In this method, some units (such as firms or households) selected at random receive incentives to participate in a program that is available to all. Such encouragement can be in the form of information, marketing materials, or financial stimulus. An example of an encouragement design mechanism can consist of reducing the cost of applications for a random subset of SMEs to a guarantee program. If firms receiving the encouragement are more likely to apply to the program, this mechanism will predict program participation. Moreover, as this program is assigned randomly, it will not be correlated with firms’ access to credit, so the incentives can be used to evaluate the impact of the intervention. Global Partnership for Financial Inclusion 22 Impact Assessment Framework: SME Finance enterprise registration reform was rolled out in Power of the Design stages in different municipalities in Mexico (Bruhn One important issue with experimental design is 2008). While the sequence of these events can be the power to detect the program effect. The power credibly seen as exogenous to the outcomes of of the design is the probability that a statistically interest, it was not done randomly. Nevertheless, significant result will be obtained. In other words, Duflo and Kremer (2005) argue that randomly the power is the assurance that the result observed determining the order of phase-in may be a fair way is unlikely due to pure chance. One way to address to introduce a program and also will allow for RCT the issue of power is to ensure a sufficiently large evaluation.5 sample size. Appendix 2 offers more details on the An important limitation of RCT is that it cannot be issues of power and take-up. used to randomly select the recipients of a loan, Take-up as financial institutions need to ensure that their recipients are creditworthy and that the loans will be Related to the problem of power is the take-up of repaid. Thus, the allocation of credit should not alter the program, or the proportion of those affected the risk-assessment process of the bank because by a policy or a program—whether individuals, it could undermine the viability of the SME finance households, or SMEs—that will actually use the program. An example of a design that takes this program. Any program or intervention’s impact will issue into account is Karlan and Zinman (2008). In significantly depend on the take-up. For example, their study, consumers first applied for loans, and not all enterprises will chose to register formally or then the pool of marginally rejected candidates was to obtain a loan even if they are assigned to the randomly assigned to receive a loan. Such studies “treatment� group that offers a particular intervention. may also help banks better refine their credit- A program that increases the availability of finance scoring methodologies. may not have the desired impact if SMEs do not actually need more access (but perhaps suffer from Another common issue with evaluating programs high costs of access). using randomized methods is that some individuals or firms must be restricted from access to the The first challenge with low take-up is that it increases program. There may be political opposition to the sample size needed to generate statistically delaying program access to some people or firms, significant differences. The second challenge is or there may be ethical considerations. one of interpreting program impact (see Appendix 1 for a discussion of technical details), which means Finally, for an RCT evaluation to be feasible, that program effects must be carefully interpreted to evaluators need to obtain data on a sufficient number decide whether the parameter estimated is, in fact, of treated versus untreated “units.� If the units are one of policy interest. individuals or firms, it is most likely that sufficient numbers can be found for a statistically valid comparison. But if, for example, the unit of analysis is financial institutions in a highly concentrated financial sector, then there might not be enough of them to compare one group to the other. 5 However, randomized phase-in may become problematic when the comparison group is affected by the expectation of future treatment. For example, in the case of a phased-in microcredit program, individuals in the comparison groups may delay investing in anticipation of cheaper credit once they have access to the program. In this case, the comparison group does not provide a valid counterfactual. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 23 VI. Impact Evaluation Methods— Non-experimental Approaches Difference-in-Difference To evaluate an intervention using DD, data on the outcomes of interest for the treatment and control The difference-in-difference (DD) approach is one groups are needed from periods before and after of the most popular methodologies used in impact the intervention. Figure 4 illustrates the DD effect.6 evaluation, including assessments of SME finance policies. This methodology compares outcomes, The DD approach is well suited to evaluate before and after an intervention took place and SME interventions in which the implementation between the group that received the intervention of the program took place at different stages (treated group) and a control group. The function (for example, a program that was rolled out of the control group is to take into account across municipalities over time) or in which the changes over time that might also affect the implementation was targeted to some groups and treatment group’s outcomes. Thus, by comparing not others (for example, a project targeting particular the outcomes of the control group to the outcomes municipalities). The evaluator must understand the of the treated group, any factors affecting both reasons for targeting specific groups, and whether groups in the same manner are canceled out. As the treatment group was selected to maximize the with RCTs, the control group is used to infer what performance of the intervention, then DD estimates would have happened to the treated group if the could produce biased results (see Box 4 for a DD intervention had not taken place. impact evaluation example). Figure 4. The DD Effect Intervention Outcome of interest Treatment group DD EFFECT Control group Period before Period after time intervention intervention 6 The DD effect is computed through two subtractions. First, changes in the outcomes from periods before and after the policy was implemented are computed separately for both groups. Then, to net out any aggregate trend confounding the impact of the intervention, the gains of the treated group are subtracted from the outcomes’ changes of the control group. Global Partnership for Financial Inclusion 24 Impact Assessment Framework: SME Finance Box 4. Regulatory Reform Evaluation: Business Registration in Mexico In 2002, the Mexican Federal Commission for Improving Regulation (COFEMER) implemented a new system that substantially reduced the number of procedures and days required to register a business. The objective of this system was to simplify business registration procedures in Mexico. Due to staff constraints, the system could not be implemented in all municipalities at the same time. While the system was launched in some municipalities in 2002, others were still in the process of setting it up in 2006. Interestingly, the timing of the implementation across municipalities had no particular pattern. Bruhn (2011) used this exogenous variation on the timing of the implementation across municipalities to evaluate the impact of this new business registration reform on economic outputs. Using a difference-in-difference approach, she classified the municipalities that set up the system early as the treatment group. The control group consisted of municipalities with similar characteristics to those in the treatment group but where the system had not yet been implemented. As long as the changes in the economic outcomes over time would have been similar in the absence of the reform, this approach to examine the impact of this reform is valid. To make sure this was the case, Bruhn examined whether the control municipalities could be used as a proper counterfactual by first establishing that these municipalities were comparable to the treated ones. Using data from periods before the reform, she showed that there were no statistically significant differences in the output data, which diminished concerns of selection bias issues between control and treatment municipalities. She also verified that both early and late adopters were geographically dispersed throughout Mexico, reducing the contagion issue by which firms from control municipalities could be benefiting from the reform. Her findings suggest that the reform increased the number of registered businesses by 5 percent and employment in these industries by 2.8 percent. By increasing competition, the reform benefited consumers and hurt incumbent businesses: after the reform, the price level fell by 0.6 percent and the income of incumbent registered businesses declined by 3.2 percent. Key Assumptions However, this assumption might be violated when evaluating interventions in which firms self-select The fundamental assumption of the DD estimator into the program. Take, for instance, an evaluation is that the control group trend is identical to the of a new state bank providing loans to SMEs in trend that the treated group would have had in the which firms have to apply for the loan. Using as a absence of treatment. While this assumption is control group those firms that decided not to apply not testable, its validity should always be carefully for the loan and as treatment those firms that did examined to ensure that the DD properly estimates apply will very likely produce biased results. Firms the impact of the program. If data are available for that select into SME interventions do so because several years preceding the treatment, then one they expect some gains from their participation, straightforward way to assess the validity of this while firms that decide not to participate are likely assumption is to analyze whether pretreatment to expect no substantial gains from it. In this case, trends were equal between groups. While this does the control group is not a good representation of not formally prove the identification assumption the treatment group. In contrast, if the state bank (which, as mentioned, is not testable), the equality entered in some municipalities and not in others of pretreatment trends suggests that the treated for logistical or political concerns, then a more and control groups are, indeed, comparable and robust comparison would be to use SMEs from thus reinforces the credibility of the estimates. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 25 municipalities without the state bank as the control Instrumental Variables group and SMEs from municipalities where the bank entered as the treatment group. The instrumental variables (IV) approach can be used to evaluate SME interventions in which Strengths firms, based on unobserved information, can select whether to participate in the program. Very DD Controls for Factors that Do Not Vary over often entrepreneurs self-select themselves to Time participate in SME finance projects. For example, One benefit of this approach is that DD estimates an intervention providing public credit guarantees control for all differences (observable and not) with the objective of increasing firms’ access to between control and treated groups that do not credit may require that entrepreneurs apply for the change over time, minimizing potential biases in guarantee. Firms that expect to benefit from having impact estimates. a public guarantee will apply, while firms that expect little or no benefit from the program will not. Limitations To evaluate interventions of this type, an instrument The Key Assumption Is Not Testable or set of instruments is required. A valid instrument One of the main issues of this methodology is that must be a strong predictor of participation in its underlying assumption (of equal trends that the intervention and must not be correlated with the treatment and control groups would have had the outcome variable for reasons other than without the intervention) is not testable, and if it fails participation in the intervention (that is, it must be to hold, then the DD impact will be biased. exogenous). In this example, an instrument must predict firms’ choice to participate in the public Targeted Interventions guarantee program but must not influence firms’ access to credit for reasons other than participation The estimates could be biased if the intervention in the guarantee program. targeted groups that are expected to experience higher gains. For instance, if a microcredit Once an instrument is identified, the impact intervention was implemented in villages with of an intervention is computed in two steps. In inherently high demand for credit, then the effect the first step, the instrument is used to predict that the program would have in treated villages is program participation. In the second step, the potentially different from the effect that it would have predicted participation (which is independent of in the control group, since the demand for loans the outcome variable) is used to evaluate the in this group is lower. Therefore, it is important to intervention’s impact. understand the motives behind the intervention’s implementation and the choice of the treated group. Box 5 discusses an example of an IV impact evaluation that analyzed the effect of a microcredit Other Changes that Affect One Group and Not program in Thailand. the Other Key Assumptions Another issue to consider is that this approach will fail to identify the impact of a policy if any change The IV estimates are valid if the instrument: other than the intervention occurs over time „„ Is a strong predictor of participation to the affecting one group and not the other. When using intervention. In the microfinance evaluation DDs, one must be confident that such changes did example, the evaluators were interested not occur. in understanding the impact of credit on economic outcomes of Thai villages. Since the Global Partnership for Financial Inclusion 26 Impact Assessment Framework: SME Finance amount of credit injected in all villages through Strengths the program was the same, smaller villages ended up receiving a more intense credit IV Controls for Unobserved Information injection than larger ones. The evaluators’ One benefit of the IV approach is that it controls instrument (interactions between the number of for unobserved differences between participating households in a village and the program years) and nonparticipating subjects. IV estimates isolate is a good predictor of the intensity of credit the effect of the intervention from unobserved received in each village because the number information that influences self-selection into the of households determined the intensity of the program. credit injection. Baseline Data Are Not Needed „„ Is not correlated with the outcomes evaluated. In the example above, the instrument used for To estimate the IV impact, baseline data are not the evaluation (number of households in each needed. village during the program years) must not influence the consumption of Thai households, Limitations their investments, and overall asset and Unplanned IV Evaluations Are Rare income growth except through the effect of the program.7 Evaluations of an intervention in which an IV design was not planned ex ante are rare because If these assumptions do not hold, the impact finding a valid exogenous instrument that predicts estimates will be biased. participation is extremely challenging. Box 5. Public Intervention Evaluation: Thailand Microfinance Fund During 2001 and 2002, a substantial microfinance initiative was implemented in Thailand: Thailand’s Million Baht Village Fund Program. This public intervention consisted of injecting funds into all 77,000 Thai villages. The initial funds distributed were significant, corresponding to about 1.5 percent of Thai GDP in 2001. Each transfer was used to form an independent village bank for lending within the village. Importantly, every village, regardless of its characteristics, was eligible to receive the program. This program is among the largest government microfinance initiatives of its kind. Kaboski and Townsend (forthcoming) evaluated the impact that Thailand’s Million Baht Village Fund Program had on economic outputs of Thai villages using the IV approach. As each village received the same amount of money, regardless of the population of the village, smaller villages received a relatively more intense injection of credit. Due to the nature of the intervention, the expansion of credit in villages by the Thai Fund Program could be correlated with the number of households in a village during the program years. Using these interactions of number of households and the program years as instruments for the amount of credit received, the authors assessed the impact of this program. Their findings suggest that the Million Baht Village Fund injection of microcredit in villages did increase the overall credit in the economy. Households borrowed more, consumed more, and increased their earnings. A short-term effect of increasing future incomes and making business and market labor more important sources of income was also found. The increased borrowing and short-lived consumption response, despite no decline in interest rates, point to a relaxation of credit constraints. The increased labor income and especially wage rates indicate important spillover effects that may have also affected non-borrowers. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 27 IV Estimates Only Local Effects would be no reason to believe that a firm with 19 workers is different from a firm with 20 workers. A second limitation of the IV approach is that it estimates only the local average treatment effects The assumption of this method is that at the margin (LATE). This means that the IV estimates measure of the cutoff, the assignment to the treatment and only the impact that the intervention had on those the control groups is close to random. By comparing subjects that were affected by the instrument the outcomes of treated firms (firms with 20 (Angrist and Kreuger 2001).8 In many cases, these workers) with control firms (firms with 19 workers), local firms are not necessarily the most important evaluators can measure the intervention’s effect for national policy makers. (see Box 6 for an example). Graphically, the outcome variable (that is, firms’ Regression Discontinuity productivity) should show a discontinuity at the cutoff value (that is, at 20 workers). Figure 5 Regression discontinuity (RD) is a non-experimental illustrates this example. approach used to evaluate interventions that have a defined cutoff for participation. For instance, One way to validate the RD estimates is to use a business training project aimed at increasing pre-intervention data on the treatment and control firms’ productivity may be provided only to firms groups and to analyze whether discontinuity exists that employed more than 20 workers in the year between these two groups at the cutoff (Angrist and before the intervention. This exogenous cutoff Pischke 2009). If no discontinuity is found for pre- provides a design that allows the identification of intervention periods, then evidence supports that the intervention’s impact, since firms at the margin the discontinuity was generated by the intervention. of the threshold would not differ substantially: there Figure 5. Random Discontinuity 𝑌𝑌=productivity Treated group Control group 20 workers 𝑋𝑋=number of workers 7 The instrument would violate this assumption if, even in the absence of credit, larger Thai villages might have experienced different trends in economic activity or business growth than smaller villages. 8 See also Appendix 1 for a discussion of related issues that arise with RCT. Global Partnership for Financial Inclusion 28 Impact Assessment Framework: SME Finance Box 6. Financial Infrastructure Evaluation—Role of Angel Funds in U.S. Start-up Firms Most equity funding of SMEs around the world comes from two sources: retained earnings and capital provided by personal savings, friends and family, and other “angel� investors.1 Similar to venture capitalists, angel funds are investors for high-potential start-up investments, commonly structured as semiformal networks of high net worth individuals who decide to invest in projects of aspiring entrepreneurs based on their own assessments. To evaluate the impact of angel funds in U.S. start-up firms, Kerr, Lerner, and Schoar (2010) obtained information on prospective ventures from a large angel investment group. Using a regression discontinuity approach to evaluate the effect of angel funding on the performance of high-growth start-up firms, the authors compared firms that fall just above and just below the funding criteria of the angel group. The evaluation found a strong, positive effect of angel funding on the survival and growth of ventures. 1 World Bank Enterprise Surveys: http://www.enterprisesurveys.org/. Key Assumptions Limitations The key assumption behind the RD approach is that Independence of Threshold the potential outcome (that is, firms’ productivity) The most important issue to consider when may be associated with the cutoff variable (that implementing an RD evaluation is the validity of the is, number of workers), but in a smooth manner. cutoff. If the cutoff was assigned with the objective In other words, in the absence of the intervention, of maximizing the intervention’s impact, then this association should have been smooth at the conclusions from the RD will be biased. The cutoff cutoff. In this way, any discontinuity in the potential selected must be independent of the expected outcome at the cutoff is interpreted as a causal outcomes from the intervention. Suppose that in effect of the intervention. This is known as the the example of the business training project, firms continuity assumption (Van der Klaauw 2008). with at least 20 workers are concentrated in the Strengths most developed region of the country. Firms in this region are more likely to have access to finance. Baseline Data Are Not Needed Thus, the effects of providing business training are One benefit to using an RD design is that baseline likely to be higher if the cutoff is 20 workers, since data are not needed to estimate the impact. these firms will also have access to better terms of However, data from pre-intervention periods are credit, likely increasing their productivity, than if the strongly recommended to perform robustness cutoff were 15 or 10 workers. checks on the validity of the discontinuity. Manipulation of the Assignment RD Estimates Are Comparable to Randomized Moreover, RD inferences will be invalid if firms are Estimates able to manipulate assignment into the program. A second advantage of the RD approach is that from For instance, if the cutoff is a specific number a methodological point of view, a solid RD design of employees, then firms can easily hire one is comparable in internal validity to a randomized more employee to participate in the intervention, experiment. prompting selection issues that contaminate the RD impact. As long as firms are unable to manipulate Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 29 their eligibility into the program, the RD estimates difference in outcomes between the treated group are valid. Thus, RD design is more flexible than the (that is, firms participating in the program) and IV approach since the IV methodology requires that control group (comparable firms not participating the instrument is exogenous to the outcomes and in the program). The approach matches treated that firms are not able to manipulate the assignment firms to non-treated ones using propensity scores (Lee and Lemieux forthcoming). that summarize all observable information used to assign treatment (or eligibility to the program). Sufficient Observations Close to the Cutoff Thus, PSM can be used to identify a control group A second issue of the RD approach is that in order to that is statistically equivalent to the treatment measure impact estimates, sufficient observations in group. As in all other approaches, the control close proximity to the cutoff must be available. In the group is used to infer what would have happened business training example, sufficient firms with 18 to to intervention participants without it. 22 workers (a number that is close to the cutoff of 20) To compute the propensity score, one must would be needed to evaluate the RD effect. estimate the conditional probability of participating Estimated Parameters Might Not Be the Most in the intervention as a function of the observed Important Ones characteristics.9 These characteristics are then aggregated into the score. Once a control group is As in the case of the IV methodology, RD estimates identified, the impact of an intervention is measured can only estimate the average treatment effect of by the difference in outcomes between the treated observations close to the cutoff (that is, the local and control groups (see Box 7 for an example). treatment effect). This implies that it might be difficult to draw conclusions about the impact of Key Assumptions the intervention for firms away from the cutoff of 20 The assumption underlying the PSM estimates workers. is known as the conditional independence assumption. This assumption implies that after Propensity Score Matching controlling for observable differences between the treated and control group, the outcome resulting in Propensity score matching (PSM) is a non- the absence of the intervention would be the same experimental approach that can be used to analyze in both cases. Thus, conditional on the score, any the impact of an SME intervention in which (1) the differences between the treated and control group institutional arrangements that defined selection are attributed to the effect of the intervention. into the project are known by the evaluator and (2) a control group is not maintained. Under In other words, this assumption implies that using these circumstances, the PSM approach can observed information from SMEs is enough to identify a control group from the group of firms not identify a statistically equivalent control group. This participating in the program. assumption is unlikely to hold in SME interventions in which firms self-select to participate based on factors The intuition of this method is to find a control that are difficult to observe from the data, such as group whose observable characteristics are entrepreneurial attitudes, managers’ skills, or risk similar to the treated group but that did not aversion. If these unobserved factors are driving participate in the intervention. The impact of firms’ participation in the program, then the PSM the intervention will then be measured as the approach will fail to identify a proper control group. 9 The conditional probability can be estimated through a probit or a logit model in which the dependent variable is an indicator variable equal to 1 if the subject participated in the intervention, and 0 otherwise. The independent variables are the observed characteristics that determined participation in the intervention. Global Partnership for Financial Inclusion 30 Impact Assessment Framework: SME Finance Strengths PSM Does Not Control for Unobserved Self- selection PSM Makes It Possible to Identify a Control Group When the Eligibility Criteria Are Known If unobservable characteristics also influence and Observed participation in the intervention and outcomes (self- selection issues such as the ones discussed in the The overall advantage of the PSM approach is that example), then the PSM by itself is not an appropriate a control group can be identified when the selection method. This could be the case when participating process is known and observed. entrepreneurs or firms self-select in the intervention for reasons that also influence their performance. The PSM approach is especially useful when Evaluations using PSM in these situations tend to several characteristics influence the eligibility for an at least combine PSM with an alternative approach, intervention, since it provides a natural weighting such as DD, in order to remove the bias due to time- scheme (the score) that yields unbiased estimates invariant unobservable characteristics (such as of the intervention effect (Dehejia and Wahba 2002). motivation, skills, or risk aversion). Limitations Eligibility Criteria Must Not Be Associated with PSM Is Data Intensive Participation in the Intervention Data on sufficient firms and detailed information on Another issue to take into consideration when their characteristics are needed to identify a control using PSM is that information from the institutional group that is statistically identical to the treated arrangements of the intervention is needed to group. identify the participant selection characteristics (Caliendo and Kopening 2008). For valid PSM estimates, these variables must not be affected by participation in the intervention. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 31 Box 7. Public Intervention Evaluation: Chile’s Supplier Development Program In Chile, the Suppliers Development Program encouraged large firms to invest in the training of their SME suppliers, strengthening the linkage between large (potentially exporter) firms and SMEs. Large firms participating in the program were expected to provide professional advice, personnel training, technical assistance, or technology transfer to their SME partners. The program would then subsidize the cost of these activities. Each project participating in the program consisted of one large firm that sponsored the knowledge transfer and at least 20 SMEs in the agriculture and forestry sector, or at least 10 SMEs in other economic activity sectors. An evaluation of the program was done by Arraiz, Henriquez, and Stucchi (2011). Administrative data allowed the evaluators to follow beneficiary and non-beneficiary firms for several years before and after the program was in place. To identify a control group, the evaluators estimated the propensity score using the probability of participating in the program with firms’ information from 2002, the year before the beneficiaries started participating in the program. The score helped the evaluators determine a control group, which was composed of firms that did not take part in the program but that had similar probabilities of participating. A concern of evaluators was that unobserved characteristics of firms (such as managers’ skills or motivation) could have influenced their participation into the program and their success in it. In such cases, the PSM approach should be combined with other evaluation methods that control for unobserved information that might influence self-selection. The evaluators combined PSM with the DD approach, since DD estimates control for all unobserved information between the treated and control groups that do not change over time. After identifying their control group through PSM, the evaluators estimated the DD effect of the program. The evaluation found that both local SME suppliers and large firms benefited from participating in it. Local SMEs that participated in the program increased sales and employment. Large firms increased their sales and their likelihood of becoming exporters. Global Partnership for Financial Inclusion 32 Impact Assessment Framework: SME Finance VII. Minimal Standard Monitoring Minimal standard monitoring typically refers to A second advantage regards budget. While before-and-after comparisons that monitor over the difference in cost between rigorous impact time the performance of the subjects affected by assessments and before-and-after comparisons an intervention. The main distinction between should not be substantially different if data a minimal standard monitoring and an impact collection is not needed, impact assessments still evaluation approach is that minimal standard need to reserve budget for monitoring costs of monitoring does not follow a control group to learn the evaluation and researchers’ time; whereas in what would have happened to the treatment group minimal-standard monitoring, if these costs exist, in the absence of the intervention. then they should be lower. Suppose, for instance, that evaluators are The drawback of using before-and-after interested in analyzing the impact that a public comparisons is that there is no control group that credit program has on the profits of SMEs. To allows us to know what would have happened if do a minimal standard monitoring, the only data firms had not received the intervention. With this needed would be information before and after the method, the odds of falsely attributing an effect are program on the profits of SMEs that participated large. This method can only identify how subjects in the program. The before-and-after effect is then change over time. Part of these changes might be measured by the difference in the average profits attributed to the intervention, but any other factor before and after the program. changing over time parallel to the intervention (such as economic growth or changing macroeconomic An advantage of this approach is that evaluators conditions) will contaminate the evaluation. only need to have information on the subjects of Therefore, we are not able to confidently measure interest before and after the reform took place. and isolate the impact of the intervention. Compared to rigorous impact evaluations, this approach demands the least amount of data. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 33 VIII. Conclusions As stated in the SME Finance Policy Guide McKenzie (2010) argues that the SME sector is (GPFI 2011), further work is needed on impact one area that is particularly full of unexploited assessment techniques for SME finance policies possibilities for impact evaluations: “SME focused and interventions. Only a handful of rigorous studies policies are typically carried out by governments exist. More studies are needed on a wider range of and international financial institutions (IFIs) rather policies in a number of different institutional settings than NGOs, and are too expensive usually for to learn what works, where, and why. To identify researchers to fund the program on offer themselves. good practice models, it is important to examine if As a result, there is a real knowledge gap—and the results of certain policies can be repeated in an opportunity to be grasped. If governments and other environments. operations staff at IFIs can work with researchers in evaluating the many projects being implemented, This Framework is intended as a resource for it should be possible to evaluate rigorously many policy makers and regulators to select adequate of the policies being carried out for SMEs and to approaches to evaluate SME finance policies and learn where modifications of existing strategies are interventions. While the focus of the Framework is needed.� on SME finance policies, the methods described can be applied to evaluate a broader set of SME In summary, more work is needed to evaluate interventions. The paper reviews a variety of impact the wide variety of SME finance policies, and evaluation methods—randomized experiments, international organizations are well suited to fill difference-in-difference, propensity scoring, and in these knowledge gaps. As Duflo and Kremer regression discontinuity designs—and provides (2005, p.342) state, “The benefits of knowing which recommendations on how to map the various programs work and which do not extend far beyond techniques to interventions spanning regulatory any program or agency, and credible impact and supervisory frameworks, financial infrastructure evaluations are global public goods in the sense programs, and public interventions. that they can offer reliable guidance to international organizations, governments, donors, and NGOs It is important to understand and consider all beyond national borders.� possible evaluation options and not focus on any single approach, such as randomization. While randomization has many advantages, it is not necessarily the optimal choice in all situations, and it has its own limitations that need to be addressed in carefully planned and implemented studies. The impact evaluation studies should be driven by important policy questions rather than by methods of evaluation. Global Partnership for Financial Inclusion 34 Impact Assessment Framework: SME Finance References Angrist, Joshua D., and Guido Imbens. 1994. “Iden- Bruhn, Miriam, and Bilal Zia. 2011. “Stimulating tification and Estimation of Local Average Treat- Managerial Capital in Emerging Markets—The Im- ment Effects.� Econometrica 62 (2): 467–75. pact of Business and Financial Literacy for Young Entrepreneurs,� Policy Research Working Paper Angrist, Joshua D., and Alan B. Kreuger. 2001. 5642, World Bank, Washington, DC. “Instrumental Variables and the Search for Identi- fication: From Supply and Demand to Natural Ex- Bruhn, Miriam, and I. Love. 2009. “The Economic periments.� Journal of Economic Perspectives 15 Impact of Banking the Unbanked: Evidence from (Fall): 69–85. Mexico,� Policy Research Working Paper 4981, World Bank, Washington, DC. Angrist, J.D., and J.S. Pischke. 2009. Mostly Harm- less Econometrics: An Empiricist’s Companion. Burgess, R., and R. Pande. 2005.�Can Rural Banks Princeton, NJ: Princeton University Press. Reduce Poverty? Evidence from the Indian Social Banking Experiment.� American Economic Review Arraiz, I., F. Henriquez, and R. Stucchi. 2011. “Im- 95 (3): 780–95. pact of the Chilean Supplier Development Program on the Performance of SME and Their Large Firm Caliendo, M., and S. Kopening. 2008. “Some Prac- Customers,� Working Paper, Inter-American Devel- tical Guidance for the Implementation of Propensity opment Bank, Washington, DC. Score Matching.� Journal of Economic Surveys 22: 31–72. Ashraf, Nava, Dean Karlan, and Wesley Yin. 2006. “Household Decision Making and Savings Impacts: Cole, Shawn, T. Sampson, and B. Zia. 2011. “Prices Further Evidence from a Commitment Savings or Knowledge? What Drives Demand for Financial Product in the Philippines,� Working Paper 939, Services in Emerging Markets?� Journal of Finance Economic Growth Center, Yale University, New Ha- 66 (6): 1933–67. ven. ———. 2009. “Financial Literacy, Financial Deci- Bauchet, Jonathan, C. Marshall, L. Starita, J. sions, and the Demand for Financial Services: Evi- Thomas, and A. Yalouris. 2011. “Latest Findings dence from India and Indonesia.� Working Paper from Randomized Evaluations of Microfinance,� 09-117, Harvard Business School, Cambridge, MA. Consultative Group to Assist the Poor Report No 2, De Mel, Suresh, D. McKenzie, and C. Woodruff. Washington, DC, December. http://www.cgap.org/ 2008a. “Are Women More Credit Constrained? gm/document-1.9.55766/FORUM2.pdf. Experimental Evidence on Gender and Microen- -----------. 2008. “License to Sell: The Effect of Busi- terprise Returns.� American Economic Journal: Ap- ness Registration Reform on Entrepreneurial Ac- plied Economics 1(3): 1-32. tivity in Mexico,� Policy Research Working Paper ———. 2008b. “Returns to Capital: Results from 4538, World Bank, Washington, DC. a Randomized Experiment.� Quarterly Journal of Bruhn, Miriam. 2011. “License to Sell: The Effect of Economics 123 (4): 1329–72. Business Registration Reform on Entrepreneurial Dehejia, R., and S. Wahba. 2002. “Propensity Activity in Mexico.� Review of Economics and Sta- Score Matching Methods for Non-Experimental tistics 93(1): 382–386. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 35 Causal Studies.� Review of Economics and Statistics Economies: Implications for Microfinance,� American 84 (1): 151–61. Economic Review 98(3): 1040–68. De Janvry, Alain, C. McIntosh, and E. Sadoulet Kerr, W. R., J. Lerner, and A. Schoar. 2010. “The (2008) “The Supply- and Demand-Side Impacts of Consequences of Entrepreneurial Finance: A Re- Credit Market Information,� San Diego, University of gression Discontinuity Analysis.� Working Paper 10- California–San Diego, unpublished. 086, Harvard Business School, Cambridge, MA. Duflo, Esther, and Michael Kremer. 2005. “Use of Lee, D. S., and T. Lemieux. Forthcoming. “Regres- Randomization in the Evaluation of Development sion Discontinuity Designs in Economics.� Journal of Effectiveness.� In Evaluating Development Effec- Economic Literature. tiveness, ed. Osvaldo Feinstein, Gregory K. Ingram, McKenzie, David, and Christopher Woodruff. 2008. and George K. Pitman, 205–32. New Brunswick, NJ: “Experimental Evidence on Returns to Capital and Transaction Publishers. Access to Finance in Mexico.� World Bank Economic Duflo, Esther, and Emmanuel Saez. 2003. “The Role Review 22(3): 457–82. of Information and Social Interactions in Retirement McKenzie, David. 2010. “Impact Assessments in Fi- Plan Decisions: Evidence from a Randomized Ex- nance and Private Sector Development: What Have periment.� Quarterly Journal of Economics 118(3): We Learned and What Should We Learn?� World 815–842. Bank Research Observer 25(2): 209-33. Gertler, Paul J., Sebastian Martinez ,Patrick Pre- Ndovie. 2010. “Malawi Business Environment mand, Laura B. Rawlings, and Christel M. J. Ver- Strengthening Technical Assistance Project (BE- meersch. 2011. “Impact Evaluation in Practice.� STAP) Impact Evaluation.� Presentation, Dakar. World Bank, Washington, DC. Ravallion, Martin. 2009. “Should the Randomistas Global Partnership for Financial Inclusion (GPFI). Rule?� The Economists’ Voice (February). www.be- 2011. “SME Finance Policy Guide.� Paper on Behalf press.com/ev. of the Global Partnership for Financial Inclusion. IFC, Washington, DC. Storey, D. J., and J. Potter. 2007. “OECD Framework for the Evaluation of SME and Entrepreneurship Pol- Imbens, Guido W., and Jeffrey M. Wooldridge. 2009. icies and Programme.� Organisation for Economic “Recent Developments in the Econometrics of Pro- Co-operation and Development (OECD), Paris. gram Evaluation.� Journal of Economic Literature 47(1): 5–86. Todd, Petra E., and Kenneth I. Wolpin. Forthcoming. “Structural Estimation and Policy Evaluation in De- Kaboski, Joseph P., and Robert M. Townsend. Forth- veloping Countries.� Annual Review of Economics. coming. “A Structural Evaluation of a Large-Scale Experimental Microfinance Initiative.� Econometrica. Van der Klaauw, W. 2008. “Regression Discontinuity Analysis: A Survey of Recent Developments in Eco- Karlan, Dean, and Jonathan Zinman. 2010. “Expand- nomics.� Labor 22 (2): 219–45. ing Credit Access: Using Randomized Supply Deci- sions to Estimate the Impacts.� Review of Financial Winters, P., L. Salazar, and A. Maffioli. 2010. “De- Studies 23 (1): 433–64. signing Impact Evaluations for Agricultural Projects,� Impact Evaluation Guidelines, Strategy Develop- ———. 2009. “Observing Unobservables: Identi- ment Division, Technical Notes IDB-TN-198, Inter- fying Information Asymmetries With a Consumer American Development Bank, Washington, DC. Credit Field Experiment.� Econometrica 77(6): 1993–2008. World Bank, 2012. “Impact Evaluation Toolkit.� World Bank, Washington, DC. ———. 2008. “Credit Elasticities in Less-Developed Global Partnership for Financial Inclusion 36 Impact Assessment Framework: SME Finance Appendix 1. General Concerns As discussed throughout the Framework, each best for addressing selection bias because they method has its limitations; however, a number of randomly assign units to be treated. However, concerns apply to all impact evaluation methods. In several other sources of bias may still crop up in this section we review such general concerns. an RCT and may also be an issue in other types of impact evaluation. Biases—Selection, Attrition, and Spillovers One common problem is that the mere fact of being All impact evaluations face selection bias and need assigned to participate in a program (whether or to have a credible way of addressing it. RCTs are not such assignment is done randomly) may cause Box 8. Changes in Behavior in Response to Program Assignment Experiment One common concern with impact evaluations is that they can change the behavior of treatment and control groups. For example, if the treatment group receives a loan or a training program while the control group does not, then the treatment group may see this as a positive boost to entrepreneurs’ morale, which may have an effect on their effort. This would contaminate the pure impact of the loan because the impact may be due to a short-term boost in morale and increased effort, and not to the additional finance or training content. On the other side, individuals or firms in the control group may change their behavior in response to not being assigned into the program. For example, if some areas are affected and others are not, then individuals may move across the border into (or out of) the affected areas. In a delayed phase-in situation, when one area receives an intervention while another expects to receive it in the future, the possibility that the intervention is coming in the future is likely to affect behavior in the control group. Another example would be a program that involves collecting accounting data on firms as part of the baseline analysis. Here, the firms that are not in the treatment group may still change their behavior because their accounting data are collected and observed by the evaluators. Thus, even if randomized methods have been employed and the intended allocation of the program was random, then the differences in behavior may contaminate this random assignment and produce biased results. Other approaches may also be subject to such sources of bias. One advantage of experiments is that they can explicitly address any possible changes in behavior. For example, in Ashraf, Karlan, and Yin’s 2006 study of a commitment savings account, the change in behavior for those who received information about the new account could come simply because of the reminder about the importance of savings. To deal with this possibility, the researchers introduced another treatment group that received marketing on the existing savings product, which also served as a reminder about the usefulness of savings. Thus, the possibility that the outcome for the new type of account was simply due to the change in savings behavior could be eliminated by adding this third group.10 10 However, adding another group affects the issues of power, discussed above. This may explain why Ashraf, Karlan, and Yin (2006) find insignificant estimates for the coefficients on the third group. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 37 the treatment or comparison group to change its were offered the prize. This allowed the authors to behavior, which may contaminate the results of the explore both the direct effect on attendance and experiment (see Box 8). plan enrollment of being offered an incentive and the spillover effect of being in a department in which In addition, there may be spillover effects from others had been offered incentives. those participating in a program in comparison to those that do not. For example, a program designed Finally, there could be differences in attrition rates to enhance financial literacy of entrepreneurs may (that is, dropout) between treatment and control have spillover effects on those not receiving the groups, which may also affect the results.11 program so that their literacy increases as well. Scaling Up and Systematic Effects This can easily happen if both treated and non- treated entrepreneurs belong to the same business Many program evaluations, especially RCTs, association or have other social connections. are often of small-scale interventions and might Spillovers may also come from redistribution of have a different impact if implemented on a large resources by the government. For example, if some scale.12 For example, capital grants or directed villages are positively affected by the experiment loan programs for SMEs offered by governmental but others are not, then the local government may financial institutions may crowd out private sector find other ways to channel resources to unaffected loans. In the long run, capital grants may skew villages (Ravallion 2009). incentives of microentrepreneurs who will be If the spillover effects on non-treated individuals waiting for grants rather than efficiently running are generally positive, then the impact estimates their businesses. Such effects may be particularly will be smaller than they would have been without important for assessing the welfare implications of spillovers. This problem affects both randomized scaling up a program. Scaling up programs raises and nonrandomized evaluations. In some cases several other issues (see Box 9). the experiments can be designed to directly Another example would be a small-scale training measure the spillovers. For example, in their study program that improves participants’ chances to of information and 401(k) participation, Duflo and obtain a job. However, scaling up such a program Saez (2003) randomized the offer of getting an may not necessarily raise aggregate employment incentive to attend an information session at two because in a world with a fixed number of jobs, a levels. First, a set of university departments were training program could only redistribute the jobs randomly chosen for treatment, and then a random (see Imbens and Wooldridge 2009). set of individuals within treatment departments 11 Attrition refers to a situation in which individuals or firms leave the sample observed by researchers. This could be due to closures for firms or a move for individuals or firms, or simply refusing to participate in subsequent surveys. If there are systematic differences in the attrition rates in the two groups, then the results may be biased in either direction. For example, if improving access to finance allows the weakest firms to survive, then the differences in attrition will make the group with access look weaker because it has a higher proportion of the weakest firms. 12 In technical terms, RCTs estimate what are known as partial equilibrium treatment effects, which may differ from general equilibrium treatment effects (Duflo and Kremer 2005). Global Partnership for Financial Inclusion 38 Impact Assessment Framework: SME Finance Box 9. Scaling Up Small Interventions Scaling up a small program raises several additional issues. Incentives. Most of the RCTs have been implemented by nongovernmental organizations (NGOs) or researchers, who are highly motivated to achieve the best possible outcome of the experiment. In addition, researchers often select the best NGOs to work with and test some of the products highly relevant to NGOs’ work and image. Thus, experiments are often done under a set of ideal conditions, which may not be possible to replicate or scale up. The outcomes might be significantly different when the same program is implemented by government officials with a very different set of incentives (Ravallion 2009). Allocation of resources. It is plausible that significantly more resources are allocated to the program during an experimental phase than would have been under a more realistic situation or in a less favorable context. Alternatively, such bias could go another way if the first phase of an experiment does not produce significant results because of ineffective implementation. However, the knowledge generated from the first phase would make subsequent phases more effective. Thus, it is important to understand the institutional and implementation factors that may make the same program successful in one place but not another. Different outcomes. In an experimental setting, some firms with potentially low impact are mixed in with firms with potentially high impact from the same program because of the random assignment. If the program is scaled up, then the most likely takers will be firms with potentially high impact. Thus, the outcomes of a national program can be fundamentally different from those of an experiment because of the different types of individuals or firms participating (Ravallion 2009). External Validity not be effective for different types of firms in the same country or for the same type of firm in other In impact evaluation discussions, it is common to countries. Alternatively, a program that had some see references to the internal and external validity minor variation from the one being tested may or of the evaluation. Internal validity refers to ensuring may not be effective in the exact same situation that the measured impact is indeed caused by the as the one tested. While issues of external validity intervention being tested, while external validity arise with other evaluation techniques, they more refers to the confidence that the impact measured often appear in the context of RCTs. in a specific study would carry over to other samples or populations. One way to address external validity concerns is to replicate the evaluations in various settings. It is RCTs in general have a good track record for important to test how robust different programs are ensuring internal validity (aside from the issues in different settings to produce valuable implemen- discussed above, which often can be addressed). tation knowledge. However, extensive replication is However, RCTs often are criticized on the basis of expensive and time-consuming.13 their external validity (that is, transferability of the results to other situations, such as different samples Another way to alleviate external validity concerns of firms, or variations in policies or countries). is to couple experiments with the theory of why For example, a specific program that was found the program is expected to work (see Duflo and effective for one type of firm in one country may Kremer 2005). 13 In addition, researchers are unlikely to be interested in running the same program in different settings because the lack of novelty will greatly reduce the chances of publication. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 39 Appendix 2. Size and Power of RCT Sample sizes, as well as other design choices, will by randomly selecting villages and treating all in- affect the power of an experiment. For example, if dividuals in a village), as the errors are likely to be there are too few units in treatment or control groups, correlated within the group. The larger the groups then the comparison of averages may not produce that are randomized, the larger the total sample statistically significant results simply due to small size needed to achieve a given power. sample. This can lead to erroneous conclusions. Low take-up exacerbates the issues of power be- For example, the program may be deemed to have cause it reduces the number of units on which to a significant effect when it actually does not, or base the statistical analysis. For example, consider the program may be deemed ineffective when it a program such as a new loan product or a busi- actually is effective. ness training that aims to raise the profits by 25 per- The issue of power in RCTs can be addressed by cent of microenterprises undertaking the program. ensuring a sufficient number of observations in A randomized experiment that offered the program each group and optimally dividing the proportion to half the firms and used a single follow-up survey of individuals in treated and control groups based to estimate its impact would require a sample size on the relative costs of treatment versus data of 670 firms if take-up was 100 percent, but would collection. The larger the expected difference need a sample size of 2,700 with 50 percent take- between treatment and control groups (that is, the up and 67,000 with 10 percent take-up.14 effect size), the smaller the sample size needed for equal power. Thus, one solution to the problem of low take-up is to employ a very large sample so that the resulting Larger sample sizes are needed when there are sample will still contain enough firms or households several treatment groups and the researcher is to enable the researchers to detect a program interested in detecting the differences between impact of a given size. An example of a randomized various treatments in addition to detecting experiment with sample sizes of this magnitude is differences between treatment and control groups. seen in Karlan and Zinman (2009), where 58,000 Moreover, if researchers are interested in the effect direct-mail offers were randomly sent by a South of the program on a subgroup—for example, impact African lender, with 8.7 percent of those contacted on female entrepreneurs relative to males—then applying for a loan. However, the downside is that the experiment must have enough power for this this solution can be very expensive and therefore subgroup. This is nontrivial, especially in samples not feasible in many situations. where female entrepreneurship is significantly less likely, which is not uncommon. Stratification The second solution to the low-power problem is to methods can be used to ensure sufficient number restrict the study to a group of units for which take- of female entrepreneurs in the sample. up would be much higher. For example, a business training program could be advertised to all eligible In some situations, the evaluation design concerns firms, and then the number of slots available in the individuals or firms within the groups (for example, program could be randomly allocated among the 14 In addition, researchers are unlikely to be interested in running the same program in different settings because the lack of novelty will greatly reduce the chances of publication. Global Partnership for Financial Inclusion 40 Impact Assessment Framework: SME Finance group of interested firms. Presumably, the take-up example, policy makers might be interested in the would be higher if the firms have already expressed effect of the loan program on all firms or on firms interest in the training. An example of such a de- interested in taking up credit. But an evaluation sign is seen in Karlan and Zinman (2008), in which such as Karlan and Zinman (2008), based on the consumers first apply for loans and then the pool of marginal applicants, only informs researchers of marginally rejected candidates (all of whom wanted the impact on those firms that fall within a narrow a loan) is randomly assigned to receive a loan. band in terms of their creditworthiness according to the specific credit-scoring model used by the The advantage of the second approach is that bank. Such firms may be different in important it requires much smaller samples to detect a ways from the general population of firms. Thus, treatment impact. The downside is that the program this experiment cannot be used to evaluate the impact estimated will apply only to the self-selected impact of credit on all firms that desire credit or on group of individuals or firms that expressed interest the poorest segments of the population. in the program, not to the general population. For Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 41 Appendix 3. Examples of Impact Evalu- ations A discussion of several evaluation approaches by to open four branches in locations without banks. type of intervention is provided below. This policy expanded the presence of banks in rural areas of Indian states. Regulatory and Supervisory Frameworks Burgess and Pande (2005) used an instrumental Entry of a New Bank in Mexico (DD Evaluation) variables approach to evaluate the impact of this Bruhn and Love (2009) evaluate the impact on policy on poverty outcomes. The instruments were economic activity of the opening of a major bank in the policy-induced trend reversals of a state’s initial Mexico. In 2002, Banco Azteca opened more than financial development in its rural branch growth. 800 branches across the country. Branches were In other words, less financially developed states opened on the same day inside all of the preexisting in 1961 were less likely to receive bank branches stores of its parent company, Grupo Elektra. in the periods outside the reform and substantially more likely to receive them during the years of the Since Azteca entered only in municipalities with a reform. As these trend reversals were significant in preexisting Elektra store, these municipalities were the years of the reform and had no direct impact on used as the treatment group, and municipalities poverty outcomes, these instruments proved to be with similar characteristics but no Elektra store were valid. used as the control group. Employing a difference- in-difference approach, the authors analyze the The evaluation concluded that rural branch effect that Azteca had by comparing outcomes expansion in India significantly reduced rural before and after it opened in both treatment and poverty. The reductions in rural poverty were control municipalities. The gains from the opening linked to increased savings and credit provision of Banco Azteca are then the difference between in rural areas. By promoting the expansion of the changes over time in treated municipalities and financial services into rural areas, this intervention control municipalities. allowed rural households to rely on more efficient mechanisms to accumulate capital and to obtain The authors find that this bank had a significant loans for longer-term productive investments. impact on the economic activity of individuals belonging to the informal sector. Its opening Financial Infrastructure increased the proportion of informal business Credit Information in Guatemala (RCT owners by 7.6 percent and led to a higher proportion Evaluation) of women working as wage earners. Additionally, Azteca’s opening increased income by about 9 Availability of information to evaluate SME percent for women and by about 5 percent for men. creditworthiness is among the key institutional constraints limiting expansion of SME finance. 2. Bank Branching Regulation in India (IV Credit registries and bureaus could be an effective Evaluation) way to generate such information, as they contain Between 1977 and 1990, the Reserve Bank of India historical information on repayment rates and mandated that in order to open a branch in a location current information on obligations. Establishment that already had bank branches, Indian banks had of credit bureaus is one of the policies that is likely Global Partnership for Financial Inclusion 42 Impact Assessment Framework: SME Finance to have economy-wide impact and thus is difficult to that the grants substantially raise incomes for the evaluate using an RCT. average firm receiving a grant and estimate real returns to capital of 5.7 percent per month in Sri De Janvry et al. (2008) used encouragement design Lanka and 20 percent per month in Mexico, much to examine the impact of the introduction of a credit higher than market interest rates in both countries. bureau in Guatemala (see also Boxes 3 and 5). In addition, the returns are highest for high-ability, They found that the awareness of the existence of credit-constrained firm owners, which is consistent a credit bureau was very low in surveys conducted with the view that credit market failures prevent soon after its implementation. They therefore talented owners from getting their firms to an randomly informed a subset of 5,000 microfinance optimal size. Interestingly, these studies find that the borrowers about the existence of the bureau and impact was similar whether the grants were given in how it worked. They found that awareness of the cash or in the form of equipment or raw materials. bureau led to a modest and temporary increase On the flip side, the studies found that while one- in repayment rates and to microfinance groups time grants succeed in raising the incomes of poor ejecting their worst-performing members. business owners, they do not lead to significant Public Sector Interventions job creation. Another surprising result of these studies is that grants did not raise the incomes of Financial Support to Microenterprises in Sri self-employed women; subsequent research has Lanka and Mexico (RCT Evaluations) attempted to understand the reason for this result (De Mel, McKenzie, and Woodruff 2008a). Financing support for SMEs—whether through lines of credit, directed credit, cofinancing, Studies like these can help policy makers design equity financing, or other forms of direct financial more effective interventions; however, more assistance—is a popular form of intervention. Such evidence may be needed before recommending interventions are based on the premise that a lack that policy makers implement grant programs of finance hampers entrepreneurs, market failures on a wide scale. Specifically, replicating similar prevent them from obtaining necessary capital, experiments in other countries and with a variety and therefore an injection of finance can put them of populations would show whether such policies on a path of increasing returns. However, credibly would prove beneficial in other environments. evaluating such programs requires distinguishing In addition, while a small-scale intervention may those that received the financial injection from be very helpful to those receiving the grants, the those that did not, which is difficult because of self- general equilibrium effects of implementing such selection issues (that is, enterprises that end up policies on a wider scale need to be properly receiving a loan or a grant are different on many understood and investigated. parameters, often unobservable, from those that do not receive such assistance). Financial Literacy Programs in Indonesia and the Dominican Republic (RCT Evaluations) Two recent studies use RCT to evaluate the effectiveness of grants to enterprises. De Financial literacy has come to play an increasingly Mel, McKenzie, and Woodruff (2008b) study prominent role in financial reform in both developed microenterprises in Sri Lanka, and McKenzie and and developing countries, and is portrayed in global Woodruff (2008) replicate the same experiment policy circles as a solution for many recent crisis- in Mexico. Grants between US$100 and US$200 related financial problems. Many countries have set were given to a randomly selected subset of up financial literacy panels that are charged with microenterprises in each country. The authors find developing financial literacy programs. Global Partnership for Financial Inclusion Impact Assessment Framework: SME Finance 43 A recent study in Indonesia was designed to SME did not need to assess its degree of financial evaluate the causal relationship between financial need. Instead, the SME needed to comply with a literacy and demand for financial services (Cole, number of eligibility criteria, such as belonging Sampson, and Zia 2011). The authors offered to a specific sector and having sound economic seminars to randomly selected groups and educated and financial conditions. These criteria were then participants on the benefits and the procedure for summarized in a scoring system that the SGS used opening savings accounts. The authors found an to order applications according to their guarantee average negligible effect of such programs on the merit. Importantly, the eligibility criteria limited the opening of new accounts; however, among the percentage of applications that were rejected on uneducated and financially illiterate households, merit grounds. there was a significant increase in opening new This paper used a difference-in-difference accounts. Moreover, they found small incentive approach to test the fund’s role in widening credit payments to have a much larger effect on getting access for SMEs and lessening their borrowing individuals to open bank accounts and to be three costs. Using data from the fund’s books, the times as cost-effective as financial education. This authors compared outcomes of guaranteed SMEs study suggests a need for more research on the with nonguaranteed SMEs before and after the most effective ways to encourage households and SGS was launched. Specifically, the authors microenterprises to save. examined whether borrowing costs and access to Drexler, Fischer, and Schoar (2011) report on two credit, measured as the value of bank debt, were randomized trials to test the impact of financial substantially different for SMEs that participated in training on firm-level and individual outcomes for the program than those that did not. The difference- microentrepreneurs in the Dominican Republic. in-difference effect can be interpreted as a causal They found no significant effect from a standard, impact as long as the average outcomes for the fundamentals-based accounting training; participating SMEs and the other firms would have however, a simplified, rule-of-thumb training followed parallel paths over time in the absence of produced significant and economically meaningful the program. While this assumption is impossible improvements in business practices and outcomes. to test, an exercise was performed to compare how different these two groups were before the program. Partial Credit Guarantees in Italy (DD Approach) The results from this exercise found no significant In 1996, to promote lending to small firms, the differences between the control and the treatment Italian government established the Fund for groups, validating the control group as a proper Guarantee to SME, or SGS, with the generic counterfactual. The difference-in-difference results mandate of providing direct guarantees to lending from the paper suggest that Italy’s scheme reduced banks, co-guarantees together with other guarantor participating SMEs’ borrowing costs by 16 to 20 institutions, and guarantees of last resort to mutual percent. Moreover, SMEs’ bank debt increased by guarantee institutions. To apply for a guarantee, an 12.41 percent once the scheme was available. Global Partnership for Financial Inclusion 44 Appendix 4. Assumptions, Strengths, and Limitations of Different Approaches Comparison of Impact Evaluation Approaches Approach Key assumptions Strengths Limitations Randomized control „„ Subjects cannot manipulate „„ Clear comparison group, which allows for „„ Not all policies are suitable for RCT trials (RCTs) assignment into the program credible identification of the impact „„ Local effects measured by RCTs might be different from systematic „„ Subjects in the control group must effects when a program is scaled up be credibly excluded from receiving „„ External validity benefits from the intervention or program Difference-in- „„ Trend of the treated group must be „„ DDs control for factors (observed and „„ DD estimates are invalid if changes over time occurred to one difference (DD) identical to trend of control group in the unobserved) that do not vary over time group but not the other, or if the two groups had different trends before absence of the intervention „„ Cost-effective impact evaluation method the intervention Instrumental variables „„ Instrument must be strongly „„ IV estimates control for unobserved „„ If not planned ahead, IV evaluations are difficult to do (IV) associated with participation of the policy information that may influence self-selection into „„ IV results estimate only local effects and must not be associated with the the program outcomes evaluated Regression „„ In the absence of the intervention, „„ Baseline data not needed „„ RD effects will be biased if the cutoff was assigned to maximize discontinuity (RD) the cutoff variable should be associated „„ Solid RD estimates are comparable to RCT the impact of the intervention, or if firms are able to manipulate their with the outcome variable in a estimates eligibility into the program continuous manner „„ Sufficient observations surrounding the cutoff are needed „„ RD results estimate only local effects Global Partnership for Financial Inclusion Propensity score „„ After controlling for observed „„ PSM allows identification of a control group „„ PSM is data intensive matching (PSM) differences, outcomes of treated group when the eligibility criteria depend on multiple „„ PSM estimator not robust against bias caused by unobserved identical are to outcomes of control variables information associated with participation in the program (in this case, group in the absence of the intervention PSM can be combined with other approaches) „„ PSM should be used only in cases where the evaluator has a clear and detailed understanding of the eligibility criteria of an intervention Minimal standard „„ Outcomes of treated subjects are not „„ Relatively easy method to implement—does „„ Results should be treated with caution since other factors besides monitoring being affected by any other factor except not require significant technical capacity from the policy may be contaminating the results by the policy of interest the evaluating team or the principal investigator Impact Assessment Framework: SME Finance