080 080 This paper is a product of the Poverty and Equity Global Practice Group. It is part of a larger effort by the World Bank to provide open access to its research and contribute to development policy discussions around the world. The authors may be contacted at eskoufias@worldbank.org. The Poverty & Equity Global Practice Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. ‒ Poverty & Equity Global Practice Knowledge Management & Learning Team This paper is co-published with the World Bank Policy Research Working Papers. Estimating Poverty Rates in Target Populations: An Assessment of the Simple Poverty Scorecard and Alternative Approaches Alexis Diamond, International Finance Corporation Michael Gill, Harvard University Miguel Rebolledo Dellepiane, International Finance Corporation Emmanuel Skoufias*, The World Bank Group Katja Vinha, The World Bank Group Yiqing Xu, Massachusetts Institute of Technology JEL Classification: I31, I32, I38 Keywords: Simple Poverty Scorecard, PPI, Headcount poverty rate, Training and Test data sets. *Corresponding author: Emmanuel Skoufias, The World Bank (Mail Stop: I4‐405), 1818 H Street NW, Washington DC 20433‐USA. tel: (202) 458‐7539. fax: (202) 522‐3134. e‐mail: eskoufias@worldbank.org. Acknowledgements: We are grateful to Nobuo Yoshida and Phillippe Leite for inputs at the early stages of this study and to Mark Schreiner for discussions and clarifications provided through the process if this study. Contents I.  Introduction .......................................................................................................................................... 1  II.  Data ....................................................................................................................................................... 2  III.  Methods and Notation ......................................................................................................................  3  A.  Household‐level Poverty Probabilities: Several Approaches ............................................................ 4  B.  Regression‐based Alternatives to the SPS ........................................................................................  6  C.  Group‐Level Poverty Rates ...............................................................................................................  8  D.  Measures of Uncertainty via Bootstrap ............................................................................................  8  IV.  Results ............................................................................................................................................... 9  A.  Estimating National Poverty Rates....................................................................................................  9  B.  Stratum‐Specific Poverty Rates .......................................................................................................  11  C.  Testing Estimator Resilience over Time .............................................................................................  27  V.  Concluding Remarks and the Way Forward ........................................................................................  29  References .................................................................................................................................................. 31  ................................................ 33  Appendix 1: A Detailed Summary of the Poverty Scorecard Methodology  Appendix 2: SPS Survey Questions and Lookup Tables ..............................................................................  36  Appendix 3: Additional Figures from other Countries ................................................................................  46  Appendix 4: Stratum Specific Poverty Rates ...............................................................................................  50  I. Introduction The World Bank Group (WBG)’s twin goals—eliminating extreme poverty and boosting shared prosperity—have intensified the interest in measuring poverty rates in specific populations targeted by development programs worldwide. Private‐sector firms and financial institutions (especially micro‐finance institutions, NGOs, agriculture and agribusiness enterprises) are also increasingly selling and buying from the poor products and services and seeking to estimate poverty levels in specific market segments to inform their business strategies and operations. The most rigorous poverty estimation methodologies are, however, not necessarily practical for development practitioners or private sector firms. The best sources of poverty data are based on government‐run large‐scale nationally‐representative household surveys that collect highly detailed socioeconomic household information, cost millions of dollars, and take years to design and implement (Benin and Randriamomonjy, 2008). However useful these survey instruments and data may be for many applications, they do not enable direct estimation of poverty rates in targeted idiosyncratic populations. Even if the relevant national data happen to be available, the alternative approach of using small area estimation or survey to survey imputation methods is very data intensive and requires a level of technical sophistication that makes this approach impractical for practitioners (Elbers, et al, 2003; Christiaensen et al, 2012; Tarozzi and Deaton, 2009). The most popular solution for project‐specific poverty estimation is the Simple Poverty Scorecard (SPS) described in Schreiner (2014a), which is implemented with a 10‐question survey and (in its basic form) a statistical look‐up table.1 The SPS is based on a logistic regression, but it departs from established econometric approaches. Despite the SPS’s widespread use there is little published academic literature assessing its performance when applied to subnational populations across countries and time‐periods, as it is actually utilized by researchers.2 We seek to fill this gap in the literature by evaluating the SPS, performing several thousand statistical experiments across diverse populations and strata in nine separate countries in a collection of surveys representative ay subnational level that span nearly a decade. These experiments assess the national level calculated SPS performance versus the performance of established regression‐based estimators.3 We benchmark all estimates against observed data on poverty status derived from government‐run sub‐nationally representative socioeconomic household surveys. We find that established regression‐based models like ordinary least squares, logistic regression, and lasso regression—trained on “training set” data and tested on “test set” data—outperform the SPS in terms of bias and variance. In many of these experiments, our regression‐based estimators perform better than the SPS because they are informed by additional and target population specific information 1 See Appendix 1 for a summary of the methodology. 2 As Schreiner ( 2015) states, “Like all predictive models, the scorecard here is constructed from a single sample and so misses the mark to some unknown extent when applied to a validation sample. Furthermore, it is biased when applied (in practice) to a different population….. (because the relationships between indicators and poverty change over time).” 3 It is important to clarify that this report evaluates the performance of the SPS in estimating poverty for a target population and not the performance of SPS in predicting the poverty status of an individual for targeting based on poverty status or for program eligibility. In fact the SPS can be and has been used for targeting based on individual poverty status but the evaluation of the targeting performance of SPS is outside the scope of this report. 1 reflecting a basic but important point: the SPS does not take advantage of any information beyond its ten survey questions, even though researchers almost always have additional data at their disposal. Using all relevant available information is especially important when targeting groups that significantly differ from the national populations upon which SPSs are based, because the modeled relationship between household characteristics and poverty status in a given national population may not hold true for subpopulations in the country. We advise researchers attempting to estimate poverty rates in particular samples to adopt an approach based on the established regression‐based techniques explored in our experiments. Our recommended approach is more accurate than the SPS, applicable to any population of interest, and able to exploit information beyond the ten SPS survey questions. Because we utilize national household surveys that statistically representative for specific subnational strata, our approach can be used to derive reliable poverty rate estimates for these subnational groups. This paper is organized as follows: Section II describes the data. Section III explains the methods and models under examination, including the SPS, and Section IV assesses their relative performance. Section V summarizes lessons learned and describes paths for future work. II. Data Our analysis draws from 14 national socioeconomic household surveys across 9 countries (see Table 1), which were implemented by the statistical agencies of the various governments. The surveys cover subnationally representative samples ranging in size from 3,579 households (in Sierra Leone, 2003) to 293,715 households (in Indonesia, 2010). Each survey contains a core questionnaire which consists of a household roster listing the sex, age, marital status, educational attainment, household income and/or expenditure information, and labor force experience of all household members. We use the expenditure or income data along with national poverty lines to determine the poverty status of each household. We then use the poverty status to calculate our benchmark poverty rates, referred to below as “observed” poverty rates for various the populations of interest. Hundreds of questions are asked in each survey (e.g., 612 variables total in 21 data sets associated with the Bangladesh 2010 Household Income and Expenditure Survey), and the time and cost of implementation is a major reason that so many researchers have adopted the SPS approach. The SPS and the other models we examine below have been derived from these data sets, and are thus based upon the statistical relationships between observed poverty status and household characteristics. We also use these large‐scale household surveys to test the models, assessing which models come closest to the observed poverty rates. Table 1: Representative strata and national poverty rates in SPS sample Representative Number Poverty rate Poverty rate Country Year N strata of strata (weighted) (unweighted) Bangladesh 2010 Region 7 12,209 0.2848 0.2881 2 (HIES) Indonesia 2010 Districts 498 293,715 0.1000 0.1047 (SUSENAS) Jordan 2006 Urban/Rural 2 11,639 0.0942 0.0953 (HEIS) Jordan 2008 Urban/Rural 2 10,961 0.1533 0.1548 (HEIS) Jordan 2010 Urban/Rural 2 11,223 0.1202 0.1241 (HEIS) Nepal 2010 Analytical domains 12 5,988 0.2000 0.1852 (NLSS) Paraguay Department 2011 7 4,893 0.2626 0.2669 (EPH) (modified) Peru 2010 Department 24 20,048 0.2673 0.3064 (ENAHO) Peru 2011 Department 24 22,978 0.2433 0.2752 (ENAHO) Peru 2012 Department 24 23,349 0.2254 0.2551 (ENAHO) Sierra Leone 2003 Region 4 3,579 0.6106 0.6767 (SLIHS) Sierra Leone 2011 District 14 6,693 0.4572 0.4715 (SLIHS) Thailand Region 2011 9 42,083 0.1339 0.1205 (SES) Urban/Rural Uganda 2009 Region 4 6,755 0.2436 0.2364 (UNPS) Notes: N indicates the sample size of the data set used to calibrate the SPS. The weighted poverty rate uses population survey weights and the unweighted does not. III. Methods and Notation For each data set in our sample, we estimate the national poverty rate and stratum‐specific poverty rates through various approaches. Here we present the notation and a general overview of the approaches used in our analysis. There are a set of households N = 1,…,n. Let denote household i’s observed poverty status (i.e., in poverty or not, 1 or 0) as determined by their expenditure (or income) in reference to the national poverty line. While there are 10 questions in each poverty scorecard, some of these questions have categorical responses with more than two categories. Some care is taken to preprocess the explanatory data for analysis. Let denote the number of possible categorical responses for each survey question q = 1, 2,…, 10; for example, if question q = 1 had possible responses A; B; and C, this means there are three response categories to that question. If an respondent were to be asked all 10 questions, it follows that the total number of binary indicators needed to account for all survey responses is ∑ 1 . Hence, in our reformatted data, is a binary indicator (1 or 0) of whether household i provided response j for each j 1, …, p. 3 A. Household‐level Poverty Probabilities: Several Approaches Simple Poverty Scorecard (SPS) The SPS is a poverty estimation methodology developed by Mark Schreiner of Microfinance Risk Management, L.L.C.4 Each SPS is designed for a particular country and a particular year. Design begins with a nationally representative household survey, which is taken as the poverty estimation in that country at the time of the survey. For a given survey, half the data are used as a “training set” to develop (or train) the model that the SPS will use to estimate poverty rates, and the other half (the “test set”) are used to validate the accuracy of the constructed model. To train the model, the SPS developer repeatedly analyzes the training set, attempting to identify 10 questions from the national household survey that can reliably predict household‐level poverty‐status. This is an iterative model‐selection process that relies on both statistical methods (logistic regression) and professional judgment, in an effort to identify variables with high predictive power that can be easily collected and verified by surveyors in the field. Details on the development and calibration of SPS for particular data sets in our sample can be found in Schreiner (2010, 2011a, b, 2012a, b, c, 2013a, b, 2014b).5 SPSs have been estimated for at least 63 countries, and in many countries they have been periodically updated when new household surveys have become available. Once an SPS has been developed, actually calculating poverty rates for a given data set is straightforward. All that is needed is basic arithmetic, pencil and paper, and a lookup‐table that converts each household’s survey result (the “poverty score”) into an estimated probability that that household is below a specific poverty‐line. After converting the results into probabilities for all respondents, the average probability is adjusted by an additive bias‐adjustment factor (typically a fraction of a percentage‐point) and the result is the poverty‐rate estimate. Technology applications to do this are also widely available and used by the private sector. One limitation of the SPS is that its look‐up table (which is derived from logistic regression results) is calibrated to deliver certain discrete estimated probabilities for each household. In the case of the Bangladesh 2010 SPS, the look‐ up table converts any survey result into one of 18 discrete poverty rate estimates. This SPS can estimate a household‐level probability of 40.9%, or 50.4%, but nothing in between (see Figure 1). 4 More details available online at http://www.microfinance.com/#Poverty_Scoring. 5 Or see Appendix 1 for a summary of the methodology. 4 Figure 1. Comparing the SPS and Fitted Values with a Simple Logit Model (Bangladesh, 2010). This plot compares the household‐level poverty probabilities from a logistic regression model (the horizontal axis) against those obtained from the SPS (the vertical axis) using data from the Bangladesh 2010 HIES. As a result of data coarsening in the SPS lookup table, the number of unique probabilities generated by SPS is considerably lower than logit. While there is a strong positive association between the results from both approaches, the figure demonstrates there is information sacrificed in the calculation of SPS probabilities. 5 SPS authors claim good properties for their estimator when it is applied to data that mirror the national household samples that formed the original SPS training sets. Indeed, the SPS’s claims to reliability are, almost without exception,6 limited to SPS’s application to the corresponding national population data. The problem with this claim is that the need to use an SPS to estimate poverty rates in these target populations it is not at all obvious, given that the best and easiest approach would be to derive the numbers directly from the original national data sets (or read the summary statistics published by government agencies). Indeed, the SPS is typically used to estimate poverty rates in specific subnational groups. For example, microfinance institutions frequently and other private sector actors use the SPS to estimate poverty rates in specific subnational groups. Similarly, when the SPS is used by development practitioners to assess poverty rates in project areas, or to compare poverty rates before versus after projects or in treatment versus control samples, the target population is typically a specific subnational group of project participants. The SPS is sometimes used to estimate poverty rates in groups that are significantly poorer than the national average (e.g., female‐headed agricultural households in a particular region of a country). The SPS authors are cognizant of the difference between the ideal conditions for SPS use and the way it is actually used by practitioners, and SPS documentation always includes a caveat to this effect, like the one quoted in footnote 4. B. Regression‐based Alternatives to the SPS Ordinary Least Squares (OLS): Using observed survey responses for each of the ten poverty scorecard questions, we estimate the following linear probability model ∑ , 1,2, … , . (OLS) In this context, each observation in our training sample is given equal weight toward the model’s estimation. As is well known, the set of estimated regression coefficients are estimated so as to minimize the Sum of Squared Errors (SSE), i.e., ∑ ∑ ∑ . In the event the empirical distribution of the training sample is reflective of the researcher’s population of interest, the OLS estimator may provide unbiased and consistent estimates of group‐ level poverty rates despite the fact that household‐level poverty estimates may fall outside of the unit interval. Weighted Least Squares (WLS): In contrast to OLS, WLS does not treat all observations in the training set as equally influential to the estimation procedure. This feature may be desirable the more a national survey is stratified, or if the empirical distribution of survey respondents does not closely approximate the proportion of households in the researcher’s population of interest, or if the residuals of the classic least squares models are heteroskedastic ( Greene, 2003). In this report, we 6 One exception is Mark Schreiner’s paper “Is One Simple Poverty Scorecard Enough for India?” http://www. microfinance.com/English/Papers/Scoring_Poverty_India_Segments.pdf. 6 weight the influence of each observation according to its estimated inverse‐probability weight , which are obtained from the national surveys. As in Equation (OLS), the WLS regression equation model assumes a household’s probability of being impoverished is a linear function of observed covariates (i.e., survey responses). However, the WLS estimator minimizes the weighted sum of squared errors (WSSE), such that ∑ ∑ ∑ . Logistic Regression (Logit): In addition to OLS and WLS, we implement a simple logistic regression model to compare against poverty rates generated by the poverty score card. We model poverty with the well‐known functional form ∑ 1 (Logit) ∑ Lasso Penalization: The lasso is a form of penalized regression, similar to ridge regression, whereby regression coefficients are weighted by “shrinkage factors” such that regression coefficients are weighted towards zero (Tibshirani, 1994; Hastie et al., 2009; James et al. 2013). The lasso is commonly used for feature selection in high‐dimensional learning problems to decrease the variance of a particular classifier. For our ordinary least squares estimators, we apply the lasso at the training‐ set level such that we solve the following problem: min ∑ ∑ s.t. ∑ (OLS Lasso) where s is a coefficient shrinkage factor, and is a linear estimate of the marginal influence of a survey response on poverty. Philosophically, this procedure is similar to an ordinary least squares regression procedure in which the best‐model is determined by that which minimizes the in‐sample sum of squared residuals, except regression coefficients are penalized according to prior rule (i.e., the shrinkage factor) on the minimum coefficient size a variable is allowed to have to be included in the final classification model. The conventional penalty used in the lasso is the ℓ penalty, which is defined by ∑ . In Equation OLS Lasso, the coefficient shrinkage factor is therefore defined as ⋅ , where ∈ 0,1 is a tuning factor. The optimal level of is chosen through 10‐fold cross validation. Training and Test Sets: For each national survey analyzed in this report, we divide the full survey sample into training set and a test set (a random 50‐percent of the data in each set), just as the SPS’s developers do. The regression‐based models (OLS, logit, etc.) described in Section A are trained (i.e., coefficients are estimated) exclusively on the training data set. To evaluate the performance of these models, we project the estimated model parameters (and the equivalent–the SPS’s poverty scores) onto the test set. By “project” we mean we use the coefficients estimated for the training data sets and the values in the testing sets to derive the predicted poverty rate in the test data sets. It is important to keep in mind that all the regression‐based models we employ at the national and at the 7 stratum level use, on purpose, the same 10 variables used by the SPS model.7 All results presented in Section IV, including the stratum‐specific analyses, use only the test set data. C. Group‐Level Poverty Rates The objective of the analysis is to estimate poverty rates at various levels. Here we distinguish the poverty rate of the survey sample from the poverty rate of the national population, and we distinguish both of these quantities from the poverty rate of a particular stratum of interest (i.e., a particular subset of the national population). The utility in estimating any of these quantities will depend precisely on how closely they map to the researcher’s target population of interest. Table 2: Calculation of Poverty Rates for Different Populations of Interest. The weighted‐ national estimate of the poverty rate is the classical Horvitz‐Thompson estimator (Horvitz and Thompson, 1952). For stratum‐specific poverty rates, S indicates the set of individuals in a particular stratum of interest, where ⊂ , and | | denotes the length of the stratum. Readers should note we do not directly estimate in this technical report, so the quality of our results that rely on these weights will depend on the quality of these weights derived by prior analysis and made available in these data sets. Poverty Rate of Interest Observed Rate (in Data) Estimated Rate (Fitted) 1 1 Sample National(Weighted) ∙ ∙ 1 1 Stratum | | ∈ | | ∈ Stratum (Weighted) ∙ ∙ ∈ ∈ ∈ ∈ D. Measures of Uncertainty via Bootstrap For each of the surveys, we obtain both the observed rates of poverty based on the raw national surveys (yi) and the estimated poverty rates obtained thorough the approaches described in Section III, . For all confidence intervals presented in this analysis, bootstrapped standard errors of the mean (using 5000 bootstrapped samples) are calculated for estimated poverty rates of interest (Efron and Tibshirani, 1998). Bootstrapped confidence intervals are similarly generated for the observed levels of poverty in the data, given that the observed rate of poverty is itself a sample and therefore an estimate of the unobserved population. 7 It is quite plausible that not all of these 10 variables will be the best predictors of poverty in each different stratum. In fact it is quite likely that the best predictors of poverty may vary from stratum to stratum which implies that the composition of the set of the 10 best poverty predictors is likely to vary across strata. For the purpose of keeping the analysis simple and fair to the SPS, we have decided to stick with the same 10 variables used by SPS to predict poverty. One exception is the Lasso which is analogous to selecting prediction variables among the 10 variable used in the model (see James et al. 2013). 8 IV. Results In this section we assess the accuracy of the SPS by running multiple trials across different “test” sets, including national samples and subpopulations, to see how well the SPS poverty estimates can recover the observed poverty rates. We then subject the regression‐based models to the same trials and compare SPS and regression results. We consider whether the SPS or other approaches are preferable for estimating poverty rates. A. Estimating National Poverty Rates We begin by exploring how effectively the SPS and other estimators can recover the observed poverty rate in a random sample drawn from national household survey data. This is a logical place to start because it is what the SPS is expressly designed to do. We run 14 tests in as many country‐year data sets, with and without the observation‐specific weights accompanying the national household surveys.8 We show results for trials with the weights because they produce statistically representative national samples, but we also show results for trials run without weights (the “raw sample”) because in practice, the SPS is often applied to idiosyncratic samples that do not generally have observation weights. Figure 2 is illustrative of the findings from the exercise described above. The upper panel shows poverty estimates and bootstrapped 95% confidence intervals—Indonesia on the left, Peru on the right—applied to raw (unweighted) national “test sets” or validation sets (i.e., random samples from the national survey data that were not used to fit the regression‐based models). The regression‐based methods clearly dominate the SPS in the upper panel; of these, weighted least squares (WLS), which utilizes observation weights in the training set but not the test set, does the worst but still dominates the SPS. The lower panel shows how these results change when observation weights are applied to the test sets (creating nationally representative test sets). Here we see that the SPS performs better than it did in the upper panel, but SPS results are not dominated by the regression‐based approaches (and in fact SPS is clearly dominated by WLS, which performs the best). Additionally, a general feature of these results is that applying the SPS’s bias correction factor actually increases discrepancies, moving estimates away from the observed poverty rate. 8 Figures from Paraguay, Jordan, Uganda, and Thailand are presented in Appendix 3. 9 Figure 2. Regression‐Based Methods Improve Upon Simple Poverty Scorecard for National Poverty Rate Estimates. The upper panel shows poverty estimates and bootstrapped 90% and 95% confidence intervals—Indonesia on the left, Peru on the right—applied to raw (unweighted) national “test sets” or validation sets (i.e., random samples from the national survey data that were not used to fit the models). Regression‐based methods clearly dominate the SPS in the upper panel; of these, weighted least squares (WLS), which utilizes observation weights in the training set but not in the test set, does the worst but still dominates the SPS. The lower panel shows how these results change when observation weights are applied to the test sets (creating nationally representative test sets). Here the SPS performs better than in the upper panel, but SPS is still not dominated by the regression‐based approaches (and is clearly dominated by WLS, which performs the best). Additionally, a general feature of these results is that applying the SPS’s bias correction factor increases the discrepancy, moving estimates away from the observed poverty rate. 10 B. Stratum‐Specific Poverty Rates Results in the previous section show the SPS to be fairly reliable in percentage‐point terms (though not generally the most accurate estimator) for estimating poverty rates in samples mirroring the national household samples that formed the original SPS training sets. However, it is essential to test SPS and the other methods under realistic conditions, as they would actually be used by project leaders and researchers. To this end, we assess the performance of all estimators applied to subnational groups. We first compare results for the SPS versus the Logit –based estimator across geographic strata. In this exercise the SPS and the Logit models are both trained (or estimated) with models applied to national‐level data. Figure 3 indicates how nationally representative poverty estimators, such as SPS, perform relative to logistic regression models trained on the national data. Even though the Indonesia 2010 SUSENAS survey is only representative at the district level, we split the data into 934 strata such that each of the 498 districts are partitioned into urban and rural areas, and measure the discrepancies between estimated poverty rates and actual poverty rates for each of these strata. Each of the small circles corresponds to the discrepancy for a given stratum—raw discrepancies in the upper panel and absolute discrepancies in the lower panel, with SPS results in blue, and national‐level logit results in gold. Locally weighted scatterplot smoothing (Cleveland and Devlin, 1988) defines best‐fit curves drawn through the points. The green vertical line shows the national poverty rate. We implement bootstrapped Kolgomorov‐Smirnov tests for the equality of the poverty scorecard estimates against the stratum specific regressions (Præstgaard, 1995), and find that the results from the two estimators are statistically distinguishable, but both estimators perform about equally poorly and overestimate poverty rates in the richer regions and underestimate poverty in the poorer regions. In the poorest strata, average SPS discrepancies are as high as 15–25 percentage‐points. 11 Figure 3. “One Size Fits All” National Models Perform Poorly When Applied to Subnational Strata, SPS versus Logit (Indonesia 2010). We split the Indonesia 2010 data into 934 strata such that each of the 498 districts are partitioned into urban and rural components and measure the discrepancies between the estimated poverty rates and the observed poverty rates. Each of the small circles corresponds to the discrepancy for a given stratum—raw discrepancies in the upper panel and absolute discrepancies in the lower panel, with SPS results in blue, and national‐level logit results in gold. Each estimator produces results that are statistically distinguishable from the other (see the statistically significant Kolmogorov‐Smirnov p‐values), but both estimators perform about equally poorly, overestimating poverty rates in the richer regions and underestimating poverty in the poorer regions. In the poorest strata, average SPS discrepancies are as high as 15–25 percentage‐ points. 12 13 The next set of figures (Figures 4‐7) compares the SPS against regression‐based estimators trained or estimated separately using data from each geographic stratum that the household survey is designed to yield representative estimates for. As Table 1 indicates many household surveys are designed to be representative for different geographic strata. Peru and Bangladesh for example are representative at the regional level, with Peru having 24 departments (regions) and Bangladesh 7 regions. Indonesia, on the other hand, following the fiscal decentralization that took place in 2001, is designed to representative for each of its 498 districts. Regression‐based approaches perform much better when trained on data specific to each geographic stratum. The upper subplot in Figure 4 shows that for the SUSENAS 2010 data set from Indonesia (which has the largest number of representative subnational groupings of all our data sets) no matter what the quantile, the magnitudes of discrepancies for regression‐based estimators are a fraction of what they are for the SPS. For each estimator, we observe the district‐ level estimates of the poverty rate. We compare the relative absolute error (i.e., the absolute value of the estimated value minus the observed poverty rate in the test set, benchmarked against the error rate of the SPS) at each percentile of the absolute error rate. The lower half of this figure restricts the sample to the poorer districts (i.e., the districts with a poverty rate above the median district poverty rate), and reflects how applying the SPS to the poorest subgroups (as is often done in practice) may compare to other methods. Figure 5 shows the overlap of the 95% confidence interval of poverty rate estimates using SPS and the 95% confidence intervals of the true poverty rate estimate based on household consumption at the district level.9 There tends to be spatial correlation among districts where the SPS overestimates the poverty rate and where it underestimates the poverty rate. In about 10% of the districts, the SPS under‐estimates the poverty rate and in about 26% of the districts it over‐estimates the poverty rate. In comparison, the strata‐specific logit estimator under‐ estimates the poverty rate in about 2% of the districts and over‐estimates the poverty rate in about 2% of the districts. 9 The maps are based on the 2007 district boundaries. We were unable to acquire the boundary files with the most recent divisions. If a district has since split, only the data reflect the poverty rate comparison on the “mother” district. That is if a district was created after 2007, then the map does not necessarily reflect the poverty comparison in this district, but in the district from which it split from. The maps reflect the estimates using the weighted testing set. 14 Figure 4 15 Figure 5. Mapping Discrepancies Across Districts: Strata‐specific Logit District Dominates the SPS for Indonesia 2010. The upper panel shows the overlap of the 95% confidence interval of poverty rate estimates using SPS and the 95% confidence intervals of the true poverty rate estimate based on household consumption at the district level. There tends to be spatial correlation among districts where the SPS overestimates the poverty rate and where it underestimates the poverty rate. In about 10% of the districts, the SPS underestimates the poverty rate and in about 26% of the districts it overestimates the poverty rate. In comparison, the strata‐specific logit estimator (lower panel) under‐estimates the poverty rate in about 2% of the districts and over‐estimates the poverty rate in about 2% of the districts 16 In Figure 6 we split the Peru 2010 data into 24 regional (departments) strata and measure discrepancies between estimated poverty rates by the stratum‐specific logit and the national SPS and observed poverty rates. Each of the small circles corresponds to the discrepancy for a given strata. Raw discrepancies are in the upper panel and absolute discrepancies are in the lower panel, with SPS results in blue, strata‐specific logit in red, and the green vertical line marking the average national poverty rate. Stratum‐specific logit shows strong and consistent performance, with low average discrepancies across the domain of observed stratum poverty rates. Notice that as before, the SPS overestimates poverty rates in the regions that are least poor. In the strata with the lowest poverty rates, the discrepancies are highest, averaging about 15 percentage points. Kolmogorov‐Smirnov p‐values show statistically significant differences between SPS and strata‐specific logit results. The map in Figure 7 shows that the SPS overestimates the poverty rate for 6 regions (upper panel), whereas region‐specific logit produces estimates are within the 95% confidence interval of the true estimate in all cases (lower panel). 17 Figure 6. Regional Strata Poverty Rate Estimation: Strata‐specific Logit Dominates the SPS for Peru 2010. We split the Peru 2010 data into 24 regional strata (departments), and measure the discrepancies between estimated poverty rates and observed poverty rates. Each of the small circles corresponds to the discrepancy for a given stratum‐‐‐raw discrepancies in the upper panel and absolute discrepancies in the lower panel, with SPS results in blue, strata‐ specific logit in red, and the green vertical line marking the average national poverty rate. Stratum‐specific logit shows strong and consistent performance, with low average discrepancies across the domain of observed stratum poverty rates. SPS overestimates poverty rates in the regions that are least poor. In the strata with the lowest poverty rates, the discrepancies are highest, averaging about 15 percentage points. Kolmogorov‐Smirnov p‐ values show statistically significant differences between SPS and strata‐specific logit results. 18 Figure 7. Mapping the Discrepancies Across Regional Strata: Strata‐specific Logit Dominates the SPS for Peru 2010. The region‐specific logit produces estimates within the 95% confidence interval of the true estimate in all cases, whereas the SPS overestimates the poverty rate for 6 regions. 19 The next set of figures (Figures 8‐12) drill deeper than the lowest geographic level for which a survey may be representative for, by comparing the performance of the national SPS and the stratum‐specific Logit model with the stratum now defined by the intersection of the region or district identifier and another key socio‐economic variable, such as an identifier whether the household head is male or female or whether the household head is in agriculture or not. One important caveat associated with these comparisons is that the “true” poverty rate estimate that the SPS and the stratum‐specific Logit models are compared with, is likely to have a high variance since the survey is designed to yield reliable estimates for the region or the district as a whole but not for any specific socio‐economic group within these geographic areas. We will describe each of these analyses in order.10 The map in Figure 8 also shows the differences between the SPS estimates and the 14 district‐specific logit estimates for Sierra Leone. The Sierra Leone Integrated Household Survey (SLIHS) in 2003 is only representative for the 4 regions of Sierra Leone, whereas the 2013 is representative for all 14 districts of the country. In this exercise we do not take into consideration the geographic representativity of the available survey, and we use the district identifiers to create the district strata in 2003 even though the survey is representative at the region level and not at the district level. The district‐specific logit produces estimates within the 95% confidence interval of the poverty estimates in all districts,11 whereas the SPS overestimates the poverty rate for one district (the area near Freetown, the capital) and underestimates the poverty in three districts. Figure 9 shows the results for Bangladesh 2010, split into 14 strata defined by the intersection of the “agricultural head of household” and the 7 “region” dummy variables. Stratum‐specific logit shows strong and consistent performance, with low average discrepancies across the domain of observed stratum poverty rates. Again, the stratum‐specific logit shows strong and consistent performance, with low average discrepancies across the domain of observed stratum poverty rates. Furthermore, the SPS consistently over‐estimates the lower poverty rates and consistently underestimates the higher poverty rates as in Indonesia and in Peru in the earlier set of estimates. In the poorest and richest strata, average SPS discrepancies may be as high as 15–20 percentage‐ points. Kolmogorov‐Smirnov p‐values indicate that differences between SPS and strata‐specific logit results are statistically significant. Figure 10 shows results for Sierra Leone 2003 data, divided into 28 strata defined by the inter‐section of the district/region and the “agricultural head of household” dummy variables. The upper panel reveals that SPS is reliable (discrepancies are near zero) only if the strata poverty rates are in the immediate neighborhood of the national poverty rate; once again, the SPS overestimates poverty rates in the richer districts and severely underestimates poverty in the poorer districts, with the worst performance in the poorest districts. The Kolmogorov‐ Smirnov p‐values in the upper panel show that strata‐specific logit and SPS are not distinguishable when comparing performance based on raw discrepancies. In the lower panel, 10 Figures from Paraguay, Thailand, and Uganda may be found in Appendix 4. 11 Note that since the 2003 survey is only representative at the regional level and not at the district level, we refrain from referring to these estimates as the “true” estimates of poverty in the district. 20 however, strata‐specific logit dominates SPS (though not as much in previous strata‐specific comparisons); in both weighted and unweighted comparisons, the red line (strata‐specific logit) has less absolute discrepancy than SPS for the entire range of observed stratum poverty rates. SPS discrepancies here are some of the worst across all data sets– averaging as much as 30 percentage points in the poorest regions. Figure 11 shows results for Nepal 2010, using 28 strata defined by the intersection of the “female head‐of‐household” and the14 administrative zones of the country. Stratum‐specific logit dominates SPS, which tends to underestimate the poverty rate across the entire domain of observed stratum‐specific poverty rates. When the strata are idiosyncratic and deviate from the national sample, average SPS discrepancies may be as high as 10–20 percentage‐points. Finally, Figure 12 shows results for the Indonesia 2010 data, divided into 934 strata defined by the intersection of “urban/rural” and the district identifier. Stratum‐specific logit shows strong and consistent performance, with low average discrepancies across the domain of observed stratum poverty rates. SPS is reliable (discrepancies are near zero) only if the strata poverty rates are in the immediate neighborhood of the national poverty rate; once again, the SPS overestimates poverty rates in the richer districts and severely underestimates poverty in the poorer districts, with the worst performance in the poorest districts. When the strata are idiosyncratic and deviate from the national sample, average SPS discrepancies may be as high as 15–25 percentage‐points. Kolmogorov‐Smirnov test results are statistically significant here as well. 21 Figure 8. Mapping Discrepancies Across District and Agricultural/Non‐Agricultural Strata: Strata‐specific Logit Dominates the SPS for Sierra Leone 2011. The district‐specific logit produces estimates within the 95% confidence interval of the true estimate in all cases, whereas the SPS overestimates the poverty rate for one district and underestimates the poverty rate for three districts. 22 Figure 9. Agricultural/Non‐Agricultural Household and Regional Strata: Strata‐specific Logit Dominates SPS for Bangladesh 2010. We split the Bangladesh 2010 data into 14 strata, and measure the discrepancies between the estimated poverty rates and the observed poverty rates. Each of the small circles corresponds to the discrepancy for a given strata—raw discrepancies in the upper panel and absolute discrepancies in the lower panel, with SPS results in blue, strata‐specific logit in red, and the green vertical line marking the average national poverty rate. Stratum‐specific logit shows strong and consistent performance, with low average discrepancies across the domain of observed stratum poverty rates. SPS consistently overestimates the lower poverty rates and consistently underestimates the higher poverty rates. In the poorest and richest strata, average SPS discrepancies may be as high as 15–20 percentage‐points. Kolmogorov‐Smirnov p‐values indicate that differences between SPS and strata‐specific logit results are statistically significant. 23 Figure 10. Agricultural/Non‐Agricultural Household and District Strata: Strata‐specific Logit Dominates SPS, Sierra Leone 2003. We split the Sierra Leone 2003 data into 28 strata defined by the intersection of the district and the “agricultural head of household” dummy variables, and measure the discrepancies between the estimated poverty rates and the observed poverty rates Each of the small circles corresponds to the discrepancies between observed and estimated poverty rates‐‐raw discrepancies on the upper panel and absolute discrepancies on the lower panel, with SPS results in blue, strata‐specific logit in red, and the green vertical line marking the average national poverty rate. The upper panel reveals that SPS is reliable (discrepancies are near zero) only if the strata poverty rates are in the immediate neighborhood of the national poverty rate; once again, the SPS overestimates poverty rates in the richer districts and severely underestimates poverty in the poorer regions, with the worst performance in the poorest districts. The Kolmogorov‐Smirnov p‐values in the upper panel show that strata‐specific logit and SPS are not distinguishable when comparing performance based on raw discrepancies. In the lower panel, however, strata‐specific logit dominates SPS (though not as much in previous strata‐specific comparisons). In both weighted and unweighted comparisons, the red line (strata‐specific logit) has less absolute discrepancy than SPS for the entire range of observed stratum poverty rates. SPS discrepancies here are some of the worst across all data sets–averaging as much as 30 percentage points in the poorest regions. 24 Figure 11. Female Head‐of‐Household/Regional Strata Poverty Rate Estimation: Strata‐ specific Logit Dominates SPS for Nepal 2010. We split the Nepal 2010 data into 28 strata defined by the intersection of the “female head‐of‐household” and the administrative zone dummy variables, and measure the discrepancies between the estimated poverty rates and the observed poverty rates. Each of the small circles corresponds to the discrepancy for a given strata—raw discrepancies in the upper panel and absolute discrepancies in the lower panel, with SPS results in blue, strata‐specific logit in red, and the green vertical line marking the average national poverty rate. Stratum‐specific logit dominates SPS, which tends to underestimate the poverty rate across the entire domain of observed stratum‐specific poverty rates. When the strata are idiosyncratic and deviate from the national sample, average SPS discrepancies may be as high as 10–20 percentage‐points. 25 Figure 12. Strata‐specific Logit Dominates SPS, Indonesia 2010. We split the Indonesia 2010 data into 934 strata defined by the intersection of “urban/rural” and “district” dummy variables, and measure the discrepancies between the estimated poverty rates and the observed poverty rates. Each of the small circles corresponds to the discrepancy for a given strata. Raw discrepancies in the upper panel and absolute discrepancies in the lower panel, with SPS results in blue, strata‐specific logit in red, and the green vertical line marking the average national poverty rate. Stratum‐specific logit shows strong and consistent performance, with low average discrepancies across the domain of observed stratum poverty rates. SPS is reliable if and only if the strata poverty rates are in the immediate neighborhood of the national poverty rate. When the strata are idiosyncratic and deviate from the national sample, average SPS discrepancies may be as high as 15–25 percentage‐points. 26 C. Testing Estimator Resilience over Time National household data sets are not published every year and SPSs are not developed for every national household survey, so it is not uncommon for there to be a mismatch in time between the vintage of one’s sample and the vintage of available SPS. Of the 63 countries for which SPSs are currently available, 48 countries offer only one SPS, ten countries offer SPSs for two different years, and five countries offer SPSs for three different years. What can happen when, for example, someone attempts to estimate the poverty rate in a sample collected in 2012 and apply the SPS when the most recent SPS in the country was calibrated to a 2010 national household survey?12 To explore this question, we took SPS and regression‐based models trained on the Peru 2010 national data and tested those models using Peru 2011 and Peru 2012 test data. The upper panel in Figure 13 shows poverty estimates and bootstrapped 95% confidence intervals based on models derived from Peru 2010 data—for Peru 2011 data on the left and Peru 2012 data on the right— applied to raw (unweighted) national “test sets” or validation sets (i.e., random samples from the national survey data that were not used to fit the models). Regression‐based methods clearly dominate the SPS and weighted least squares (WLS), which utilizes observation weights in the training set but not the test set, does the worst but still dominates the SPS. The lower panel shows how these results change when observation weights are applied to the test sets (creating nationally representative test sets). As with the concurrent SPS case, a common feature of these results is that applying the SPS “Bias Correction Factor” appears to increase the bias, moving estimates away from the observed poverty rate. 12 As noted above, in footnote 4, SPS documentation includes warnings about this practice. 27 Figure 13. Testing the Resilience of Poverty Estimates Over Time: Regression Outperforms SPS, Peru 2010 Estimators Applied to Peru 2011 and Peru 2012 Data. This figure illustrates the discrepancies that may occur when the poverty estimator models are not contemporaneous with the targeted sample. The upper panel shows poverty estimates and bootstrapped 95% confidence intervals based on models derived from Peru 2010 data—for Peru 2011 data on the left and Peru 2012 data on the right—applied to raw (unweighted) national “test sets” or validation sets (i.e., random samples from the national survey data that were not used to fit the models). Regression‐based methods clearly dominate the SPS and weighted least squares (WLS), which utilizes observation weights in the training set but not the test set, does the worst but still dominates the SPS. The lower panel shows how these results change when observation weights are applied to the test sets (creating nationally representative test sets). A common feature of these results is that applying the SPS “Bias Correction Factor” appears to increase the bias, moving estimates away from the observed poverty rate. 28 We do not claim that the results in Figure 13 generalize across countries and time‐periods; they are merely indicative of what can happen when there is a mismatch in time between a training set and a test set. It is worth noting that mismatches in time can favor or hinder any estimator if economic trends cancel out discrepancies. V. Concluding Remarks and the Way Forward The accurate and efficient estimation of poverty rates is a concern for development practitioners and researchers alike. In this paper we demonstrate that an increasingly popular method for estimating poverty rates, the simple poverty scorecard, performs best when applied to the estimation of national poverty rates with nationally‐representative samples. However, SPS‐like procedures are (by their very nature and emphasis on simple operational implementation) ignore information that is commonly available to surveyors in most applied settings. Analysts generally have rich household‐level covariates, such as occupation and geographic or regional information that can provide additional information and allow researchers and practitioners to more precisely estimate poverty rates in target populations of interest. We demonstrate that both SPS‐type procedures and national‐level regressions “perform well” in practice (in a training and test set paradigm) when applied to targeted strata with poverty rates near the national poverty rate. But as the populations of interest become more granular (e.g., regional) or more extreme on the income distribution, SPS‐type procedures perform measurably worse than prosaic statistical models tuned at the stratum level. These results are also in accordance with the growing academic literature on small area estimation and poverty mapping that advocates the estimation of region or district ‐specific consumption (or income) models as long as the household survey is representative at that level (e.g., Elbers et al. 2003, Tarozzi and Deaton, 2009; Tarozzi, 2011). The findings in this report have important implications for the practitioners in the field wanting to have an estimate of the poverty rate in their target population. To begin with, it is important to understand that there is a fundamental tradeoff between simplicity of use and accuracy. Simple tools, like the SPS in its current form, are designed in favor of simplicity by estimating poverty for any possible target population using underlying parameters derived from the full sample of households in the national household survey. For example, suppose that one’s goal is to estimate poverty among female‐headed household program participants, in one region of a country, in a year in which a national household survey was administered and a SPS has been estimated and made publicly available. The practitioner could then collect data for the10 questions required by the SPS from the target female population in the specific region and apply the SPS from the same year to estimate the poverty rate for the target female population. The poverty rates for the target population would then be based on the parameters that have been estimated using national data for male and female headed households from all different regions in the country. This implies that if male‐headed households are much more prevalent in the national household survey, as it is usually the case, or if other regions have a larger population than the region of interest, then the poverty rate estimated by the SPS for the target female population may be biased in the sense 29 that it would not be so good at approximating the true poverty rate among females in the region of interest. The analysis in this report implies that the prediction of poverty could be improved if the underlying parameters of the model used to predict poverty or to assign poverty scores were estimated based on the sub‐sample of female headed households in that region (extracted from the full national household survey). However, it is important to acknowledge upfront that there are various challenges especially for the practitioner in implementing the methods employed here just for the sake of improving the accuracy of poverty predictions in the target population: (i) Potential users (researchers, practitioners, or both) would require access to sub‐national representative household level data, which are not easily accessible nor ready for processing for the purposes of predicting poverty13; (ii) Specialized statistical background and econometric expertise would be required; (iii) Even if (i) and (ii) are feasible, the sample size of the specific population of interest in the national survey may be insufficient; (iv) There may not be sufficient business appetite to individually allocate the resources needed to attain such improvements in analytical precision. Assuming there is sufficient appetite, for the practitioners in the field relying on the SPS or SPS‐ like methods to estimate poverty in the population of their interest, with the data available and currently used by the SPS, there are relatively simple and low cost ways of improving the predictions of poverty in target populations. One practical option is that the SPS method and its surrounding infrastructure get updated by considering: (i) the use of regression‐based methods such as those used in this report; and (ii) the incorporation of the intermediate and more practical step of estimating regression‐based models separately for the geographic strata that the national survey is designed to be representative at. Poverty estimates for target populations based on strata‐specific estimates of regression‐based models certainly improve upon poverty estimates based on the nationally estimated SPS. It is quite likely that region or district‐specific estimates of the SPS, depending on the country, will improve the accuracy of the poverty estimates currently based on the nationally estimated SPS. 14 Therefore, we suggest that the international development and donor community take a lead in developing, refining, packaging and making available such models in a toolkit format that would be available to current users of the SPS. 13 For example, the household survey itself may be available without the consumption (or income) aggregate used for the estimation of the official poverty rate in the country. 14 Another option, available at this point only to World Bank staff, is to rely on tools designed for customized analysis for the target population in the country of interest. The SWIFT tool currently under development in the World Bank is one such option providing target population‐specific poverty estimates upon request with primary data collected through a short survey applied to the target population in the same spirit as the SPS. The difference between SWIFT and the SPS is that the underlying statistical model and methods used in SWIFT to derive the parameters needed for the prediction of poverty are different and more flexible in the sense that a menu of options is available. SWIFT, for example, is equipped to estimate region‐specific consumption models in the national household survey that can be used to predict consumption in the target population. In addition to the headcount poverty rate, the availability of predicted consumption allows the estimation of the poverty gap and the severity of poverty, two poverty measures that are not possible to predict with the SPS in its current form. 30 References Benin, S. and Randriamomonjy J. (2008). “Estimating Household Income to Monitor and Evaluate Public Investment Programs in Sub‐Saharan Africa.” IFPRI Discussion Paper 00771 pp. 1–24. Christiaensen L., Lanjouw P., Luoto J., and Stifel D. (2012) “Small Area Estimation‐based Prediction Methods to Track Poverty: Validation and Applications”, Journal of Economic Inequality, 10:267–297 Cleveland, W. S. and Devlin, S.J. (1988). “Locally‐Weighted Regression: An Approach to Regression Analysis by Local Fitting.” Journal of the American Statistical Association 83(403):596–610. Efron, B., and Tibshirani R. (1998). An Introduction to the Bootstrap. Chapman & Hall/CRC. Elbers, C., Lanjouw, J.O., Lanjouw, P. (2003) “Micro‐level estimation of poverty and inequality.” Econometrica 71(1), 355–364.Greene, W. H. 2003. Econometric Analysis. 5th ed. NJ: Prentice Hall. Hastie, T., Tibshirani R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer. Horvitz, D. G., and Thompson, D.J. (1952). “A Generalization of Sampling Without Replacement From a Finite Universe.” Journal of the American Statistical Association 47(260):663–685. James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer Præstgaard, J. T. (1995). “Permutation and Bootstrap Kolmogorov‐Smirnov Tests for the Equality of Two Distributions.” Scandinavian Journal of Statistics 22(3):305–322. Rosenbaum, P., and Rubin, D. B, (1983). “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70(1):41–55. Schreiner, M. (2010). “A Simple Poverty Scorecard for Jordan.” Mimeo. ________________(2011a). “A Simple Poverty Scorecard for Sierra Leone.” Mimeo. ________________(2011b). “A Simple Poverty Scorecard for Uganda.” Mimeo. ________________(2012a). “A Simple Poverty Scorecard for Indonesia.” Mimeo. ________________(2012b). “A Simple Poverty Scorecard for Paraguay.” Mimeo. ________________(2012c). “A Simple Poverty Scorecard for Peru.” Mimeo. ________________(2013a). “A Simple Poverty Scorecard for Bangladesh.” Mimeo. ________________(2013b). “A Simple Poverty Scorecard for Nepal.” Mimeo. ________________(2014a). “The Process of Poverty‐Scoring Analysis.” Mimeo. ________________(2014b). “A Simple Poverty Scorecard for Thailand.” Mimeo. 31 Tarozzi, A. (2011). “Can census data alone signal heterogeneity in the estimation of poverty maps?” Journal of Development Economics 95(2):170–185. Tarozzi, A., and Deaton, A. (2009). “Using Census and Survey Data to Estimate Poverty and Inequality for Small Areas.” Review of Economics and Statistics 91(4):773–792. Tibshirani, R. (1994). “Regression Shrinkage and Selection Via the Lasso.” Journal of the Royal Statistical Society, Series B 58:267–288. 32 Appendix 1: A Detailed Summary of the Poverty Scorecard Methodology15 The poverty scorecards are based on survey data with half of the data used for to construct and calibrate the card and the other half of the data is used to validate the accuracy of the constructed card. Scorecard construction for a 10‐indicator variable card 1. Choose a large number (about 120 in Bangladesh) of candidate indicators from a representative population survey describing: family composition, education, housing characteristics, ownership of durable assets, employment, and agriculture. (Here already some decisions are made as some variables are aggregate groups. For example the variable “how many household members 12 or younger” is grouped into 0, 1, 2, or 3+.) 2. Choose a poverty line on which the scorecard is built. (For Bangladesh it was $1.25/ day 2005 PPP poverty line.) 3. Use Logit to build one scorecard for each candidate indicator using the construction/calibration sub‐ sample. 4. Order the candidate indicators based on how strongly (by themselves) they are correlated with poverty using entropy based “uncertainty coefficient.” (Figure 3 in the Bangladesh document.) 5. Choose one of the one‐indicator scorecards based on its: accuracy (in predicting poverty), likelihood of acceptance by users, sensitivity to change in poverty, applicability across regions, relevance for distinguishing among households at the poorer end of the expenditure distribution and verifiability. (For Bangladesh the first 10 indicators based on uncertainty coefficient are: number of mobile phones; number of fans; receipt of charity, gifts, royalties, help, Zagat, Fitra; tv’s with dvdetc; tv; consumption on Qurbani; landline connection or mobile phone; highest grade of male spouse; anyone worked with daily pay; main job was with daily pay. For example, of these only one of the measures of telephones (number of mobiles) was chosen. And, for example, receipt of charity was not included, possibly as it is difficult to verify.) 6. Then build a series of two‐indicator scorecards adding another indicator variable (from the set of 120) to the first one chosen in step 5. 7. Compare the two‐indicator scorecards and choose one based on the uncertainty coefficients and judgment on the characteristics of the second indicator. The second indicator should be evaluated based on the same criteria as the first one and, in addition, variety among indicators should be considered. 8. Build a series of three‐indicator scorecards (adding a third variable to 2‐variable scorecard chosen in 7) and again choose the third indicator based on the same statistical and non‐statistical criteria as before. 9. Repeat exercise until have a scorecard with 10 measures. 10. Transform the logit coefficients from the 10‐indicator scorecard into non‐negative integers with total score range from 0 (most likely below a poverty line) to 100 (least likely to be below a poverty line). Scorecard and poverty likelihood correspondence 11. Determine the poverty score for each household in the construction/calibration subsample. 12. For each calibration subsample household determine whether it is below/above a poverty line. (e.g. in the Bangladesh document, the status with respect to national lower, national upper, 150% national upper, 200% national upper, USAID “Extreme”, $1.25 PPP, $1.75 PPP, $2.00PPP, $2.50 PPP is determined for each household.) 13. For each range of poverty scores (0‐4; 5‐9; 10‐14…) determine the percentage of households (in the calibration subsample) who are below the particular poverty line for which the conversion table is being built. 15 Based on Schreiner (2014a) 33 Accuracy of SPS based poverty likelihoods 14. Determine the score for each of the households in the validation subsample. 15. Draw a bootstrap sample of n households with replacement from the validation sample. (In the Bangladesh study n = 16,384.) 16. Calculate the true poverty likelihood in the bootstrap sample. That is, the share of households below a poverty line. (Needs to be calculated separate for each poverty line considered.) 17. For each score, compare this true poverty likelihood with the estimated poverty likelihood determined in step 13 (Scorecard and poverty likelihood correspondence). Record the difference. 18. Repeat 1,000 times recording the difference between the true and estimated likelihoods for each score. 19. For each score, report the two‐sided intervals containing the central 900, 950, and 990 differences between estimated and true poverty likelihoods (to get confidence intervals) to see how accurate the measure is for different poverty scores. Accuracy of SPS based poverty rate 20. To determine the poverty rate for a particular group, average the estimated poverty likelihood (from the score cards) of all individuals in the group. 21. Calculate the true poverty rate for the 1,000 repetitions of n = 16,384 bootstrap samples. 22. Calculate the difference between the estimated poverty rate and the true poverty rate for each of the 1,000 repetitions. 23. The average difference between the estimated and the true poverty rates is the “bias correction factor.” 24. The poverty rates than need to be adjusted by this “bias correction” to get the unbiased estimates. There is a unique bias correction factor for each poverty line. (In Bangladesh they range from +0.5 to ‐ 0.9 percentage points.) 25. Use the distribution of the true poverty rate estimates from the bootstrap samples to determine standard errors/confidence intervals. (I.e. the interval containing the central 900 poverty rate estimates is the 90% confidence interval.) Determination of standard errors for estimated samples 26. To determine the standard errors for the scorecard based poverty rates the direct measurement standard error formula needs to be adjusted for the fact that the scorecard is not a direct measure of poverty. The correction factor is the ratio of the standard errors derived analytically from the bootstrap sample to the standard error from the mathematical formula in the direct measurement case. (A value less than on implies confidence intervals for poverty scoring method are smaller than those from direct measurement, i.e. they are more precise, and a value greater than one implies that they are less precise.) The correction factor is derived by using bootstrap samples of various sizes to get empirical estimates of the confidence interval and comparing them to the analytical standard errors corresponding to the same sample size. The correction factor is the average of these ratios. (In the Bangaldesh case he does the exercise for 7 different sample size ranging from n=256 to n=16,384.) 27. The standard error for point‐in‐time estimates of poverty rates vis SPS is ̂ 1 ̂ ∗ 1 Where is the correction factor, ̂ is the estimated poverty rate, N is the population size, and n is the sample size. Estimate of change in poverty rates over time 28. Similar methodology can be used to derive the estimates of bias, precision and the when using (2010) SPS in other years. As above, the (2010) validation sample as well as full sample from another year are used to generate bootstrap samples to obtain mean differences standard errors between surveys samples. 34 29. Note that change does not imply impact. Targeting 30. Targeting accuracy can also be assessed by comparing the true poverty status with inclusion/exclusion of a pro‐poor program with different scores used as the cut‐off level. 35 Appendix 2: SPS Survey Questions and Lookup Tables Figure 14a: SPS for Bangladesh, 2010 36 Figure 14b: SPS Look-up Table for Bangladesh, 2010 37 Figure 15: SPS for Indonesia, 2010 38 Figure 16: SPS for Jordan, 2006 39 Figure 17: SPS for Nepal, 2010 40 Figure 18: SPS for Paraguay, 2011 41 Figure 19: SPS for Peru, 2010 42 Figure 20: SPS for Sierra Leone, 2003 43 Figure 21: SPS for Thailand, 2011 44 Figure 22: SPS for Uganda, 2009 45 Appendix 3: Additional Figures from other Countries 46 47 48 49 Appendix 4: Stratum Specific Poverty Rates 50 51 52 Poverty & Equity Global Practice Working Papers (Since July 2014) The Poverty & Equity Global Practice Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. This series is co‐published with the World Bank Policy Research Working Papers (DECOS). It is part of a larger effort by the World Bank to provide open access to its research and contribute to development policy discussions around the world. For the latest paper, visit our GP’s intranet at http://POVERTY. 1 Estimating poverty in the absence of consumption data: the case of Liberia Dabalen, A. L., Graham, E., Himelein, K., Mungai, R., September 2014 2 Female labor participation in the Arab world: some evidence from panel data in Morocco Barry, A. G., Guennouni, J., Verme, P., September 2014 3 Should income inequality be reduced and who should benefit? redistributive preferences in Europe and Central Asia Cojocaru, A., Diagne, M. F., November 2014 4 Rent imputation for welfare measurement: a review of methodologies and empirical findings Balcazar Salazar, C. F., Ceriani, L., Olivieri, S., Ranzani, M., November 2014 5 Can agricultural households farm their way out of poverty? Oseni, G., McGee, K., Dabalen, A., November 2014 6 Durable goods and poverty measurement Amendola, N., Vecchi, G., November 2014 7 Inequality stagnation in Latin America in the aftermath of the global financial crisis Cord, L., Barriga Cabanillas, O., Lucchetti, L., Rodriguez‐Castelan, C., Sousa, L. D., Valderrama, D. December 2014 8 Born with a silver spoon: inequality in educational achievement across the world Balcazar Salazar, C. F., Narayan, A., Tiwari, S., January 2015 Updated on December 2016 by POV GP KL Team | 1 9 Long‐run effects of democracy on income inequality: evidence from repeated cross‐sections Balcazar Salazar,C. F., January 2015 10 Living on the edge: vulnerability to poverty and public transfers in Mexico Ortiz‐Juarez, E., Rodriguez‐Castelan, C., De La Fuente, A., January 2015 11 Moldova: a story of upward economic mobility Davalos, M. E., Meyer, M., January 2015 12 Broken gears: the value added of higher education on teachers' academic achievement Balcazar Salazar, C. F., Nopo, H., January 2015 13 Can we measure resilience? a proposed method and evidence from countries in the Sahel Alfani, F., Dabalen, A. L., Fisker, P., Molini, V., January 2015 14 Vulnerability to malnutrition in the West African Sahel Alfani, F., Dabalen, A. L., Fisker, P., Molini, V., January 2015 15 Economic mobility in Europe and Central Asia: exploring patterns and uncovering puzzles Cancho, C., Davalos, M. E., Demarchi, G., Meyer, M., Sanchez Paramo, C., January 2015 16 Managing risk with insurance and savings: experimental evidence for male and female farm managers in the Sahel Delavallade, C., Dizon, F., Hill, R., Petraud, J. P., el., January 2015 17 Gone with the storm: rainfall shocks and household well‐being in Guatemala Baez, J. E., Lucchetti, L., Genoni, M. E., Salazar, M., January 2015 18 Handling the weather: insurance, savings, and credit in West Africa De Nicola, F., February 2015 19 The distributional impact of fiscal policy in South Africa Inchauste Comboni, M. G., Lustig, N., Maboshe, M., Purfield, C., Woolard, I., March 2015 20 Interviewer effects in subjective survey questions: evidence from Timor‐Leste Himelein, K., March 2015 21 No condition is permanent: middle class in Nigeria in the last decade Corral Rodas, P. A., Molini, V., Oseni, G. O., March 2015 22 An evaluation of the 2014 subsidy reforms in Morocco and a simulation of further reforms Verme, P., El Massnaoui, K., March 2015 Updated on December 2016 by POV GP KL Team | 2 23 The quest for subsidy reforms in Libya Araar, A., Choueiri, N., Verme, P., March 2015 24 The (non‐) effect of violence on education: evidence from the "war on drugs" in Mexico Márquez‐Padilla, F., Pérez‐Arce, F., Rodriguez Castelan, C., April 2015 25 “Missing girls” in the south Caucasus countries: trends, possible causes, and policy options Das Gupta, M., April 2015 26 Measuring inequality from top to bottom Diaz Bazan, T. V., April 2015 27 Are we confusing poverty with preferences? Van Den Boom, B., Halsema, A., Molini, V., April 2015 28 Socioeconomic impact of the crisis in north Mali on displaced people (Available in French) Etang Ndip, A., Hoogeveen, J. G., Lendorfer, J., June 2015 29 Data deprivation: another deprivation to end Serajuddin, U., Uematsu, H., Wieser, C., Yoshida, N., Dabalen, A., April 2015 30 The local socioeconomic effects of gold mining: evidence from Ghana Chuhan-Pole, P., Dabalen, A., Kotsadam, A., Sanoh, A., Tolonen, A.K., April 2015 31 Inequality of outcomes and inequality of opportunity in Tanzania Belghith, N. B. H., Zeufack, A. G., May 2015 32 How unfair is the inequality of wage earnings in Russia? estimates from panel data Tiwari, S., Lara Ibarra, G., Narayan, A., June 2015 33 Fertility transition in Turkey—who is most at risk of deciding against child arrival? Greulich, A., Dasre, A., Inan, C., June 2015 34 The socioeconomic impacts of energy reform in Tunisia: a simulation approach Cuesta Leiva, J. A., El Lahga, A., Lara Ibarra, G., June 2015 35 Energy subsidies reform in Jordan: welfare implications of different scenarios Atamanov, A., Jellema, J. R., Serajuddin, U., June 2015 36 How costly are labor gender gaps? estimates for the Balkans and Turkey Cuberes, D., Teignier, M., June 2015 37 Subjective well‐being across the lifespan in Europe and Central Asia Bauer, J. M., Munoz Boudet, A. M., Levin, V., Nie, P., Sousa‐Poza, A., July 2015 Updated on December 2016 by POV GP KL Team | 3 38 Lower bounds on inequality of opportunity and measurement error Balcazar Salazar, C. F., July 2015 39 A decade of declining earnings inequality in the Russian Federation Posadas, J., Calvo, P. A., Lopez‐Calva, L.‐F., August 2015 40 Gender gap in pay in the Russian Federation: twenty years later, still a concern Atencio, A., Posadas, J., August 2015 41 Job opportunities along the rural‐urban gradation and female labor force participation in India Chatterjee, U., Rama, M. G., Murgai, R., September 2015 42 Multidimensional poverty in Ethiopia: changes in overlapping deprivations Yigezu, B., Ambel, A. A., Mehta, P. A., September 2015 43 Are public libraries improving quality of education? when the provision of public goods is not enough Rodriguez Lesmes, P. A., Valderrama Gonzalez, D., Trujillo, J. D., September 2015 44 Understanding poverty reduction in Sri Lanka: evidence from 2002 to 2012/13 Inchauste Comboni, M. G., Ceriani, L., Olivieri, S. D., October 2015 45 A global count of the extreme poor in 2012: data issues, methodology and initial results Ferreira, F.H.G., Chen, S., Dabalen, A. L., Dikhanov, Y. M., Hamadeh, N., Jolliffe, D. M., Narayan, A., Prydz, E. B., Revenga, A. L., Sangraula, P., Serajuddin, U., Yoshida, N., October 2015 46 Exploring the sources of downward bias in measuring inequality of opportunity Lara Ibarra, G., Martinez Cruz, A. L., October 2015 47 Women’s police stations and domestic violence: evidence from Brazil Perova, E., Reynolds, S., November 2015 48 From demographic dividend to demographic burden? regional trends of population aging in Russia Matytsin, M., Moorty, L. M., Richter, K., November 2015 49 Hub‐periphery development pattern and inclusive growth: case study of Guangdong province Luo, X., Zhu, N., December 2015 50 Unpacking the MPI: a decomposition approach of changes in multidimensional poverty headcounts Rodriguez Castelan, C., Trujillo, J. D., Pérez Pérez, J. E., Valderrama, D., December 2015 51 The poverty effects of market concentration Rodriguez Castelan, C., December 2015 52 Can a small social pension promote labor force participation? evidence from the Colombia Mayor program Pfutze, T., Rodriguez Castelan, C., December 2015 Updated on December 2016 by POV GP KL Team | 4 53 Why so gloomy? perceptions of economic mobility in Europe and Central Asia Davalos, M. E., Cancho, C. A., Sanchez, C., December 2015 54 Tenure security premium in informal housing markets: a spatial hedonic analysis Nakamura, S., December 2015 55 Earnings premiums and penalties for self‐employment and informal employees around the world Newhouse, D. L., Mossaad, N., Gindling, T. H., January 2016 56 How equitable is access to finance in turkey? evidence from the latest global FINDEX Yang, J., Azevedo, J. P. W. D., Inan, O. K., January 2016 57 What are the impacts of Syrian refugees on host community welfare in Turkey? a subnational poverty analysis Yang, J., Azevedo, J. P. W. D., Inan, O. K., January 2016 58 Declining wages for college‐educated workers in Mexico: are younger or older cohorts hurt the most? Lustig, N., Campos‐Vazquez, R. M., Lopez‐Calva, L.‐F., January 2016 59 Sifting through the Data: labor markets in Haiti through a turbulent decade (2001‐2012) Rodella, A.‐S., Scot, T., February 2016 60 Drought and retribution: evidence from a large‐scale rainfall‐indexed insurance program in Mexico Fuchs Tarlovsky, Alan., Wolff, H., February 2016 61 Prices and welfare Verme, P., Araar, A., February 2016 62 Losing the gains of the past: the welfare and distributional impacts of the twin crises in Iraq 2014 Olivieri, S. D., Krishnan, N., February 2016 63 Growth, urbanization, and poverty reduction in India Ravallion, M., Murgai, R., Datt, G., February 2016 64 Why did poverty decline in India? a nonparametric decomposition exercise Murgai, R., Balcazar Salazar, C. F., Narayan, A., Desai, S., March 2016 65 Robustness of shared prosperity estimates: how different methodological choices matter Uematsu, H., Atamanov, A., Dewina, R., Nguyen, M. C., Azevedo, J. P. W. D., Wieser, C., Yoshida, N., March 2016 66 Is random forest a superior methodology for predicting poverty? an empirical assessment Stender, N., Pave Sohnesen, T., March 2016 67 When do gender wage differences emerge? a study of Azerbaijan's labor market Tiongson, E. H. R., Pastore, F., Sattar, S., March 2016 Updated on December 2016 by POV GP KL Team | 5 68 Second‐stage sampling for conflict areas: methods and implications Eckman, S., Murray, S., Himelein, K., Bauer, J., March 2016 69 Measuring poverty in Latin America and the Caribbean: methodological considerations when estimating an empirical regional poverty line Gasparini, L. C., April 2016 70 Looking back on two decades of poverty and well‐being in India Murgai, R., Narayan, A., April 2016 71 Is living in African cities expensive? Yamanaka, M., Dikhanov, Y. M., Rissanen, M. O., Harati, R., Nakamura, S., Lall, S. V., Hamadeh, N., Vigil Oliver, W., April 2016 72 Ageing and family solidarity in Europe: patterns and driving factors of intergenerational support Albertini, M., Sinha, N., May 2016 73 Crime and persistent punishment: a long‐run perspective on the links between violence and chronic poverty in Mexico Rodriguez Castelan, C., Martinez‐Cruz, A. L., Lucchetti, L. R., Valderrama Gonzalez, D., Castaneda Aguilar, R. A., Garriga, S., June 2016 74 Should I stay or should I go? internal migration and household welfare in Ghana Molini, V., Pavelesku, D., Ranzani, M., July 2016 75 Subsidy reforms in the Middle East and North Africa Region: a review Verme, P., July 2016 76 A comparative analysis of subsidy reforms in the Middle East and North Africa Region Verme, P., Araar, A., July 2016 77 All that glitters is not gold: polarization amid poverty reduction in Ghana Clementi, F., Molini, V., Schettino, F., July 2016 78 Vulnerability to Poverty in rural Malawi Mccarthy, N., Brubaker, J., De La Fuente, A., July 2016 79 The distributional impact of taxes and transfers in Poland Goraus Tanska, K. M., Inchauste Comboni, M. G., August 2016 80 Estimating poverty rates in target populations: an assessment of the simple poverty scorecard and alternative approaches Vinha, K., Rebolledo Dellepiane, M. A., Skoufias, E., Diamond, A., Gill, M., Xu, Y., August 2016 Updated on December 2016 by POV GP KL Team | 6 81 Synergies in child nutrition: interactions of food security, health and environment, and child care Skoufias, E., August 2016 82 Understanding the dynamics of labor income inequality in Latin America Rodriguez Castelan, C., Lustig, N., Valderrama, D., Lopez‐Calva, L.‐F., August 2016 83 Mobility and pathways to the middle class in Nepal Tiwari, S., Balcazar Salazar, C. F., Shidiq, A. R., September 2016 84 Constructing robust poverty trends in the Islamic Republic of Iran: 2008‐14 Salehi Isfahani, D., Atamanov, A., Mostafavi, M.‐H., Vishwanath, T., September 2016 85 Who are the poor in the developing world? Newhouse, D. L., Uematsu, H., Doan, D. T. T., Nguyen, M. C., Azevedo, J. P. W. D., Castaneda Aguilar, R. A., October 2016 86 New estimates of extreme poverty for children Newhouse, D. L., Suarez Becerra, P., Evans, M. C., October 2016 87 Shedding light: understanding energy efficiency and electricity reliability Carranza, E., Meeks, R., November 2016 88 Heterogeneous returns to income diversification: evidence from Nigeria Siwatu, G. O., Corral Rodas, P. A., Bertoni, E., Molini, V., November 2016 89 How liberal is Nepal's liberal grade promotion policy? Sharma, D., November 2016 90 CPI bias and its implications for poverty reduction in Africa Dabalen, A. L., Gaddis, I., Nguyen, N. T. V., December 2016 91 Pro-growth equity: a policy framework for the twin goals Lopez-Calva, L. F., Rodriguez Castelan, C., November 2016 92 Building an ex ante simulation model for estimating the capacity impact, benefit incidence, and cost effectiveness of child care subsidies: an application using provider‐level data from Turkey Aran, M. A., Munoz Boudet, A., Aktakke, N., December 2016 93 Vulnerability to drought and food price shocks: evidence from Ethiopia Porter, C., Hill, R., December 2016 94 Job quality and poverty in Latin America Rodriguez Castelan, C., Mann, C. R., Brummund, P., December 2016 Updated on December 2016 by POV GP KL Team | 7 For the latest and sortable directory, available on the Poverty & Equity GP intranet site. http://POVERTY WWW.WORLDBANK.ORG/POVERTY Updated on December 2016 by POV GP KL Team | 8