Policy Research Working Paper 9256 Pull Your Small Area Estimates up by the Bootstraps Paul Corral Isabel Molina Minh Nguyen Poverty and Equity Global Practice May 2020 Policy Research Working Paper 9256 Abstract After almost two decades of poverty maps produced by the point estimates. The main contributions of this paper are World Bank and multiple advances in the literature, this then two: 1) to adapt the original Monte Carlo simulation paper presents a methodological update to the World Bank’s procedure of Molina and Rao (2010) for the approximation toolkit for small area estimation. The paper reviews the com- of the extended EB estimators that include heteroscedas- putational procedures of the current methods used by the ticity and survey weights as in Van der Weide (2014); and World Bank: the traditional approach by Elbers, Lanjouw 2) to adapt the parametric bootstrap approach for mean and Lanjouw (2003) and the Empirical Best/Bayes (EB) squared error (MSE) estimation considered by Molina and addition introduced by Van der Weide (2014). The addition Rao (2010), and proposed originally by González-Manteiga extends the EB procedure of Molina and Rao (2010) by et al. (2008), to these extended EB estimators. Simula- considering heteroscedasticity and includes survey weights, tion experiments illustrate that the revised Monte Carlo but uses a different bootstrap approach, here referred to simulation method yields estimators that are considerably as clustered bootstrap. Simulation experiments comparing less biased and more efficient in terms of MSE than those these methods to the original EB approach of Molina and obtained from the clustered bootstrap approach, and that Rao (2010) provide empirical evidence of the shortcomings the parametric bootstrap MSE estimators are in line with of the clustered bootstrap approach, which yields biased the true MSEs under realistic scenarios. This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at pcorralrodas@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Pull Your Small Area Estimates up by the Bootstraps Paul Corral,∗Isabel Molina†and Minh Nguyen‡§ Key words: Small area estimation, ELL, Poverty mapping, Poverty map, Empirical best, Parametric bootstrap JEL classification: C55, C87, C15 ∗ The World Bank Group - Human Development Group Chief Economist’s Office † Universidad Carlos III de Madrid - Department of Statistics ‡ The World Bank Group - Poverty and Inequality Global Practice § The authors acknowledge financial support from the World Bank. We thank Samuel Freije-Rodriguez, Roy van der Weide, Alexandru Cojocaru, and David Newhouse for comments on an earlier draft. We also thank Kristen Himelein, and Carlos Rodriguez for comments, suggestions and overall guidance. Additionally we thank Carolina Sanchez for providing support and space to work on this. Finally, we thank the Global Solutions Group on Welfare Measurement and Statistical Capacity, as well as all attendants of the Summer University courses on Small Area Estimation. Any error or omission is the authors’ responsiblility alone. 1 Introduction It has been almost two decades since the publication of Elbers, Lanjouw and Lanjouw’s (2003 - ELL henceforth) seminal paper on small area estimation (SAE). The methodology proposed by these authors has been the de facto methodology used by the World Bank to obtain small area estimates of poverty and inequality indicators and perhaps constitutes the most applied SAE method across the globe. The World Bank, in an effort to make the implementation of the method as simple as possible, created a free software package with a simple point and click interface that could be easily used by any practitioner. The PovMap software (Zhao, 2006) is of great value to World Bank staff and to many statistical agencies, and its ease of use has allowed the ELL approach to be adopted worldwide. It has also permitted the World Bank to provide training and spread this knowledge to many statistical agencies. After the work from Van der Weide (2014), several important modifications and additions were made to the PovMap software, and thus to the toolkit used by the World Bank. The first change was to add a new method for estimating the variances of the model errors, called Henderson’s Method III (H3 - Henderson, 1953), which does not require normality because it is based on the method of moments. The second improvement involved adjusting the original Generalized Least Squares estimators of the re- gression parameters considered in Elbers, Lanjouw and Lanjouw (2003) by including the survey weights following Huang and Hidiroglou (2003). Another important modification involved the addition of Empir- ical Best/Bayes (EB) prediction assuming normality. The modifications came about due to advances in the SAE literature (Molina and Rao, 2010 - MR) and criticism of the ELL methodology (Haslett et al., 2010 and MR, 2010). The final important change was to consider a different bootstrap approach for obtaining the EB estimators and their corresponding error measures.1 To make the application of the World Bank’s method friendlier to more advanced practitioners and to those who seek more flexibility, a World Bank team produced a Stata version of PovMap (Nguyen et al., 2018) which is available under the name “sae”. The development of the tool has made running multiple simulations and tests of the software much more straightforward. Since the creation of the “sae” Stata suite of commands, the World Bank has shifted towards training statistical agencies using this approach, which can facilitate replicability. Additionally, the code in the Stata package is open, so that curious practitioners can see in detail how methods have been implemented. Small area estimation is a broad branch of statistics that focuses on improving estimates’ precision when household surveys are not large enough to achieve a desired level of precision. Within small area estimation methods, model-based techniques “borrow strength” from a larger data set or auxiliary data across areas through models linking the areas (regression-type techniques), which yield indirect estimators (MR, 2010). Most of the model-based techniques fall into two groups, methods based on unit-level models and those based on area-level ones. The former are commonly used when data on units (e.g. households) are available; when only area-level data are available (e.g. area means), the latter are used, see Fay and Herriot (1979). In the context of poverty, methods based on unit-level models typically rely on estimating the distribution of the household’s welfare given a set of auxiliary variables or correlates. The model parameters are then used to simulate multiple welfare vectors from the fitted distribution for every single household in the census which commonly lacks a welfare measure for poverty measurement (Elbers et al., 2007). Using the simulated census vectors, it is possible to obtain poverty rates or any other welfare indicator, for every area (including the non-sampled ones). Perhaps the two most popular approaches for unit-level small area estimation of poverty indicators are the traditional ELL (2003) method used by the World 1 The bootstrap is not discussed in Van der Weide’s paper. 1 Bank, and the EB approach by MR (2010).2 This paper focuses on unit-level models for small area estimation; in particular, on the traditional ELL (2003) approach, the latest additions to the World Bank toolkit by Van der Weide (2014), and the original EB approach introduced by MR (2010). It sheds light on the nuances of the traditional approach by ELL (2003) and the EB addition by Van der Weide (2014), and it updates these methods in line with the original EB approach by MR (2010). The paper goes in depth into the differences between the estimation of the noise in the ELL approach (as implemented in PovMap and the sae Stata command), and the estimation of the MSE according to MR (2010). Then, the paper proposes: 1) an adaptation of the Monte Carlo simulation procedure by MR (2010) for calculation of the extended EB estimators that incorporate heteroscedasticity and survey weights as proposed by Van der Weide (2014); and 2) an adaptation of the parametric bootstrap method considered by MR (2010), which comes from González-Manteiga et al. (2008), for estimation of the corresponding MSE of the extended EB estimators. According to our simulation results, the adapted procedures for calculation of extended EB point estimates and corresponding MSEs represent a substantial improvement over the current approaches. The paper reviews the current methods in chronological order. It begins describing the traditional ELL method in Section 2, then moves on to the EB method proposed by MR (2010) in Section 3 and finishes with the additions by Van der Weide (2014) in Section 4. It discusses these methods in Section 5. Then, in Section 6 it describes the proposed computational approaches for calculation of the extended EB estimators and for estimation of their MSEs. In Section 7, several simulation exercises, imitating those performed by MR (2010) but extending them to more realistic scenarios, are conducted to compare the different methods. Finally, conclusions are presented in Section 8. 2 Traditional ELL approach The Elbers, Lanjouw and Lanjouw (2002, 2003) methodology is the de facto small area estimation ap- proach utilized by the World Bank. The methodology has been widely applied across the globe to produce poverty maps conducted by the institution.3 The popularity of the methodology can be attributed, to some degree, to the availability of the PovMap software (Zhao, 2006),4 which was programmed in C and offers users a simple point and click interface. The PovMap software is also incredibly computationally efficient and fast, allowing users, even with limited computing power, to work with census data without facing memory limitations. In 2018, a Stata version of the PovMap software was released (Nguyen et al., 2018). The Stata command “sae” replicates most of the procedures and methods of the original PovMap software, and has become popular because it allows for expansion of the methods available to users.5 The ELL method assumes that the welfare ych for each household h within each location c in the population is linearly related to a 1 × K vector of characteristics (or correlates) xch for that household, according to the nested error model: ln(ych ) = xch β + ηc + ech , h = 1, . . . , Nc , c = 1, . . . , C, (1) where ηc and ech are respectively location and household-specific idiosyncratic errors, assumed to be 2 A factor that likely contributes to these being well known is the availability of software, that can be easily used to apply these approaches; PovMap (Zhao, 2006) for ELL and R’s sae package (Molina and Marhuenda, 2015) 3 Poverty mapping is the common name within the World Bank for SAE methodology, where the obtained estimates are mapped for illustrative purposes. 4 Downloadable from: http://iresearch.worldbank.org/PovMap/PovMap2/setup.zip 5 Users should be aware that results from Stata and PovMap differ slightly due to the use of different random number generators 2 independent from each other, following iid 2 iid2 ηc ∼ N 0, ση , ech ∼ N 0, σe . Here, C is the number of locations in which the population is divided and Nc is the number of households in location c, for c = 1, . . . , C . Finally, β is the K × 1 vector of coefficients. Under the original ELL methodology, the locations indexed with c are supposed to be the clusters, or primary sampling units (PSUs), of the sampling design and do not necessarily correspond to the level at which the estimates need to be produced. In fact, clusters are typically nested within the areas of interest (e.g. census tracks). Presenting estimates at a higher aggregation level than the clusters (for which random effects are included in the model) may not be appropriate in cases of considerable between- area variability, and may underestimate the estimator’s standard errors (Das and Chambers, 2017). A way to alleviate this is to include covariates that explain sufficiently well the between-area heterogeneity in the model (ibid ). In this regard, ELL (2002) suggests the inclusion of area-level covariates as a way to improve precision, because they can explain the between-area variation in welfare and hence might reduce the contribution of area-specific residuals. Marhuenda et al. (2017) recommend putting the location effects at the same aggregation level where estimation is desired. According to this, in this paper we consider that the clusters c are equal to the areas where estimation is desired. A key feature of the implemented ELL approach is that it considers heteroscedasticity and fits a prelimi- nary model for that, called the alpha model. Very little research has gone into this aspect of the method. In most applications, the alpha model usually has a small adjusted R2 (which may not reach 0.05) yet tends to play a considerable role in the obtained point estimates, especially for parameters beyond the poverty rate such as the Gini index, Theil index, Poverty Gap, etc. Heteroscedasticity is included in the description of the EB approach given in Rao and Molina (2015), but this is not implemented in the sae R package (Molina and Marhuenda, 2015). For simplicity, below we spell the steps of the traditional ELL approach only in the case of homoscedasticity. The nested error model in (1) was originally fitted by Generalized Least Squares (GLS) accounting for heteroscedasticity, but Van der Weide (2014) updated this original GLS method to properly account for survey weights.6 Currently, the implemented method fits the model by Feasible Generalized Least Squares (FGLS). In this procedure, a simple linear regression obtained by considering as model errors uch = ηh + ech is fit using ordinary least squares (OLS), and afterwards the appropriate covariance matrix of regression parameter estimators is estimated.7 The first stage of the process being an OLS is an important aspect, because most of the model testing and validation done for the ELL approach is in practice undertaken with the OLS fit. Nevertheless, the parameter estimates used for generating welfare in the census (see the steps below) come from the GLS fit accounting for heteroscedasticity and including the survey weights. The variances of the model parameter estimators are required in the considered ELL bootstrap approach. To estimate these variances, ELL (2002) proposed to use the delta method. The actual implementation of the delta method is through a computationally intensive simulated numerical gradient. An estimate of the gradient vector is obtained by making perturbations of the parameters, and this is used to estimate the variances of the estimators via the delta method. ELL (2002) mentions also the possibility of drawing from the sampling distribution (parametric) as a way to incorporate the estimation error into the total prediction error. This later method has become the de facto approach used under the ELL methodology.8 6 Haslett et al. (2010) presents the problems in the original GLS implementation of ELL. 7 For a detailed look into the ELL approach, interested readers should refer to the original ELL papers (ELL, 2003; 2002) and Section 3 of Nguyen et al. (2018), which presents the current GLS estimator from Van der Weide (2014) 8 In a comparison of the simulation methods proposed by ELL (2002), Demombynes et al. (2008) shows that the delta 3 The first implementation of the World Bank poverty mapping software was actually done in SAS (De- mombynes, 2002). Poverty indicators and their corresponding standard errors were obtained by a boot- strap procedure relying on simulation. This simulation approach is very reminiscent of multiple imputa- tion methods, where every single relevant parameter necessary for simulating vectors of welfare is drawn from its corresponding estimated asymptotic distribution (or an approximation to it). This approach is the one currently used in the World Bank’s tools for implementing the ELL (2002) approach: Stata’s sae package and PovMap. Specifically, the steps of the implemented bootstrap procedure designed to obtain the traditional ELL point estimates and their estimated noise are: 1. Fit the nested error model (1) to the survey data. This yields the vector of initial parameter estimates ˆ0 , σ ˆ0 = (β θ 2 2 ˆη 0, σ ˆe 0 ), where the 0 subscript is used hereafter to indicate that the estimates come from the original household survey. In the implemented version, the model is fit via FGLS as specified in Nguyen et al. (2018). 2. Draw new model parameters as follows. First, regression coefficients are drawn from ˆ0 ) , ˆ0 , vcov(β β∗ ∼ M V N β The variance of the household-level errors is drawn, according to Gelman et al. (2004, pp. 364-365), from 2∗ 2 (n − K ) σe ∼σ ˆe 0 ∗ , χ2n−K ∗ where χ2 n−K denotes a random number from a chi-squared distribution with n − K degrees of freedom. Here, n is the number of observations in the survey data used to fit the model and K 2∗ is the number of correlates used in the model. The variance of the cluster effects ση is drawn, according to Demombynes (2008) and Demombynes et al. (2002), from 2∗ 2 2 ση ∼ Gamma σ ˆη ση 0 , var(ˆ 0) . 3. Using the simulated model parameters in step 2, calculate the welfare for every household in the ∗ census ych from the model as ∗ ln(ych ∗ ) = xch β ∗ + ηc + e∗ ch , where the household-specific errors are generated as iid e∗ 2∗ ch ∼ N 0, σe , and the location effects are generated as ∗ iid 2∗ ηc ∼ N 0, ση . Then construct the vector containing the Nc simulated welfares for the households in location c, ∗ ∗ ∗ T denoted yc = (yc1 , . . . , ycNc ) , where Nc is the number of census households in location c. ∗ 4. With the vectors yc , c = 1, . . . , C , of simulated census welfares, indicators can be produced for all ∗ ∗ the locations. Let τc = f (yc ) be the indicator of interest calculated based on the simulated vector method from ELL and the parametric drawing of the parameters provide similar results. In tests with pseudo surveys, the delta method seems to provide wider standard errors than the parametric approach (Demombynes et al. 2008), suggesting that perhaps the parametric estimates are too optimistic when compared to the delta method 4 for location c; for example, for the Foster, Greer and Thorbecke (1984 - FGT) class of decomposable ∗ poverty measures, the function f (yc ) is defined as Nc α ∗ pch ∗ y∗ fα (yc ) = I (ych < z ) 1 − ch , α ≥ 0, pc z h=1 where z is the poverty line, pch is the size of household h from location c in the census and ∗ ∗ I (ych < z ) = 1 if ych < z and is equal to 0 otherwise. 5. Repeat steps 2 to 4 M times. Although traditionally M = 100, a larger number of replicates M ∗(m) is recommended. Let τc be the indicator of interest obtained in mth replicate of the bootstrap. The ELL estimator is then given by the average across the M bootstrap replicates, M ELL 1 ∗(m) τ ˆc = τc M m=1 and the ELL estimated variance of the ELL estimator is given by M 2 1 varELL (ˆELL τc ) = τ ∗(m) − τ ˆcELL . M − 1 m=1 c As opposed to the EB method of MR (2010), which uses a Monte Carlo simulation procedure and a sep- arate bootstrap procedure for MSE (see Section 3), in the above ELL procedure, a single computational algorithm tries to capture the noise of the initial model parameter estimates in the ELL standard error by varying the model parameters across simulations, which is aligned to Rubin’s rules (Rubin, 2004). 2∗ This means that, in each replicate of the bootstrap, different values of the model parameters β ∗ , ση and 2∗ σe are used to generate the welfare data. However, this step might entail an increase of noise in the final ELL estimators. Note that, if the model parameters were kept fixed to the initial survey estimates from step 1 (skipping ELL step 2), the above ELL estimate τ ˆc would be a basic Monte Carlo approximation to the marginal ˆ0 ), with respect to the distribution of the welfare (given the correlates) mean of the indicator, E (τc ; θ induced by the nested error model (1), with θ replaced by the initial estimate θ ˆ0 . Note that E (τc ; θ), 2 2 evaluated at the true θ = (β, ση , σe ), is unbiased by definition since it equals the expectation of the indicator under the true model.9 According to this, the prediction error of the ELL estimator can be decomposed as described in ELL (2003), τ ˆcELL − τc = τ ˆcELL − E (τc ; θ ˆ0 ) − E (τc ; θ) + E (τc ; θ) − τc , ˆ0 ) + E (τc ; θ computation error model error idiosyncratic error where these sources of error are described as follows: 1. The idiosyncratic error is related to how the actual value of the expenditure for a given location, c, deviates from its expected value due to unobserved aspects in the expenditure for the location (ELL, 2002). For locations with smaller population sizes, the underlying distribution is not likely approximated when errors are drawn. This, however, is related to the explanatory power of the independent variables in the model, the smaller the unexplained portion of the model, the smaller the idiosyncratic error. With poorly fitting models, estimates for locations with smaller populations are likely to suffer from more variability across simulations. Therefore, in order to minimize this error, one of the goals of the modeling stage is to obtain the highest possible R2 . 9 Note that under a model-based approach, the true value of the indicator τ is random. Under this setup, an estima- c tor/predictor τ τc − τc ) = 0 or, accordingly, when E (ˆ ˆc of τc is said to be unbiased when E (ˆ τc ) = E ( τc ) . 5 2. The model error is related to the properties of the model parameters and is unrelated to the size of the target population (ELL, 2002). The magnitude of the error is dependent on the precision of the β coefficients of the welfare model and the sensitivity of the indicator to deviations in welfare (ibid). Consequently, in order to minimize this source of error in the final estimates, it is recommended to remove all non-significant independent variables in the modeling stage. 3. The final source of error is the computation error, which is not related to the other two sources of error. This is related to the simulation and can be made as small as possible, as computational resources allow, by running a larger number of simulations. Consider the case that model parameter estimates were kept fixed across simulations and equal to the ˆ0 , σ ˆ0 = (β initial survey estimates from step 1, θ ˆ 2 ). In that case, consistency of these estimators as the ˆ2 , σ η0 e0 total sample size n tends to infinity,10 and of Monte Carlo averages to the actual expectations under the ELL model as the number of bootstrap replicates M tends to infinity, would ensure that τ ˆc approximates correctly the “theoretical” ELL estimator, E (τc ; θ), which is unbiased by definition. However, there is an additional source of error due to generation of welfare in each simulation from a different set of parameter values in step 3, instead of using the initial estimates from step 1. −1 Nd Moreover, if our target indicator is the mean welfare in location c, τc = Nc h=1 ych , and the welfare −1 Nd xc β + ηc + e is not transformed with log, then E (τc ; θ) = E (¯ ¯c β , where x ¯c ) = x ¯c = Nc h=1 xch is the −1 Nd area mean of the correlates and e ¯c = Nc h=1 ech is the area mean of the household-level errors. This means that the area effect ηc vanishes in the final ELL estimator, which becomes basically the regression ¯c β . This estimator is called “synthetic” in the SAE literature because it does not account estimator x for between-area heterogeneity (or idiosyncrasy of the areas) beyond that explained by the correlates. ¯c β would not be using at all Note that, if the regression parameter β was actually known (best case), x the actual survey data, only the census auxiliary information. Synthetic estimators rely very strongly on the assumed regression model and might be misleading if the model is incorrectly specified. In fact, standard errors are obtained assuming that the model actually holds and do not account for model misspecification. Hence, under model failure, they might still be small, giving a lot of credibility to the (misleading) estimates. As will be shown, even if the model is correctly specified, the MSE of the ELL estimators can be substantial, especially if the prediction power of the model is weak. 3 Molina and Rao’s EB and Census EB estimators Molina and Rao (2010) consider the nested error model (1) as in the ELL procedure, but the location effects are originally defined for the areas of interest. A more recent paper by Guadarrama, Molina and Rao (2018) further extends the EB method of MR (2010) to complex sampling designs, but so far it has not been implemented in any software package as a library or command, unlike the traditional ELL (2002) with the subsequent updates by Van der Weide (2014). For simplicity, this section describes the original EB approach proposed by MR (2010). The main difference with the ELL approach is that the EB method of MR (2010) conditions on the survey sample data and thus makes a more efficient use of the survey data, which contains the only available (and hence precious) information on actual welfare. Nevertheless, conditioning on the survey data requires areas and households across survey and census to be matched. Thus, if Pc denotes the area’s population of size Nc , then the survey sample, sc , is a sample of size nc ≤ Nc drawn from Pc . The complement to the sampled population is referred to as rc , which is just Pc − sc . Thus, the census 10 Note that the total survey sample size n is typically large. 6 T T T welfare vector for any area c is defined as yc = (yc,s , yc,r ) , which contains the welfare for sampled and non-sampled observations in area c, denoted yc,s and yc,r respectively. Typically, the non-sampled population (of size Nc − nc ) is much larger than the sampled population (of size nc ). It may also happen that nc = 0 for some areas, meaning that the area is not sampled in the survey. The empirical best (EB) predictor of τc for an area c that is sampled (nc > 0) is defined as the conditional expectation ˆ). For an out-of-sample area (nc = 0), the empirical best (EB) predictor of τc is E (τc ; θ E (τc |yc,s ; θ ˆ), which is similar to the ELL estimator. Here, θ ˆ is a consistent estimator of θ as the overall sample size n tends to infinity. In contrast with this procedure, the traditional ELL approach does not require to link the census and survey observations. ˆ) has no explicit form, the empirical best (EB) predictor of τc can When the expectation E (τc |yc,s ; θ be approximated using a Monte Carlo (MC) simulation method. The MC approximation to the EB predictor τ ˆcEB ˆ), as described in Rao and Molina (2015), is obtained as follows: = E (τc |yc,s ; θ 1. Using the survey data, fit model (1) via any method providing consistent estimators. This yields the vector of parameter estimates: ˆ0 , σ ˆ0 = β θ 2 2 ˆη 0, σ ˆe 0 . Usual fitting methods under this approach are maximum likelihood (ML) or restricted maximum likelihood (REML), both based on the normal likelihood, and H3 method, which does not require to specify a distribution. In the implemented sae R package, REML is used. 2. Use the parameter estimates obtained in step 1 as true values to simulate a vector of welfare in the census. First, the welfare from each out-of-sample household is generated as follows, ∗ ln(ych,r ˆ0 + η ∗ + e∗ , ) = xch,r β c ch,r where here we add the subscript r to indicate that the household is not sampled. Here, for an area ∗ c that is included in the sample, the area effect ηc is generated as ∗ 2 ηc ∼N η ˆc0 , σ ˆη 0 (1 − γ ˆc ) , ˆc0 and γ where η ˆc are respectively given by 2 σ ˆη 0 η ˆc0 = γ ¯c,s − x ˆc y ˆ0 , ¯c,s β γ ˆc = σ ˆη2 +σ ˆe 2 /n . 0 0 c ¯c,s and x Here, y ¯c,s are respectively the sample means of the welfare variable and the correlates ∗ 2 in area c. If area c is not sampled, the area effect is generated as ηc ∼ N 0, σ ˆη 0 . Finally, the household error e∗ ch,r is generated as: e∗ ch,r ∼ N 0, σ ˆe2 0 . ∗ For an area c that is sampled in the survey, the generated non-sample vector yc,r is then augmented ∗ T ∗T T by the survey data yc,s . Therefore, the final vector for the whole census in area c is yc = (yc,s , yc,r ) , which is made up of the generated out-of-sample welfares and the survey ones. ∗ 3. The previous step yields a census vector of welfare yc , simulated using the fitted model parameters. This census vector is then used to calculate the indicator for each area c = 1, . . . , C , as ∗ ∗ τc = f (yc ), where f (·) can be any indicator function such as the FGT poverty indicators. 7 ∗(m) 4. Repeat steps 1 to 3 a large number of times M . If yc denotes the mth replicate of the census ∗ T ∗T T ∗(m) ∗(m) vector yc = (yc,s , yc,r ) , then τc = f (yc ) is the corresponding indicator, m = 1, . . . , M . The final MC approximation to the EB estimator of τc is just the average across the M simulations of these indicators M EB 1 ∗(m) τ ˆc = τc . M m=1 Molina (2019) comments that the effect of adding the survey data is negligible when the sample is small relative to the census population of the area. Thus, a slight variation of the EB estimator, called Census EB (see e.g. Molina 2019), avoids appending the survey data in step 2 by generating all the census welfares (including the sample ones), similarly as the non-sampled ones are generated. Thus, the welfare for each household h in the census is generated from the model ∗ ln(ych ˆ0 + η ∗ + e∗ , ) = xch β c ch ∗ with ηc and e∗ ch obtained exactly the same as in step 2. The result is the full census vector for area c given ∗ ∗ ∗ T ∗ ∗ by yc = (yc1 , . . . , yc,Nc ) , which is used to compute the target indicator τc = f (yc ). Unfortunately, the Census EB estimator is not implemented in the sae R package. One possible approach for applying this software when survey units cannot be identified in the census would be to append the survey data to the available census data, which would lead to a different approximation of the EB estimator. A drawback of this approach is that, in this case, the total size of the area is then Nc + nc ≥ Nc . EB The EB estimator τ ˆc approximates the so called “best” predictor of τc , which is the optimal predictor in the sense of minimizing the MSE under the model. This MSE of an estimator τ ˆc is defined as τc − τc )2 . τc ) = E (ˆ the expectation of the squared prediction error under the considered model, MSE(ˆ B The best predictor is given by the conditional expectation τ ˆc = E (τc |yc,s ; θ), where the expectation is with respect to the conditional distribution of the non-sampled data yc,r given the sample data yc,s , determined by the nested error model. Under normality of the random terms in the model, the conditional distribution is also normal. In step 2 of the above procedure, welfare values for non-sampled households are generated from that conditional distribution, but with estimated model parameters. Hence, the above MC simulation procedure actually approximates the EB estimator τ ˆ), which is the best ˆEB = E (τc |yc,s ; θ c predictor with model parameters replaced by the corresponding estimates based on the original sample survey data θ ˆ0 , σ ˆ0 = (β ˆ 2 ). Note that, unlike the ELL bootstrap procedure, here these estimates ˆ2 , σ η0 e0 are kept constant across simulations because there is a unique DGP. Hence, this simulation procedure ˆ) of τc , where the expectation ˆEB = E (τc |yc,s ; θ delivers a simple MC approximation for the EB predictor τ c is replaced by the corresponding MC average. −1 Nd If the target parameter is just the area mean τc = Nc h=1 ych , and there is no log-transformation of welfare, then the best predictor reduces to E (τc |yc,s ; θ) = x yc,s − x ¯c β + γc (¯ ¯c,s β ), which corrects the ¯c β by accounting for the area effect using the survey data. Moreover, the regression synthetic estimator x 2 size of the correction depends on γc , which measures the share of between-area heterogeneity ση from the 2 2 total variance in the area, ση + σe /nc . Thus, the correction is stronger in the case of highly heterogeneous areas and it is weak otherwise, becoming null only when all the existing area heterogeneity is completely explained by the available auxiliary variables. In that case, the EB estimator reduces to the “theoretical” ¯c β . Hence, the EB procedure makes more efficient use of the survey data for ELL estimator E (τc ; θ) = x areas that are sampled and, for those areas, it does not rely so strongly on the assumed linear regression (which could be misspecified). Concerning error measures of the estimators, the ELL and EB approaches consider different error mea- sures. In the EB approach of MR (2010), the noise is measured by the actual MSE of the EB estimator 8 EB EB τc under the nested error model, MSE(ˆ τc ) = E (ˆ − τc )2 . Under the model-based framework, the target indicators τc are regarded as random quantities (they are generated from the model) and hence incorporate uncertainty as well. Even if an estimator (or predictor) τ ˆc is unbiased with respect to the τc − τc ) = 0, it would still leave model in the sense that the expectation of its prediction error is zero E (ˆ τc − τc ), which is not exactly equal to the variance of the predictor the variance of the prediction error var(ˆ τc ) because of the randomness of the target indicator τc . var(ˆ The MSE of the EB estimator is estimated using a parametric bootstrap procedure for finite populations introduced by González-Manteiga et al. (2008), which is computationally more intensive than the ELL bootstrap approach. This is because, in every bootstrap replicate, all the steps for obtaining the EB estimator are reproduced, including model fitting and the MC procedure for approximation of the con- ditional expectation defining the EB estimator. This means that the number of simulated census vectors is the product of the MC simulations and the number of bootstrap replicates, except when the target indicator has an explicit EB estimator (such as the poverty rate). In this latter case, the MC simulation procedure can be replaced by an explicit formula calculated without simulation, and then the number of simulated censuses will be only the number of bootstrap replicates. The steps for the parametric bootstrap estimation of the MSE are as follows (MR, 2015): 1. Using the survey data, fit model (1) using a method providing consistent estimators. This yields the set of parameter estimates from the observed sample: ˆ0 , σ ˆ0 = β θ 2 2 ˆη 0, σ ˆe 0 . In the implemented sae R package, the REML method is used. ˆ0 to simulate census welfares11 from the fitted model as follows: 2. Use the estimates in θ ∗ ln(ych ˆ0 + η ∗ + e∗ , ) = xch β c ch ∗ where the area effects ηc are generated as ∗ 2 ηc ∼ N 0, σ ˆη 0 and the household-specific errors are generated as e∗ ch ∼ N 0, σ ˆe2 0 . ∗ 3. From the simulated census vector yc , we calculate the true value of the indicator of interest for each area c: ∗ ∗ τc = f (yc ), ∗ where f (yc ) can be any indicator function such as the FGT indicators. 4. Since the survey is regarded as a subset of the census, the survey sample sc is now extracted and, ∗ using the corresponding sample welfares yc,s (newly generated in step 2), one fits the model (1). This yields bootstrap estimates of the model parameters, ˆ∗ = β θ ˆ∗ , σ ˆη2∗ ,σ ˆe2∗ . ∗ EB ∗ 5. With the bootstrap vectors of sample welfare yc,s , c = 1, . . . , C , obtain the EB estimators τ ˆc , using MC simulation if needed. The estimation procedure is exactly the same EB procedure based ∗ on the original sample, but using the bootstrap sample data yc,s , c = 1, . . . , C and the corresponding ˆ∗ bootstrap model parameter estimates θ from step 4. 11 Note that here welfares are generated for all households in the census, sampled and non-sampled. 9 ∗(b) 6. Repeat steps 2 to 5 a sufficiently large number of times B . In each bootstrap replicate b, τc is the EB ∗(b) true value of the indicator obtained from the bth simulated census and τ ˆc is the corresponding EB estimator obtained from the extracted sample. The parametric bootstrap approximation of the MSE is then given by: B 2 EB 1 EB ∗(b) ∗(b) mseB τ ˆc = τ ˆc − τc . B b=1 4 Van der Weide’s update to PovMap software The addition of EB prediction developed by Van der Weide (2014) represented a landmark update to the PovMap project of the World Bank and its poverty mapping agenda. This section will highlight that this extension is not exactly the same as the EB estimator by MR (2010). In fact, it resembles the traditional ELL approach in the manner which point estimates and standard errors are computed according to the implementation in PovMap as well as in Stata’s sae package. EB prediction is implemented by generating censuses using the predicted location effects instead of generating them from their theoretical distribution. Let ec,s be the vector with the marginal residuals ech = ln(ych ) − xch β for the households in the survey. The predictor of the location effect ηc proposed by Van der Weide (2014) is an extension of the best predictor E [ηc |ec,s ] for the heteroscedastic nested error model with normality, incorporating also the survey weights to account for complex sampling designs. This predictor is given by −1 wch wch ˆc = γ η ˆc 2 2 eˆch (2) σ ˆech σ ˆech h h 2 where wch is the survey weight for household h in location c, σ ˆech is the estimated household-specific error variance using the alpha model as in ELL (2002), 2 σ ˆη γ ˆc = −1 2+ 2 wch σ ˆη h wch h wch h 2 σ ˆech 2 2 and σ ˆη is an estimator of ση . For this, Van der Weide (2014) considers the extended version of Hen- derson’s method III (Henderson, 1953) obtained accounting for survey weights as proposed in Huang ˆ is the estimated marginal residual from the GLS ˆch = ln(ych ) − xch β and Hidroglou (2003). Finally, e ˆ that accounts for heteroscedasticity and survey weights, given by estimator β −1 ˆ= wch T wch wch wch β 2 xch xch − γc 2 x ¯T ¯c,w x c,w 2 xch ych − γc 2 x¯c,w y ¯c,w , c σ ˆech σ ˆech c σ ˆech σ ˆech h h h h ¯c,w and x where all the sums are in the survey data, and y ¯c,w are the weighted sample means, −1 −1 wch wch wch wch y ¯c,w = 2 2 ych , x ¯c,w = 2 2 xch . σ ˆech σ ˆech σ ˆech σ ˆech h h h h Van der Weide (2014) also proposed the following estimator of the variance of η ˆc : 2 2 2 2 wch 2 ηc ] = σ var [ˆ ˆη −γ ˆc σ ˆη + 2 σ ˆech . (3) σ ˆech h Under homoscedasticity, the predicted location effects η ˆc are equal to those of the Pseudo EB in Guadar- rama et al. (2018). They reduce to the estimated area effects η ˆc obtained from MR (2010) if additionally no survey weights are considered (wch = 1 for all h and c) as noted in Van der Weide (2014). 10 As mentioned before, for sampled clusters, EB prediction makes use of the estimated cluster effects from the survey data to improve the point estimates of the poverty indicators and their errors. A crucial assumption under EB prediction is that model errors ech and ηc are normally distributed. This yields normality for ηc given ec,s , which leads to the proposed predicted cluster effect η ˆc . This EB addition implemented in the PovMap software uses an extended version of Henderson’s method 2 2 2 2 III (Henderson, 1953) to obtain estimators σ ˆη and σ ˆe of the variance components ση and σe that in- 2∗ corporate the sampling weights.12 In step 2 of section 2 for the ELL, ση is drawn from a Gamma ηc ] is unkonwn,13 it is necessary to rely distribution, however to avoid this assumption and because var [ˆ on bootstrap re-sampling to obtain the noise of the point estimates in similar fashion to ELL.14 Even if the estimator is based on the EB method of MR (2010), the bootstrap procedure used to generate the census data is completely different. This new bootstrap algorithm tries to avoid the distributional assumptions by generating in the first step samples of clusters from the survey data set. The full procedure for obtaining the estimator of τc , called here clustered bootstrap EB (CB-EB), is as follows: 1. Take a bootstrap sample of clusters (PSUs) with replacement from the survey data.15 2. Fit model (1) to the bootstrapped survey data from step 1. This yields the set of parameter estimates from the bootstrap sampled data (βˆ∗ , σ ˆη2∗ ,σ ˆe2∗ ), along with predicted effects for the ∗ ∗ clusters that were sampled in step 1, η ˆc ηc , and their corresponding estimated variance var∗ [ˆ ]. In the implemented version, the model is fit by FGLS as specified in Nguyen et al. (2018). ∗ 3. Use the model parameter estimates from step 2 to simulate welfares ych in the census as16 ∗ ln(ych ˆ∗ + η ∗ + e∗ , ) = xch β c ch where, for a cluster c that is sampled in step 1, the cluster effect is generated as ∗ ∗ ∗ ηc ∼ N (ˆ ηc ηc , var∗ [ˆ ]) , ∗ ∗ where η ˆc ηc and var∗ [ˆ ] are the bootstrap versions of η ˆc and its corresponding variance, given in (2) and (3) respectively. For a cluster that is not sampled in step 1, generate ∗ 2∗ ηc ∼ N 0, σ ˆη . Under homoscedasticity, the household specific errors are generated as e∗ ch ∼ N 0, σ ˆe2∗ , and, in the case of heteroscedasticity, as e∗ ch ∼ N 0, σ 2∗ ˆech . ∗ Construct the vector yc = (yc1 , . . . , ycNc )T of simulated welfare for every household within location c of size Nc , for all the locations c = 1, . . . , C in the census. 12 Interested readers should refer to Van der Weide (2014) and/or Nguyen et al. (2018) for an in-depth look at how these are obtained under ELL and Henderson’s method III. 13 In the traditional ELL method of Section 2, σ ˆη2 is assumed to follow a Gamma distribution, which does not hold in this case. 14 Actually, Van der Weide (2014) does not offer a method for the estimation of the standard errors, but the approach described below was the one implemented on PovMap and consequently also in the Stata sae package from Nguyen et al. (2018). 15 Note that not all clusters are expected to appear 16 Note that, in each bootstrap replicate, the census is generated from a different model. 11 ∗ ∗ ∗ 4. With the simulated welfare vector yc , calculate the indicator of interest as τc = f (yc ) such as the FGT indicator of order α ≥ 0, which is calculated as follows: Nc α ∗ pch ∗ y∗ fα (yc )= I (ych < z ) 1 − ch . pc z h=1 5. Repeat steps 1 to 3 M times. Even if traditionally M = 100, a larger number of replicates is ∗(m) recommended. Let τc be the indicator obtained in mth replicate, m = 1, . . . , M . The CB-EB CB −EB estimator τ ˆc is then obtained as the average across the M bootstrap replicates, M CB −EB 1 ∗(m) τ ˆc = τc M m=1 and the variance of the estimator is approximated as M 2 CB −EB 1 τc varCB −EB (ˆ )= τ ∗(m) − τ ˆcCB −EB . M − 1 m=1 c Unlike the traditional ELL approach, this procedure requires to match the location codes in the survey and the census. However, in the application of the traditional ELL approach, the authors recommended the inclusion of contextual variables at the location level and thus the linking between the two data sources was still necessary. As in the previous computational procedures, the final estimator of the poverty indicator for a location is obtained as an average across bootstrap replicates. By the Monte Carlo principle, this average is approximating an expectation. However, here it is not clear with respect to which DGP is the expectation taken, since the bootstrap procedure involves generation of several measures: In step 1, the sample of clusters varies, and this would define an expectation with respect to the sampling design, but considering only the first stage of the design. In step 2, model parameters are newly generated in each simulation. Finally, in step 3, location effects are generated from their conditional distribution given the sample residuals (determined by the nested error model under normality), and household-specific errors are generated from their distribution under the assumed nested error model with normality. Even if the bootstrap procedure is not the same, standard errors are obtained similarly as in the tra- ditional ELL approach and are thus different from MSEs. Note that the traditional ELL approach was initially based on the multiple imputation literature. According to Rubin (1996, and noted in StataCorp, 2019), the objective of multiple imputation is not to re-create the missing data as closely as possible, but to handle it in a manner that allows for statistical inference. 5 Comparison of CB-EB, traditional ELL and EB methods Because EB prediction uses the survey data to estimate the location effect, its benefits are evident only in locations present both in the census and the survey. In the traditional ELL approach, the location effect was originally at the cluster level and, in most surveys from developing countries where the World Bank focuses its efforts, the sampled clusters represent a small percentage of all the clusters in the country. This may also be the case for locations above clusters. For example, in a recently completed exercise in Moldova, the survey only contained 129 comunas out of a total of 901. Consequently, only a small share of these comunas benefit from EB prediction (Corral and Cojocaru, 2019). Nevertheless, differences also arise due to the computational procedures used to obtain the point estimates and their corresponding measures of noise. 12 MR (2010) presented evidence that their EB approach is superior to ELL in terms of a smaller MSE. This finding has met some criticism, mostly because the simulation experiment in MR (2010) was conducted under scenarios where ELL is seldom applied. First, the population size in MR (2010) was 20,000 uniformly spread across 80 areas. Second, all the areas were sampled and the sample within each area represented 20 percent of the population, something hardly seen in real-world applications. ELL (2002) illustrates how the noise of the estimator falls as the size of the population increases, and advises against estimating below populations of 100 households. In practice, the level of disaggregation of ELL estimates depends on the model quality, and even with an R2 not smaller than 0.6, it is seldom advised to go below 1,000 households (Elbers et al., 2007). Another point of departure is that the model considered in the simulation experiments of MR (2010) was really poor, yielding an adjusted R2 of less than 0.01. Under the usual applications of the ELL approach, the number of auxiliary variables is much greater, reaching most of the times an explanatory power, as measured by adjusted R2 , of over 0.5. Moreover, to improve precision, ELL (2002; 2003) also advocated for the use of contextual area-level variables17 on the right-hand side, since these explain (and thus minimize) the variation across locations. These were not used in the simulation experiment conducted in MR (2010). In such a setup, the much noisier ELL estimates obtained in their simulation experiments are not surprising. A final important difference is the fact that the error measures of estimators and the bootstrap procedures for obtaining them are considerably different in the three procedures. Currently, the error measures of the traditional ELL and of the more recent CB-EB procedure of Van der Weide (2014) are obtained under a very similar framework to that of multiple imputation. As an error measure, these consider the standard deviation of all the simulated point estimates of the indicators in the M bootstrap replicates, and so every simulated indicator is compared to the average of the simulated indicators. Moreover, in the bootstrap approaches used for that average of the simulated indicators, the initial model parameter ˆ0 , σ estimates β 2 2 ˆη 0 and σ ˆe 0 obtained using the original survey sample are not those used to generate the 2 2 welfare vectors in the population. In fact, in each bootstrap replicate, model parameters β, ση , σe are simulated from the estimated distribution of the initial estimators (or an approximation to it). In contrast, in MR’s (2010) parametric bootstrap procedure, a simple MC simulation procedure is used to approximate the actual MSE under the nested error model with the average across bootstrap simulations of the squared prediction errors. In this bootstrap procedure, the parameter estimates used to generate the bootstrap censuses of welfare are kept fixed to the original survey estimates. Consistency of these initial survey estimators to their corresponding true values as the total sample size n tends to infinity ensures that the bootstrap MSE approximates the true MSE (under the model with the true values of the parameters). 6 Extended Census EB estimators The EB estimator in MR (2010) requires linking the survey and census households, which may not be possible in real applications. In fact, the survey sample is likely not a subset of the available census. Hence, here we consider the Census EB (CEB) estimator, which avoids this step. In fact, in most cases, the number of sample households for a given area is much smaller than the number of census households and, in such cases, the CEB estimator is expected to perform very much like the original EB. CEB small area estimators are obtained with a similar MC simulation approach as the one used by MR (2010) and described in Section 3, but simulating the vectors of welfare for all the census households 17 These may come from the census or administrative records (ELL, 2003 p356). 13 (instead of the non-sampled ones only) and hence not appending the welfare for the survey units. The CEB is extended by accounting for the sampling design and heteroscedasticity, similarly as proposed by Van der Weide (2014). This procedure is implemented by also incorporating household expansion factors (household sizes) taken from the census. The proposed MC simulation procedure for the approximation of the extended CEB estimator is: 1. Fit model (1) to the survey data. This yields the set of parameter estimates: ˆ0 , σ ˆ0 = β θ 2 2 ˆη 0, σ ˆe 0 . The currently implemented procedure fits the model either via FGLS and decomposing residuals as in the traditional ELL procedure or using H3 method. 2. Use the model parameter estimates obtained in step 1 to simulate a vector of welfare for the Nc ∗ ∗ ∗ T ∗ census households in area c, yc = (yc1 , . . . , ycNc ) , where each yc1 is obtained as ∗ ln(ych ˆ0 + η ∗ + e∗ , ) = xch β c ch where, if area c is in the sample, its location effect is generated as ∗ ηc ∼ N (ˆ ηc0 ]) , ηc0 , var [ˆ with η ˆc0 given by −1 wch wch ˆc0 = γ η ˆc 2 2 ˆch0 , e σ ˆech 0 σ ˆech 0 h h for e ∗ ˆch0 = ln(ych ˆ0 , ) − xch β 2 σ ˆη 0 γ ˆc0 = −1 . 2 2 wch σ ˆη 0 + h wch h wch h 2 σ ˆech 0 and 2 2 2 2 wch 2 ηc0 ] = σ var [ˆ ˆη 0−γ ˆc 0 σ ˆη 0+ 2 σ ˆech 0 . σ ˆech0 h ∗ 2 If area c is not included in the sample, then its effect is generated as ηc ∼ N 0, σ ˆη 0 . In absence of heteroscedasticity, the household-specific residuals come from e∗ ch ∼ N 0, σ ˆe2 0 and, in the case of heteroscedasticity, from e∗ ch ∼ N 0, σ 2 ˆech 0 . ∗ ∗ ∗ T 3. The previous step yields a simulated vector of census welfare for each area, yc = ( yc1 , . . . , ycNc ) , ∗ which makes use of the fitted model parameters. With the census yc , the indicator in area c is calculated as ∗ ∗ τc = f (yc ), where, for the FGT indicator of order α ≥ 0, we have Nc α ∗ pch ∗ y∗ fα (yc )= I (ych < z ) 1 − ch . pc z h=1 14 ∗(m) 4. Repeat steps 1 to 3 a large number of times M . Let τc be indicator obtained in mth replicate, for m = 1, . . . , M . The extended Census EB estimate is just the average of the M indicators, M CEB 1 ∗(m) τ ˆc = τc . M m=1 In contrast with the traditional ELL and CB-EB approaches, in the above MC procedure, the model ˆ0 , σ parameter estimates β ˆ 2 , as well as η ˆ 2 and σ ηc0 ], are kept fixed across the whole procedure; ˆc0 and var [ˆ η0 e0 that is, there is a single DGP. As a measure of noise of the extended CEB estimators, here the actual MSE under the nested error model is considered. To estimate this MSE, we propose to apply a parametric bootstrap procedure similar to the one in MR (2010), but adapted to the case where the survey is not necessarily a subset of the census. This procedure is also described in Molina (2019) for the CEB predictor, but here we apply it to the extended CEB predictor that includes heteroscedasticity and sampling weights: 1. Fit model (1) to the survey data. This yields the set of parameter estimates: ˆ0 , σ ˆ0 = β θ 2 2 ˆη 0, σ ˆe 0 . In the implemented version, the model is fit either via FGLS and decomposing residuals as in the traditional ELL fitting approach, or by H3. 2. Using the estimates obtained in step 1 as true values of the model parameters, simulate a vector ∗ of census welfare yc = (yc1 , . . . , ycNc )T as follows: ∗ ln(ych ˆ0 + η ∗ + e∗ , h = 1, . . . , Nc , ) = xch β c ch where the location effect is generated as ∗ 2 ηc ∼ N 0, σ ˆη 0 . In the homoscedastic case, household-specific errors are generated as e∗ ch ∼ N 0, σ ˆe2 0 and, in the case of heteroscedasticity, as e∗ ch ∼ N 0, σ 2 ˆech 0 . ∗ 3. With the simulated census of welfare yc = (yc1 , . . . , ycNc )T for each area c = 1, . . . , C , produce indicators of interest: ∗ ∗ τc = f (yc ). ∗ where f (yc ) can be any indicator function, such as one of the FGT poverty indicators. ∗ 4. Use the model parameter estimates obtained in step 1 to obtain new sample welfares, ych,s , for every location c as follows: ∗ ln(ych,s ˆ0 + η ∗ + e∗ . ) = xch,s β c ch,s ˆ0 come from step 1. (a) The estimates β ∗ (b) ηc is the same one generated in step 2. Specifically, all locations present in the survey are ∗ matched to the census locations and the same value of ηc that was simulated for that location in step 2 is applied to the survey households within the same location. 15 (c) The household-specific errors are simulated as e∗ ch,s ∼ N 0, σ ˆe2 0 in the homoscedastic case. When there is heteroscedasticity, they are simulated as e∗ ch,s ∼ N 0, σ 2 ˆech0 . Unlike the original parametric bootstrap procedure in MR (2010) described in Section 3, here the sample errors e∗ ch,s are not a subset of the census errors simulated in step 2, although they are generated from exactly the same model (with the same variance). 5. Fit the model (1) to the newly simulated survey data from step 4. This yields a bootstrap vector of estimated model parameters: ˆ∗ = β θ ˆ∗ , σ ˆη2∗ ,σ ˆe2∗ . CEB ∗ 6. Obtain the bootstrap extended CEB estimator τˆc,b through MC simulation using the bootstrap ˆ∗ parameter estimates θ from step 5, by the approach described in the previous section. ∗(b) CEB ∗(b) 7. Repeat steps 2 to 6 a sufficiently large number of times, B . Let τc be the true value and τ ˆc,b be the CEB estimator obtained in the bth replicate of the bootstrap procedure. A parametric bootstrap estimator of the MSE is given by: B 2 CEB 1 CEB ∗(b) ∗(b) mseB τ ˆc = τ ˆc − τc . B b=1 7 Simulation experiments for comparison of methods 7.1 Simulation experiment with poor model Here we compare the procedures described in the previous sections for the estimation of poverty rates and gaps (FGT indicators with α = 0, 1 respectively) and their corresponding error measures. To this aim, we perform simulation experiments under the same setup as in MR (2010), but extended to consider also more realistic scenarios. Thus, like in MR (2010), a census data set of N = 20, 000 observations is created, where all observations are uniformly spread among C = 80 areas, labeled from 1 to 80. This means that every area consists of iid Nc = 250 observations. Location effects are simulated as ηc ∼ N 0, 0.152 ; note that every observation within a given area will have the same simulated effect. Then, values of two right-hand side binary variables are simulated. The first one, x1 , takes the value 1 if a generated random uniform value between c 0 and 1 is less than or equal to 0.3 + 0.5 80 . This means that observations in areas with a higher label are more likely to get value 1. The next one, x2 , is not tied to the area’s label. This variable takes the value 1 if a simulated random uniform value between 0 and 1 is less than or equal to 0.2. The census welfare vectors yc = (yc,1 , . . . , yc,Nc )T for each area c = 1, . . . , C , are then created as follows: ln(ych ) = 3 + 0.03x1,ch − 0.04x2,ch + ηc + ech , iid where household-level errors are generated under the homoscedastic setup, as ech ∼ N 0, 0.52 . The poverty line is fixed at z = 12, which is roughly 60 percent of the median welfare of a generated population. 16 From the created “census” in each of the areas, 20 percent of the observations are sampled using simple random sampling without replacement;18 this yields our “survey” data. Thus, in this experiment, survey weights are all equal. This generation process is repeated L = 10, 000 times. In each simulation, the following quantities are computed for the poverty rates and gaps in each area: 1. True poverty indicators τc , using the “census”. DIR 2. Direct estimators τ ˆc using the “survey”, defined as the sample versions of τc . EB 3. Original EB estimators τ ˆc by MR (2010) with REML fitting as implemented in the sae R package, with M = 50 MC replicates. ELL 4. Traditional ELL estimators τ ˆc and their estimated standard errors, without location means and also including the location means of the considered auxiliary variables, where M = 50. CEB 5. Census EB (CEB) estimators τ ˆc , obtained as in the MC procedure of MR (2010), but generating full census vectors from the conditional distribution and without appending the survey welfares, using REML fitting and with M = 50. 6. Extended CEB estimators using the MC and parametric bootstrap methods described in Section 6, using either H3 or ELL fitting methods with M = 50. Since in this simulation experiment model errors are homoscedastic and sampling weights are constant, here the extended CEB estimators only differ in the above CEB ones in the model fitting method. CB −EB 7. Estimators based on the CB-EB approach from Section 4, τ ˆc using either H3 or ELL fitting methods.19 Since clusters are originally supposed to be nested within the areas and the clustered bootstrap method is supposed to sample clusters (PSUs) instead of areas in step 1, we computed additional CB-EB estimators, where every area is divided into 25 PSUs (with 10 households each) and selecting 5 PSUs within every area (instead of areas). Under this scenario M = 50 as well. Model bias and MSE are approximated empirically as in MR (2010), as the averages across the L = 10, 000 j simulations of the prediction errors τ ˆc − τc and of the squared prediction errors respectively, where j stands for one of the methods: DIR, ELL, CEB or CB-EB.20 Before getting into the results of the full simulation experiment, the methods are initially compared using just a single generated population along with its corresponding sample. For this one population, the different methods are run by setting M = 1000 to get point estimates and B = 1000 for the MSE.21 For expediency, results that do not conform to the main message of the paper are relegated to the appendix. However, from the figures in the appendix, we can draw the following conclusions: 1) The differences in the EB and Census EB point estimates are negligible even if the area sampling fractions fc = nc /Nc are not so small in this experiment (20 percent), see Figure 19. Although differences appear in their estimated MSEs (see Figure 20), these differences become closer to zero as the sampling fractions fc decrease. 2) Census EB estimates using H3 method as implemented in the sae Stata package are practically identical to the Census EB ones with REML fitting (Figure 21). Similarly, when comparing the Census EB estimators obtained with the proposed MC procedure using either H3 or ELL estimation methods, we can see that they are also aligned (Figure 22). This indicates that the actual method used 18 The same units are sampled in every simulation and the values of x1 and x2 for all the census units are also kept fixed; this means that the x1 and x2 values for the sample units are always the same across simulations. 19 For a detailed description of the difference between the methods readers should refer to Nguyen et al. (2018). 20 E j j 2 τ ˆc − τc for the bias, and E τ ˆc − τc for the MSE. 21 ForELL from section 2 and H3 CBEB from section 4 since the process relies on a single computational algorithm M = 1, 000 17 to fit the model is not an important factor. In fact, in real applications, the overall survey sample is typically very large so any fitting method, as long as it provides consistent estimators, is expected to deliver estimates that will be very close to the corresponding true values. The above conclusions support the use of the Census EB, using either REML, H3 or ELL estimation methods, instead of the original EB procedure in real-world applications, where it is seldom possible to link the survey and census data. Moreover, since the large sampling fractions of this experiment favor the original EB method, in the following we show only results for the Census EB instead of the original EB, where welfares are generated in the MC procedure for all the census units and the survey welfares are not appended. Figure 1 compares, for the poverty rates, Census EB estimators obtained from the proposed MC pro- cedure of Section 6 using H3 estimation method (labeled “H3-CensusEB”) with the analogues obtained using the CB-EB procedure of Section 4 (labeled “H3-CBEB (old bootstrap)”). We include also the results from the CB-EB method with selection of PSUs within the areas instead of selection of areas in step 1 (labeled “H3-CBEB-old with PSU”). Note that the CB-EB estimates presented in Section 4 differ considerably from the true poverty rates whereas the Census EB ones track approximately the true values. In fact, the sum of absolute differences of the CB-EB estimates to the true values is almost three times that of the Census EB estimates. The main reason for this difference of behavior, even if both estimates are supposed to implement EB prediction is that, under the CB-EB approach in Section 4 with areas equal to clusters, a sample of areas is drawn in each simulation. This leads to a given area benefiting from EB prediction of the area effect only in a portion of the simulations, since it is likely that an area is not selected in every single bootstrap sample. This leads to CB-EB point estimates that are far off from the Census EB estimates obtained with the proposed MC procedure of Section 6. Another reason might be that the parameters used to simulate the census vectors in CB-EB approach are not kept fixed across simulations, making difficult the convergence to the conditional expectation defining the estimator. When using the CB-EB approach with PSUs, there is still a clear difference with the Census EB estimates obtained with the new bootstrap procedure of Section 6. Turning now to the traditional ELL approach, Figure 2 shows ELL estimates for the poverty rates, based on the model with only x1 and x2 (labeled “ELL”) and the same, but including the area means of x1 and x2 in the model (labeled “ELL with context”), together with true poverty rates and Census EB estimates obtained using the original MC procedure of MR (2010), without appending the survey data. In this plot, one finds that the traditional ELL estimates face the same troubles as the CB-EB ones. This is in line with what MR (2010) find in their simulations. Given the present simulation set-up with a poor model fit, ELL performs much worse than MR’s Census EB method in terms of point estimates. Also note the rather flat nature of ELL, which is mostly due to the model providing very little information and, given the limited explanatory power of the correlates, the ELL estimates fall close to the average poverty rate of 16 for all the areas.22 What is somewhat concerning is that the addition of contextual variables to the model for ELL, in the form of the area means, does little to improve the point estimates. Thus, even if the survey data are not preserved in the Census EB estimation method, the benefit of EB is quite considerable and caution should be taken when doing out of sample predictions in case of little explanatory power being brought in by the auxiliary variables. The complete simulation experiment with L = 10, 000 simulated populations are now discussed. Results in this case focus on the poverty gap, although the observed tendencies are the same for the poverty rates. Figures 3 and 4 show respectively the empirical bias and empirical MSE across simulated populations 22 The ELL estimates are not really flat if sorted from smallest to largest. 18 Figure 1: True poverty rates, proposed Census EB estimates of Section 6 using H3 method (labeled “H3- CensusEB”), CB-EB analogues from Section 4 (labeled “H3-CBEB (old bootstrap)”) and CB-EB ones with selection of PSUs within the areas (labeled “H3-CBEB-old with PSU”) in one simulated population and sample. Figure 2: True poverty rates, ELL estimates (labeled “ELL”), ELL estimates including the area mean of x1 in the model (labeled “ELL with context”) and Census EB estimates obtained from MR (2010) without appending the survey data, in one simulated population and sample. 19 Figure 3: Empirical bias across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of the poverty gaps. Figure 4: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of the poverty gaps. 20 of Census EB from Section 6 using H3 estimation method (labeled “H3-CensusEB”), traditional ELL estimators (labeled “ELL”), estimators based on the CB-EB approach with H3 method (labeled “H3- CBEB (old bootstrap)”), the same with PSUs within areas selected in step 1 (labeled “H3-CBEB (old bootstrap with PSU)”) and direct estimators (labeled “Direct”). These full results are quite sobering. While the bias for the traditional ELL estimators is not remarkable, the bias of the estimators obtained by the CB-EB method from Section 4 is substantial, showing upward bias for all areas. In contrast, the bias of the Census-EB estimators obtained from Section 6 seems negligible, ranging between -0.025 and 0.027, similar to the bias range of the original EB estimators from MR (2010).23 This figure shows a considerable bias reduction of the proposed MC procedure for obtaining the Census EB estimators with respect to the CB-EB method. In terms of MSE, results are aligned to those in MR (2010), see Figure 4. The traditional ELL yields an empirical MSE above the one of the direct estimator, which uses no model. Even if ELL estimators do not show a large bias when averaging across all the simulated populations (the ELL bootstrap procedure is supposed to approximate the marginal mean E (τc ; θ), which is exactly unbiased), as shown in Figure 2, it does not capture the area effect of the true poverty rates for a single population. Note that, under the model-based setup, welfares are generated from the model and in each simulation, new location effects are generated. This means that the true poverty rates vary for one population to another. In fact, a given area might have a positive location effect in one generated population and a negative effect in another. Since the location effects have an expected value of zero, the ELL method is tracking the average zero effect, which is the same for all the areas, but it does not capture the area effects in a given population. What is even more concerning is that the CB-EB method applied in Nguyen et al. (2018) and PovMap described in Section 4 yields an MSE that is more than 3 times greater that of the traditional ELL. This happens because a large amount of the MSE is the squared bias, which has been shown to be substantial for the CB-EB estimator. The proposed MC method for Census EB estimation presented here represents a massive improvement over the traditional ELL and the CB-EB approaches under the setup of this simulation experiment based on a poor model. The next section describes results when increasing the prediction power of the model. 7.2 Simulation experiment with improved model This experiment addresses one of the concerns of the previous simulation experiment: the poor explana- tory power of the model. Hence, in this section a more informed model is considered, yielding an adjusted R2 of about 0.42, which is much closer to that of real-world applications of ELL. The model includes six covariates. The first two, x1 and x2 , are generated exactly as in the previous section. Four additional covariates are included, with values generated as follows: 1. x3 is a binary variable, taking value 1 when a random uniform number between 0 and 1 is less than c or equal to 0.1 + 0.2 80 . 2. x4 is a binary variable, taking value 1 when a random uniform value between 0 and 1 is less than c or equal to 0.5 + 0.3 80 . 3. x5 is a discrete variable, simulated as the rounded integer value of the maximum value between 1 c and a random Poisson variable with mean λ = 3 1 − 0.1 80 . 4. x6 is a binary variable, taking value 1 when a random uniform value between 0 and 1 is less than or equal to 0.4. Note that the values of x6 are not related to the area’s label. 23 Resultsfor the Census EB estimators obtained from the original procedure of MR’s (2010) with REML estimation are not shown since they are aligned to the Census EB estimators from Section 6. 21 The welfare vector for each area is created from the model with these covariates as follows ln ych = 3 + 0.09x1,ch − 0.04x2,ch − 0.09x3,ch + 0.4x4,ch − 0.25x5,ch + 0.1x6,ch + ηc + ech , iid iid where ech ∼ N 0, 0.52 and ηc ∼ N 0, 0.152 . The poverty line in this scenario is fixed at z = 10.2. All other steps are the same as in the previous simulation.24 Even if the performance of CB-EB estimators improves when considering a better model, these are still clearly worse than the Census EB estimators obtained with the proposed MC approach of Section 6, see Figure 5. This conclusion remains true even after decreasing the location effect, by simulating the iid populations with ηc ∼ N 0, 0.072 .25 Similarly, despite the improved explanatory power of the model, Figure 6 still portrays a rather flat ELL hovering around the national average poverty rate. Under a single simulated population from the improved model, the traditional ELL method still performs poorly when compared with the Census EB according to MR (2010). Results from the full simulation with L = 10, 000 populations for the poverty gap also resemble those from the previous section, see Figures 7 and 8 showing empirical bias and empirical MSE respectively.26 The CB-EB method implemented as described in Nguyen et al. (2018), which replicates the methods of PovMap, still presents considerable bias in this new simulation exercise. In comparison, the proposed Census EB procedure seems to be nearly unbiased, although the obtained empirical biases actually range between -0.04 and 0.05. Moreover, even under a better regression model, the MSE of the proposed Census EB estimator is still much smaller. What is concerning is that the MSEs of the direct estimators are smaller than those of the traditional ELL estimators and of the CB-EB estimators. Another criticism of the EB approach of MR (2010), which extends to the Census EB, is its dependence on normality. In order to study the sensitivity of the procedures to the normality assumption, an additional simulation experiment is run under the same setup of this section, but generating the household-specific errors from a Student’s t distribution with 5 degrees of freedom and scaled by 0.5. This distribution is symmetric but has heavier tails than the normal distribution, which may represent the existence of outlying welfares in the data, often encountered in real applications. Empirical bias and MSE across the L = 10, 000 simulations are shown in Figures 9 and 10 respectively, for the proposed Census EB, traditional ELL and direct estimators.27 Note that the ELL method takes household-specific residuals from their empirical distribution, so its performance is not supposed to rely so strongly on normality. However, results suggest that, even under lack of normality, ELL’s performance is still worse than that of the Census EB update. In terms of bias, some areas under the traditional ELL show quite considerable bias, yet on average across areas the bias can be near zero. This suggests that, even if the bias across all areas of ELL can be small, it is not an accurate estimator in each area.28 In fact, the empirical MSEs of the traditional ELL estimators for many areas are still much greater than those of the proposed Census EB ones in this experiment. Even after the better performance of the Census EB procedure compared with the traditional ELL method under normality departure in the form of higher tails, model checking and diagnostics are ex- tremely important, as in any model-based procedure. In fact, good performance of EB and Census EB 24 Another iid slight modification made in separate simulations is that the location effect is simulated as ηc ∼ N 0, 0.072 . 25 Results available upon request. Although it must be noted that under this scenario under many populations σ ˆη <0, and thus many of the populations need to be re-simulated. 26 For other FGT indicators see figures: 24, 25, 26, and 27. 27 For other FGT indicators see figures: 28, 29, 30, and 31. 28 Note that the purpose in SAE is not obtaining estimators performing well on average across areas since, in that case, the overall sample mean at the population level would be sufficient. 22 Figure 5: True poverty rates, proposed Census EB estimates of Section 6 using H3 method (labeled “H3- CensusEB”), CB-EB analogues from Section 4 (labeled “H3-CBEB (old bootstrap)”) and CB-EB ones with selection of PSUs within the areas (labeled “H3-CBEB-old with PSU”) in one simulated population and sample, with improved model fit. Figure 6: True poverty rates, ELL estimates (labeled “ELL”), ELL estimates including the area means of x1 in the model (labeled “ELL with context”) and Census EB estimates obtained from MR (2010) without appending the survey data, in one simulated population and sample, with improved model fit. 23 Figure 7: Empirical bias across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of the poverty gaps, with improved model. Figure 8: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of the poverty gaps, under improved model. 24 estimators in terms of bias and efficiency also relies on normality. Note that the sample size of house- hold surveys is typically very large and therefore normality tests are all expected to reject the simple null hypothesis of normality due to the strong evidence against it (which is never exactly true in real applications). Hence, usual residual plots for checking the normality assumption using both predicted area effects and conditional unit-level residuals should be an important part of any application with real data. In fact, log transformation of welfares is indeed conventionally taken because practically all welfare variables show strong right-skewness and heteroscedasticity. First of all, it is important to note that, even after log transformation, residuals in some applications still show a skewed distribution, where the skewness is transferred to the left tail. This occurs often in the presence of very small welfare values, which the log function actually shifts to minus infinity. A simple way to solve this left skewness problem caused by the log function is to add a shift c > 0 to the original survey welfares, shifting them to values further apart from zero, where the slope of the log function is less pronounced. As Molina, Nandram and Rao (2014) recommends, this shift c can be obtained by fitting the nested error model to a grid of values of c in the range of the welfare values, and selecting the value for which the Fisher skewness of unit level residuals is closest to zero. Another possibility is to select a different transformation from the Box-Cox or power families of transformations (which contain the log), as implemented in the sae R package of Molina and Marhuenda (2015). In fact, Box and Cox (1964) included also the shift in their original family of transformations. If one wishes to avoid finding the correct transformation to achieve normality29 or if residuals still show skewness, even after selecting the best possible transformation of welfare, approaches for EB estimation have been recently developed for skewed distributions, see Graf, Marín and Molina (2019) for EB under general skewed distributions and Diallo and Rao (2018) considering skew-normal errors. A simulation experiment was done by increasing the size of the population to N = 100, 000, but the sample size is kept as nc = 50 households per area. In this case, each area is made up of Nc = 1, 250 households, which means that sampling fractions are now 4 percent (much smaller than the previous 20 percent). Everything else in the simulation experiment is kept the same as in the simulation experiment with the improved model and assuming normality. Under this scenario, ELL still performs worse than the proposed Census EB, see Figures 11 and 12. However, it now shows improved performance in certain areas, where it achieves a smaller MSE than the direct estimators (Figure 12). This bodes well for real world scenarios, where the sampling fractions hardly reach 0.5 percent and thus ELL will likely be preferable to direct estimators.30 Under this same (more realistic) scenario, we analyze the performance of the MSE estimators obtained from each method. Figure 13 shows respectively the true MSEs of the traditional ELL estimators (approximated empirically with the L = 10, 000 simulations), and the corresponding ELL estimated variances (section2). We can see that estimated ELL variances are severely understating the true MSEs. This severe underestimation also affects the CB-EB procedure (see appendix Figure 23). On the other hand, according to Figure 14, showing true MSEs of Census EB estimators and their corresponding parametric bootstrap MSEs, where Census EB and bootstrap MSE estimators are obtained from the algorithms of Section 6, the parametric bootstrap MSE estimators are tracking adequately the true MSEs.31 29 Note that for a random variable v , the transformation that leads to normality is given by Φ−1 (F (v )), where F (·) is the true cumulative distribution function (cdf) of v and Φ(·) is the standard normal cdf. The problem is that the true cdf F (·) is typically unknown. 30 For other FGT indicators see figures: 32, 33, 34, and 35 31 Similar conclusions were obtained for other indicators, such as the poverty severity (FGT indicator with α = 2), although results are not shown for brevity. 25 Figure 9: Empirical bias across simulated populations of proposed Census EB (labeled “H3-CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of the poverty gaps, under non-normal errors. Figure 10: Empirical MSE across simulated populations of proposed Census EB (labeled “H3- CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of the poverty gaps, under non-normal errors. 26 Figure 11: Empirical bias across simulated populations of Census EB estimators using H3 method (la- beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”), with population size of 100K. Figure 12: Empirical MSE across simulated populations of Census EB estimators using H3 method (la- beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”), with population size of 100K. 27 Figure 13: Empirical MSEs of ELL estimators of poverty gaps (labeled “True ELL MSE”) and corre- sponding ELL variance estimates (labeled “Estimated ELL Variance”), with population size of 100K. Figure 14: Empirical MSEs of Census EB estimators of poverty gaps that use H3 estimation method (labeled “True MSE”) and corresponding parametric bootstrap MSE estimates (labeled “Estimated H3- CensusEB MSE”), with population size of 100K. 28 Figure 15: Empirical bias across simulated populations of Census EB estimators using H3 method (labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot- strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample of 10 per area. Figure 16: Empirical MSE across simulated populations of Census EB estimators using H3 method (labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot- strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample of 10 per area. 29 Table 1: Aggregate results across areas (FGT1) 1 2 3 4 5 6 Model R2 <0.01 ~0.42 ~0.42 ~0.42 ~0.42 ~0.46 K 2 6 6 6 6 6 σ ηc N (0, 0.152 ) N (0, 0.152 ) N (0, 0.072 ) N (0, 0.152 ) N (0, 0.152 ) N (0, 0.152 ) σech N (0, 0.52 ) N (0, 0.52 ) N (0, 0.52 ) Student’s t* N (0, 0.52 ) N (0, 0.52 ) Pov. threshold 12 10.2 10.2 10.2 10.2 10.2 nc 50 50 50 50 50 10 Nc 20,000 20,000 20,000 20,000 100,000 100,000 Direct estimates AAB (x100) 0.031 1.180 1.183 1.721 1.556 3.744 AARB (x100) 33.000 17.446 17.212 16.538 20.101 43.347 ARMSE (x100) 1.269 2.417 2.404 2.386 2.835 5.143 ARRMSE (x100) 43.114 21.661 21.116 18.466 24.823 47.335 ELL AAB (x100) 0.155 0.450 0.175 2.363 Not run Not run AARB (x100) 76.528 27.755 13.663 25.288 Not run Not run ARMSE (x100) 2.042 3.627 1.906 3.608 Not run Not run ARRMSE (x100) 510.106 39.350 17.824 28.903 Not run Not run ELL - with context AAB (x100) 0.169 0.413 0.169 2.264 0.408 1.936 AARB (x100) 77.412 27.496 13.723 25.077 26.040 25.372 ARMSE (x100) 2.036 3.591 2.098 3.566 3.529 2.929 ARRMSE (x100) 448.674 38.957 19.679 28.715 36.171 28.378 H3-CBEB (with PSU) AAB (x100) 3.742 3.125 3.205 Not run 3.120 Not run AARB (x100) 157.127 32.002 30.199 Not run 30.554 Not run ARMSE (x100) 3.888 3.503 3.481 Not run 3.484 Not run ARRMSE (x100) 363.459 38.417 34.072 Not run 36.025 Not run H3-CBEB AAB (x100) 3.737 3.123 3.205 Not run Not run 3.383 AARB (x100) 176.202 35.250 30.726 Not run Not run 36.295 ARMSE (x100) 3.879 3.707 3.507 Not run Not run 3.840 ARRMSE (x100) 577.523 45.157 35.216 Not run Not run 40.608 H3-CensusEB AAB (x100) 0.007 0.014 0.012 1.272 0.013 1.590 AARB (x100) 26.387 11.170 9.755 12.959 10.550 19.432 ARMSE (x100) 0.932 1.560 1.372 1.739 1.542 2.319 ARRMSE (x100) 63.121 14.668 12.586 14.300 13.590 21.818 AAB: Average absolute bias; AARB: Average absolute relative bias; ARMSE: Average root MSE; ARRMSE: Average relative root MSE. *Student’s t distribution with 5 degrees of freedom and scaled by 0.5 30 One additional simulation is executed with the population of N = 100, 000, but the sample size is kept as nc = 10 households per area, which makes the total sample size n = 800 . Each area is made up of Nc = 1, 250 households, which means that sampling fractions are now 0.8 percent. The purpose of the simulation is to assess the performance of the estimators under a scenario more aligned to ELL. ELL (2002) proposed the location effect be at the cluster level, and in some applications this is a cluster of 10 observations. However, in this case the total survey sample size may not be so realistic, since in household surveys from many countries, the overall sample size is of several thousands. As expected, under this scenario, the differences in bias of the different methods is less stark. Note that with zero area sample size (nc = 0), Census EB and traditional ELL become approximately the same, so, as we approach this scenario, estimators come together (Figure 15). Because of the limited sample, all methods now show considerably more bias and a higher MSE. Concurrently, H3-CensusEB now appears to be more aligned to the ELL and H3-CBEB. However, H3-Census EB still outperforms the older methods in terms of bias and MSE. Note also, that usually ELL is done by setting the location effect at the cluster level but results are presented at higher levels.32 However, aggregation to higher levels may underestimate the true MSE as noted by Das and Chambers (2017). The inclusion of contextual level variables may help (ibid ) although this requires being able to link clusters in the survey sample to the census which in practice is not always feasible. Aggregate results across areas are presented in Table 1 for the simulations discussed.33 Results from column 1 are those that replicate the simulation experiment of Molina and Rao (2010). Note that the large values of the average relative root mean squared error (ARRMSE) may arise from true values of FGT2 close to 0, but the true values are common to all estimators. Regardless, the better performance of the H3-Census EB update is quite clear in terms of all the performance measures and under all the simulations run. Moreover, the results in the tables corroborate what is observed in the figures. 7.3 Examining the bias of the CB-EB PovMap update The bias of the CB-EB estimator observed in our simulation experiments is a reason for concern. In the CB-EB method with clusters equal to the areas, one could argue that the bias comes from the fact that not all areas are equally likely to be included in the sample that is selected in step 1 of this procedure. However, in our simulation experiments, bias also appears for the CB-EB method with PSUs selected within the areas in step 1. Thus, the question remains: what is the source of this bias? Investigating whether part of the bias comes from the regression coefficients used to draw the welfares βˆ∗ , which vary across the bootstrap replicates, we have plotted a histogram of the simulated regression intercepts βˆ∗ obtained in the M replicates of the CB-EB bootstrap procedure, with M = 50, 100, 300, for a single population. Figure 17 shows the results for M = 50 and M = 300 bootstrap replicates. This ∗ figure shows that the distribution of the simulated regression intercepts β1 for small M is not actually centered around the true value β1 = 3; in fact, it is right-skewed with a mean clearly exceeding the true value. While increasing the bootstrap replicates M yields better results (right plot), the regression intercepts are still not centered around the true intercept. ∗ A second plausible source of bias is through the simulated location effects, ηc . Since the method selects PSUs with replacement, some areas might be selected more than once and others might not be selected. ηc ] might be underestimated in an area selected more than once because there As a consequence, var [ˆ will artificially be more observations in that area. On the other hand, if an area is not selected, then 32 For other FGT indicators see figures: 36, 37, 38, and 39. 33 See tables 2 and 3 for FGT0 and FGT2, respectively. 31 Figure 17: Histogram of the regression intercepts β ∗(m) , m = 1, . . . , M obtained in the M replicates of the CB-EB bootstrap method, for M = 50 (left) and M = 300 (right). ∗ ηc ∼ N 0, σ ˆη , and because xch β ∗ is likely biased according to Figure 17, then the resulting estimate 2 34 for this area will deviate considerably from the truth. Figure 18: Empirical bias of the average across the census households of exp(xch β ∗ ) for each area c = 1, . . . , C , as an estimator of the corresponding average of the true values exp(xch β ) (left) and of the average of exp(xch β ∗ + ηc ∗ ) as estimator of the true average based on exp(xch β + ηc ), for each area c. Figure 18 shows how the bias actually builds up in the predicted welfares.35 The left plot in Figure 18 shows the empirical bias of the average across the census households of exp(xch β ∗ ) for each area as an ∗ estimator of the corresponding average of the true values exp(xch β ). The right plot includes ηc and thus ∗ shows the bias of the average of exp(xch β ∗ + ηc ) as estimator of the true average based on exp(xch β + ηc ), for each area c. We can see that the fixed part of the regression already shows a considerable upward bias, which agrees with the bias observed for the regression intercept in Figure 17. Once the location effect is added, things get much worse, with bias becoming substantial for most areas. 34 Note that σ ˆη2 is not the one estimated from the sample, it comes from a bootstrap sampling of the data. 35 This simulation is executed under the same scenario as that of Subsection 7.2, except that L = 5, 000 instead of L = 10, 000 32 8 Conclusions Since the turn of the 21st century, the World Bank has been obtaining small area estimates of poverty and inequality indicators using the ELL approach. One of the reasons for the method’s popularity was the existence of software that made applying the method to real data relatively easy. This began with an implementation in SAS by Demombynes (2002), which was followed by PovMap implemented by Zhao (2006), and finally a Stata version (“sae” package by Nguyen et al., 2018). Along those lines, the research effected in this paper also represents a considerable update to the “sae” package in Stata.36 Countries often set out to obtain small area estimates because more precise poverty estimates at a more granular level allow for improved allocation of resources, see Elbers et al. (2007). This research comes just in time for the 2020 round of population census and should provide an improved tool for the operationalization of the SDGs at a sub-national level. An important aspect of the small area estimation procedures is how parameter estimates from an assumed population model are applied to census data to obtain the small area estimates. As noted in this document, the traditional ELL procedure is aligned to the approach used in multiple imputation, where the main aim is not prediction. Under multiple imputation, the method that yields the lowest MSE yields invalid statistical inference (Van Buuren, 2018). On the other hand, the goal of small area estimation is to improve precision. This paper presents a substantial revision to the traditional methods by ELL (2002, 2003) and the updates by Van der Weide (2014) for small area estimation.37 Our results show substantial gains of the proposed Census EB method with respect to the previous methods in all the considered scenarios (varying area population sizes and sampling fractions, model prediction power and distributional assumptions) and also of the considered parametric bootstrap MSE estimators. Thus, the updated methodology provides a considerable improvement in the quality of the SAE estimates obtained under the World Bank’s agenda. We show that, for a single population, the original ELL method tends to align with the national poverty estimates and does not capture the area heterogeneity, although its performance improves when averaging across populations. Nevertheless, caution should be taken when doing out of sample prediction even under EB methods, since in that case estimates are very similar to the traditional ELL ones and can thus deviate considerably from the truth. This is particularly relevant in cases where the explanatory power of the chosen correlates is low. An additional finding of this research is the considerable bias of the CB-EB update by Van der Weide (2014) and as implemented by Nguyen et al. (2018) to the World Bank’s toolkit. The revised Census EB method is aligned with MR’s (2010) method, which shows a level of bias that is several orders of magnitude smaller than the previous CB-EB method. Furthermore, the MSE for the Census EB method is also considerably smaller. This research and its accompanying software implementation represent a massive improvement to the World Bank’s current toolkit and overall poverty mapping agenda. This note serves as a background piece for a guidance note on small area estimation for the World Bank. The simulations executed here are done under a controlled scenario, and thus a natural question is how the findings would differ under a real-world application. Now that the initial limitations of the methods have been identified, the revisions can be put to the test on real-world data. 36 An update to Nguyen et al. (2018) is in progress, but all Stata codes and commands used in this document are available at https://github.com/pcorralrodas/SAE-Stata-Package 37 The revision is also accompanied by an update to the 2018 Stata “sae” package by Nguyen et al. (2018) 33 References Box, G. and Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B, 26:211–252. Corral, P. and Cojocaru, A. (2019). Moldova poverty map: an application of small area estimation. mimeo. Das, S. and Chambers, R. (2017). Robust mean-squared error estimation for poverty estimates based on the method of elbers, lanjouw and lanjouw. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4):1137–1161. Demombynes, G. (2002). A manual for the poverty and inequality mapper module. University of California, Berkeley and Development Research Group, the World Bank. Demombynes, G., Elbers, C., Lanjouw, J., Lanjouw, P., Mistiaen, J., and Ozler, B. (2002). Produc- ing an Improved Geographic Profile of Poverty: Methodology and Evidence from Three Developing Countries. WIDER Working Paper Series 039, World Institute for Development Economic Research (UNU-WIDER). Demombynes, G., Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2008). How good is a map? putting small area estimation to the test. Rivista Internazionale di Scienze Sociali, pages 465–494. Diallo, M. and Rao, J. (2018). Small area estimation of complex parameters under unit-level models with skew-normal errors. Journal of the Royal Statistical Society, Series A, 45:1092–1116. Elbers, C., Fujii, T., Lanjouw, P., Ozler, B., Yin, W., et al. (2007). Poverty alleviation through geographic targeting: How much does disaggregation help? Journal of Development Economics, 83(1):198–213. Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2002). Micro-level estimation of welfare. World Bank Policy Research Working Paper, (2911). Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2003). Micro–level estimation of poverty and inequality. Econometrica, 71(1):355–364. Fay III, R. E. and Herriot, R. A. (1979). Estimates of income for small places: an application of james- stein procedures to census data. Journal of the American Statistical Association, 74(366a):269–277. Foster, J., Greer, J., and Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrica: Journal of the Econometric Society, pages 761–766. Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2004). Bayesian Data Analysis., volume 2nd ed of Texts in Statistical Science. CRC Press [CAM]. González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., and Santamaría, L. (2008). Boot- strap mean squared error of a small-area eblup. Journal of Statistical Computation and Simulation, 78(5):443–462. Graf, M., Marín, J. M., and Molina, I. (2019). A generalized mixed model for skewed distributions applied to small area estimation. TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, 28(2):565–597. Guadarrama, M., Molina, I., and Rao, J. (2018). Small area estimation of general parameters under complex sampling designs. Computational Statistics & Data Analysis, 121:20–40. 34 Haslett, S., Isidro, M., and Jones, G. (2010). Comparison of survey regression techniques in the context of small area estimation of poverty. Survey Methodology, 36(2):157–170. Henderson, C. R. (1953). Estimation of variance and covariance components. Biometrics, 9(2):226–252. Huang, R. and Hidiroglou, M. (2003). Design consistent estimators for a mixed linear model on survey data. Proceedings of the Survey Research Methods Section, American Statistical Association (2003), pages 1897–1904. Marhuenda, Y., Molina, I., Morales, D., and Rao, J. N. K. (2017). Poverty mapping in small areas under a twofold nested error regression model. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4):1111–1136. Molina, I. (2019). Desagregación de datos en encuestas de hogares: metodologías de estimación en áreas pequeñas. Molina, I. and Marhuenda, Y. (2015). sae: An R package for small area estimation. The R Journal, 7(1):81–98. Molina, I., Nandram, B., Rao, J., et al. (2014). Small area estimation of general parameters with application to poverty indicators: A hierarchical bayes approach. The Annals of Applied Statistics, 8(2):852–885. Molina, I. and Rao, J. (2010). Small area estimation of poverty indicators. Canadian Journal of Statistics, 38(3):369–385. Nguyen, M. C., Corral, P., Azevedo, J. P., and Zhao, Q. (2018). sae: A stata package for unit level small area estimation. World Bank Policy Research Working Paper, (8630). Rao, J. N. and Molina, I. (2015). Small area estimation. John Wiley & Sons, 2nd edition. Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American statistical Association, 91(434):473–489. Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys, volume 81. John Wiley & Sons. StataCorp, L. (2019). Stata mi reference manual. Van Buuren, S. (2018). Flexible imputation of missing data. Chapman and Hall/CRC. Van der Weide, R. (2014). Gls estimation and empirical bayes prediction for linear mixed models with heteroskedasticity and sampling weights: a background study for the povmap project. World Bank Policy Research Working Paper, (7028). Zhao, Q. (2006). User manual for povmap. World Bank. http://siteresources. worldbank. org/INTPGI/Resources/342674-1092157888460/Zhao_ ManualPovMap. pdf. 35 Appendix: Additional simulation results Figure 19: True poverty rates, EB and Census EB estimates obtained according to MR ((2010)) with REML fitting, in one population and sample, with poor model fit. 36 Figure 20: Difference in estimated MSE between EB and Census EB obtained according to MR (2010) with REML fitting, in one population and sample, with poor model fit. Figure 21: Census EB estimates with H3 estimation method and with REML fitting according to MR (2010), in one population and sample, with poor model fit. 37 Figure 22: Empirical bias across simulated populations of Census EB with H3 or ELL estimation methods, compared with direct estimators. Figure 23: Empirical MSEs of H3-CBEB estimators of poverty gaps (labeled “True ELL MSE”) and corresponding H3-CBEB variance estimates (labeled “Estimated H3-CBEB variance”), with population size of 100K. 38 Appendix: Results for other indicators (FGT0 & FGT2) 39 Figure 24: Empirical bias across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of FGT0, with improved model. Figure 25: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of FGT0, under improved model. 40 Figure 26: Empirical bias across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of FGT2, with improved model. Figure 27: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method (labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3- CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct estimators (labeled “Direct”) of FGT2, under improved model. 41 Figure 28: Empirical bias across simulated populations of proposed Census EB (labeled “H3-CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of FGT0, under non-normal errors. Figure 29: Empirical MSE across simulated populations of proposed Census EB (labeled “H3- CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of FGT0, under non-normal errors. 42 Figure 30: Empirical bias across simulated populations of proposed Census EB (labeled “H3-CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of FGT2, under non-normal errors. Figure 31: Empirical MSE across simulated populations of proposed Census EB (labeled “H3- CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of FGT2, under non-normal errors. 43 Figure 32: Empirical bias across simulated populations of Census EB estimators using H3 method (la- beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”), with population size of 100K. Figure 33: Empirical MSE across simulated populations of Census EB estimators using H3 method (la- beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”), with population size of 100K. 44 Figure 34: Empirical bias across simulated populations of Census EB estimators using H3 method (la- beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”), with population size of 100K. Figure 35: Empirical MSE across simulated populations of Census EB estimators using H3 method (la- beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”), with population size of 100K. 45 Figure 36: Empirical bias across simulated populations of Census EB estimators using H3 method (labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot- strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample of 10 per area. Figure 37: Empirical MSE across simulated populations of Census EB estimators using H3 method (labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot- strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample of 10 per area. 46 Figure 38: Empirical bias across simulated populations of Census EB estimators using H3 method (labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot- strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample of 10 per area. Figure 39: Empirical MSE across simulated populations of Census EB estimators using H3 method (labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot- strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample of 10 per area. 47 Table 2: Aggregate results across areas (FGT0) 1 2 3 4 5 6 Model R2 <0.01 ~0.42 ~0.42 ~0.42 ~0.42 ~0.46 K 2 6 6 6 6 6 σ ηc N (0, 0.152 ) N (0, 0.152 ) N (0, 0.072 ) N (0, 0.152 ) N (0, 0.152 ) N (0, 0.152 ) σech N (0, 0.52 ) N (0, 0.52 ) N (0, 0.52 ) Student’s t* N (0, 0.52 ) N (0, 0.52 ) Pov. threshold 12 10.2 10.2 10.2 10.2 10.2 nc 50 50 50 50 50 10 Nc 250 250 250 250 1,250 1,250 Direct estimates AAB (x100) 0.112 2.610 2.651 4.050 3.089 9.748 AARB (x100) 25.943 14.181 14.000 13.693 15.626 35.794 ARMSE (x100) 4.524 5.808 5.849 5.282 6.504 13.088 ARRMSE (x100) 34.143 17.781 17.332 15.127 19.546 40.127 ELL AAB (x100) 0.530 1.003 0.382 4.795 Not run Not run AARB (x100) 52.871 21.077 10.785 19.722 Not run Not run ARMSE (x100) 7.474 8.282 4.510 7.642 Not run Not run ARRMSE (x100) 98.481 29.126 13.937 22.376 Not run Not run ELL - with context AAB (x100) 0.613 0.909 0.391 4.499 0.915 4.333 AARB (x100) 53.709 20.862 10.783 19.530 19.686 19.004 ARMSE (x100) 7.447 8.200 4.546 7.512 7.957 6.629 ARRMSE (x100) 99.706 28.816 14.070 22.131 26.790 21.025 H3-CBEB (with PSU) AAB (x100) 7.622 2.969 3.114 Not run 2.868 Not run AARB (x100) 71.075 13.925 11.972 Not run 12.733 Not run ARMSE (x100) 8.391 4.949 4.539 Not run 4.680 Not run ARRMSE (x100) 98.071 18.724 15.191 Not run 16.811 Not run H3-CBEB AAB (x100) 7.614 2.967 3.114 Not run Not run 4.343 AARB (x100) 79.167 16.706 12.488 Not run Not run 17.384 ARMSE (x100) 8.797 5.852 4.709 Not run Not run 5.837 ARRMSE (x100) 119.737 23.238 16.037 Not run Not run 19.983 H3-CensusEB AAB (x100) 0.027 0.029 0.028 2.321 0.027 3.638 AARB (x100) 20.049 8.914 7.874 9.674 8.217 14.722 ARMSE (x100) 3.341 3.655 3.306 3.625 3.477 5.281 ARRMSE (x100) 30.074 11.648 10.105 10.780 10.573 16.566 AAB: Average absolute bias; AARB: Average absolute relative bias; ARMSE: Average root MSE; ARRMSE: Average relative root MSE. *Student’s t distribution with 5 degrees of freedom and scaled by 0.5 48 Table 3: Aggregate results across areas (FGT2) 1 2 3 4 5 6 Model R2 <0.01 ~0.42 ~0.42 ~0.42 ~0.42 ~0.46 K 2 6 6 6 6 6 σ ηc N (0, 0.152 ) N (0, 0.152 ) N (0, 0.072 ) N (0, 0.152 ) N (0, 0.152 ) N (0, 0.152 ) σech N (0, 0.52 ) N (0, 0.52 ) N (0, 0.52 ) Student’s t* N (0, 0.52 ) N (0, 0.52 ) Pov. threshold 12 10.2 10.2 10.2 10.2 10.2 nc 50 50 50 50 50 10 Nc 250 250 250 250 1,250 1,250 Direct estimates AAB (x100) 0.012 0.683 0.679 1.072 0.933 2.106 AARB (x100) 43.706 22.195 21.887 20.838 25.986 54.216 ARMSE (x100) 0.568 1.460 1.437 1.516 1.751 3.013 ARRMSE (x100) 56.676 27.534 26.831 23.059 32.142 59.398 ELL AAB (x100) 0.062 0.252 0.102 1.383 Not run Not run AARB (x100) 3381.026 33.477 16.578 28.985 Not run Not run ARMSE (x100) 0.798 2.014 1.072 2.106 Not run Not run ARRMSE (x100) 328071.1 49.030 21.944 33.038 Not run Not run ELL - with context AAB (x100) 0.066 0.237 0.097 1.344 0.227 1.067 AARB (x100) 2736.935 33.172 16.753 28.990 30.893 30.141 ARMSE (x100) 0.796 1.996 1.430 2.106 1.953 1.607 ARRMSE (x100) 263427.9 48.521 29.322 33.166 43.758 33.977 H3-CBEB (with PSU) AAB (x100) 2.018 2.380 2.417 Not run 2.396 Not run AARB (x100) 1834.462 50.899 48.558 Not run 48.464 Not run ARMSE (x100) 2.105 2.554 2.547 Not run 2.555 Not run ARRMSE (x100) 156534.7 59.238 53.586 Not run 54.890 Not run H3-CBEB AAB (x100) 2.015 2.378 2.416 Not run Not run 2.486 AARB (x100) 3852.208 55.248 49.299 Not run Not run 55.997 ARMSE (x100) 2.062 2.608 2.549 Not run Not run 2.694 ARRMSE (x100) 354728.8 68.541 55.098 Not run Not run 61.308 H3-CensusEB AAB (x100) 0.003 0.009 0.008 0.774 0.008 0.865 AARB (x100) 151.842 13.837 12.206 15.255 12.392 22.871 ARMSE (x100) 0.390 0.908 0.798 1.059 0.866 1.269 ARRMSE (x100) 11633.9 18.454 15.922 16.761 16.042 25.717 AAB: Average absolute bias; AARB: Average absolute relative bias; ARMSE: Average root MSE; ARRMSE: Average relative root MSE. *Student’s t distribution with 5 degrees of freedom and scaled by 0.5 49