Policy Research Working Paper                        9256




              Pull Your Small Area Estimates
                   up by the Bootstraps
                                      Paul Corral
                                     Isabel Molina
                                     Minh Nguyen




Poverty and Equity Global Practice
May 2020
Policy Research Working Paper 9256


  Abstract
 After almost two decades of poverty maps produced by the                           point estimates. The main contributions of this paper are
 World Bank and multiple advances in the literature, this                           then two: 1) to adapt the original Monte Carlo simulation
 paper presents a methodological update to the World Bank’s                         procedure of Molina and Rao (2010) for the approximation
 toolkit for small area estimation. The paper reviews the com-                      of the extended EB estimators that include heteroscedas-
 putational procedures of the current methods used by the                           ticity and survey weights as in Van der Weide (2014); and
 World Bank: the traditional approach by Elbers, Lanjouw                            2) to adapt the parametric bootstrap approach for mean
 and Lanjouw (2003) and the Empirical Best/Bayes (EB)                               squared error (MSE) estimation considered by Molina and
 addition introduced by Van der Weide (2014). The addition                          Rao (2010), and proposed originally by González-Manteiga
 extends the EB procedure of Molina and Rao (2010) by                               et al. (2008), to these extended EB estimators. Simula-
 considering heteroscedasticity and includes survey weights,                        tion experiments illustrate that the revised Monte Carlo
 but uses a different bootstrap approach, here referred to                          simulation method yields estimators that are considerably
 as clustered bootstrap. Simulation experiments comparing                           less biased and more efficient in terms of MSE than those
 these methods to the original EB approach of Molina and                            obtained from the clustered bootstrap approach, and that
 Rao (2010) provide empirical evidence of the shortcomings                          the parametric bootstrap MSE estimators are in line with
 of the clustered bootstrap approach, which yields biased                           the true MSEs under realistic scenarios.




 This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to
 provide open access to its research and make a contribution to development policy discussions around the world. Policy
 Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted
 at pcorralrodas@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
          Pull Your Small Area Estimates up by the Bootstraps
                           Paul Corral,∗Isabel Molina†and Minh Nguyen‡§




Key words: Small area estimation, ELL, Poverty mapping, Poverty map, Empirical best, Parametric
bootstrap
JEL classiﬁcation: C55, C87, C15
  ∗ The  World Bank Group - Human Development Group Chief Economist’s Oﬃce
  † Universidad  Carlos III de Madrid - Department of Statistics
   ‡ The World Bank Group - Poverty and Inequality Global Practice
   § The authors acknowledge ﬁnancial support from the World Bank. We thank Samuel Freije-Rodriguez, Roy van der

Weide, Alexandru Cojocaru, and David Newhouse for comments on an earlier draft. We also thank Kristen Himelein, and
Carlos Rodriguez for comments, suggestions and overall guidance. Additionally we thank Carolina Sanchez for providing
support and space to work on this. Finally, we thank the Global Solutions Group on Welfare Measurement and Statistical
Capacity, as well as all attendants of the Summer University courses on Small Area Estimation. Any error or omission is
the authors’ responsiblility alone.
1      Introduction

It has been almost two decades since the publication of Elbers, Lanjouw and Lanjouw’s (2003 - ELL
henceforth) seminal paper on small area estimation (SAE). The methodology proposed by these authors
has been the de facto methodology used by the World Bank to obtain small area estimates of poverty
and inequality indicators and perhaps constitutes the most applied SAE method across the globe. The
World Bank, in an eﬀort to make the implementation of the method as simple as possible, created a free
software package with a simple point and click interface that could be easily used by any practitioner.
The PovMap software (Zhao, 2006) is of great value to World Bank staﬀ and to many statistical agencies,
and its ease of use has allowed the ELL approach to be adopted worldwide. It has also permitted the
World Bank to provide training and spread this knowledge to many statistical agencies.

After the work from Van der Weide (2014), several important modiﬁcations and additions were made
to the PovMap software, and thus to the toolkit used by the World Bank. The ﬁrst change was to
add a new method for estimating the variances of the model errors, called Henderson’s Method III (H3
- Henderson, 1953), which does not require normality because it is based on the method of moments.
The second improvement involved adjusting the original Generalized Least Squares estimators of the re-
gression parameters considered in Elbers, Lanjouw and Lanjouw (2003) by including the survey weights
following Huang and Hidiroglou (2003). Another important modiﬁcation involved the addition of Empir-
ical Best/Bayes (EB) prediction assuming normality. The modiﬁcations came about due to advances in
the SAE literature (Molina and Rao, 2010 - MR) and criticism of the ELL methodology (Haslett et al.,
2010 and MR, 2010). The ﬁnal important change was to consider a diﬀerent bootstrap approach for
obtaining the EB estimators and their corresponding error measures.1

To make the application of the World Bank’s method friendlier to more advanced practitioners and to
those who seek more ﬂexibility, a World Bank team produced a Stata version of PovMap (Nguyen et al.,
2018) which is available under the name “sae”. The development of the tool has made running multiple
simulations and tests of the software much more straightforward. Since the creation of the “sae” Stata
suite of commands, the World Bank has shifted towards training statistical agencies using this approach,
which can facilitate replicability. Additionally, the code in the Stata package is open, so that curious
practitioners can see in detail how methods have been implemented.

Small area estimation is a broad branch of statistics that focuses on improving estimates’ precision
when household surveys are not large enough to achieve a desired level of precision. Within small
area estimation methods, model-based techniques “borrow strength” from a larger data set or auxiliary
data across areas through models linking the areas (regression-type techniques), which yield indirect
estimators (MR, 2010). Most of the model-based techniques fall into two groups, methods based on
unit-level models and those based on area-level ones. The former are commonly used when data on units
(e.g. households) are available; when only area-level data are available (e.g. area means), the latter are
used, see Fay and Herriot (1979).

In the context of poverty, methods based on unit-level models typically rely on estimating the distribution
of the household’s welfare given a set of auxiliary variables or correlates. The model parameters are then
used to simulate multiple welfare vectors from the ﬁtted distribution for every single household in the
census which commonly lacks a welfare measure for poverty measurement (Elbers et al., 2007). Using
the simulated census vectors, it is possible to obtain poverty rates or any other welfare indicator, for
every area (including the non-sampled ones). Perhaps the two most popular approaches for unit-level
small area estimation of poverty indicators are the traditional ELL (2003) method used by the World
    1 The   bootstrap is not discussed in Van der Weide’s paper.


                                                              1
Bank, and the EB approach by MR (2010).2

This paper focuses on unit-level models for small area estimation; in particular, on the traditional ELL
(2003) approach, the latest additions to the World Bank toolkit by Van der Weide (2014), and the original
EB approach introduced by MR (2010). It sheds light on the nuances of the traditional approach by
ELL (2003) and the EB addition by Van der Weide (2014), and it updates these methods in line with the
original EB approach by MR (2010). The paper goes in depth into the diﬀerences between the estimation
of the noise in the ELL approach (as implemented in PovMap and the sae Stata command), and the
estimation of the MSE according to MR (2010). Then, the paper proposes: 1) an adaptation of the Monte
Carlo simulation procedure by MR (2010) for calculation of the extended EB estimators that incorporate
heteroscedasticity and survey weights as proposed by Van der Weide (2014); and 2) an adaptation
of the parametric bootstrap method considered by MR (2010), which comes from González-Manteiga
et al. (2008), for estimation of the corresponding MSE of the extended EB estimators. According
to our simulation results, the adapted procedures for calculation of extended EB point estimates and
corresponding MSEs represent a substantial improvement over the current approaches.

The paper reviews the current methods in chronological order. It begins describing the traditional ELL
method in Section 2, then moves on to the EB method proposed by MR (2010) in Section 3 and ﬁnishes
with the additions by Van der Weide (2014) in Section 4. It discusses these methods in Section 5. Then,
in Section 6 it describes the proposed computational approaches for calculation of the extended EB
estimators and for estimation of their MSEs. In Section 7, several simulation exercises, imitating those
performed by MR (2010) but extending them to more realistic scenarios, are conducted to compare the
diﬀerent methods. Finally, conclusions are presented in Section 8.



2     Traditional ELL approach

The Elbers, Lanjouw and Lanjouw (2002, 2003) methodology is the de facto small area estimation ap-
proach utilized by the World Bank. The methodology has been widely applied across the globe to produce
poverty maps conducted by the institution.3 The popularity of the methodology can be attributed, to
some degree, to the availability of the PovMap software (Zhao, 2006),4 which was programmed in C and
oﬀers users a simple point and click interface. The PovMap software is also incredibly computationally
eﬃcient and fast, allowing users, even with limited computing power, to work with census data without
facing memory limitations. In 2018, a Stata version of the PovMap software was released (Nguyen et al.,
2018). The Stata command “sae” replicates most of the procedures and methods of the original PovMap
software, and has become popular because it allows for expansion of the methods available to users.5

The ELL method assumes that the welfare ych for each household h within each location c in the
population is linearly related to a 1 × K vector of characteristics (or correlates) xch for that household,
according to the nested error model:

                             ln(ych ) = xch β + ηc + ech , h = 1, . . . , Nc , c = 1, . . . , C,                       (1)

where ηc and ech are respectively location and household-speciﬁc idiosyncratic errors, assumed to be
   2 A factor that likely contributes to these being well known is the availability of software, that can be easily used to

apply these approaches; PovMap (Zhao, 2006) for ELL and R’s sae package (Molina and Marhuenda, 2015)
   3 Poverty mapping is the common name within the World Bank for SAE methodology, where the obtained estimates are

mapped for illustrative purposes.
   4 Downloadable from: http://iresearch.worldbank.org/PovMap/PovMap2/setup.zip
   5 Users should be aware that results from Stata and PovMap diﬀer slightly due to the use of diﬀerent random number

generators



                                                             2
independent from each other, following
                                           iid    2            iid2
                                       ηc ∼ N 0, ση , ech ∼ N 0, σe .

Here, C is the number of locations in which the population is divided and Nc is the number of households
in location c, for c = 1, . . . , C . Finally, β is the K × 1 vector of coeﬃcients.

Under the original ELL methodology, the locations indexed with c are supposed to be the clusters, or
primary sampling units (PSUs), of the sampling design and do not necessarily correspond to the level
at which the estimates need to be produced. In fact, clusters are typically nested within the areas of
interest (e.g. census tracks). Presenting estimates at a higher aggregation level than the clusters (for
which random eﬀects are included in the model) may not be appropriate in cases of considerable between-
area variability, and may underestimate the estimator’s standard errors (Das and Chambers, 2017). A
way to alleviate this is to include covariates that explain suﬃciently well the between-area heterogeneity
in the model (ibid ). In this regard, ELL (2002) suggests the inclusion of area-level covariates as a way
to improve precision, because they can explain the between-area variation in welfare and hence might
reduce the contribution of area-speciﬁc residuals. Marhuenda et al. (2017) recommend putting the
location eﬀects at the same aggregation level where estimation is desired. According to this, in this
paper we consider that the clusters c are equal to the areas where estimation is desired.

A key feature of the implemented ELL approach is that it considers heteroscedasticity and ﬁts a prelimi-
nary model for that, called the alpha model. Very little research has gone into this aspect of the method.
In most applications, the alpha model usually has a small adjusted R2 (which may not reach 0.05) yet
tends to play a considerable role in the obtained point estimates, especially for parameters beyond the
poverty rate such as the Gini index, Theil index, Poverty Gap, etc. Heteroscedasticity is included in the
description of the EB approach given in Rao and Molina (2015), but this is not implemented in the sae
R package (Molina and Marhuenda, 2015). For simplicity, below we spell the steps of the traditional
ELL approach only in the case of homoscedasticity.

The nested error model in (1) was originally ﬁtted by Generalized Least Squares (GLS) accounting for
heteroscedasticity, but Van der Weide (2014) updated this original GLS method to properly account
for survey weights.6 Currently, the implemented method ﬁts the model by Feasible Generalized Least
Squares (FGLS). In this procedure, a simple linear regression obtained by considering as model errors
uch = ηh + ech is ﬁt using ordinary least squares (OLS), and afterwards the appropriate covariance
matrix of regression parameter estimators is estimated.7 The ﬁrst stage of the process being an OLS
is an important aspect, because most of the model testing and validation done for the ELL approach
is in practice undertaken with the OLS ﬁt. Nevertheless, the parameter estimates used for generating
welfare in the census (see the steps below) come from the GLS ﬁt accounting for heteroscedasticity and
including the survey weights.

The variances of the model parameter estimators are required in the considered ELL bootstrap approach.
To estimate these variances, ELL (2002) proposed to use the delta method. The actual implementation
of the delta method is through a computationally intensive simulated numerical gradient. An estimate of
the gradient vector is obtained by making perturbations of the parameters, and this is used to estimate
the variances of the estimators via the delta method. ELL (2002) mentions also the possibility of drawing
from the sampling distribution (parametric) as a way to incorporate the estimation error into the total
prediction error. This later method has become the de facto approach used under the ELL methodology.8
  6 Haslett  et al. (2010) presents the problems in the original GLS implementation of ELL.
  7 For  a detailed look into the ELL approach, interested readers should refer to the original ELL papers (ELL, 2003;
2002) and Section 3 of Nguyen et al. (2018), which presents the current GLS estimator from Van der Weide (2014)
   8 In a comparison of the simulation methods proposed by ELL (2002), Demombynes et al. (2008) shows that the delta




                                                          3
The ﬁrst implementation of the World Bank poverty mapping software was actually done in SAS (De-
mombynes, 2002). Poverty indicators and their corresponding standard errors were obtained by a boot-
strap procedure relying on simulation. This simulation approach is very reminiscent of multiple imputa-
tion methods, where every single relevant parameter necessary for simulating vectors of welfare is drawn
from its corresponding estimated asymptotic distribution (or an approximation to it). This approach is
the one currently used in the World Bank’s tools for implementing the ELL (2002) approach: Stata’s sae
package and PovMap. Speciﬁcally, the steps of the implemented bootstrap procedure designed to obtain
the traditional ELL point estimates and their estimated noise are:

   1. Fit the nested error model (1) to the survey data. This yields the vector of initial parameter
      estimates
                                                          ˆ0 , σ
                                                    ˆ0 = (β
                                                    θ            2      2
                                                               ˆη  0, σ
                                                                      ˆe 0 ),

      where the 0 subscript is used hereafter to indicate that the estimates come from the original
      household survey. In the implemented version, the model is ﬁt via FGLS as speciﬁed in Nguyen
      et al. (2018).

   2. Draw new model parameters as follows. First, regression coeﬃcients are drawn from

                                                                   ˆ0 ) ,
                                                         ˆ0 , vcov(β
                                              β∗ ∼ M V N β

      The variance of the household-level errors is drawn, according to Gelman et al. (2004, pp. 364-365),
      from
                                                    2∗    2      (n − K )
                                                   σe  ∼σ
                                                        ˆe 0         ∗    ,
                                                                  χ2n−K
               ∗
      where χ2
             n−K denotes a random number from a chi-squared distribution with n − K degrees of
      freedom. Here, n is the number of observations in the survey data used to ﬁt the model and K
                                                                                         2∗
      is the number of correlates used in the model. The variance of the cluster eﬀects ση  is drawn,
      according to Demombynes (2008) and Demombynes et al. (2002), from

                                              2∗           2           2
                                             ση  ∼ Gamma σ
                                                         ˆη          ση
                                                             0 , var(ˆ   0) .


   3. Using the simulated model parameters in step 2, calculate the welfare for every household in the
              ∗
      census ych from the model as
                                                   ∗
                                               ln(ych                ∗
                                                      ) = xch β ∗ + ηc + e∗
                                                                          ch ,

      where the household-speciﬁc errors are generated as
                                                         iid
                                                    e∗          2∗
                                                     ch ∼ N 0, σe  ,

      and the location eﬀects are generated as

                                                      ∗ iid     2∗
                                                     ηc ∼ N 0, ση  .

      Then construct the vector containing the Nc simulated welfares for the households in location c,
               ∗     ∗             ∗    T
      denoted yc = (yc1 , . . . , ycNc ) , where Nc is the number of census households in location c.

                        ∗
   4. With the vectors yc , c = 1, . . . , C , of simulated census welfares, indicators can be produced for all
                          ∗       ∗
      the locations. Let τc = f (yc ) be the indicator of interest calculated based on the simulated vector
method from ELL and the parametric drawing of the parameters provide similar results. In tests with pseudo surveys, the
delta method seems to provide wider standard errors than the parametric approach (Demombynes et al. 2008), suggesting
that perhaps the parametric estimates are too optimistic when compared to the delta method



                                                           4
      for location c; for example, for the Foster, Greer and Thorbecke (1984 - FGT) class of decomposable
                                         ∗
      poverty measures, the function f (yc ) is deﬁned as
                                               Nc                                α
                                     ∗               pch      ∗           y∗
                                fα (yc )   =              I (ych < z ) 1 − ch        , α ≥ 0,
                                                       pc                  z
                                               h=1

      where z is the poverty line, pch is the size of household h from location c in the census and
          ∗                ∗
      I (ych < z ) = 1 if ych < z and is equal to 0 otherwise.

   5. Repeat steps 2 to 4 M times. Although traditionally M = 100, a larger number of replicates M
                                ∗(m)
      is recommended. Let τc           be the indicator of interest obtained in mth replicate of the bootstrap.
      The ELL estimator is then given by the average across the M bootstrap replicates,
                                                                   M
                                                       ELL     1          ∗(m)
                                                     τ
                                                     ˆc      =           τc
                                                               M   m=1

      and the ELL estimated variance of the ELL estimator is given by
                                                                   M                    2
                                                           1
                                   varELL (ˆELL
                                           τc   )      =           τ ∗(m) − τ
                                                                            ˆcELL
                                                                                            .
                                                         M − 1 m=1 c

As opposed to the EB method of MR (2010), which uses a Monte Carlo simulation procedure and a sep-
arate bootstrap procedure for MSE (see Section 3), in the above ELL procedure, a single computational
algorithm tries to capture the noise of the initial model parameter estimates in the ELL standard error
by varying the model parameters across simulations, which is aligned to Rubin’s rules (Rubin, 2004).
                                                                                                    2∗
This means that, in each replicate of the bootstrap, diﬀerent values of the model parameters β ∗ , ση  and
 2∗
σe  are used to generate the welfare data. However, this step might entail an increase of noise in the
ﬁnal ELL estimators.

Note that, if the model parameters were kept ﬁxed to the initial survey estimates from step 1 (skipping
                                      ELL
step 2), the above ELL estimate τ    ˆc    would be a basic Monte Carlo approximation to the marginal
                               ˆ0 ), with respect to the distribution of the welfare (given the correlates)
mean of the indicator, E (τc ; θ
induced by the nested error model (1), with θ replaced by the initial estimate θ    ˆ0 . Note that E (τc ; θ),
                               2    2
evaluated at the true θ = (β, ση , σe ), is unbiased by deﬁnition since it equals the expectation of the
indicator under the true model.9 According to this, the prediction error of the ELL estimator can be
decomposed as described in ELL (2003),

                   τ
                   ˆcELL
                         − τc = τ
                                ˆcELL
                                      − E (τc ; θ              ˆ0 ) − E (τc ; θ) + E (τc ; θ) − τc ,
                                                ˆ0 ) + E (τc ; θ
                                   computation error             model error      idiosyncratic error

where these sources of error are described as follows:

   1. The idiosyncratic error is related to how the actual value of the expenditure for a given location,
      c, deviates from its expected value due to unobserved aspects in the expenditure for the location
      (ELL, 2002). For locations with smaller population sizes, the underlying distribution is not likely
      approximated when errors are drawn. This, however, is related to the explanatory power of the
      independent variables in the model, the smaller the unexplained portion of the model, the smaller
      the idiosyncratic error. With poorly ﬁtting models, estimates for locations with smaller populations
      are likely to suﬀer from more variability across simulations. Therefore, in order to minimize this
      error, one of the goals of the modeling stage is to obtain the highest possible R2 .
   9 Note that under a model-based approach, the true value of the indicator τ is random. Under this setup, an estima-
                                                                                  c
tor/predictor τ                                       τc − τc ) = 0 or, accordingly, when E (ˆ
              ˆc of τc is said to be unbiased when E (ˆ                                      τc ) = E ( τc ) .


                                                             5
    2. The model error is related to the properties of the model parameters and is unrelated to the size of
      the target population (ELL, 2002). The magnitude of the error is dependent on the precision of the
      β coeﬃcients of the welfare model and the sensitivity of the indicator to deviations in welfare (ibid).
      Consequently, in order to minimize this source of error in the ﬁnal estimates, it is recommended to
      remove all non-signiﬁcant independent variables in the modeling stage.

    3. The ﬁnal source of error is the computation error, which is not related to the other two sources
      of error. This is related to the simulation and can be made as small as possible, as computational
      resources allow, by running a larger number of simulations.

Consider the case that model parameter estimates were kept ﬁxed across simulations and equal to the
                                            ˆ0 , σ
                                      ˆ0 = (β
initial survey estimates from step 1, θ               ˆ 2 ). In that case, consistency of these estimators as the
                                                 ˆ2 , σ  η0   e0
total sample size n tends to inﬁnity,10 and of Monte Carlo averages to the actual expectations under the
                                                                                    ELL
model as the number of bootstrap replicates M tends to inﬁnity, would ensure that τ
                                                                                  ˆc    approximates
correctly the “theoretical” ELL estimator, E (τc ; θ), which is unbiased by deﬁnition. However, there is an
additional source of error due to generation of welfare in each simulation from a diﬀerent set of parameter
values in step 3, instead of using the initial estimates from step 1.
                                                                           −1                 Nd
Moreover, if our target indicator is the mean welfare in location c, τc = Nc                  h=1   ych , and the welfare
                                                                                        −1                Nd
                                                  xc β + ηc + e
is not transformed with log, then E (τc ; θ) = E (¯                  ¯c β , where x
                                                              ¯c ) = x            ¯c = Nc                 h=1   xch is the
                                               −1      Nd
area mean of the correlates and e
                                ¯c =          Nc       h=1 ech     is the area mean of the household-level errors. This
means that the area eﬀect ηc vanishes in the ﬁnal ELL estimator, which becomes basically the regression
          ¯c β . This estimator is called “synthetic” in the SAE literature because it does not account
estimator x
for between-area heterogeneity (or idiosyncrasy of the areas) beyond that explained by the correlates.
                                                                         ¯c β would not be using at all
Note that, if the regression parameter β was actually known (best case), x
the actual survey data, only the census auxiliary information. Synthetic estimators rely very strongly
on the assumed regression model and might be misleading if the model is incorrectly speciﬁed. In fact,
standard errors are obtained assuming that the model actually holds and do not account for model
misspeciﬁcation. Hence, under model failure, they might still be small, giving a lot of credibility to the
(misleading) estimates. As will be shown, even if the model is correctly speciﬁed, the MSE of the ELL
estimators can be substantial, especially if the prediction power of the model is weak.



3     Molina and Rao’s EB and Census EB estimators

Molina and Rao (2010) consider the nested error model (1) as in the ELL procedure, but the location
eﬀects are originally deﬁned for the areas of interest. A more recent paper by Guadarrama, Molina and
Rao (2018) further extends the EB method of MR (2010) to complex sampling designs, but so far it
has not been implemented in any software package as a library or command, unlike the traditional ELL
(2002) with the subsequent updates by Van der Weide (2014). For simplicity, this section describes the
original EB approach proposed by MR (2010).

The main diﬀerence with the ELL approach is that the EB method of MR (2010) conditions on the
survey sample data and thus makes a more eﬃcient use of the survey data, which contains the only
available (and hence precious) information on actual welfare. Nevertheless, conditioning on the survey
data requires areas and households across survey and census to be matched. Thus, if Pc denotes the
area’s population of size Nc , then the survey sample, sc , is a sample of size nc ≤ Nc drawn from Pc .
The complement to the sampled population is referred to as rc , which is just Pc − sc . Thus, the census
 10 Note   that the total survey sample size n is typically large.


                                                               6
                                                  T      T T
welfare vector for any area c is deﬁned as yc = (yc,s , yc,r ) , which contains the welfare for sampled
and non-sampled observations in area c, denoted yc,s and yc,r respectively. Typically, the non-sampled
population (of size Nc − nc ) is much larger than the sampled population (of size nc ). It may also happen
that nc = 0 for some areas, meaning that the area is not sampled in the survey. The empirical best
(EB) predictor of τc for an area c that is sampled (nc > 0) is deﬁned as the conditional expectation
              ˆ). For an out-of-sample area (nc = 0), the empirical best (EB) predictor of τc is E (τc ; θ
E (τc |yc,s ; θ                                                                                            ˆ),
which is similar to the ELL estimator. Here, θ  ˆ is a consistent estimator of θ as the overall sample size n
tends to inﬁnity. In contrast with this procedure, the traditional ELL approach does not require to link
the census and survey observations.
                                   ˆ) has no explicit form, the empirical best (EB) predictor of τc can
When the expectation E (τc |yc,s ; θ
be approximated using a Monte Carlo (MC) simulation method. The MC approximation to the EB
predictor τ
          ˆcEB                 ˆ), as described in Rao and Molina (2015), is obtained as follows:
               = E (τc |yc,s ; θ

  1. Using the survey data, ﬁt model (1) via any method providing consistent estimators. This yields
      the vector of parameter estimates:
                                                      ˆ0 , σ
                                                 ˆ0 = β
                                                 θ           2      2
                                                           ˆη  0, σ
                                                                  ˆe 0 .

      Usual ﬁtting methods under this approach are maximum likelihood (ML) or restricted maximum
      likelihood (REML), both based on the normal likelihood, and H3 method, which does not require
      to specify a distribution. In the implemented sae R package, REML is used.

  2. Use the parameter estimates obtained in step 1 as true values to simulate a vector of welfare in the
      census. First, the welfare from each out-of-sample household is generated as follows,

                                              ∗
                                          ln(ych,r           ˆ0 + η ∗ + e∗ ,
                                                   ) = xch,r β     c     ch,r


      where here we add the subscript r to indicate that the household is not sampled. Here, for an area
                                                        ∗
      c that is included in the sample, the area eﬀect ηc is generated as

                                              ∗            2
                                             ηc ∼N η
                                                   ˆc0 , σ
                                                         ˆη  0 (1 − γ
                                                                    ˆc ) ,

            ˆc0 and γ
      where η       ˆc are respectively given by
                                                                               2
                                                                             σ
                                                                             ˆη  0
                                  η
                                  ˆc0 = γ  ¯c,s − x
                                        ˆc y           ˆ0 ,
                                                  ¯c,s β         γ
                                                                 ˆc =
                                                                        σ
                                                                        ˆη2 +σ ˆe 2 /n .
                                                                           0       0  c

            ¯c,s and x
      Here, y        ¯c,s are respectively the sample means of the welfare variable and the correlates
                                                                           ∗          2
      in area c. If area c is not sampled, the area eﬀect is generated as ηc ∼ N 0, σ
                                                                                    ˆη  0 . Finally, the
      household error e∗
                       ch,r is generated as:


                                                 e∗
                                                  ch,r ∼ N 0, σ
                                                              ˆe2
                                                                 0 .


                                                                                    ∗
      For an area c that is sampled in the survey, the generated non-sample vector yc,r is then augmented
                                                                                              ∗     T      ∗T T
      by the survey data yc,s . Therefore, the ﬁnal vector for the whole census in area c is yc = (yc,s , yc,r ) ,
      which is made up of the generated out-of-sample welfares and the survey ones.
                                                          ∗
  3. The previous step yields a census vector of welfare yc , simulated using the ﬁtted model parameters.
      This census vector is then used to calculate the indicator for each area c = 1, . . . , C , as

                                                      ∗       ∗
                                                     τc = f (yc ),

      where f (·) can be any indicator function such as the FGT poverty indicators.

                                                        7
                                                                                   ∗(m)
   4. Repeat steps 1 to 3 a large number of times M . If yc                               denotes the mth replicate of the census
                ∗         T      ∗T T              ∗(m)           ∗(m)
      vector   yc   =   (yc,s , yc,r ) ,   then   τc      =   f (yc )      is the corresponding indicator, m = 1, . . . , M . The
      ﬁnal MC approximation to the EB estimator of τc is just the average across the M simulations of
      these indicators
                                                                             M
                                                                EB     1            ∗(m)
                                                              τ
                                                              ˆc   =               τc    .
                                                                       M    m=1


Molina (2019) comments that the eﬀect of adding the survey data is negligible when the sample is small
relative to the census population of the area. Thus, a slight variation of the EB estimator, called Census
EB (see e.g. Molina 2019), avoids appending the survey data in step 2 by generating all the census
welfares (including the sample ones), similarly as the non-sampled ones are generated. Thus, the welfare
for each household h in the census is generated from the model

                                                        ∗
                                                    ln(ych         ˆ0 + η ∗ + e∗ ,
                                                           ) = xch β     c     ch

      ∗
with ηc and e∗
             ch obtained exactly the same as in step 2. The result is the full census vector for area c given
    ∗     ∗             ∗     T                                                 ∗       ∗
by yc = (yc1 , . . . , yc,Nc ) , which is used to compute the target indicator τc = f (yc ). Unfortunately, the
Census EB estimator is not implemented in the sae R package. One possible approach for applying this
software when survey units cannot be identiﬁed in the census would be to append the survey data to the
available census data, which would lead to a diﬀerent approximation of the EB estimator. A drawback
of this approach is that, in this case, the total size of the area is then Nc + nc ≥ Nc .
                   EB
The EB estimator τ
                 ˆc   approximates the so called “best” predictor of τc , which is the optimal predictor
in the sense of minimizing the MSE under the model. This MSE of an estimator τ
                                                                             ˆc is deﬁned as
                                                                                          τc − τc )2 .
                                                                                τc ) = E (ˆ
the expectation of the squared prediction error under the considered model, MSE(ˆ
                                                             B
The best predictor is given by the conditional expectation τ
                                                           ˆc  = E (τc |yc,s ; θ), where the expectation
is with respect to the conditional distribution of the non-sampled data yc,r given the sample data yc,s ,
determined by the nested error model. Under normality of the random terms in the model, the conditional
distribution is also normal. In step 2 of the above procedure, welfare values for non-sampled households
are generated from that conditional distribution, but with estimated model parameters. Hence, the above
MC simulation procedure actually approximates the EB estimator τ                        ˆ), which is the best
                                                                    ˆEB = E (τc |yc,s ; θ    c
predictor with model parameters replaced by the corresponding estimates based on the original sample
survey data θ      ˆ0 , σ
             ˆ0 = (β         ˆ 2 ). Note that, unlike the ELL bootstrap procedure, here these estimates
                        ˆ2 , σ  η0    e0
are kept constant across simulations because there is a unique DGP. Hence, this simulation procedure
                                                                            ˆ) of τc , where the expectation
                                                        ˆEB = E (τc |yc,s ; θ
delivers a simple MC approximation for the EB predictor τ                      c
is replaced by the corresponding MC average.
                                                    −1                              Nd
If the target parameter is just the area mean τc = Nc                               h=1   ych , and there is no log-transformation
of welfare, then the best predictor reduces to E (τc |yc,s ; θ) = x          yc,s − x
                                                                  ¯c β + γc (¯      ¯c,s β ), which corrects the
                               ¯c β by accounting for the area eﬀect using the survey data. Moreover, the
regression synthetic estimator x
                                                                                               2
size of the correction depends on γc , which measures the share of between-area heterogeneity ση from the
                             2    2
total variance in the area, ση + σe /nc . Thus, the correction is stronger in the case of highly heterogeneous
areas and it is weak otherwise, becoming null only when all the existing area heterogeneity is completely
explained by the available auxiliary variables. In that case, the EB estimator reduces to the “theoretical”
                           ¯c β . Hence, the EB procedure makes more eﬃcient use of the survey data for
ELL estimator E (τc ; θ) = x
areas that are sampled and, for those areas, it does not rely so strongly on the assumed linear regression
(which could be misspeciﬁed).

Concerning error measures of the estimators, the ELL and EB approaches consider diﬀerent error mea-
sures. In the EB approach of MR (2010), the noise is measured by the actual MSE of the EB estimator

                                                                       8
                                   EB         EB
                                  τc
under the nested error model, MSE(ˆ          τc
                                      ) = E (ˆ   − τc )2 . Under the model-based framework, the
target indicators τc are regarded as random quantities (they are generated from the model) and hence
incorporate uncertainty as well. Even if an estimator (or predictor) τ
                                                                     ˆc is unbiased with respect to the
                                                                           τc − τc ) = 0, it would still leave
model in the sense that the expectation of its prediction error is zero E (ˆ
                                         τc − τc ), which is not exactly equal to the variance of the predictor
the variance of the prediction error var(ˆ
    τc ) because of the randomness of the target indicator τc .
var(ˆ
The MSE of the EB estimator is estimated using a parametric bootstrap procedure for ﬁnite populations
introduced by González-Manteiga et al. (2008), which is computationally more intensive than the ELL
bootstrap approach. This is because, in every bootstrap replicate, all the steps for obtaining the EB
estimator are reproduced, including model ﬁtting and the MC procedure for approximation of the con-
ditional expectation deﬁning the EB estimator. This means that the number of simulated census vectors
is the product of the MC simulations and the number of bootstrap replicates, except when the target
indicator has an explicit EB estimator (such as the poverty rate). In this latter case, the MC simulation
procedure can be replaced by an explicit formula calculated without simulation, and then the number of
simulated censuses will be only the number of bootstrap replicates.
The steps for the parametric bootstrap estimation of the MSE are as follows (MR, 2015):

  1. Using the survey data, ﬁt model (1) using a method providing consistent estimators. This yields
      the set of parameter estimates from the observed sample:
                                                          ˆ0 , σ
                                                     ˆ0 = β
                                                     θ           2      2
                                                               ˆη  0, σ
                                                                      ˆe 0 .

      In the implemented sae R package, the REML method is used.
                          ˆ0 to simulate census welfares11 from the ﬁtted model as follows:
  2. Use the estimates in θ
                                                     ∗
                                                 ln(ych         ˆ0 + η ∗ + e∗ ,
                                                        ) = xch β     c     ch

                             ∗
      where the area eﬀects ηc are generated as
                                                         ∗          2
                                                        ηc ∼ N 0, σ
                                                                  ˆη  0

      and the household-speciﬁc errors are generated as

                                                       e∗
                                                        ch ∼ N 0, σ
                                                                  ˆe2
                                                                     0 .

                                       ∗
  3. From the simulated census vector yc , we calculate the true value of the indicator of interest for
      each area c:
                                                          ∗       ∗
                                                         τc = f (yc ),
                ∗
      where f (yc ) can be any indicator function such as the FGT indicators.

  4. Since the survey is regarded as a subset of the census, the survey sample sc is now extracted and,
                                               ∗
      using the corresponding sample welfares yc,s (newly generated in step 2), one ﬁts the model (1).
      This yields bootstrap estimates of the model parameters,
                                                     ˆ∗ = β
                                                     θ    ˆ∗ , σ
                                                               ˆη2∗
                                                                    ,σ
                                                                     ˆe2∗
                                                                          .

                                                   ∗                                                   EB ∗
  5. With the bootstrap vectors of sample welfare yc,s , c = 1, . . . , C , obtain the EB estimators τ
                                                                                                     ˆc     ,
      using MC simulation if needed. The estimation procedure is exactly the same EB procedure based
                                                                   ∗
      on the original sample, but using the bootstrap sample data yc,s , c = 1, . . . , C and the corresponding
                                              ˆ∗
      bootstrap model parameter estimates θ from step 4.
 11 Note   that here welfares are generated for all households in the census, sampled and non-sampled.


                                                             9
                                                                                                                                       ∗(b)
    6. Repeat steps 2 to 5 a suﬃciently large number of times B . In each bootstrap replicate b, τc                                           is the
                                                                                                                 EB ∗(b)
       true value of the indicator obtained from the bth simulated census and                                  τ
                                                                                                               ˆc          is the corresponding
       EB estimator obtained from the extracted sample. The parametric bootstrap approximation of the
       MSE is then given by:
                                                                    B                                  2
                                                    EB         1              EB ∗(b)    ∗(b)
                                        mseB      τ
                                                  ˆc         =              τ
                                                                            ˆc        − τc                 .
                                                               B
                                                                    b=1



4     Van der Weide’s update to PovMap software

The addition of EB prediction developed by Van der Weide (2014) represented a landmark update to the
PovMap project of the World Bank and its poverty mapping agenda. This section will highlight that this
extension is not exactly the same as the EB estimator by MR (2010). In fact, it resembles the traditional
ELL approach in the manner which point estimates and standard errors are computed according to the
implementation in PovMap as well as in Stata’s sae package.

EB prediction is implemented by generating censuses using the predicted location eﬀects instead of
generating them from their theoretical distribution. Let ec,s be the vector with the marginal residuals
ech = ln(ych ) − xch β for the households in the survey. The predictor of the location eﬀect ηc proposed by
Van der Weide (2014) is an extension of the best predictor E [ηc |ec,s ] for the heteroscedastic nested error
model with normality, incorporating also the survey weights to account for complex sampling designs.
This predictor is given by
                                                                      −1
                                                             wch                  wch
                                        ˆc = γ
                                        η    ˆc                2                    2 eˆch                                                      (2)
                                                             σ
                                                             ˆech                 σ
                                                                                  ˆech
                                                         h                   h
                                                                2
where wch is the survey weight for household h in location c, σ
                                                              ˆech is the estimated household-speciﬁc
error variance using the alpha model as in ELL (2002),
                                                                      2
                                                                    σ
                                                                    ˆη
                                   γ
                                   ˆc =                                                          −1
                                              2+        2                              wch
                                            σ
                                            ˆη       h wch            h wch        h     2
                                                                                       σ
                                                                                       ˆech
      2                     2
and σ
    ˆη  is an estimator of ση . For this, Van der Weide (2014) considers the extended version of Hen-
derson’s method III (Henderson, 1953) obtained accounting for survey weights as proposed in Huang
                                                    ˆ is the estimated marginal residual from the GLS
                               ˆch = ln(ych ) − xch β
and Hidroglou (2003). Finally, e
          ˆ that accounts for heteroscedasticity and survey weights, given by
estimator β
                                                                      −1
 ˆ=                  wch      T              wch                                           wch                                 wch
 β                     2 xch xch − γc          2 x     ¯T
                                                  ¯c,w x c,w                                 2 xch ych − γc                      2 x¯c,w y
                                                                                                                                         ¯c,w    ,
          c
                     σ
                     ˆech                    σ
                                             ˆech                            c
                                                                                           σ
                                                                                           ˆech                                σ
                                                                                                                               ˆech
                h                       h                                              h                                   h

                                               ¯c,w and x
where all the sums are in the survey data, and y        ¯c,w are the weighted sample means,
                                        −1                                                            −1
                                 wch               wch                                     wch                     wch
                    y
                    ¯c,w =         2                 2 ych ,        x
                                                                    ¯c,w =                   2                       2 xch .
                                 σ
                                 ˆech              σ
                                                   ˆech                                    σ
                                                                                           ˆech                    σ
                                                                                                                   ˆech
                             h                h                                        h                       h


Van der Weide (2014) also proposed the following estimator of the variance of η
                                                                              ˆc :
                                                                                           2
                                               2    2           2                wch             2
                                      ηc ] = σ
                                 var [ˆ      ˆη  −γ
                                                  ˆc          σ
                                                              ˆη  +                2           σ
                                                                                               ˆech    .                                        (3)
                                                                                 σ
                                                                                 ˆech
                                                                        h


Under homoscedasticity, the predicted location eﬀects η
                                                      ˆc are equal to those of the Pseudo EB in Guadar-
rama et al. (2018). They reduce to the estimated area eﬀects η
                                                             ˆc obtained from MR (2010) if additionally
no survey weights are considered (wch = 1 for all h and c) as noted in Van der Weide (2014).

                                                               10
As mentioned before, for sampled clusters, EB prediction makes use of the estimated cluster eﬀects from
the survey data to improve the point estimates of the poverty indicators and their errors. A crucial
assumption under EB prediction is that model errors ech and ηc are normally distributed. This yields
normality for ηc given ec,s , which leads to the proposed predicted cluster eﬀect η
                                                                                  ˆc .

This EB addition implemented in the PovMap software uses an extended version of Henderson’s method
                                             2       2                             2      2
III (Henderson, 1953) to obtain estimators σ
                                           ˆη  and σ
                                                   ˆe  of the variance components ση and σe that in-
                                                                       2∗
corporate the sampling weights.12 In step 2 of section 2 for the ELL, ση  is drawn from a Gamma
                                                                ηc ] is unkonwn,13 it is necessary to rely
distribution, however to avoid this assumption and because var [ˆ
on bootstrap re-sampling to obtain the noise of the point estimates in similar fashion to ELL.14

Even if the estimator is based on the EB method of MR (2010), the bootstrap procedure used to generate
the census data is completely diﬀerent. This new bootstrap algorithm tries to avoid the distributional
assumptions by generating in the ﬁrst step samples of clusters from the survey data set. The full
procedure for obtaining the estimator of τc , called here clustered bootstrap EB (CB-EB), is as follows:

   1. Take a bootstrap sample of clusters (PSUs) with replacement from the survey data.15

   2. Fit model (1) to the bootstrapped survey data from step 1. This yields the set of parameter
      estimates from the bootstrap sampled data (βˆ∗ , σ
                                                       ˆη2∗
                                                            ,σ
                                                             ˆe2∗
                                                                  ), along with predicted eﬀects for the
                                              ∗                                                     ∗
      clusters that were sampled in step 1, η
                                            ˆc                                                     ηc
                                                , and their corresponding estimated variance var∗ [ˆ  ]. In
      the implemented version, the model is ﬁt by FGLS as speciﬁed in Nguyen et al. (2018).
                                                                          ∗
   3. Use the model parameter estimates from step 2 to simulate welfares ych in the census as16

                                                    ∗
                                                ln(ych         ˆ∗ + η ∗ + e∗ ,
                                                       ) = xch β     c     ch


      where, for a cluster c that is sampled in step 1, the cluster eﬀect is generated as

                                                    ∗       ∗          ∗
                                                   ηc ∼ N (ˆ
                                                           ηc         ηc
                                                              , var∗ [ˆ  ]) ,

              ∗            ∗
      where η
            ˆc            ηc
                and var∗ [ˆ  ] are the bootstrap versions of η
                                                             ˆc and its corresponding variance, given in (2)
      and (3) respectively. For a cluster that is not sampled in step 1, generate

                                                       ∗          2∗
                                                      ηc ∼ N 0, σ
                                                                ˆη   .

      Under homoscedasticity, the household speciﬁc errors are generated as

                                                      e∗
                                                       ch ∼ N 0, σ
                                                                 ˆe2∗
                                                                      ,

      and, in the case of heteroscedasticity, as

                                                     e∗
                                                      ch ∼ N 0, σ
                                                                  2∗
                                                                ˆech .

                            ∗
      Construct the vector yc = (yc1 , . . . , ycNc )T of simulated welfare for every household within location
      c of size Nc , for all the locations c = 1, . . . , C in the census.
  12 Interested readers should refer to Van der Weide (2014) and/or Nguyen et al. (2018) for an in-depth look at how these

are obtained under ELL and Henderson’s method III.
  13 In the traditional ELL method of Section 2, σ ˆη2 is assumed to follow a Gamma distribution, which does not hold in

this case.
  14 Actually, Van der Weide (2014) does not oﬀer a method for the estimation of the standard errors, but the approach

described below was the one implemented on PovMap and consequently also in the Stata sae package from Nguyen et al.
(2018).
  15 Note that not all clusters are expected to appear
  16 Note that, in each bootstrap replicate, the census is generated from a diﬀerent model.




                                                           11
                                          ∗                                           ∗       ∗
    4. With the simulated welfare vector yc , calculate the indicator of interest as τc = f (yc ) such as the
       FGT indicator of order α ≥ 0, which is calculated as follows:
                                                  Nc                               α
                                           ∗            pch      ∗           y∗
                                      fα (yc )=              I (ych < z ) 1 − ch       .
                                                          pc                  z
                                                  h=1


    5. Repeat steps 1 to 3 M times. Even if traditionally M = 100, a larger number of replicates is
                               ∗(m)
       recommended. Let τc            be the indicator obtained in mth replicate, m = 1, . . . , M . The CB-EB
                     CB −EB
       estimator   τ
                   ˆc         is then obtained as the average across the M bootstrap replicates,
                                                                   M
                                                    CB −EB     1          ∗(m)
                                                  τ
                                                  ˆc         =           τc
                                                               M   m=1

       and the variance of the estimator is approximated as
                                                                   M                       2
                                          CB −EB            1
                                         τc
                              varCB −EB (ˆ       )=                 τ ∗(m) − τ
                                                                             ˆcCB −EB
                                                                                               .
                                                          M − 1 m=1 c

Unlike the traditional ELL approach, this procedure requires to match the location codes in the survey
and the census. However, in the application of the traditional ELL approach, the authors recommended
the inclusion of contextual variables at the location level and thus the linking between the two data
sources was still necessary.

As in the previous computational procedures, the ﬁnal estimator of the poverty indicator for a location
is obtained as an average across bootstrap replicates. By the Monte Carlo principle, this average is
approximating an expectation. However, here it is not clear with respect to which DGP is the expectation
taken, since the bootstrap procedure involves generation of several measures: In step 1, the sample of
clusters varies, and this would deﬁne an expectation with respect to the sampling design, but considering
only the ﬁrst stage of the design. In step 2, model parameters are newly generated in each simulation.
Finally, in step 3, location eﬀects are generated from their conditional distribution given the sample
residuals (determined by the nested error model under normality), and household-speciﬁc errors are
generated from their distribution under the assumed nested error model with normality.

Even if the bootstrap procedure is not the same, standard errors are obtained similarly as in the tra-
ditional ELL approach and are thus diﬀerent from MSEs. Note that the traditional ELL approach was
initially based on the multiple imputation literature. According to Rubin (1996, and noted in StataCorp,
2019), the objective of multiple imputation is not to re-create the missing data as closely as possible,
but to handle it in a manner that allows for statistical inference.



5     Comparison of CB-EB, traditional ELL and EB methods

Because EB prediction uses the survey data to estimate the location eﬀect, its beneﬁts are evident only in
locations present both in the census and the survey. In the traditional ELL approach, the location eﬀect
was originally at the cluster level and, in most surveys from developing countries where the World Bank
focuses its eﬀorts, the sampled clusters represent a small percentage of all the clusters in the country.
This may also be the case for locations above clusters. For example, in a recently completed exercise in
Moldova, the survey only contained 129 comunas out of a total of 901. Consequently, only a small share
of these comunas beneﬁt from EB prediction (Corral and Cojocaru, 2019). Nevertheless, diﬀerences also
arise due to the computational procedures used to obtain the point estimates and their corresponding
measures of noise.

                                                          12
MR (2010) presented evidence that their EB approach is superior to ELL in terms of a smaller MSE. This
ﬁnding has met some criticism, mostly because the simulation experiment in MR (2010) was conducted
under scenarios where ELL is seldom applied. First, the population size in MR (2010) was 20,000
uniformly spread across 80 areas. Second, all the areas were sampled and the sample within each area
represented 20 percent of the population, something hardly seen in real-world applications. ELL (2002)
illustrates how the noise of the estimator falls as the size of the population increases, and advises against
estimating below populations of 100 households. In practice, the level of disaggregation of ELL estimates
depends on the model quality, and even with an R2 not smaller than 0.6, it is seldom advised to go below
1,000 households (Elbers et al., 2007).

Another point of departure is that the model considered in the simulation experiments of MR (2010)
was really poor, yielding an adjusted R2 of less than 0.01. Under the usual applications of the ELL
approach, the number of auxiliary variables is much greater, reaching most of the times an explanatory
power, as measured by adjusted R2 , of over 0.5. Moreover, to improve precision, ELL (2002; 2003) also
advocated for the use of contextual area-level variables17 on the right-hand side, since these explain
(and thus minimize) the variation across locations. These were not used in the simulation experiment
conducted in MR (2010). In such a setup, the much noisier ELL estimates obtained in their simulation
experiments are not surprising.

A ﬁnal important diﬀerence is the fact that the error measures of estimators and the bootstrap procedures
for obtaining them are considerably diﬀerent in the three procedures. Currently, the error measures of
the traditional ELL and of the more recent CB-EB procedure of Van der Weide (2014) are obtained
under a very similar framework to that of multiple imputation. As an error measure, these consider the
standard deviation of all the simulated point estimates of the indicators in the M bootstrap replicates,
and so every simulated indicator is compared to the average of the simulated indicators. Moreover, in
the bootstrap approaches used for that average of the simulated indicators, the initial model parameter
          ˆ0 , σ
estimates β      2         2
               ˆη  0 and σ
                         ˆe 0 obtained using the original survey sample are not those used to generate the
                                                                                              2    2
welfare vectors in the population. In fact, in each bootstrap replicate, model parameters β, ση , σe
are simulated from the estimated distribution of the initial estimators (or an approximation to it). In
contrast, in MR’s (2010) parametric bootstrap procedure, a simple MC simulation procedure is used to
approximate the actual MSE under the nested error model with the average across bootstrap simulations
of the squared prediction errors. In this bootstrap procedure, the parameter estimates used to generate
the bootstrap censuses of welfare are kept ﬁxed to the original survey estimates. Consistency of these
initial survey estimators to their corresponding true values as the total sample size n tends to inﬁnity
ensures that the bootstrap MSE approximates the true MSE (under the model with the true values of
the parameters).



6    Extended Census EB estimators

The EB estimator in MR (2010) requires linking the survey and census households, which may not be
possible in real applications. In fact, the survey sample is likely not a subset of the available census.
Hence, here we consider the Census EB (CEB) estimator, which avoids this step. In fact, in most cases,
the number of sample households for a given area is much smaller than the number of census households
and, in such cases, the CEB estimator is expected to perform very much like the original EB.

CEB small area estimators are obtained with a similar MC simulation approach as the one used by MR
(2010) and described in Section 3, but simulating the vectors of welfare for all the census households
 17 These   may come from the census or administrative records (ELL, 2003 p356).


                                                          13
(instead of the non-sampled ones only) and hence not appending the welfare for the survey units. The
CEB is extended by accounting for the sampling design and heteroscedasticity, similarly as proposed by
Van der Weide (2014). This procedure is implemented by also incorporating household expansion factors
(household sizes) taken from the census. The proposed MC simulation procedure for the approximation
of the extended CEB estimator is:

  1. Fit model (1) to the survey data. This yields the set of parameter estimates:

                                                                ˆ0 , σ
                                                           ˆ0 = β
                                                           θ           2      2
                                                                     ˆη  0, σ
                                                                            ˆe 0 .


     The currently implemented procedure ﬁts the model either via FGLS and decomposing residuals
     as in the traditional ELL procedure or using H3 method.

  2. Use the model parameter estimates obtained in step 1 to simulate a vector of welfare for the Nc
                                   ∗     ∗             ∗    T              ∗
     census households in area c, yc = (yc1 , . . . , ycNc ) , where each yc1 is obtained as

                                                     ∗
                                                 ln(ych         ˆ0 + η ∗ + e∗ ,
                                                        ) = xch β     c     ch


     where, if area c is in the sample, its location eﬀect is generated as

                                                        ∗
                                                       ηc ∼ N (ˆ          ηc0 ]) ,
                                                               ηc0 , var [ˆ

     with η
          ˆc0 given by
                                                                            −1
                                                                  wch                   wch
                                         ˆc0 = γ
                                         η     ˆc                  2                     2    ˆch0 ,
                                                                                              e
                                                                 σ
                                                                 ˆech 0                σ
                                                                                       ˆech 0
                                                             h                     h

     for e          ∗
         ˆch0 = ln(ych         ˆ0 ,
                       ) − xch β
                                                                            2
                                                                          σ
                                                                          ˆη  0
                                      γ
                                      ˆc0 =                                                            −1 .
                                                2               2                            wch
                                              σ
                                              ˆη  0   +      h wch         h   wch      h     2
                                                                                            σ
                                                                                            ˆech 0

     and
                                                                                                2
                                              2     2                 2                 wch            2
                                    ηc0 ] = σ
                               var [ˆ       ˆη  0−γ
                                                  ˆc 0              σ
                                                                    ˆη  0+               2           σ
                                                                                                     ˆech 0     .
                                                                                       σ
                                                                                       ˆech0
                                                                               h
                                                                              ∗          2
     If area c is not included in the sample, then its eﬀect is generated as ηc ∼ N 0, σ
                                                                                       ˆη  0 . In absence
     of heteroscedasticity, the household-speciﬁc residuals come from

                                                             e∗
                                                              ch ∼ N 0, σ
                                                                        ˆe2
                                                                           0


     and, in the case of heteroscedasticity, from

                                                           e∗
                                                            ch ∼ N 0, σ
                                                                        2
                                                                      ˆech 0 .


                                                                                   ∗      ∗             ∗    T
  3. The previous step yields a simulated vector of census welfare for each area, yc = ( yc1 , . . . , ycNc ) ,
                                                                     ∗
     which makes use of the ﬁtted model parameters. With the census yc , the indicator in area c is
     calculated as
                                                                  ∗       ∗
                                                                 τc = f (yc ),

     where, for the FGT indicator of order α ≥ 0, we have
                                                      Nc                                                α
                                           ∗                  pch      ∗           y∗
                                      fα (yc )=                    I (ych < z ) 1 − ch                      .
                                                                pc                  z
                                                  h=1


                                                                  14
                                                                        ∗(m)
  4. Repeat steps 1 to 3 a large number of times M . Let τc                    be indicator obtained in mth replicate,
      for m = 1, . . . , M . The extended Census EB estimate is just the average of the M indicators,
                                                                  M
                                                   CEB      1            ∗(m)
                                                 τ
                                                 ˆc    =                τc    .
                                                            M     m=1


In contrast with the traditional ELL and CB-EB approaches, in the above MC procedure, the model
                    ˆ0 , σ
parameter estimates β            ˆ 2 , as well as η
                         ˆ 2 and σ                             ηc0 ], are kept ﬁxed across the whole procedure;
                                                  ˆc0 and var [ˆ
                            η0        e0
that is, there is a single DGP.

As a measure of noise of the extended CEB estimators, here the actual MSE under the nested error model
is considered. To estimate this MSE, we propose to apply a parametric bootstrap procedure similar to
the one in MR (2010), but adapted to the case where the survey is not necessarily a subset of the census.
This procedure is also described in Molina (2019) for the CEB predictor, but here we apply it to the
extended CEB predictor that includes heteroscedasticity and sampling weights:

  1. Fit model (1) to the survey data. This yields the set of parameter estimates:

                                                        ˆ0 , σ
                                                   ˆ0 = β
                                                   θ           2      2
                                                             ˆη  0, σ
                                                                    ˆe 0 .


      In the implemented version, the model is ﬁt either via FGLS and decomposing residuals as in the
      traditional ELL ﬁtting approach, or by H3.

  2. Using the estimates obtained in step 1 as true values of the model parameters, simulate a vector
                         ∗
      of census welfare yc = (yc1 , . . . , ycNc )T as follows:
                                          ∗
                                      ln(ych         ˆ0 + η ∗ + e∗ , h = 1, . . . , Nc ,
                                             ) = xch β     c     ch


      where the location eﬀect is generated as
                                                      ∗          2
                                                     ηc ∼ N 0, σ
                                                               ˆη  0 .


      In the homoscedastic case, household-speciﬁc errors are generated as

                                                     e∗
                                                      ch ∼ N 0, σ
                                                                ˆe2
                                                                   0


      and, in the case of heteroscedasticity, as

                                                   e∗
                                                    ch ∼ N 0, σ
                                                                2
                                                              ˆech 0 .


                                           ∗
  3. With the simulated census of welfare yc = (yc1 , . . . , ycNc )T for each area c = 1, . . . , C , produce
      indicators of interest:
                                                        ∗       ∗
                                                       τc = f (yc ).
                ∗
      where f (yc ) can be any indicator function, such as one of the FGT poverty indicators.
                                                                                          ∗
  4. Use the model parameter estimates obtained in step 1 to obtain new sample welfares, ych,s , for
      every location c as follows:
                                                ∗
                                            ln(ych,s           ˆ0 + η ∗ + e∗ .
                                                     ) = xch,s β     c     ch,s

                         ˆ0 come from step 1.
       (a) The estimates β
            ∗
       (b) ηc is the same one generated in step 2. Speciﬁcally, all locations present in the survey are
                                                                  ∗
           matched to the census locations and the same value of ηc that was simulated for that location
           in step 2 is applied to the survey households within the same location.

                                                          15
        (c) The household-speciﬁc errors are simulated as

                                                        e∗
                                                         ch,s ∼ N 0, σ
                                                                     ˆe2
                                                                        0


            in the homoscedastic case. When there is heteroscedasticity, they are simulated as

                                                      e∗
                                                       ch,s ∼ N 0, σ
                                                                     2
                                                                   ˆech0 .


            Unlike the original parametric bootstrap procedure in MR (2010) described in Section 3, here
            the sample errors e∗
                               ch,s are not a subset of the census errors simulated in step 2, although they
            are generated from exactly the same model (with the same variance).

    5. Fit the model (1) to the newly simulated survey data from step 4. This yields a bootstrap vector
       of estimated model parameters:
                                                   ˆ∗ = β
                                                   θ    ˆ∗ , σ
                                                             ˆη2∗
                                                                  ,σ
                                                                   ˆe2∗
                                                                        .

                                                     CEB ∗
    6. Obtain the bootstrap extended CEB estimator τˆc,b   through MC simulation using the bootstrap
                           ˆ∗
       parameter estimates θ from step 5, by the approach described in the previous section.
                                                                                ∗(b)                          CEB ∗(b)
    7. Repeat steps 2 to 6 a suﬃciently large number of times, B . Let τc              be the true value and τ
                                                                                                             ˆc,b
       be the CEB estimator obtained in the bth replicate of the bootstrap procedure. A parametric
       bootstrap estimator of the MSE is given by:
                                                               B                        2
                                            CEB          1           CEB ∗(b)    ∗(b)
                                     mseB τ
                                          ˆc    =                  τ
                                                                   ˆc         − τc          .
                                                         B
                                                             b=1



7     Simulation experiments for comparison of methods

7.1     Simulation experiment with poor model

Here we compare the procedures described in the previous sections for the estimation of poverty rates
and gaps (FGT indicators with α = 0, 1 respectively) and their corresponding error measures. To this
aim, we perform simulation experiments under the same setup as in MR (2010), but extended to consider
also more realistic scenarios.

Thus, like in MR (2010), a census data set of N = 20, 000 observations is created, where all observations
are uniformly spread among C = 80 areas, labeled from 1 to 80. This means that every area consists of
                                                                      iid
Nc = 250 observations. Location eﬀects are simulated as ηc ∼ N 0, 0.152 ; note that every observation
within a given area will have the same simulated eﬀect. Then, values of two right-hand side binary
variables are simulated. The ﬁrst one, x1 , takes the value 1 if a generated random uniform value between
                                            c
0 and 1 is less than or equal to 0.3 + 0.5 80 . This means that observations in areas with a higher label
are more likely to get value 1. The next one, x2 , is not tied to the area’s label. This variable takes the
value 1 if a simulated random uniform value between 0 and 1 is less than or equal to 0.2. The census
welfare vectors yc = (yc,1 , . . . , yc,Nc )T for each area c = 1, . . . , C , are then created as follows:

                                 ln(ych ) = 3 + 0.03x1,ch − 0.04x2,ch + ηc + ech ,
                                                                                                  iid
where household-level errors are generated under the homoscedastic setup, as ech ∼ N 0, 0.52 . The
poverty line is ﬁxed at z = 12, which is roughly 60 percent of the median welfare of a generated
population.


                                                          16
From the created “census” in each of the areas, 20 percent of the observations are sampled using simple
random sampling without replacement;18 this yields our “survey” data. Thus, in this experiment, survey
weights are all equal. This generation process is repeated L = 10, 000 times. In each simulation, the
following quantities are computed for the poverty rates and gaps in each area:

   1. True poverty indicators τc , using the “census”.
                          DIR
   2. Direct estimators τ
                        ˆc    using the “survey”, deﬁned as the sample versions of τc .
                               EB
   3. Original EB estimators τ
                             ˆc   by MR (2010) with REML ﬁtting as implemented in the sae R package,
         with M = 50 MC replicates.
                                   ELL
   4. Traditional ELL estimators τ
                                 ˆc    and their estimated standard errors, without location means and
         also including the location means of the considered auxiliary variables, where M = 50.
                                   CEB
   5. Census EB (CEB) estimators τ
                                 ˆc    , obtained as in the MC procedure of MR (2010), but generating
         full census vectors from the conditional distribution and without appending the survey welfares,
         using REML ﬁtting and with M = 50.

   6. Extended CEB estimators using the MC and parametric bootstrap methods described in Section
         6, using either H3 or ELL ﬁtting methods with M = 50. Since in this simulation experiment model
         errors are homoscedastic and sampling weights are constant, here the extended CEB estimators
         only diﬀer in the above CEB ones in the model ﬁtting method.
                                                               CB −EB
   7. Estimators based on the CB-EB approach from Section 4, τ
                                                             ˆc       using either H3 or ELL ﬁtting
         methods.19 Since clusters are originally supposed to be nested within the areas and the clustered
         bootstrap method is supposed to sample clusters (PSUs) instead of areas in step 1, we computed
         additional CB-EB estimators, where every area is divided into 25 PSUs (with 10 households each)
         and selecting 5 PSUs within every area (instead of areas). Under this scenario M = 50 as well.

Model bias and MSE are approximated empirically as in MR (2010), as the averages across the L = 10, 000
                                       j
simulations of the prediction errors τ
                                     ˆc  − τc and of the squared prediction errors respectively, where j
stands for one of the methods: DIR, ELL, CEB or CB-EB.20

Before getting into the results of the full simulation experiment, the methods are initially compared
using just a single generated population along with its corresponding sample. For this one population,
the diﬀerent methods are run by setting M = 1000 to get point estimates and B = 1000 for the MSE.21
For expediency, results that do not conform to the main message of the paper are relegated to the
appendix. However, from the ﬁgures in the appendix, we can draw the following conclusions: 1) The
diﬀerences in the EB and Census EB point estimates are negligible even if the area sampling fractions
fc = nc /Nc are not so small in this experiment (20 percent), see Figure 19. Although diﬀerences appear in
their estimated MSEs (see Figure 20), these diﬀerences become closer to zero as the sampling fractions
fc decrease. 2) Census EB estimates using H3 method as implemented in the sae Stata package are
practically identical to the Census EB ones with REML ﬁtting (Figure 21). Similarly, when comparing
the Census EB estimators obtained with the proposed MC procedure using either H3 or ELL estimation
methods, we can see that they are also aligned (Figure 22). This indicates that the actual method used
  18 The same units are sampled in every simulation and the values of x1 and x2 for all the census units are also kept ﬁxed;
this means that the x1 and x2 values for the sample units are always the same across simulations.
  19 For a detailed description of the diﬀerence between the methods readers should refer to Nguyen et al. (2018).

  20 E   j                            j       2
       τ
       ˆc  − τc for the bias, and E τ
                                    ˆc  − τc for the MSE.
  21 ForELL from section 2 and H3 CBEB from section 4 since the process relies on a single computational algorithm
M = 1, 000


                                                            17
to ﬁt the model is not an important factor. In fact, in real applications, the overall survey sample is
typically very large so any ﬁtting method, as long as it provides consistent estimators, is expected to
deliver estimates that will be very close to the corresponding true values.

The above conclusions support the use of the Census EB, using either REML, H3 or ELL estimation
methods, instead of the original EB procedure in real-world applications, where it is seldom possible to
link the survey and census data. Moreover, since the large sampling fractions of this experiment favor
the original EB method, in the following we show only results for the Census EB instead of the original
EB, where welfares are generated in the MC procedure for all the census units and the survey welfares
are not appended.

Figure 1 compares, for the poverty rates, Census EB estimators obtained from the proposed MC pro-
cedure of Section 6 using H3 estimation method (labeled “H3-CensusEB”) with the analogues obtained
using the CB-EB procedure of Section 4 (labeled “H3-CBEB (old bootstrap)”). We include also the
results from the CB-EB method with selection of PSUs within the areas instead of selection of areas
in step 1 (labeled “H3-CBEB-old with PSU”). Note that the CB-EB estimates presented in Section 4
diﬀer considerably from the true poverty rates whereas the Census EB ones track approximately the true
values. In fact, the sum of absolute diﬀerences of the CB-EB estimates to the true values is almost three
times that of the Census EB estimates.

The main reason for this diﬀerence of behavior, even if both estimates are supposed to implement EB
prediction is that, under the CB-EB approach in Section 4 with areas equal to clusters, a sample of
areas is drawn in each simulation. This leads to a given area beneﬁting from EB prediction of the area
eﬀect only in a portion of the simulations, since it is likely that an area is not selected in every single
bootstrap sample. This leads to CB-EB point estimates that are far oﬀ from the Census EB estimates
obtained with the proposed MC procedure of Section 6. Another reason might be that the parameters
used to simulate the census vectors in CB-EB approach are not kept ﬁxed across simulations, making
diﬃcult the convergence to the conditional expectation deﬁning the estimator. When using the CB-EB
approach with PSUs, there is still a clear diﬀerence with the Census EB estimates obtained with the new
bootstrap procedure of Section 6.

Turning now to the traditional ELL approach, Figure 2 shows ELL estimates for the poverty rates, based
on the model with only x1 and x2 (labeled “ELL”) and the same, but including the area means of x1 and
x2 in the model (labeled “ELL with context”), together with true poverty rates and Census EB estimates
obtained using the original MC procedure of MR (2010), without appending the survey data. In this
plot, one ﬁnds that the traditional ELL estimates face the same troubles as the CB-EB ones. This is
in line with what MR (2010) ﬁnd in their simulations. Given the present simulation set-up with a poor
model ﬁt, ELL performs much worse than MR’s Census EB method in terms of point estimates. Also
note the rather ﬂat nature of ELL, which is mostly due to the model providing very little information
and, given the limited explanatory power of the correlates, the ELL estimates fall close to the average
poverty rate of 16 for all the areas.22 What is somewhat concerning is that the addition of contextual
variables to the model for ELL, in the form of the area means, does little to improve the point estimates.
Thus, even if the survey data are not preserved in the Census EB estimation method, the beneﬁt of EB
is quite considerable and caution should be taken when doing out of sample predictions in case of little
explanatory power being brought in by the auxiliary variables.

The complete simulation experiment with L = 10, 000 simulated populations are now discussed. Results
in this case focus on the poverty gap, although the observed tendencies are the same for the poverty rates.
Figures 3 and 4 show respectively the empirical bias and empirical MSE across simulated populations
 22 The   ELL estimates are not really ﬂat if sorted from smallest to largest.


                                                             18
Figure 1: True poverty rates, proposed Census EB estimates of Section 6 using H3 method (labeled “H3-
CensusEB”), CB-EB analogues from Section 4 (labeled “H3-CBEB (old bootstrap)”) and CB-EB ones
with selection of PSUs within the areas (labeled “H3-CBEB-old with PSU”) in one simulated population
and sample.




Figure 2: True poverty rates, ELL estimates (labeled “ELL”), ELL estimates including the area mean
of x1 in the model (labeled “ELL with context”) and Census EB estimates obtained from MR (2010)
without appending the survey data, in one simulated population and sample.




                                                 19
Figure 3: Empirical bias across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of the poverty gaps.




Figure 4: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of the poverty gaps.




                                               20
of Census EB from Section 6 using H3 estimation method (labeled “H3-CensusEB”), traditional ELL
estimators (labeled “ELL”), estimators based on the CB-EB approach with H3 method (labeled “H3-
CBEB (old bootstrap)”), the same with PSUs within areas selected in step 1 (labeled “H3-CBEB (old
bootstrap with PSU)”) and direct estimators (labeled “Direct”). These full results are quite sobering.
While the bias for the traditional ELL estimators is not remarkable, the bias of the estimators obtained
by the CB-EB method from Section 4 is substantial, showing upward bias for all areas. In contrast, the
bias of the Census-EB estimators obtained from Section 6 seems negligible, ranging between -0.025 and
0.027, similar to the bias range of the original EB estimators from MR (2010).23 This ﬁgure shows a
considerable bias reduction of the proposed MC procedure for obtaining the Census EB estimators with
respect to the CB-EB method.
In terms of MSE, results are aligned to those in MR (2010), see Figure 4. The traditional ELL yields an
empirical MSE above the one of the direct estimator, which uses no model. Even if ELL estimators do
not show a large bias when averaging across all the simulated populations (the ELL bootstrap procedure
is supposed to approximate the marginal mean E (τc ; θ), which is exactly unbiased), as shown in Figure
2, it does not capture the area eﬀect of the true poverty rates for a single population. Note that, under
the model-based setup, welfares are generated from the model and in each simulation, new location
eﬀects are generated. This means that the true poverty rates vary for one population to another. In
fact, a given area might have a positive location eﬀect in one generated population and a negative eﬀect
in another. Since the location eﬀects have an expected value of zero, the ELL method is tracking the
average zero eﬀect, which is the same for all the areas, but it does not capture the area eﬀects in a
given population. What is even more concerning is that the CB-EB method applied in Nguyen et al.
(2018) and PovMap described in Section 4 yields an MSE that is more than 3 times greater that of the
traditional ELL. This happens because a large amount of the MSE is the squared bias, which has been
shown to be substantial for the CB-EB estimator. The proposed MC method for Census EB estimation
presented here represents a massive improvement over the traditional ELL and the CB-EB approaches
under the setup of this simulation experiment based on a poor model. The next section describes results
when increasing the prediction power of the model.


7.2    Simulation experiment with improved model

This experiment addresses one of the concerns of the previous simulation experiment: the poor explana-
tory power of the model. Hence, in this section a more informed model is considered, yielding an adjusted
R2 of about 0.42, which is much closer to that of real-world applications of ELL. The model includes six
covariates. The ﬁrst two, x1 and x2 , are generated exactly as in the previous section. Four additional
covariates are included, with values generated as follows:

   1. x3 is a binary variable, taking value 1 when a random uniform number between 0 and 1 is less than
                             c
      or equal to 0.1 + 0.2 80 .

   2. x4 is a binary variable, taking value 1 when a random uniform value between 0 and 1 is less than
                             c
      or equal to 0.5 + 0.3 80 .

   3. x5 is a discrete variable, simulated as the rounded integer value of the maximum value between 1
                                                             c
      and a random Poisson variable with mean λ = 3 1 − 0.1 80 .

   4. x6 is a binary variable, taking value 1 when a random uniform value between 0 and 1 is less than
      or equal to 0.4. Note that the values of x6 are not related to the area’s label.
 23 Resultsfor the Census EB estimators obtained from the original procedure of MR’s (2010) with REML estimation are
not shown since they are aligned to the Census EB estimators from Section 6.


                                                        21
The welfare vector for each area is created from the model with these covariates as follows

         ln ych = 3 + 0.09x1,ch − 0.04x2,ch − 0.09x3,ch + 0.4x4,ch − 0.25x5,ch + 0.1x6,ch + ηc + ech ,
               iid                 iid
where ech ∼ N 0, 0.52 and ηc ∼ N 0, 0.152 . The poverty line in this scenario is ﬁxed at z = 10.2. All
other steps are the same as in the previous simulation.24

Even if the performance of CB-EB estimators improves when considering a better model, these are still
clearly worse than the Census EB estimators obtained with the proposed MC approach of Section 6,
see Figure 5. This conclusion remains true even after decreasing the location eﬀect, by simulating the
                       iid
populations with ηc ∼ N 0, 0.072 .25

Similarly, despite the improved explanatory power of the model, Figure 6 still portrays a rather ﬂat
ELL hovering around the national average poverty rate. Under a single simulated population from the
improved model, the traditional ELL method still performs poorly when compared with the Census EB
according to MR (2010).

Results from the full simulation with L = 10, 000 populations for the poverty gap also resemble those
from the previous section, see Figures 7 and 8 showing empirical bias and empirical MSE respectively.26
The CB-EB method implemented as described in Nguyen et al. (2018), which replicates the methods of
PovMap, still presents considerable bias in this new simulation exercise. In comparison, the proposed
Census EB procedure seems to be nearly unbiased, although the obtained empirical biases actually range
between -0.04 and 0.05. Moreover, even under a better regression model, the MSE of the proposed Census
EB estimator is still much smaller. What is concerning is that the MSEs of the direct estimators are
smaller than those of the traditional ELL estimators and of the CB-EB estimators.

Another criticism of the EB approach of MR (2010), which extends to the Census EB, is its dependence on
normality. In order to study the sensitivity of the procedures to the normality assumption, an additional
simulation experiment is run under the same setup of this section, but generating the household-speciﬁc
errors from a Student’s t distribution with 5 degrees of freedom and scaled by 0.5. This distribution
is symmetric but has heavier tails than the normal distribution, which may represent the existence of
outlying welfares in the data, often encountered in real applications.

Empirical bias and MSE across the L = 10, 000 simulations are shown in Figures 9 and 10 respectively,
for the proposed Census EB, traditional ELL and direct estimators.27 Note that the ELL method takes
household-speciﬁc residuals from their empirical distribution, so its performance is not supposed to rely
so strongly on normality. However, results suggest that, even under lack of normality, ELL’s performance
is still worse than that of the Census EB update. In terms of bias, some areas under the traditional ELL
show quite considerable bias, yet on average across areas the bias can be near zero. This suggests that,
even if the bias across all areas of ELL can be small, it is not an accurate estimator in each area.28 In
fact, the empirical MSEs of the traditional ELL estimators for many areas are still much greater than
those of the proposed Census EB ones in this experiment.

Even after the better performance of the Census EB procedure compared with the traditional ELL
method under normality departure in the form of higher tails, model checking and diagnostics are ex-
tremely important, as in any model-based procedure. In fact, good performance of EB and Census EB
  24 Another                                                                                              iid
              slight modiﬁcation made in separate simulations is that the location eﬀect is simulated as ηc ∼ N 0, 0.072 .
  25 Results available upon request. Although it must be noted that under this scenario under many populations σ     ˆη <0,
and thus many of the populations need to be re-simulated.
  26 For other FGT indicators see ﬁgures: 24, 25, 26, and 27.
  27 For other FGT indicators see ﬁgures: 28, 29, 30, and 31.
  28 Note that the purpose in SAE is not obtaining estimators performing well on average across areas since, in that case,

the overall sample mean at the population level would be suﬃcient.


                                                            22
Figure 5: True poverty rates, proposed Census EB estimates of Section 6 using H3 method (labeled “H3-
CensusEB”), CB-EB analogues from Section 4 (labeled “H3-CBEB (old bootstrap)”) and CB-EB ones
with selection of PSUs within the areas (labeled “H3-CBEB-old with PSU”) in one simulated population
and sample, with improved model ﬁt.




Figure 6: True poverty rates, ELL estimates (labeled “ELL”), ELL estimates including the area means
of x1 in the model (labeled “ELL with context”) and Census EB estimates obtained from MR (2010)
without appending the survey data, in one simulated population and sample, with improved model ﬁt.




                                                 23
Figure 7: Empirical bias across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of the poverty gaps, with improved model.




Figure 8: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of the poverty gaps, under improved model.




                                               24
estimators in terms of bias and eﬃciency also relies on normality. Note that the sample size of house-
hold surveys is typically very large and therefore normality tests are all expected to reject the simple
null hypothesis of normality due to the strong evidence against it (which is never exactly true in real
applications). Hence, usual residual plots for checking the normality assumption using both predicted
area eﬀects and conditional unit-level residuals should be an important part of any application with real
data.

In fact, log transformation of welfares is indeed conventionally taken because practically all welfare
variables show strong right-skewness and heteroscedasticity. First of all, it is important to note that,
even after log transformation, residuals in some applications still show a skewed distribution, where the
skewness is transferred to the left tail. This occurs often in the presence of very small welfare values,
which the log function actually shifts to minus inﬁnity. A simple way to solve this left skewness problem
caused by the log function is to add a shift c > 0 to the original survey welfares, shifting them to values
further apart from zero, where the slope of the log function is less pronounced. As Molina, Nandram
and Rao (2014) recommends, this shift c can be obtained by ﬁtting the nested error model to a grid
of values of c in the range of the welfare values, and selecting the value for which the Fisher skewness
of unit level residuals is closest to zero. Another possibility is to select a diﬀerent transformation from
the Box-Cox or power families of transformations (which contain the log), as implemented in the sae R
package of Molina and Marhuenda (2015). In fact, Box and Cox (1964) included also the shift in their
original family of transformations.

If one wishes to avoid ﬁnding the correct transformation to achieve normality29 or if residuals still show
skewness, even after selecting the best possible transformation of welfare, approaches for EB estimation
have been recently developed for skewed distributions, see Graf, Marín and Molina (2019) for EB under
general skewed distributions and Diallo and Rao (2018) considering skew-normal errors.

A simulation experiment was done by increasing the size of the population to N = 100, 000, but the
sample size is kept as nc = 50 households per area. In this case, each area is made up of Nc = 1, 250
households, which means that sampling fractions are now 4 percent (much smaller than the previous 20
percent). Everything else in the simulation experiment is kept the same as in the simulation experiment
with the improved model and assuming normality. Under this scenario, ELL still performs worse than
the proposed Census EB, see Figures 11 and 12. However, it now shows improved performance in certain
areas, where it achieves a smaller MSE than the direct estimators (Figure 12). This bodes well for
real world scenarios, where the sampling fractions hardly reach 0.5 percent and thus ELL will likely be
preferable to direct estimators.30

Under this same (more realistic) scenario, we analyze the performance of the MSE estimators obtained
from each method. Figure 13 shows respectively the true MSEs of the traditional ELL estimators
(approximated empirically with the L = 10, 000 simulations), and the corresponding ELL estimated
variances (section2). We can see that estimated ELL variances are severely understating the true MSEs.
This severe underestimation also aﬀects the CB-EB procedure (see appendix Figure 23). On the other
hand, according to Figure 14, showing true MSEs of Census EB estimators and their corresponding
parametric bootstrap MSEs, where Census EB and bootstrap MSE estimators are obtained from the
algorithms of Section 6, the parametric bootstrap MSE estimators are tracking adequately the true
MSEs.31
  29 Note that for a random variable v , the transformation that leads to normality is given by Φ−1 (F (v )), where F (·) is

the true cumulative distribution function (cdf) of v and Φ(·) is the standard normal cdf. The problem is that the true cdf
F (·) is typically unknown.
  30 For other FGT indicators see ﬁgures: 32, 33, 34, and 35
  31 Similar conclusions were obtained for other indicators, such as the poverty severity (FGT indicator with α = 2),

although results are not shown for brevity.


                                                            25
Figure 9: Empirical bias across simulated populations of proposed Census EB (labeled “H3-CensusEB”),
ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of the
poverty gaps, under non-normal errors.




Figure 10: Empirical MSE across simulated populations of proposed Census EB (labeled “H3-
CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”) of the poverty gaps, under non-normal errors.




                                                26
Figure 11: Empirical bias across simulated populations of Census EB estimators using H3 method (la-
beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap
with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”), with population size of 100K.




Figure 12: Empirical MSE across simulated populations of Census EB estimators using H3 method (la-
beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap
with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”), with population size of 100K.




                                                27
Figure 13: Empirical MSEs of ELL estimators of poverty gaps (labeled “True ELL MSE”) and corre-
sponding ELL variance estimates (labeled “Estimated ELL Variance”), with population size of 100K.




Figure 14: Empirical MSEs of Census EB estimators of poverty gaps that use H3 estimation method
(labeled “True MSE”) and corresponding parametric bootstrap MSE estimates (labeled “Estimated H3-
CensusEB MSE”), with population size of 100K.




                                               28
Figure 15: Empirical bias across simulated populations of Census EB estimators using H3 method
(labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot-
strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample
of 10 per area.




Figure 16: Empirical MSE across simulated populations of Census EB estimators using H3 method
(labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot-
strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample
of 10 per area.




                                                 29
                                Table 1: Aggregate results across areas (FGT1)

                            1              2                3              4               5                 6

Model R2                 <0.01           ~0.42           ~0.42           ~0.42           ~0.42           ~0.46
K                           2               6               6               6               6               6
σ ηc                  N (0, 0.152 )   N (0, 0.152 )   N (0, 0.072 )   N (0, 0.152 )   N (0, 0.152 )   N (0, 0.152 )
σech                  N (0, 0.52 )    N (0, 0.52 )    N (0, 0.52 )    Student’s t*    N (0, 0.52 )    N (0, 0.52 )
Pov. threshold            12             10.2             10.2            10.2           10.2            10.2
nc                        50              50               50              50             50              10
Nc                      20,000          20,000          20,000           20,000        100,000         100,000

Direct estimates
AAB (x100)                0.031           1.180         1.183             1.721          1.556             3.744
AARB (x100)              33.000          17.446         17.212           16.538         20.101             43.347
ARMSE (x100)             1.269           2.417           2.404           2.386          2.835               5.143
ARRMSE (x100)            43.114          21.661         21.116           18.466         24.823             47.335

ELL
AAB (x100)               0.155            0.450         0.175             2.363        Not   run       Not    run
AARB (x100)              76.528          27.755         13.663           25.288        Not   run       Not    run
ARMSE (x100)             2.042           3.627           1.906           3.608         Not   run       Not    run
ARRMSE (x100)           510.106          39.350         17.824           28.903        Not   run       Not    run

ELL - with context
AAB (x100)         0.169                  0.413         0.169             2.264          0.408             1.936
AARB (x100)        77.412                27.496         13.723           25.077         26.040             25.372
ARMSE (x100)       2.036                 3.591           2.098           3.566          3.529               2.929
ARRMSE (x100)     448.674                38.957         19.679           28.715         36.171             28.378

H3-CBEB (with PSU)
AAB (x100)      3.742                     3.125         3.205           Not   run        3.120         Not    run
AARB (x100)    157.127                   32.002         30.199          Not   run       30.554         Not    run
ARMSE (x100)    3.888                    3.503           3.481          Not   run       3.484          Not    run
ARRMSE (x100)  363.459                   38.417         34.072          Not   run       36.025         Not    run

H3-CBEB
AAB (x100)               3.737            3.123         3.205           Not   run      Not   run           3.383
AARB (x100)             176.202          35.250         30.726          Not   run      Not   run           36.295
ARMSE (x100)             3.879           3.707           3.507          Not   run      Not   run            3.840
ARRMSE (x100)           577.523          45.157         35.216          Not   run      Not   run           40.608

H3-CensusEB
AAB (x100)                0.007           0.014         0.012             1.272          0.013             1.590
AARB (x100)              26.387          11.170         9.755            12.959         10.550             19.432
ARMSE (x100)             0.932           1.560          1.372            1.739          1.542               2.319
ARRMSE (x100)            63.121          14.668         12.586           14.300         13.590             21.818
AAB: Average absolute bias; AARB: Average absolute relative bias; ARMSE: Average root MSE;
ARRMSE: Average relative root MSE. *Student’s t distribution with 5 degrees of freedom and scaled by 0.5




                                                       30
One additional simulation is executed with the population of N = 100, 000, but the sample size is kept
as nc = 10 households per area, which makes the total sample size n = 800 . Each area is made up of
Nc = 1, 250 households, which means that sampling fractions are now 0.8 percent. The purpose of the
simulation is to assess the performance of the estimators under a scenario more aligned to ELL. ELL
(2002) proposed the location eﬀect be at the cluster level, and in some applications this is a cluster of 10
observations. However, in this case the total survey sample size may not be so realistic, since in household
surveys from many countries, the overall sample size is of several thousands. As expected, under this
scenario, the diﬀerences in bias of the diﬀerent methods is less stark. Note that with zero area sample
size (nc = 0), Census EB and traditional ELL become approximately the same, so, as we approach this
scenario, estimators come together (Figure 15). Because of the limited sample, all methods now show
considerably more bias and a higher MSE. Concurrently, H3-CensusEB now appears to be more aligned
to the ELL and H3-CBEB. However, H3-Census EB still outperforms the older methods in terms of
bias and MSE. Note also, that usually ELL is done by setting the location eﬀect at the cluster level
but results are presented at higher levels.32 However, aggregation to higher levels may underestimate
the true MSE as noted by Das and Chambers (2017). The inclusion of contextual level variables may
help (ibid ) although this requires being able to link clusters in the survey sample to the census which in
practice is not always feasible.

Aggregate results across areas are presented in Table 1 for the simulations discussed.33 Results from
column 1 are those that replicate the simulation experiment of Molina and Rao (2010). Note that the
large values of the average relative root mean squared error (ARRMSE) may arise from true values of
FGT2 close to 0, but the true values are common to all estimators. Regardless, the better performance
of the H3-Census EB update is quite clear in terms of all the performance measures and under all the
simulations run. Moreover, the results in the tables corroborate what is observed in the ﬁgures.


7.3       Examining the bias of the CB-EB PovMap update

The bias of the CB-EB estimator observed in our simulation experiments is a reason for concern. In the
CB-EB method with clusters equal to the areas, one could argue that the bias comes from the fact that
not all areas are equally likely to be included in the sample that is selected in step 1 of this procedure.
However, in our simulation experiments, bias also appears for the CB-EB method with PSUs selected
within the areas in step 1. Thus, the question remains: what is the source of this bias?

Investigating whether part of the bias comes from the regression coeﬃcients used to draw the welfares
βˆ∗ , which vary across the bootstrap replicates, we have plotted a histogram of the simulated regression
intercepts βˆ∗ obtained in the M replicates of the CB-EB bootstrap procedure, with M = 50, 100, 300,
for a single population. Figure 17 shows the results for M = 50 and M = 300 bootstrap replicates. This
                                                                          ∗
ﬁgure shows that the distribution of the simulated regression intercepts β1 for small M is not actually
centered around the true value β1 = 3; in fact, it is right-skewed with a mean clearly exceeding the
true value. While increasing the bootstrap replicates M yields better results (right plot), the regression
intercepts are still not centered around the true intercept.
                                                                             ∗
A second plausible source of bias is through the simulated location eﬀects, ηc . Since the method selects
PSUs with replacement, some areas might be selected more than once and others might not be selected.
                       ηc ] might be underestimated in an area selected more than once because there
As a consequence, var [ˆ
will artiﬁcially be more observations in that area. On the other hand, if an area is not selected, then
 32 For   other FGT indicators see ﬁgures: 36, 37, 38, and 39.
 33 See   tables 2 and 3 for FGT0 and FGT2, respectively.



                                                            31
Figure 17: Histogram of the regression intercepts β ∗(m) , m = 1, . . . , M obtained in the M replicates of
the CB-EB bootstrap method, for M = 50 (left) and M = 300 (right).




 ∗
ηc ∼ N 0, σ
          ˆη  , and because xch β ∗ is likely biased according to Figure 17, then the resulting estimate
            2 34

for this area will deviate considerably from the truth.


Figure 18: Empirical bias of the average across the census households of exp(xch β ∗ ) for each area
c = 1, . . . , C , as an estimator of the corresponding average of the true values exp(xch β ) (left) and of the
average of exp(xch β ∗ + ηc   ∗
                                ) as estimator of the true average based on exp(xch β + ηc ), for each area c.




Figure 18 shows how the bias actually builds up in the predicted welfares.35 The left plot in Figure 18
shows the empirical bias of the average across the census households of exp(xch β ∗ ) for each area as an
                                                                                                ∗
estimator of the corresponding average of the true values exp(xch β ). The right plot includes ηc and thus
                                                ∗
shows the bias of the average of exp(xch β ∗ + ηc ) as estimator of the true average based on exp(xch β + ηc ),
for each area c. We can see that the ﬁxed part of the regression already shows a considerable upward
bias, which agrees with the bias observed for the regression intercept in Figure 17. Once the location
eﬀect is added, things get much worse, with bias becoming substantial for most areas.
 34 Note  that σ
               ˆη2 is not the one estimated from the sample, it comes from a bootstrap sampling of the data.
 35 This simulation is executed under the same scenario as that of Subsection 7.2, except that L = 5, 000 instead of
L = 10, 000




                                                        32
8     Conclusions

Since the turn of the 21st century, the World Bank has been obtaining small area estimates of poverty
and inequality indicators using the ELL approach. One of the reasons for the method’s popularity was
the existence of software that made applying the method to real data relatively easy. This began with an
implementation in SAS by Demombynes (2002), which was followed by PovMap implemented by Zhao
(2006), and ﬁnally a Stata version (“sae” package by Nguyen et al., 2018). Along those lines, the research
eﬀected in this paper also represents a considerable update to the “sae” package in Stata.36

Countries often set out to obtain small area estimates because more precise poverty estimates at a
more granular level allow for improved allocation of resources, see Elbers et al. (2007). This research
comes just in time for the 2020 round of population census and should provide an improved tool for the
operationalization of the SDGs at a sub-national level.

An important aspect of the small area estimation procedures is how parameter estimates from an assumed
population model are applied to census data to obtain the small area estimates. As noted in this
document, the traditional ELL procedure is aligned to the approach used in multiple imputation, where
the main aim is not prediction. Under multiple imputation, the method that yields the lowest MSE yields
invalid statistical inference (Van Buuren, 2018). On the other hand, the goal of small area estimation is
to improve precision.

This paper presents a substantial revision to the traditional methods by ELL (2002, 2003) and the
updates by Van der Weide (2014) for small area estimation.37 Our results show substantial gains of the
proposed Census EB method with respect to the previous methods in all the considered scenarios (varying
area population sizes and sampling fractions, model prediction power and distributional assumptions)
and also of the considered parametric bootstrap MSE estimators. Thus, the updated methodology
provides a considerable improvement in the quality of the SAE estimates obtained under the World
Bank’s agenda. We show that, for a single population, the original ELL method tends to align with
the national poverty estimates and does not capture the area heterogeneity, although its performance
improves when averaging across populations. Nevertheless, caution should be taken when doing out of
sample prediction even under EB methods, since in that case estimates are very similar to the traditional
ELL ones and can thus deviate considerably from the truth. This is particularly relevant in cases where
the explanatory power of the chosen correlates is low.

An additional ﬁnding of this research is the considerable bias of the CB-EB update by Van der Weide
(2014) and as implemented by Nguyen et al. (2018) to the World Bank’s toolkit. The revised Census
EB method is aligned with MR’s (2010) method, which shows a level of bias that is several orders of
magnitude smaller than the previous CB-EB method. Furthermore, the MSE for the Census EB method
is also considerably smaller. This research and its accompanying software implementation represent a
massive improvement to the World Bank’s current toolkit and overall poverty mapping agenda.

This note serves as a background piece for a guidance note on small area estimation for the World Bank.
The simulations executed here are done under a controlled scenario, and thus a natural question is how
the ﬁndings would diﬀer under a real-world application. Now that the initial limitations of the methods
have been identiﬁed, the revisions can be put to the test on real-world data.




  36 An update to Nguyen et al. (2018) is in progress, but all Stata codes and commands used in this document are available

at https://github.com/pcorralrodas/SAE-Stata-Package
  37 The revision is also accompanied by an update to the 2018 Stata “sae” package by Nguyen et al. (2018)




                                                            33
References
Box, G. and Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society,
  Series B, 26:211–252.

Corral, P. and Cojocaru, A. (2019). Moldova poverty map: an application of small area estimation.
  mimeo.

Das, S. and Chambers, R. (2017). Robust mean-squared error estimation for poverty estimates based
  on the method of elbers, lanjouw and lanjouw. Journal of the Royal Statistical Society: Series A
  (Statistics in Society), 180(4):1137–1161.

Demombynes, G. (2002). A manual for the poverty and inequality mapper module. University of
  California, Berkeley and Development Research Group, the World Bank.

Demombynes, G., Elbers, C., Lanjouw, J., Lanjouw, P., Mistiaen, J., and Ozler, B. (2002). Produc-
  ing an Improved Geographic Proﬁle of Poverty: Methodology and Evidence from Three Developing
  Countries. WIDER Working Paper Series 039, World Institute for Development Economic Research
  (UNU-WIDER).

Demombynes, G., Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2008). How good is a map? putting
  small area estimation to the test. Rivista Internazionale di Scienze Sociali, pages 465–494.

Diallo, M. and Rao, J. (2018). Small area estimation of complex parameters under unit-level models
  with skew-normal errors. Journal of the Royal Statistical Society, Series A, 45:1092–1116.

Elbers, C., Fujii, T., Lanjouw, P., Ozler, B., Yin, W., et al. (2007). Poverty alleviation through geographic
  targeting: How much does disaggregation help? Journal of Development Economics, 83(1):198–213.

Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2002). Micro-level estimation of welfare. World Bank
  Policy Research Working Paper, (2911).

Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2003). Micro–level estimation of poverty and inequality.
  Econometrica, 71(1):355–364.

Fay III, R. E. and Herriot, R. A. (1979). Estimates of income for small places: an application of james-
  stein procedures to census data. Journal of the American Statistical Association, 74(366a):269–277.

Foster, J., Greer, J., and Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrica:
  Journal of the Econometric Society, pages 761–766.

Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2004). Bayesian Data
  Analysis., volume 2nd ed of Texts in Statistical Science. CRC Press [CAM].

González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., and Santamaría, L. (2008). Boot-
  strap mean squared error of a small-area eblup. Journal of Statistical Computation and Simulation,
  78(5):443–462.

Graf, M., Marín, J. M., and Molina, I. (2019). A generalized mixed model for skewed distributions
  applied to small area estimation. TEST: An Oﬃcial Journal of the Spanish Society of Statistics and
  Operations Research, 28(2):565–597.

Guadarrama, M., Molina, I., and Rao, J. (2018). Small area estimation of general parameters under
  complex sampling designs. Computational Statistics & Data Analysis, 121:20–40.


                                                     34
Haslett, S., Isidro, M., and Jones, G. (2010). Comparison of survey regression techniques in the context
  of small area estimation of poverty. Survey Methodology, 36(2):157–170.

Henderson, C. R. (1953). Estimation of variance and covariance components. Biometrics, 9(2):226–252.

Huang, R. and Hidiroglou, M. (2003). Design consistent estimators for a mixed linear model on survey
  data. Proceedings of the Survey Research Methods Section, American Statistical Association (2003),
  pages 1897–1904.

Marhuenda, Y., Molina, I., Morales, D., and Rao, J. N. K. (2017). Poverty mapping in small areas under
  a twofold nested error regression model. Journal of the Royal Statistical Society: Series A (Statistics
  in Society), 180(4):1111–1136.

Molina, I. (2019). Desagregación de datos en encuestas de hogares: metodologías de estimación en áreas
  pequeñas.

Molina, I. and Marhuenda, Y. (2015). sae: An R package for small area estimation. The R Journal,
  7(1):81–98.

Molina, I., Nandram, B., Rao, J., et al. (2014). Small area estimation of general parameters with
  application to poverty indicators: A hierarchical bayes approach. The Annals of Applied Statistics,
  8(2):852–885.

Molina, I. and Rao, J. (2010). Small area estimation of poverty indicators. Canadian Journal of Statistics,
  38(3):369–385.

Nguyen, M. C., Corral, P., Azevedo, J. P., and Zhao, Q. (2018). sae: A stata package for unit level small
  area estimation. World Bank Policy Research Working Paper, (8630).

Rao, J. N. and Molina, I. (2015). Small area estimation. John Wiley & Sons, 2nd edition.

Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American statistical Association,
  91(434):473–489.

Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys, volume 81. John Wiley & Sons.

StataCorp, L. (2019). Stata mi reference manual.

Van Buuren, S. (2018). Flexible imputation of missing data. Chapman and Hall/CRC.

Van der Weide, R. (2014). Gls estimation and empirical bayes prediction for linear mixed models with
  heteroskedasticity and sampling weights: a background study for the povmap project. World Bank
  Policy Research Working Paper, (7028).

Zhao, Q. (2006).       User manual for povmap.           World Bank. http://siteresources. worldbank.
  org/INTPGI/Resources/342674-1092157888460/Zhao_ ManualPovMap. pdf.




                                                    35
Appendix: Additional simulation results

Figure 19: True poverty rates, EB and Census EB estimates obtained according to MR ((2010)) with
REML ﬁtting, in one population and sample, with poor model ﬁt.




                                              36
Figure 20: Diﬀerence in estimated MSE between EB and Census EB obtained according to MR (2010)
with REML ﬁtting, in one population and sample, with poor model ﬁt.




Figure 21: Census EB estimates with H3 estimation method and with REML ﬁtting according to MR
(2010), in one population and sample, with poor model ﬁt.




                                             37
Figure 22: Empirical bias across simulated populations of Census EB with H3 or ELL estimation methods,
compared with direct estimators.




Figure 23: Empirical MSEs of H3-CBEB estimators of poverty gaps (labeled “True ELL MSE”) and
corresponding H3-CBEB variance estimates (labeled “Estimated H3-CBEB variance”), with population
size of 100K.




                                                 38
Appendix: Results for other indicators (FGT0 & FGT2)




                            39
Figure 24: Empirical bias across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of FGT0, with improved model.




Figure 25: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of FGT0, under improved model.




                                               40
Figure 26: Empirical bias across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of FGT2, with improved model.




Figure 27: Empirical MSE across simulated populations of Census EB from Section 6 using H3 method
(labeled “H3-CensusEB”), traditional ELL (labeled “ELL”), CB-EB with H3 method (labeled “H3-
CBEB (old bootstrap)”), CB-EB with PSUs (labeled “H3-CBEB (old bootstrap with PSU)”) and Direct
estimators (labeled “Direct”) of FGT2, under improved model.




                                               41
Figure 28: Empirical bias across simulated populations of proposed Census EB (labeled “H3-CensusEB”),
ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of FGT0,
under non-normal errors.




Figure 29: Empirical MSE across simulated populations of proposed Census EB (labeled “H3-
CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”) of FGT0, under non-normal errors.




                                                 42
Figure 30: Empirical bias across simulated populations of proposed Census EB (labeled “H3-CensusEB”),
ELL with location means (labeled “ELL with context”) and direct estimators (labeled “Direct”) of FGT2,
under non-normal errors.




Figure 31: Empirical MSE across simulated populations of proposed Census EB (labeled “H3-
CensusEB”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”) of FGT2, under non-normal errors.




                                                 43
Figure 32: Empirical bias across simulated populations of Census EB estimators using H3 method (la-
beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap
with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”), with population size of 100K.




Figure 33: Empirical MSE across simulated populations of Census EB estimators using H3 method (la-
beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap
with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”), with population size of 100K.




                                                44
Figure 34: Empirical bias across simulated populations of Census EB estimators using H3 method (la-
beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap
with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”), with population size of 100K.




Figure 35: Empirical MSE across simulated populations of Census EB estimators using H3 method (la-
beled “H3-CensusEB”), analogue CB-EB estimators sampling PSUs (labeled “H3-CBEB (old bootstrap
with PSU)”), ELL with location means (labeled “ELL with context”) and direct estimators (labeled
“Direct”), with population size of 100K.




                                                45
Figure 36: Empirical bias across simulated populations of Census EB estimators using H3 method
(labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot-
strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample
of 10 per area.




Figure 37: Empirical MSE across simulated populations of Census EB estimators using H3 method
(labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot-
strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample
of 10 per area.




                                                 46
Figure 38: Empirical bias across simulated populations of Census EB estimators using H3 method
(labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot-
strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample
of 10 per area.




Figure 39: Empirical MSE across simulated populations of Census EB estimators using H3 method
(labeled “H3-CensusEB”), analogue CB-EB estimators sampling areas (labeled “H3-CBEB (old boot-
strap)”), ELL with location means (labeled “ELL with context”) with population size of 100K and sample
of 10 per area.




                                                 47
                                Table 2: Aggregate results across areas (FGT0)

                            1              2                3              4               5                 6

Model R2                 <0.01           ~0.42           ~0.42           ~0.42           ~0.42           ~0.46
K                           2               6               6               6               6               6
σ ηc                  N (0, 0.152 )   N (0, 0.152 )   N (0, 0.072 )   N (0, 0.152 )   N (0, 0.152 )   N (0, 0.152 )
σech                  N (0, 0.52 )    N (0, 0.52 )    N (0, 0.52 )    Student’s t*    N (0, 0.52 )    N (0, 0.52 )
Pov. threshold            12             10.2             10.2            10.2           10.2             10.2
nc                         50              50              50              50              50              10
Nc                        250             250             250             250            1,250           1,250

Direct estimates
AAB (x100)                0.112           2.610         2.651             4.050          3.089             9.748
AARB (x100)              25.943          14.181         14.000           13.693         15.626             35.794
ARMSE (x100)             4.524           5.808           5.849           5.282          6.504              13.088
ARRMSE (x100)            34.143          17.781         17.332           15.127         19.546             40.127

ELL
AAB (x100)                0.530           1.003         0.382             4.795        Not   run       Not    run
AARB (x100)              52.871          21.077         10.785           19.722        Not   run       Not    run
ARMSE (x100)             7.474           8.282           4.510           7.642         Not   run       Not    run
ARRMSE (x100)            98.481          29.126         13.937           22.376        Not   run       Not    run

ELL - with context
AAB (x100)          0.613                 0.909         0.391             4.499          0.915             4.333
AARB (x100)        53.709                20.862         10.783           19.530         19.686             19.004
ARMSE (x100)       7.447                 8.200           4.546           7.512          7.957               6.629
ARRMSE (x100)      99.706                28.816         14.070           22.131         26.790             21.025

H3-CBEB (with PSU)
AAB (x100)       7.622                    2.969         3.114           Not   run        2.868         Not    run
AARB (x100)     71.075                   13.925         11.972          Not   run       12.733         Not    run
ARMSE (x100)    8.391                    4.949           4.539          Not   run       4.680          Not    run
ARRMSE (x100)   98.071                   18.724         15.191          Not   run       16.811         Not    run

H3-CBEB
AAB (x100)               7.614            2.967         3.114           Not   run      Not   run           4.343
AARB (x100)              79.167          16.706         12.488          Not   run      Not   run           17.384
ARMSE (x100)             8.797           5.852           4.709          Not   run      Not   run            5.837
ARRMSE (x100)           119.737          23.238         16.037          Not   run      Not   run           19.983

H3-CensusEB
AAB (x100)                0.027           0.029         0.028             2.321          0.027             3.638
AARB (x100)              20.049           8.914         7.874             9.674          8.217             14.722
ARMSE (x100)             3.341           3.655          3.306            3.625          3.477               5.281
ARRMSE (x100)            30.074          11.648         10.105           10.780         10.573             16.566
AAB: Average absolute bias; AARB: Average absolute relative bias; ARMSE: Average root MSE;
ARRMSE: Average relative root MSE. *Student’s t distribution with 5 degrees of freedom and scaled by 0.5




                                                       48
                                Table 3: Aggregate results across areas (FGT2)

                            1              2                3              4               5                 6

Model R2                 <0.01           ~0.42           ~0.42           ~0.42           ~0.42           ~0.46
K                           2               6               6               6               6               6
σ ηc                  N (0, 0.152 )   N (0, 0.152 )   N (0, 0.072 )   N (0, 0.152 )   N (0, 0.152 )   N (0, 0.152 )
σech                  N (0, 0.52 )    N (0, 0.52 )    N (0, 0.52 )    Student’s t*    N (0, 0.52 )    N (0, 0.52 )
Pov. threshold            12             10.2             10.2            10.2           10.2             10.2
nc                         50              50              50              50              50              10
Nc                        250             250             250             250            1,250           1,250

Direct estimates
AAB (x100)                0.012           0.683         0.679             1.072          0.933             2.106
AARB (x100)              43.706          22.195         21.887           20.838         25.986             54.216
ARMSE (x100)             0.568           1.460           1.437           1.516          1.751               3.013
ARRMSE (x100)            56.676          27.534         26.831           23.059         32.142             59.398

ELL
AAB (x100)               0.062            0.252         0.102             1.383        Not   run       Not    run
AARB (x100)            3381.026          33.477         16.578           28.985        Not   run       Not    run
ARMSE (x100)            0.798            2.014           1.072           2.106         Not   run       Not    run
ARRMSE (x100)          328071.1          49.030         21.944           33.038        Not   run       Not    run

ELL - with context
AAB (x100)         0.066                  0.237         0.097             1.344          0.227             1.067
AARB (x100)      2736.935                33.172         16.753           28.990         30.893             30.141
ARMSE (x100)       0.796                 1.996           1.430           2.106          1.953               1.607
ARRMSE (x100)    263427.9                48.521         29.322           33.166         43.758             33.977

H3-CBEB (with PSU)
AAB (x100)       2.018                    2.380         2.417           Not   run        2.396         Not    run
AARB (x100)    1834.462                  50.899         48.558          Not   run       48.464         Not    run
ARMSE (x100)    2.105                    2.554           2.547          Not   run       2.555          Not    run
ARRMSE (x100)  156534.7                  59.238         53.586          Not   run       54.890         Not    run

H3-CBEB
AAB (x100)               2.015            2.378         2.416           Not   run      Not   run           2.486
AARB (x100)            3852.208          55.248         49.299          Not   run      Not   run           55.997
ARMSE (x100)            2.062            2.608           2.549          Not   run      Not   run            2.694
ARRMSE (x100)          354728.8          68.541         55.098          Not   run      Not   run           61.308

H3-CensusEB
AAB (x100)               0.003            0.009         0.008             0.774          0.008             0.865
AARB (x100)             151.842          13.837         12.206           15.255         12.392             22.871
ARMSE (x100)             0.390           0.908           0.798           1.059          0.866               1.269
ARRMSE (x100)           11633.9          18.454         15.922           16.761         16.042             25.717
AAB: Average absolute bias; AARB: Average absolute relative bias; ARMSE: Average root MSE;
ARRMSE: Average relative root MSE. *Student’s t distribution with 5 degrees of freedom and scaled by 0.5




                                                       49