WPS7043 Policy Research Working Paper 7043 Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data Methods and Illustration with Reference to a Middle-Income Country Hai-Anh H. Dang Peter F. Lanjouw Umar Serajuddin Development Research Group Poverty and Inequality Team September 2014 Policy Research Working Paper 7043 Abstract Obtaining consistent estimates on poverty over time as well introduced here imposes few restrictive assumptions, as monitoring poverty trends on a timely basis is a priority works with simple variance formulas, provides guidance concern for policy makers. However, these objectives are not on the selection of control variables for model building, readily achieved in practice when household consumption and can be generally applied to imputation either from data are neither frequently collected, nor constructed using one survey to another survey with the same design, or to consistent and transparent criteria. This paper develops a another survey with a different design. Empirical results formal framework for survey-to-survey poverty imputa- analyzing the Household Expenditure and Income Survey tion in an attempt to overcome these obstacles, and to and the Unemployment and Employment Survey in Jordan elevate the discussion of these methods beyond the largely are quite encouraging, with imputation-based poverty ad-hoc efforts in the existing literature. The framework estimates closely tracking the direct estimates of poverty. This paper is a product of the Poverty and Inequality Team, Development Research Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at hdang@worldbank.org and planjouw@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data: Methods and Illustration with Reference to a Middle-Income Country Hai-Anh H. Dang, Peter F. Lanjouw, and Umar Serajuddin * Key words: poverty, imputation, consumption, household survey, labor force survey, Jordan JEL codes: C15, I32, O15 * Dang (hdang@worldbank.org) and Lanjouw (planjouw@worldbank.org) are respectively Economist and Research Manager at the Poverty and Inequality Unit, Development Research Group; Serajuddin (userajuddin@worldbank.org) is Senior Economist at the Development Data Group; all three are at the World Bank. We would like to thank Eric le Borgne, Jose Antonio Cuesta, Kristen Himelein, Dean Jolliffe, Nora Lustig, Yusuf Mansur, David Newhouse, Dominique van de Walle, Paolo Verme, and participants at the 17th World Congress of the International Economic Association (Dead Sea, Jordan) for helpful discussions on earlier versions. We are thankful to Orouba Al-Sabbagh, Mukhallad Omari and Zein Soufan at the Ministry of International Planning and International Cooperation, and to Mohammad Al Jundi, and Ghaida Khasawneh at the Department of Statistics of Jordan for their support. We would like to thank Yichen Tu and Yoon Jung Lee for capable research assistance. We are grateful to the UK Department of International Development for funding assistance through its Strategic Research Program. I. Introduction Building on the success of the Millennium Development Goal that saw the global poverty rate in 1990 halve before 2015, the international community has redoubled its efforts to reduce poverty further. For example, the World Bank recently proposed an ambitious goal of reducing the global extreme poverty rate to no more than 3 percent by 2030. In this connection, measuring poverty serves as an instrumental tool for poverty eradication; reliable estimates can help us understand which policies work and which do not work, and how efficient they are. Estimation of poverty is, however, a rather involved process, one that typically imposes significant demands on financial resources and that needs to draw on specialized technical expertise. The process often confronts practical challenges that can undermine efforts to track poverty trends for timely policy interventions. For instance, if poverty estimates are to be compared over time, a crucial requirement is that both the consumption aggregates and poverty lines be consistently constructed across survey rounds and be strictly comparable. However, studies document that this seemingly undemanding condition is less often satisfied than one might think. A well-known example is the vibrant debate in India in the early 2000s where, among other factors, changes in the questionnaire design had resulted in considerable controversy around the degree and direction of change in poverty during the 1990s. According to official estimates, the headcount poverty rate decreased by 10 percentage points—equivalent to 60 million people escaping poverty—between 1993/1994 and 1999/2000. In contrast, independent researchers produced conflicting estimates suggesting a rate of decline ranging from slightly slower than the official estimates (Deaton and Dreze, 2002; Kijima and Lanjouw, 2003; Tarozzi, 2007) to one estimate suggesting a mere three percentage point decline in poverty (Sen 2 and Himanshu, 2005). This latter estimate was associated with the absolute number of people living in poverty remaining unchanged during the 1990s. 1 Another issue that commonly hinders the tracking of poverty over time is that consumption surveys are typically conducted only occasionally (particularly in developing countries), and poverty estimates are not available in the intervening years during which surveys have not been implemented. Yet another issue is that collecting, cleaning, and preparing data for analysis can be a protracted process that, at times, can span multiple years from the start of field work to the time when the data are ready for analysis. In all these cases, the challenge can be broadly regarded as one involving missing data: consumption data are available in one period but in the next period(s) are either not available, or are not comparable. The topic of imputing missing consumption data from one survey to another (i.e., survey-to- survey imputation) has received some attention in the statistics literature, but relatively little in the economics literature. With a handful of exceptions, the estimation framework utilized by most current economic studies that focus on poverty comparisons appears to be largely based on earlier work exploring the feasibility of survey-to-census imputation by Elbers, Lanjouw, and Lanjouw (2003). This survey-to-census imputation model provides a related, but not perfectly transferable, econometric model for survey-to-survey imputation. 2 It can be contrasted with the multiple imputation (MI) approach discussed in the statistics literature, which has grown rapidly since it was first introduced by Donald Rubin in the late 1970s (Rubin, 1978). Indeed, the widespread availability of a variety of missing data imputation procedures offered in most 1 See Deaton and Kozel (2005) for further discussion on this poverty debate in India. See also Christiaensen et al. (2012) and World Bank (2012a) for similar issues compromising the comparability of poverty estimates in Russia and Vietnam respectively. 2 Significant differences exist between survey-to-census imputation and survey-to-survey imputation methods. In particular, the former focuses on intratemporal (i.e., same point in time) imputation for producing poverty estimates at lower administrative levels than a survey would reasonably allow, while the latter focuses on intertemporal imputation for poverty estimates at more aggregated population groups. These differences clearly raise distinct econometric issues for each method. We will discuss the relevant studies in the next section on literature review. 3 current statistical software packages can pose a challenge to the analyst in identifying the best method to use, and especially in assessing which estimation technique is best suited to the specific economic question, assumptions and data requirements at hand. In this paper we make new contributions on both the theoretical and empirical front. 3 On the theoretical front, we provide a formal framework for survey-to-survey poverty imputation with several original features ranging from assumption testing to model building and estimation variance. First, we provide an explicit discussion of the different assumptions required for the appropriate application of our poverty imputation method, which are often only implicitly considered in existing studies. In particular, we show that the key and traditionally-made assumption of constant parameters in the household consumption model is both unduly restrictive and unlikely to hold in practice, and we offer a less restrictive assumption instead. Existing studies commonly invoke the assumption of constant parameters, but to our knowledge none provides a direct test for this assumption. We thus propose formal tests for our general assumption as well as for this traditional but more restrictive assumption, and we also discuss further what can be done when these assumptions are relaxed. Second, our proposed formula for the variance of the estimated poverty rate is simple and accords with the one commonly used in the statistics literature. Our framework also allows us to provide more insights into the selection of control variables for model building—which has received relatively cursory treatment in the literature. An enhanced understanding of this model selection process coupled with certain additional assumptions enables us to offer bound estimates 3 We focus in this paper on predicting household consumption in cross sectional rather than panel data. For predicting poverty mobility based on synthetic (pseudo) panel data, see Dang, Lanjouw, Luoto, and McKenzie (2014), and Dang and Lanjouw (2013). We also focus on survey to survey imputation; for survey to census imputation, see, e.g., Elbers, Lanjouw, and Lanjouw (2003) and Tarozzi and Deaton (2009) for economic studies, and Rao (2003) for statistical studies. For a related literature on partial identification with different samples see, e.g., Manski (2003); see also Ridder and Moffitt (2007) for a recent review on the econometrics of data combination. 4 even in cases where data constraints are so severe that only very few control variables are available. Our paper thus aims at providing a systematic and comprehensive treatment of survey- to-survey poverty imputation methods that appear to be implemented on a somewhat ad hoc basis in most of the existing economics literature. Third, we also show that, given some standard assumptions, our framework can be generally applied to imputation either from one survey to another survey with the same design, or to another survey of a different design. The former is relevant to situations where consumption data in a more recent survey round are not consistent with those in an earlier round (say, owing to measurement errors or poorly constructed consumption aggregates), or where no reliable consumer price index (CPI) data exist to update the poverty line over time. On the other hand, imputation from one survey to another of a different design is pertinent to situations where one survey is implemented less frequently but collects consumption data (e.g., household expenditure or budget surveys), while the other survey is conducted more frequently but does not collect consumption data (e.g., labor force surveys). Using surveys of different designs can remarkably expand the application range of imputation methods, but the inevitable tradeoff is that the sample statistics estimated from surveys of different designs would likely be different due to various reasons, which would in turn render imputation-based estimates incomparable. We propose rather straightforward standardization procedures to harmonize the different surveys and show that employing these procedures can produce estimates that are statistically indistinguishable from the actual poverty rates, in sharp contrast to the severely biased estimates obtained from non-standardized data. Finally, in constructing our framework, we offer a critical review of the economics literature and of the related studies on data imputation in statistics. Our paper thus also represents an early 5 attempt at distinguishing the currently available methods in statistics and economics as well as incorporating the advances from the former into the latter. This is consistent with similar ongoing efforts in other disciplines that build on the multiple imputation method in statistics to better address their own disciplinary needs. 4 Empirically, we illustrate our method with an application to Jordan, a particularly interesting case for analysis. Not much is known about poverty trends since Jordan’s Department of Statistics (DOS) last conducted its Household Expenditure and Income Survey (HEIS) in 2010. In the meantime, this country’s economy has experienced several major events such as the introduction of new poverty-reduction policies by the government (e.g., in accordance with its recent Poverty Reduction Strategy), economic reforms (e.g., reducing its petroleum subsidies and implementing a targeted cash transfer), and shocks due to higher energy prices. Socio-political change and unrest in neighboring Syria and Egypt also add further uncertainty to the economy. Given this fast evolving context, policy makers are keenly interested in tracking poverty trends on a more frequent and timely basis. In contrast with the HEIS survey which was last conducted in 2010, DOS administers the Employment-Unemployment Survey, a labor force survey (LFS) with wide geographical coverage, on a quarterly basis. We exploit the LFS, which does not collect consumption data and has a different design from the HEIS, to fill the missing poverty data problem in Jordan for the years the HEIS is absent. We validate our imputation-based estimates of poverty against those obtained from the actual consumption data (or design-based estimates) for the two years 2008 and 2010 when consumption data are available, before imputing estimates for other years when consumption 4 See, for example, King et al. (2001) and Honaker and King (2010) for examples of adaptation of multiple imputation methods in the field of political science. 6 data are not available. 5 We offer two types of validation: imputation-based estimates for 2010 against the true rate in this year using only the HEIS, and imputation-based estimates for 2008 and 2010 against the true rates combining both the HEIS and the LFS. Validation results show that our imputed poverty estimates are close to the true rates based on the actual consumption data, with the former falling within the 95 percent confidence intervals of the latter. Indeed, in quite a few cases, our estimates are within one standard error of the true rates. Putting the true rates for the two years where consumption data are available together with the imputation-based estimates for the remaining years, estimation results point to a steadily decreasing trend in poverty over time for Jordan during the period 2008-2013. This paper consists of five sections. A review of recent studies in economics and statistics is provided in the next section. This is followed in Section III by the theoretical framework, estimation procedures, and empirical application for imputation using surveys of the same design. Section IV extends this framework to imputation for surveys of different designs and then provides empirical illustrations. Section V concludes. II. Review of Missing Data Imputation Methods in Recent Studies The idea of imputing missing household consumption has existed in various forms in the economic literature, but there was an upsurge of interest in the 2000s. Except for the survey-to- survey imputation on India by Deaton and Drèze (2002) and Tarozzi (2007), earlier work on poverty based on imputations largely focuses on survey to census imputation and includes a study on Ecuador by Hentschel et al (2000), which is followed by a formalization of the approach in Elbers, Lanjouw, and Lanjouw (ELL) (2003). 6 While a consumption survey collects 5 While a more general and widely used statistical term “model-based” exists which can include the term “imputation-based”, we prefer to use the latter to emphasize the more specific imputation nature of our estimates. We also use the terms “imputation” and “prediction” interchangeably in this paper. 6 An earlier study by Ravallion (1996) proposes using time series data consisting of aggregated agricultural wages and outputs to forecast poverty rates in India. Another method to track poverty over time constructs an index for 7 consumption data, its limited sample size means the survey is only representative at highly aggregated administrative levels; conversely, the population census has exactly the opposite strength and weakness, being nationally representative at a far more disaggregated administrative level but offering no consumption data. Applying the estimated model parameters of consumption from a household expenditure survey onto overlapping variables with the census, ELL can predict consumption data into the latter. These data can then be disaggregated to estimate poverty at lower administrative levels than are possible using the household survey alone. This method is sometimes referred to as the “poverty-mapping” approach owing to its extensive presentation of poverty estimates in a cartographic format. Kijima and Lanjouw (2003) then apply this method to provide survey-to-survey imputation-based poverty estimates for India. Building on this approach, Stifel and Christiaensen (2007) combine household expenditure survey data with more recent rounds of the Demographic and Health Survey (DHS) in Kenya to impute household consumption into the latter. A more recent paper by Christiaensen et al. (2012) predicts consumption in the second round of a consumption survey using the estimated model parameters from the first round of the same survey for several countries. By generating consumption data in the second round that are more consistent with those in the first round, this study indicates that imputation methods can help obviate the need of updating expenditure data with problematic deflators over time. Using seven rounds of household survey data from household wealth based on household assets (Sahn and Stifel, 2000). This method’s greatest strength is perhaps that it is straightforward to implement in most contexts where information on household assets is available; however, the non-monetary nature of asset indices renders poverty estimates more difficult to interpret. Another branch of the (statistics and economics) literatures constructs weights to adjust estimates in the presence of missing data instead; for studies that follow this approach, see, e.g., Tarozzi (2007) and Bethlehem, Cobben, and Schouten (2011). 8 Uganda, Mathiassen (2013) also finds imputation-based poverty estimates to accurately track the true poverty rates in most cases. In the same spirit, another approach is to combine a household expenditure survey and a more recent labor force survey to impute consumption into the latter and subsequently to estimate poverty. This approach has been implemented for Mozambique by Mathiassen (2009). Douidich, Ezzrari, van der Weide, and Verme (2013) similarly take advantage of an almost identical design between the household expenditure survey and the LFSs in Morocco to impute poverty rates in the latter and find very encouraging results. Among all these cited studies, however, only the three most recent studies by Christiaensen et al. (2012), Mathiassen (2013), and Douidich et al. (2013) offer validation for their estimates against the true poverty rates before extending their analysis to the years without consumption data. It is worth noting that all these validation studies restrict their analysis to surveys of the same design, but none of these studies explicitly discusses this assumption that their studies rely on. 7 Missing data imputation, however, does not appeal to economics researchers alone. The few existing studies in economics appear to have been developed independently of a much more established literature on missing data imputation in statistics. Starting with the seminal work on imputation methods by Rubin in the late 1970s (Rubin, 1977, 1978), imputation methods have steadily become counted among the main tools of a professional statistician. Government agencies such as the U.S. Census Bureau regularly use imputation to fill in important missing data on various statistics for income (Census Bureau, 2014a) and labor (Census Bureau, 2014b). 7 A recent study that uses the ELL approach for poverty imputation for Sri Lanka by Newhouse et al. (2014) is an exception. It finds that differences in sampling design can undermine the accuracy of survey-to-survey predictions. Another study by Dabalen et al. (2014) imputes poverty estimates from one household survey round to another round for Liberia but does not provide validation due to missing consumption data in the latter. 9 However, due to different disciplinary focuses, while the imputation methods used in statistics share common features with those used in economics, important differences exist. Table 1 summarizes the key features that are similar and different across imputation methods employed in several recent published studies in economics and statistics, which for economics include ELL (2003), Stifel and Christiaensen (2007), Christiaensen et al., (2012), and Mathiassen (2013), and for statistics include Rubin (1987), Little and Rubin (2002), Schafer and Graham (2002), van Buuren (2012), and Carpenter and Kenward (2013). These studies do not represent all the existing studies in their respective literatures, but they are indicative of the “typical” approach used within each field. 8 The common and different features across economic and statistical studies are broadly classified along several dimensions including the target population, the type and proportion of missing data as well as the mechanism underlying missing data, and timing and modeling issues. Several findings emerge from Table 1. There is much commonality between imputation methods used in economics and statistics, even though statistical imputation methods are more general than economic imputation methods. For example, economic studies mostly focus on a single missing variable, usually the household consumption variable; conversely, statistical studies pay attention to missing variables that can either be outcome or explanatory ones (rows 1.1 and 1.2, Table 1). Economic studies mostly investigate a missing data mechanism defined in statistical terminology as missing data at random (MAR) (row 2) and employ parametric and semi-parametric estimation techniques (row 3.3); statistical studies, however, broadly consider other missing data mechanisms and estimation techniques as well. 8 Also see, e.g., Davey, Shanahan, and Schafer (2001) and Jenkins et al. (2011) for studies that apply the statistical approach of missing data imputation techniques to economic issues. 10 The differences between economic studies and statistical studies stem largely from their different disciplinary focuses. The cited economic studies are mostly interested in predicting consumption in a new survey (census) round, while the statistics studies pay more attention to filling in the missing data in an existing data set. Consequently, economists usually impute from one survey to another (row 4.1) with missing consumption data (row 5) that are implemented either at the same time or more recently (row 6). In contrast, statisticians often impute missing data within the same survey where usually less than half of the data are missing. Another difference is that, economists appear to use economic theory alongside statistical theory for model selection, even though there is little formal discussion of this process in existing studies (row 3.4). In short, all these reviewed economic and statistical studies rely on a key assumption that the (distributions of the) parameters estimated from the first survey (for economics) or the observed complete data (for statistics) be identical for the missing data (row 3.1). This assumption is practically a prerequisite for any existing work with data imputation; another implicit assumption which is not often discussed is that the two surveys (or the complete data and the missing data sources) have comparable designs. However, hardly any economic studies explicitly discuss the assumption of comparable survey design, and none tests for the assumption of identical parameters. This latter assumption in fact constitutes the major divergence between the intratemporal survey-to-census imputation and intertemporal survey-to-survey imputation. We will discuss in more detail these assumptions and what should be done when these are relaxed as well as other modelling issues in our imputation framework. III. Imputation Using Surveys of the Same Design III.1. Estimation Framework 11 Let xj be a vector of characteristics that are commonly observed between the two surveys, where j indicates the type of survey that can either be the same household expenditure survey or another survey. 9 Subject to data availability, these characteristics can include household variables such as the household head’s age, sex, education, ethnicity, religion, language, occupation, household assets or incomes, and other community or regional variables. Occupation-related characteristics can generally include whether household heads work, the share of household members that work, the type of work that household members participate in, as well as context-specific variables such as the share of female household members that participate in the labor force. Regional characteristics related to macroeconomic trends such as (un)employment rates or commodity prices can also be included if such data are available. As discussed below, these variables would play a critical role in capturing the changes in estimated poverty rates. Household consumption (or income) data exist in one survey but are missing in the other survey, thus without loss of generality, let survey 1 and survey 2 respectively represent the survey with and without household consumption data, and y1 represent household consumption in survey 1. More generally, these two surveys can be either in the same period or in different periods. We focus in this section on the latter case, before discussing the more complicated cases of combining surveys of different designs in the same period and in different periods in the next section. 10 9 More generally, j can indicate any type of relevant surveys that collect household data sufficiently relevant for imputation purposes such as labor force surveys, demographic and health or youth surveys. To make notation less cluttered, we suppress the subscript for each household in the following equations. 10 Theoretically, it is trivial to consider the change in poverty estimates when we impute from one survey to another in the same time period; this change is zero by construction. But practically, this imputation exercise is relevant for validation purposes when imputation is done using two surveys with different design. We will come back to discuss this later. 12 To further operationalize our estimation, we assume that the linear projection of household consumption on household and other characteristics (x) for survey 1 is given by a cluster random-effects model y1 = β1 ' x1 + µ1 + ε 1 (1) Were the household consumption data y2 available in survey 2, we assume the same linear projection of household consumption on household characteristics 11 y 2 = β 2 ' x2 + µ 2 + ε 2 (2) where, conditional on household characteristics, the cluster random effects and the error terms are assumed uncorrelated with each other and to follow a normal distribution µ j | x j ~ N (0, σ µ 2 j ) and ε j | x j ~ N (0, σ ε2j ) . Equation (1) thus provides a linear random effects model that can be straightforwardly estimated using most available statistical packages. We are most interested in the poverty estimates for survey 2, where the consumption data are missing. Let z2 be the poverty line in period 2, if y2 existed the poverty rate P2 in this period could be estimated with the following quantity P( y2 ≤ z 2 ) (3) where P(.) is the probability (or poverty) function that gives the percentage of the population that are under the poverty line z2 in survey 2. P(.) is thus non-increasing in household consumption. We further make the following assumptions that underlie the theoretical framework. Assumption 1: Let xjt denote the values of the variables observed in survey j at time time t, for j= 1, 2, and t= 1,…, T; and let Xt denote the corresponding measurements in the population. Then xjt=Xt for all j and t. 11 This assumption assumes that the returns to the characteristics xj are captured by equation (2) and precludes the (perhaps exceptionally) rare situations where there could be no correlation between these characteristics and household consumption due to unexpected upheavals in the economy or calamitous disasters. Contexts where there are sudden changes to the economic structures (e.g., overnight regime change) may also introduce noise into the comparability of the parameters in equation (2). 13 Assumption 1 is crucial for imputation and ensures that the sampled data in survey 1 and survey 2 are representative of the population in each respective time period. Put differently, this assumption implies that, for two contemporaneous (i.e., implemented in the same time period) surveys, these estimates are identical since they equal the population values; and for two non- contemporaneous surveys, estimates based on the same characteristics x in these two surveys are consistent and comparable over time. While surveys of the same design (and sample frame) are more likely to be comparable and can thus satisfy Assumption 1, there is no a priori guarantee that these surveys can provide comparable estimate across two different time periods, or even the same estimates in the same time periods. Examples where Assumption 1 may be violated include the cases where national statistical agencies change the questionnaire for the same survey over time as with the NSS for India discussed earlier, or where one considers different surveys that focus on different population groups (e.g., the average household size may differ between a household survey and a labor force survey depending on the specific definition that is used). Violation of Assumption 1 rules out the straightforward application of survey-to-survey imputation technique and would require that additional assumptions be made on the relevance of the estimated parameters from one survey to the other. To make notation less cluttered, we will suppress the subscript t for time in subsequent expressions. Assumption 2: Let ∆P and ∆x respectively represent the changes in poverty rates and the explanatory variables x over time, and Θ j the set of parameters ( β j , µ j , ε j ) that map the variables x into the household consumption space in period j where the consumption data are available. Then ∆P = P(∆x | Θ j ) , where P(.) is the given poverty function. Assumption 2 implies that, given the estimated consumption parameters from survey 1, the changes in the distributions of the explanatory variables x between the two periods can capture the change in poverty rate in the next period. Given the commonly observed variables in the two surveys, this assumption allows the imputation of the missing household consumption for survey 14 2. In practical terms it implies that the change in poverty rates over time is attributable to changes in the explanatory variables x rather than the returns to characteristics (or economic structure) and the unexplained characteristics (or random shocks)—which are respectively represented by β1 and ε 1 . In other words, given the same observed characteristics x, households would be subject to the same level of poverty regardless of the time period the data were collected. While this assumption may seem counterintuitive, it may be especially relevant to economies where the returns to characteristics do not change or simply change little over time. Clearly, this is a testable assumption if household consumption is available for both of the periods under consideration. As discussed earlier, previous studies commonly assume that the distributions of the household consumption parameters β1 , µ1 , and ε 1 in equations (1) and (2) based on the data in survey (or period) 1 remain the same for the data in survey (period) 2. Assumption 2 is less restrictive since it allows for the estimated parameters to change over time, as long as the changes in the distribution of the variables x alone can correctly capture the change in poverty rate. Technically speaking, Assumption 2 only requires that, overall the parts of the consumption distributions below the poverty line for both periods (that can be explained by the changes in x in our model) be equal and not all the percentiles along the consumption distributions be equal as implied by the assumption made in existing studies; this result is formally stated in Corollary 1.2 below. 12 12 Assumption 2 is also more general in the sense that, it practically allows for the estimated parameters to change even in different directions, as long as the changes in the x variables can capture the net changes in poverty given the estimated parameters in period 1. Another difference between Assumption 2 and the stricter assumption of constant parameters related to model checking, is that the backward imputation (i.e., using the predicted coefficients from the later survey round to impute backwards on the data in the earlier survey round) may not necessarily yield the same results as the forward imputation. The difference in terms of prediction accuracy between the two would also depend on the changes in these predicted coefficients, in addition to the changes in the x characteristics over time. The 15 Given these two assumptions, we propose the following proposition that lays out the estimation framework. 13 Proposition 1: Imputation framework Given Assumptions 1 and 2, the poverty rate based on data in survey 2 can be predicted using data in survey 1. In particular, let P(.) be the poverty function and y1 2 be defined as β1 ' x2 + µ1 + ε 1 , we have P ( y 2 ) = P ( y1 2) (4) Corollary 1.1 ˆ ,µ Let β ˆ1 represent the estimated parameters obtained from equation (1) and let ˆ1 , and ε 1 ˆ ~ ~ ~ ~ ˆ1 y ˆ ˆ ˆ 2 , s = β1 ' x2 + µ1, s + ε 1, s , where µ1, s and ε 1, s represent the s random draw from their estimated ˆ th distributions. The poverty rate P2 in period 2 and its variance can be estimated as 1 S i) P2 = ∑ P( y ˆ ˆ12 , s ≤ z1 ) (5) S s =1 1 S 1 S ˆ ii) V ( P2 ) = ∑ V ( P2,s | x2 ) + V ( ∑ P2,s | x2 ) ˆ ˆ (6) S s =1 S s =1 Corollary 1.2 Instead of Assumption 2, assume the traditional but more restrictive assumption that the consumption model parameters in equation 1 remain the same in period 2 (that is, β1 ≡ β 2 , µ1 ≡ µ 2 , and ε 1 ≡ ε 2 ). Given Assumption 1 and this stricter assumption, we have W ( y 2 ) = W ( y1 2) (7) where W(.) is a general one-to-one mapping welfare function, which includes the poverty function P(.) as a special case. Proof. Appendix 1. Some remarks about Proposition 1 and its corollaries may be useful. First, the simulation of the error terms for households in survey 2 is mandatory rather than a matter of choice since we former type of changes is set to zero under the stricter assumption but allowed to occur with our more general assumption. 13 Note that in situations where Assumption 1 fails (e.g., one survey is representative of the whole population while the other survey specially targets a population segment such as elderly people), survey imputation may still be feasible conditional on the fact that Assumption 2 holds. In such cases, Assumption 2 essentially boils down to implying that the estimated parameters for equation (1) with the appropriate adjustments (say, by including the dummy variables for different population groups) apply to the population group targeted by the other survey. 16 are working with two cross sections, which by definition precludes the linkage of households in survey 1 to those in survey 2. Second, we use the poverty line in period 1 in equation (5) rather than the poverty line in period 2 to be consistent with the estimated parameters that are also obtained from the data in period 1. More generally, the poverty line to be used should come from the same time period as the estimated parameters. The consistency between these estimated parameters in the same period is by construction, and can in fact provide more comparable poverty estimates in contexts where there is reason to believe the poverty line (and/ or consumption aggregates) is not consistently updated across the two different periods. Third, the variance for the estimated poverty rate in (6) consists of two components, one for the variance of the estimated poverty rate conditional on household characteristics averaged over the S simulations (i.e. first term on the right hand side in (6)), and the other the variance of the average of the predicted poverty rate (the second term on the right hand side in (6)). This is related to Rubin’s (1987) variance formula, the difference being that we exclude a component due to simulation errors in his formula. 14 The reason is simple, if the number of simulations is large enough, this component would be negligible. We thus recommend using a large number of simulation (e.g., at least 1,000 simulations) in the estimation procedures proposed in the next section. 15 Furthermore, the first and second terms on the right hand side in (6) correspond to the variance resulting from the survey design (or sampling error) and the fitness of the regression model (or modelling error). Thus if the regression model has a good model fit and the usual 14 Rubin’s variance formula is in turn based on the standard variance decomposition formula which provides the unconditional variance as the sum of the mean of the conditional variance and the variance of the conditional mean. 15 Given ever increasing computer speed, this number of simulation should not be a cause of concern. For example, given a sample of around 11,000 households for each one of two survey rounds, we can provide a model run for the estimates on poverty rate and its variance using 1,000 simulations in around one minute using a Dell Inspiron laptop with Intel 7 chip in its 3rd generation. A Stata program for our procedures is available upon request. 17 complex survey design with cluster sampling and stratification for most surveys is taken into account, the dominant part of the variance would likely be the first term in (6). 16 Finally, the assumption of constant parameters employed by most, if not all, existing studies is overly restrictive and much more demanding than our Assumption 2. As implied by Corollary 1.2., this assumption can lead to a number of very general results such as any imputed quantities—including mean consumption or any percentile along the consumption distribution— can approximate those based on the true data. These results are sweepingly broad and are thus unlikely to hold under most contexts. We will come back to more discussion on the validity of this assumption in the next section on empirical results. In practice, the set of the observed overlapping variables between the two survey rounds can be small (i.e., few common variables exist between the two surveys), which may effectively result in these variables being unable to capture well the intertemporal change in poverty. Put differently, Assumption 2 may not hold due to the existence of a limited set of overlapping variables, which can in turn invalidate our imputation framework. However, in such cases, if the trend in the unobserved variables across the survey round and the direction of their correlation with household consumption is known (or can be inferred from previous survey rounds), we can still obtain bound estimates of poverty as proposed in the following proposition. Proposition 2: Bound estimates Given Assumptions 1 and 2, if the set of the observed overlapping variables between the survey rounds does not fully capture the change in poverty over time, but assuming that the general trend of the changes in the unobserved variables as well as the direction of their correlation with household consumption is known, we can obtain bound estimates on the poverty rate in period 2. In particular, without loss of generality assuming that these unobserved variables have a positive correlation with household consumption, if this trend is positive, we can obtain an upper bound 16 An implication of this is that the standard error for the imputation-based poverty estimate can in fact be even smaller than that of the true rate if the sample size in survey 2 is much larger than in survey 1 and there is a good model fit. See, e.g., Matloff (1981) for further discussion. 18 estimate on poverty; conversely, if this trend is negative, a lower bound estimate on poverty results. While Proposition 2 appears to require much additional information, it is relevant in such cases as where no data on household assets are available. Since assets are positively correlated with household consumption (see, e.g., Filmer and Pritchett (2001)), additional knowledge about the trend of asset ownership over time (say, from macroeconomic data or qualitative surveys) can be useful in helping determine the bias of estimates. III.2. Validation in the Jordanian Context We turn in this section to discussing poverty imputation using the 2008 and 2010 rounds of the HEIS. Since we have the actual consumption data in 2010, we can validate our imputation method by imputing from 2008 into 2010 to obtain imputation-based poverty estimates pretending that consumption data did not exist in the latter year, and then compare these estimates with the design-based (true) estimates based on the actual consumption data. We provide an overview of the country background and data description before discussing estimation results. III.2.1. Country Background: Poverty in Jordan The official poverty line in Jordan is constructed based on a “cost of basic needs” approach with a common food and non-food basket for all households, where the food consumption is anchored to a national caloric level of 2,347 calories per capita per day. Since consumption habits of rich and poor households may differ greatly, the poverty line was based on the revealed consumption patterns of the bottom 30 percent of the population (regarded as poor or near-poor) as reflected in the 2010 HEIS (World Bank, 2012b). The national annual poverty line for 2010 is thus set at 813.7 JD per individual, 17 yielding the official poverty rate of 14.4 percent for this year. This poverty line is then fixed for 2010 and is adjusted for changes in the cost of living 17 This line is equivalent to 3.42 US Dollars per day in 2005 PPP terms. 19 using official CPI deflators to obtain a comparable poverty line in 2008 and its associated poverty rate of 19.5 percent. Macroeconomic trends shown in Figure 1 appear to corroborate the poverty decline as shown by the household consumption data, since the downward sloping poverty trend is consistent with that of growth in real GDP per capita. The period between 2002 and 2007 sees rapid growth, which, however, slows down in the subsequent period between 2008 and 2010. Real GDP per capita grew by 3 percent and poverty was estimated to fall by about 5 percentage points in this latter period. While poverty could be tracked between 2008 and 2010 with the consumption data from the HEIS, no consumption data exists after 2010 that can be used to monitor poverty trends. Projections show per capita GDP growth to be weak, but this alone does not say much about poverty trends. The recent subsidy reforms and the associated cash transfer could well impact poverty, as could the various economic stresses including a continued weak labor market, increased energy prices, and a large influx of war refugees from Syria. 18 Against the background of infrequent collection of consumption data, the country’s economically uncertain atmosphere provides an even stronger impetus for policy makers to track poverty with alternative methods like imputation-based estimates. III.2.2. Data Description for the HEIS We use the most recent two rounds of Jordan’s Household Expenditure and Income Survey (HEIS) in 2008 and 2010, which are the nationally representative surveys used to produce official poverty statistics. The HEIS has been implemented nine times since 1966, and every other year between 2006 and 2010. In addition to household expenditures, it collects data on 18 According to the UNHCR (http://data.unhcr.org/syrianrefugees/country.php?id=107), in July 2012 there were about 29,000 registered Syrian refugees in Jordan; a year later the number of refugees rose to about 115,000, and by August 2014 the number further increased to slightly more than 600,000, which is roughly a tenth of Jordan’s population. 20 other household characteristics including demographics, employment, assets, and incomes. This survey’s sampling frame comes from the 2004 Population and Housing Census and is divided into 89 strata (or sub-districts). The survey is typically administered over a 12 month period and follows a two-stage sampling design where census enumeration areas serve as primary sampling units (PSUs). For the 2010 survey round, 1,736 PSUs were selected in the first stage out of a total of 13,027 PSUs for the whole country using a systematic probability proportionate to size (PPS) sampling method. Within each selected PSU or cluster, 8 households were randomly selected at the second stage. The 2008 and 2010 rounds of the HEIS collected consumption data respectively for 10,961 and 11,223 households. The questionnaire design of these two survey rounds remains essentially the same. III.2.3. Estimation Results We start first with checking on Assumptions 1 and 2 before discussing estimation results. Since the 2008 and 2010 rounds of the HEIS share the same sampling frame based on the 2004 Population and Housing Census, and their questionnaire design remains almost identical, Assumption 1 for a similar survey design is satisfied. Assumption 2 is usually assumed and can only be checked if data for both survey rounds are available. In this case, since we are validating estimates with the actual consumption data, we can check this assumption using these data in both survey rounds. We propose an explicit test for this assumption. Specifically, we can use a decomposition that is similar in spirit to the Oaxaca-Blinder framework (Oaxaca, 1973; Blinder, 1973), where the change in poverty between the survey rounds can be broken down into two components, one due to the changes in the estimated coefficients (the first term in square brackets in equation (8) below) and the other the changes in the x characteristics (the second term in square brackets in 21 equation (8) below). Assumption 2 would be satisfied if the poverty change is mostly explained by the latter component. This can be expressed as [ P( y 2 ) − P( y1 ) = P( y 2 ) − P( y1 ] [ 1 2 ) + P ( y 2 ) − P ( y1 ) ] (8) = [P( β 2 ' x2 + η 2 ) − P( β1 ' x2 + η1 )] + [P( β1 ' x2 + η1 ) − P( β1 ' x1 + η1 )] where η j is defined as µ j + ε j , j= 1, 2, for less cluttered notation. 19 Decomposition results are provided in Table 2, where seven different models are used. These models are built on a cumulative basis, with later models sequentially adding more variables to earlier models. The reason is that few common variables may exist between survey rounds in other settings—especially with surveys of different designs as will be discussed in the next section—thus using different models with different sets of control variables would provide a useful illustration. Model 1 is the most parsimonious model and consists of household size, household heads’ age, age squared, gender, highest completed years of schooling, and a dummy variable indicating whether the head is Jordanian, and a dummy variable indicating urban residence. Model 2 adds to Model 1 the household demographics such as the shares of household members in the age ranges 0-14, 15-24, and 25-59 (with the reference group being those 60 years old and older). Model 3 adds to Model 2 employment variables, which include dummy variables indicating whether the head worked in the past week, whether the household has at least one female member working in the past week, whether the household has one member working as employer, whether the household has a member who is self-employed. These employment variables are commonly collected in most household surveys, and can provide a richer model than Model 1 while still keeping the model relatively parsimonious for most applications. 19 We estimate equation (8) for all households and then take the population averages rather than estimate this quantity at the means of x. Similar to estimating the marginal effects in a probit model, the latter way of estimation may only capture a small fraction of the population (Wooldridge, 2010) and thus are not representative of the data. 22 Model 4 adds to Model 3 some asset variables including the number of rooms in the house, the construction materials for the outside wall of the building, the sources of drinking water, 20 and whether the household owns a car, computer, television set, desk phone, cell phone, internet, air conditioner, microwave, and a water filter. Model 5 adds a more detailed list of asset variables, which include the physical characteristics of the house, the energy sources for cooking, whether the household has a satellite dish/ cable, video player, radio, camera, fax machine, fridge, freezer, oven, gas-operated oven, dishwasher, washing machine, vacuum cleaner, solar boiler, and a sewing machine. As an alternative to not adding all these other variables other than the basic ones in Model 3, Model 6 adds to the latter log of per capita income. Finally, Model 7 adds to Model 5 log of per capita income. Full model specifications are provided in Appendix 2, Tables 2.1 or Table 2.2. Estimation results suggest that, unsurprisingly, as the list of control variables becomes richer, the change in poverty that can be explained by the x characteristics grows proportionately larger. For example, this component increases from around 70 percent in Models 1 to 3 to more than 80 percent in Models 4 and 5, and finally more than 100 percent in Models 6 and 7. 21 This indicates 20 These variables are categorical, and we slightly revise them such that higher values (in parentheses) indicate more favorable values as follows: i) outside wall of the building: clean stones (6), clean stones with fortified cement (5), fortified cement (4), cement building blocks (3), clay building blocks (2), asbestos, zinc, tin (1); and ii) sources of drinking water: spring water (6), mineral water (5), water tank (4), tub well (3), rain water (2), general water network (1). These orderings are the same as in the wording of the questionnaires. We also experimented with using different dummy variables instead of the (revised) original categorical variables but found that estimation results are more accurate with the latter. Also note that it is generally ill-advisable to include certain assets whose correlations with consumption change dramatically over the two periods due to other factors such as technology (for example, in certain developing countries cell phones could get mass produced quickly and their prices were lowered to the extent that they could no longer be considered a luxury goods in the second period). 21 The fact that the component of the change in consumption/ poverty explained by the changes in the estimated coefficients switches from positive to negative further highlights the flexibility of Assumption 2 compared to the commonly made assumption of constant parameters. However, note that model specifications where the changes in the explanatory variables x can explain much more than 100 percent of the changes in consumption may also indicate model overfitting. In addition, using backward imputation from 2010 to 2008 as an indirect test on this stricter assumption, while estimated poverty rates range from 17 percent under Model 1 to 22.3 percent under Model 7, only the estimated poverty rate under Model 6 (20.5 percent) fall within the 95 percent confidence interval of the true rate. 23 that Assumption 2 is satisfied with Models 6 and 7, perhaps likely to be satisfied with Models 4 and 5, and less likely to be satisfied with the remaining models. As an additional check, we also present decomposition results for the changes in poverty using a more restrictive probit model (see, e.g., Yun (2004)). 22 Estimation results (Panel B, Probit model) are qualitatively similar, and even suggest that in addition to Models 6 and 7, Models 4 and 5 can satisfy Assumption 2. For comparison purposes, we also provide a Wald (Chow) test for the assumption of constant parameters traditionally made in the existing studies. The test procedure is rather straightforward and includes four steps: i) pool data for both years, ii) generate a dummy variable for the second year and then generate interaction terms for this dummy variable with all the control variables, iii) run a regression of household consumption on the usual control variables plus the year dummy variables and all its interaction terms, and iv) test the joint significance of the estimated coefficients on the year dummy variable and its interaction terms. The resulting test (Panel C, Wald test) overwhelmingly rejects the assumption of constant parameters for all seven models considered, which further emphasizes that our less restrictive Assumption 2 is more appropriate. Estimated poverty rates are then provided in Table 3. Consistent with our test for Assumption 2, estimates using Models 4 to 7 are within the 95 percent confidence interval of the true poverty rate, while estimates based on Models 1 to 3 are just outside this interval. Adding a richer list of variables help improve the precision of the estimates significantly for those of Models 1 to 3 to those of Models 4 to 7—as indicated by the point estimates moving from outside to inside the 95 percent confidence interval of to the true rate—but there is practically not much difference among the estimates provided by the latter four models. 22 The probit model is more restrictive than our estimation framework since it converts the continuous household consumption variable into a binary variable for poverty status for the dependent variable, and imposes a standard 2 normal distribution where σ ε is assumed to equal 1. j 24 These validation results provide rather encouraging support for the application of prediction- based method to obtain poverty estimates in the absence of consumption data. Put differently, if consumption data in the 2010 survey round were not available, we could provide reasonable estimates using the consumption data in the 2008 survey round in combination with the household characteristics in the 2010 round. Another interesting result is that, since household assets are known to have a positive correlation with household consumption (as empirically indicated by the regression results in Appendix 2, Tables 2.1. and 2.2), if we know that asset ownership rates are generally increasing over time (as seen in Appendix 2, Table 2.3), we would also know from Proposition 2 that estimation models that omit assets would provide upward biased estimates of the true poverty rate. Indeed, Models 1 to 3’s estimates are around 1.5 percentage points higher than the true rate of 14.4 percent. Thus with additional knowledge on the trend of asset ownership over time, Proposition 2 practically offers a way to obtain a bound estimate on poverty where Assumption 2 is not satisfied. III.2.4. Alternative Imputation Methods We provide other modelling options to our imputation framework in Table 4. The imputation framework that provides the estimates in Table 3 relies on the assumption of a normal distribution for the error terms µ j and ε j , j= 1, 2. Is this a valid assumption? We provide a robustness check by assuming no functional form for these error terms and use their empirical distribution instead. Estimation results (Table 4, row 1) provide accurate estimates only for Models 5 to 7, and higher poverty rates for the remaining models, suggesting that this assumption is reasonable and help improve our estimates. Conversely, we provide another check 25 by using the more restrictive probit model to directly estimate poverty rate. 23 Estimation results are accurate for Models 4 to 6 but inaccurate for Model 7, and similarly provide higher poverty rates for the remaining models. As discussed earlier in the literature review, MI methods are commonly used in statistics. We provide estimates based on the MI method equivalent to our imputation framework in row 3 (the normal linear regression model), and another version that employs predictive mean matching method in row 4 (which essentially matches a household’s predicted consumption level in 2010 with its closest number in the actual consumption data in 2008 and then substitute the former with the latter for the household consumption in 2010; see Little (1988)). Only Models 5 and 6 under the MI normal linear regression model yield accurate results, while all four Models 4 to 7 under the MI predictive mean matching model perform well. This suggests that out of all these alternative modelling options, the MI predictive mean matching method brings the best results. Notably for the misspecified Models 1 to 3, all these alternative modelling options provide more upward biased estimates that are between one and two percentage points higher than those offered by our imputation framework. III.3. Estimation Procedures We thus propose the following estimation procedures to predict the poverty rate in period 2, where consumption data are missing but the relevant characteristics x are available. Step 1: Check that Assumption 1 is satisfied, which involves verifying that key features of the two surveys such as the sampling frames and the questionnaires are (essentially) the same. If data from earlier survey rounds are available, check that the regression model that is used for imputation satisfies Assumption 2 on these data. 23 The difference is that we use a random effects probit model to estimate equations (1) and (2) instead of the linear random effects model, that is, the estimating equation is P ( y j ) = Φ ( β j ' x j + µ j + ε j ) , with j= 1, 2, where Φ (.) is the cumulative normal distribution. 26 Step 2: Using the data in survey round 1, estimate equation (1) and obtain the distributions of ˆ ,µ the predicted parameters β ˆ1 . ˆ1 , and ε 1 ˆ1 , and Step 3: Take a random draw from the normal distributions of the predicted parameters µ ~ ~ ˆ1 obtained in step 1 and denote these by µ ε ˆ1 . Then using these predicted parameters and the ˆ and ε data in survey round 2, estimate the consumption level for each household in round 2 as follows ˆ ~ ~ ˆ1 y ˆ ˆ 2 = β1 ' x 2 + µ + ε 1 (9) Step 4: Estimate the quantity in (5) and the first term on the right hand side in (6) (i.e., ˆ | x ) ), using the given poverty line z1 in survey round 1 and y V (P ˆ1 2 2 2 obtained from Steps 1 and 2 above. Step 5: Repeat steps 3 to 4 S times and save the data with all S simulations. Take the average of the estimated quantity in (5) over the S simulations to obtain the estimate of poverty rate in survey round 2. (We use S= 1,000 in our simulations below.) ˆ | x ) over the S simulations to Step 6: Take the average of the estimated quantity for V ( P2 2 obtain the estimate of the first term on the right hand side in (6). Obtain the estimate for the second term on the right hand side in (6) using the simulated dataset and add this estimated quantity to the estimate for the first term to obtain the estimate of the variance of poverty rate in survey round 2. Step 7 (recommended but optional): Provide additional robustness checks using the empirical distributions of the error terms or other modelling options discussed above. IV. Imputation Combining Surveys of Different Designs to Update Poverty Estimates While it may not seem unreasonable to make the assumption that the two rounds of survey under consideration are representative of the population and consequently produce comparable estimates (Assumption 1), this assumption is restrictive. Contexts where one survey has 27 consumption data while the other does not and the two surveys do not produce the same statistics are much more common for a variety of reasons. We consider the application of poverty imputation in such contexts by relaxing Assumption 1 and analyze the LFS in this section. Jordan’s Department of Statistics is responsible for implementing both the HEIS and the LFS, but different departments within this agency are in charge of each survey and thus conduct them independently according to their different mandates. The HEIS collects consumption data and is implemented biannually, while the LFS does not have consumption data but collects data on labor statistics more frequently on a quarterly basis. IV.1. Making the Different Survey Designs More Comparable The violation of Assumption 1 implies that, the estimated distributions for the common variables for the two surveys in the same period may be different and not representative of the same underlying population. We propose to “standardize” the distributions of the variables in survey 2 by those in survey 1 in Proposition 3 below. 24 Proposition 3: Standardizing common variables in surveys of different design Assume that survey 2 has the same design over time, is collected more frequently than survey 1, and that the time periods data from the former are available include the periods that data from the latter are available. Assume further that the overlapping variables between the two surveys follow a normal distribution such that x1t ~ N ( µ1t , σ 12t ) and x2t ~ N ( µ 2t , σ 2 2 t ) , for t= 1, …, T. We can standardize the variables in survey 2 according to survey 1 for both the overlapping periods and other periods as follows. i) For the overlapping period t, the standardized variables x2→1,t in survey 2 are given by σ 1t x2→1,t = ( x2t − µ 2t ) + µ1t (10) σ 2t ii) For period t’ where only data from survey 2 are available, assuming further that the standardized changes between time t and time t’ are the same for the variables x in σ 1t survey 1 and survey 2 (that is, ( µ − µ ) = ( µ1t ' − µ1t ) ) and the variances of the σ 2t 2t ' 2t 24 Note that these standardization procedures only require identifying the consumption survey as the benchmarking survey in terms of producing poverty estimates, and do not require that all the sample statistics produced from this survey be considered as more “correct” and/ or replace those from the other. 28 σ jt variables x are the same between different rounds of the same survey (that is = 1 , for σ jt ' j= 1, 2). The standardized variables x2 →1,t ' in survey 2 are given by σ 1t x2→1,t ' = ( x2t ' − µ 2t ) + µ1t (11) σ 2t Proof. Appendix 1. The intuition behind Proposition 3 is that, for the overlapping period t between the two surveys, the distribution of the variables in survey 2 can be standardized against those in survey 1 in the standard way. Once this is done, these standardized distributions in period t can be used as a benchmark for other periods when only data from survey 2 exist. The term in parentheses on the right hand side in equation (11) ( ( x2t ' − µ 2t ) ) tracks the changes in the means of the variables in survey 2 over time, which is then rescaled with the relative differences in these variables’ variances between survey 1 and survey 2, and finally made comparable to the distributions of the variables in survey 1 by anchoring to their means. In practice, since the x variables in different rounds of the same survey—particularly if they are adjacent in time—typically have roughly σ 2t ' equal variances, we can assume the within-survey scaling factor for these variables equals σ 2t one. Note that equation (11) is also a general version of equation (10), where the former is identical to the latter when t’= t. We can then modify the estimation procedures provided in Section III.3 for two surveys of different designs by replacing the step of checking on Assumption 1 with the following two steps i) standardizing the distributions of the control variables in survey 2 according to those in survey 1 using Proposition 3 (if necessary), and ii) check that imputation using these standardized variables provide estimates that are not statistically different from the true rate for the same year. 29 For better estimation results, it may also be useful to transform (some) variables in both surveys to normality before standardizing them. We come back to discuss this more in the next section. IV.2. Updating Poverty Estimates with Different Survey Sources IV.2.1. Data Description for the LFS The Employment Unemployment Survey (LFS) is the official source of employment and unemployment data in Jordan. While it shares certain similarities with the HEIS such as a two- stage cluster stratified sampling design and a common sample frame based on the Population and Housing Census of 2004, its design is different. In particular, between 660 and 680 PSUs (depending on the year) were selected in the first stage out of a total of 1,336 PSUs for the whole country, and within each selected PSU, 10 households were randomly selected at the second stage. Twelve governorates are divided into 24 rural and urban strata and the six major cities across the country with more than 100,000 people are strata on their own, which together form 30 strata in total. The LFS collects data on employment status, occupation, and economic activities for between 11,000 and 12,500 households on a quarterly basis, and these data are representative of the population for each quarter. The LFS questionnaire practically remains the same during the period under study. We analyze all 24 quarterly rounds of the LFSs from 2008 to 2013 in this paper. 25 The LFS does not collect data on assets but collects demographic and employment variables, as does the HEIS (those variables are used in Model 3, Table 2). The LFS also collects data on wage income in the past month for each worker, which is categorized in five income groups: less than 100 JD, 100 to 199 JD, 200 to 299 JD, 300 to 499 JD, and 500 JD or more. Since a considerable number (around 38 percent) of household heads did not work and thus had no 25 Half the sample households in the LFS are designed to be renewed across two consecutive years and for two straight quarters within a year. However, DOS does not maintain any identifying information that allows the construction of panel households or individuals over time, and the data provided to us have no non-Jordanians in three quarters in 2011 and 2012. For these reasons, we analyze each quarter of the LFS separately, and average four quarters within each year to obtain the yearly estimates later. 30 income in the past month, we assign zero to the wage income for these individuals to make use of all the data. To match this categorical income variable in the LFS, we convert the continuous per capita income variable in the HEIS into a categorical variable with the same income categories. We provide in Tables 5 and 6 a comparison of the distributions of the common variables across the two surveys for their overlapping years in 2008 and 2010, and test for their differences taking into account the complex survey design. 26 Given the different survey design, it is unsurprising that the means of the variables in the LFS are mostly statistically different from those in the HEIS. For example, households in the LFS generally have younger but more educated heads, a smaller share of young household members (ages 0-14 and 15-24), but higher shares of both younger and working age members (ages 25-59), are less likely to have self- employed members, and more likely to live in urban areas. These differences help emphasize the need to benchmark the variables in the LFS against those in the HEIS. IV.2.2. Estimation Results Using Proposition 3(i), we start first with transforming some positive variables including household size and age in the HEIS and LFS to normality using the Box-Cox method, then standardizing the variables in the LFS according to the distributions of the corresponding variables in the HEIS respectively for 2008 and 2010. As a result, t-tests (not shown) indicate that the distributions of the standardized LFS variables are not statistically different from those in the HEIS, which satisfies Assumption 1. To ensure that Assumption 2 is satisfied, we use the closest version of Model 6 in all the following estimation, where the income variable is in a categorical format as earlier discussed. 26 We implement this test by pooling data from the two surveys, setting the data to incorporate the complex sampling design, and running a (complex survey adjusted) regression of the variable of interest on a dummy variable indicating the survey round. 31 Before presenting poverty estimates for the years when the HEIS are not available, we provide two further validation tests: within-year and across-year validation; if the imputation- based poverty estimates are not statistically different from the true rates for both of these tests, this would provide supporting evidence for the imputation model. For the within-year validation test, we impute from the HEIS into the standardized LFS in the same year and show estimation results in Table 7, Panel B. Estimation results are quite encouraging overall. Estimates for each quarter in both 2008 and 2010 are roughly within the 95 percent confidence interval of the true rates, and estimates for 2008 even fall within one standard error of the latter. Furthermore, results are rather stable with estimates being almost identical for each quarter in the same year. For comparison, we also provide poverty estimates that are based on the original and non- standardized variables in the LFS in Panel A, which clearly show a large downward bias. For the across-year validation, we impute from the HEIS in 2008 into the standardized variables in the LFS in 2010, and can verify that estimates (not shown) are even better and within one standard error of the true rate. Table 8 then provides poverty estimates for the other years where only the non-consumption LFS data are available. Since the LFS data are nationally representative on a quarterly basis, to help increase the consistency with standardizing the variables, we benchmark the LFS variables on a quarter-to-quarter basis for each quarter in 2009 and 2011-2013 respectively against the HEIS-standardized variables in the corresponding quarter in 2008 and 2010. For example, for quarter 1 in 2011, the means and standard deviations ( µ 2t , σ 2t ) in (10) are from quarter 1 in 2010 (but ( µ1t , σ 1t ) are from the 2010 HEIS for all years from 2010 onwards). Estimation results in Table 8 indicate a decreasing trend for poverty rates over time. Since data for each quarter are representative of the population, we can then average the estimates for 32 all four quarters in one year to obtain the yearly estimates and provide them in a graphical illustration in Figure 2. The decreasing trend in poverty is steady, even though is less steep during the years 2010-2013 compared to the previous period 2008-2010, perhaps due to the various events taking place in the economy during this time period as discussed earlier. 27 Notably, estimated poverty rates based on the non-standardized variables (not shown) provide a qualitatively similar decreasing trend over time. The estimated poverty rates at the national level are encouraging. To further investigate whether this result holds at more disaggregated levels, we estimate poverty rates broken down by urban and rural areas. Estimation results (not shown) are rather encouraging for urban areas with estimated poverty rates falling within the 95 percent confidence interval of the true rates in both years. The same is true for estimates for rural areas for 2008 but not in 2010. One possible reason for this is that the Jordanian population is predominantly urban (83 percent, Table 6), thus it can be harder to predict poverty rates in rural areas which account for a smaller share of the population. 28 V. Conclusion In this paper we develop a formal and generalized framework for survey-to-survey poverty imputation, which has been typically handled on an ad hoc basis in the existing literature. We offer less restrictive assumptions and formal tests for these assumptions where data are available, provide more insights into the selection of control variables for model building, and offer simpler variance formulae. Our framework can be generally applied to imputation either from one survey 27 We also experimented with imputation from the HEIS into the DHS. However, one major issue is the latter survey’s most recent two rounds are in 2009 and 2012, which do not overlap with the HEIS, thus making it difficult to benchmark the DHS. We tried benchmarking both rounds of the DHS using the HEIS in 2010, and found a qualitatively similar decreasing trend in poverty across these two survey rounds. 28 It is also more demanding to make the distributions of the explanatory variables comparable for smaller population groups (e.g., as disaggregated by regional characteristics or other distributional characteristics such as quintiles) in surveys of different designs. We leave this extension for further research. 33 to another survey with the same design, or to another survey with a different design. We also provide a critical review of recent studies in the economics and statistics literatures that use imputation. Our estimation results combining the HEISs and the HEISs with the LFSs, are quite encouraging, with imputation-based poverty estimates not showing statistically significant differences from the true poverty rates. We also provide step-by-step estimation procedures that can facilitate the implementation of our proposed methods. Even though we provide an illustration with data from a middle-income country like Jordan, our method is more general and can be applied in other contexts where household consumption surveys are not frequently or consistently collected, while other surveys that can be benchmarked to these household surveys exist. We thus provide support to the growing assessment that survey-to-survey imputation methods can comprise a valuable tool for poverty tracking purposes in developing countries where financial and technical constraints on fielding (expensive and high-quality) consumption surveys can be particularly binding. 34 References Bethlehem, Jelke, Fannie Cobben and Barry Schouten. (2011). Handbook of Nonresponse in Household Surveys. New Jersey: John Wiley & Sons. Blinder, A. S. (1973). “Wage Discrimination: Reduced Form and Structural Estimates”. Journal of Human Resources, 8: 436–455. van Buuren, Stef. (2012). Flexible Imputation of Missing Data. Boca Raton, Florida: CRC Press. Casella, George and Roger L. Berger. (2002). Statistical Inference, 2nd Edition. California: Duxbury Press. Christiaensen, Luc, Peter Lanjouw, Jill Luoto, and David Stifel. (2012). "Small Area Estimation- based Prediction Models to Track Poverty: Validation and Applications.” Journal of Economic Inequality, 10 (2): 267-297. Census Bureau. (2014a). Survey of Income and Program Participation, Data Editing and Imputation. Accessed on the Internet on June 15, 2014 at http://www.census.gov/programs- surveys/sipp/methodology/data-editing-and-imputation.html ---. (2014b). Current Population Survey, Imputation of Unreported Data Items. Accessed on the Internet on June 15, 2014 at http://www.census.gov/cps/methodology/unreported.html Dabalen, Andrew, Errol Graham, Kristen Himelein, and Rose Mungai. (2014). “Estimating Poverty in the Absence of Consumption Data: The Case of Liberia”. Policy Research Working Paper No. 7024. Washington DC: The World Bank. Dang, Hai-Anh and Peter Lanjouw. (2013). “Measuring Poverty Dynamics with Synthetic Panels Based on Cross Sections”. Policy Research Working Paper No. 6504. Washington DC: The World Bank. Dang, Hai-Anh, Peter Lanjouw, Jill Luoto, and David McKenzie. (2014). “Using Repeated Cross-Sections to Explore Movements in and out of Poverty”. Journal of Development Economics, 107: 112-128. Davey, Adam, Michael J. Shanahan, and Joseph L. Schafer. (2001). “Correcting for Selective Nonresponse in the National Longitudinal Survey of Youth Using Multiple Imputation.” Journal of Human Resources, 36: 500–519. Deaton, Angus and Jean Dreze. (2002). “Poverty and Inequality in India: A Re-Examination”. Economic and Political Weekly, 37(36): 3729-3748. Deaton, Angus and Valerie Kozel. (2005). The Great Indian Poverty Debate. New Delhi: Macmillan. DeGroot, Morris H. and Mark H. Schervish. (2012). Probability and Statistics. Boston: Addison- Wesley. 35 Douidich, Mohamed, Abdeljaouad Ezzrari, Roy van der Weide, and Paolo Verme. (2013). “Estimating Quarterly Poverty Rates Using Labor Force Surveys: A Primer.” Policy Research Working Paper No. 6466. Washington DC: The World Bank. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw. (2003). “Micro-Level Estimation of Poverty and Inequality.” Econometrica, 71(1): 355-364. Filmer, Deon and Lant Pritchett. (2001). “Estimating Wealth Effects without Expenditure Data— or Tears: An Application to Educational Enrollments in States of India”. Demography, 38(1): 115–132. Gourieroux, Christian and Alan Monfort. (1997). Simulation-based Econometric Methods. Oxford University Press. Hentschel, Jesko, Jean Olson Lanjouw, Peter Lanjouw, and Javier Poggi. (2000) “Combining Census and Survey Data to Trace the Spatial Dimensions of Poverty: A Case Study of Ecuador”. World Bank Economic Review¸14(1): 147-165. Honaker, James and Gary King. (2010). “What to Do about Missing Values in Time-series Cross-section Data”. American Journal of Political Science, 54: 561-581. International Monetary Fund. (2014). Online World Economic Outlook Database. Jenkins, Stephen P., Richard V. Burkhauser, Shuaizhang Feng, and and Jeff Larrimore. (2011). “Measuring Inequality Using Censored Data: A Multiple-imputation Approach to Estimation and Inference.” Journal of the Royal Statistical Society: Series A, 174(1): 63–81. Kijima, Yoko and Peter Lanjouw. (2003). “Poverty in India during the 1990s: A Regional Perspective.” Policy Research Working Paper No. 3141. Washington DC: The World Bank. King, Gary, James Honaker, Anne Joseph, and Kenneth Scheve. (2001). “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation”. American Political Science Review, 95: 49–69 Little, Roderick J. A. (1988). “Missing-data Adjustments in Large Surveys”. Journal of Business and Economic Statistics, 6: 287–296. Little, Roderick J. A. and Donald B. Rubin. (2002). Statistical Analysis with Missing Data. 2nd Edition. New Jersey: Wiley. Manski, Charles F. (2003). Partial Identification of Probability Distributions. New York: Springer. Matloff, Norman S. (1981). “Use of Regression Functions for Improved Estimation of Means”. Biometrika, 68(3): 685-689. 36 Mathiassen, Astrid. (2009). “A Model Based Approach for Predicting Annual Poverty Rates without Expenditure Data”. Journal of Economic Inequality, 7:117–135. ---. (2013). “Testing Prediction Performance of Poverty Models: Empirical Evidence from Uganda”. Review of Income and Wealth, 59(1): 91–112. Newhouse, D., S. Shivakumaran, S. Takamatsu, and N. Yoshida (2014). “How Survey-to- Survey Imputation Can Fail”. Policy Research Working Paper No. 6961. Washington DC: The World Bank. Oaxaca, Ronald. (1973). “Male-female Wage Differentials in Urban Labor Markets”. International Economic Review, 14: 693–709. Powers, Daniel A., Hirotoshi Yoshioka, and Myeong-Su Yun. (2011). “mvdcmp: Multivariate Decomposition for Nonlinear Response Models”. Stata Journal, 11(4): 556–576. Rao, J. N. K. (2003). Small Area Estimation. New Jersey: Wiley. Ravallion, Martin. (1996). “How Well Can Method Substitute for Data? Five Experiments in Poverty Analysis”. World Bank Research Observer, 11(2): 199-221. Rubin, Donald B. (1977). “Formalizing Subjective Notions about the Effect of Nonrespondents in Sample Surveys”. Journal of American Statistical Association, 72: 538-543. ---. (1978). “Multiple Imputations in Sample Surveys- A Phenomenological Bayesian Approach to Nonresponse.” Proceedings of the Survey Research Methods Section, American Statistical Association, 1978: 20-34. ---. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley. Ridder, Geert and Robert Moffitt. (2007). “The Econometrics of Data Combination”. In Heckman and Leamer. (Eds). Handbook of Econometrics, Volume 6B. Elservier: the Netherlands. Sahn, David E. and David C. Stifel. (2000). “Poverty Comparison over Time and across Countries in Africa”. World Development, 28(12): 2123-2155. Schafer, Joseph L. and John W. Graham. (2002). “Missing Data- Our View of State of the Art”. Psychological Methods, 7(2): 147-177. Sen, Abhijit and Himanshu. (2005). “Poverty and Inequality in India”. In Angus Deaton and Valerie Kozel. (Eds). The Great Indian Poverty Debate. New Delhi: Macmillan. Stifel, David and Luc Christiaensen. (2007). “Tracking Poverty Over Time in the Absence of Comparable Consumption Data”. World Bank Economic Review, 21(2): 317-341. 37 Tarozzi, Alessandro. (2007). “Calculating Comparable Statistics from Incomparable Surveys, With an Application to Poverty in India”. Journal of Business and Economic Statistics, 25(3): 314-336. Tarozzi, Alessandro and Angus Deaton. (2009). “Using Census and Survey Data to Estimate Poverty and Inequality for Small Areas”. Review of Economics and Statistics, 91(4): 773- 792. Wooldridge, Jeffrey M. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, Mass.: MIT Press. World Bank. (2012a). “Well Begun, Not Yet Done: Vietnam’s Remarkable Progress on Poverty Reduction and the Emerging Challenges”. Vietnam Poverty Assessment Report 2012. Hanoi: World Bank. ---. (2012b). “The Hashemite Kingdom of Jordan: A Note on Updating Poverty Measurement Methodology.” Background paper for the Jordan Poverty Reduction Strategy. Yun, Myeong-Su. (2004). “Decomposing Differences in the First Moments”. Economics Letters, 82: 275-280. 38 Table 1: Comparison of Major Features of Missing Data Imputation Methods Used in Recent Studies in Economics and Statistics No Features Economics Statistics One single variable, usually household One or more variables of interest (i.e., 1.1 consumption (i.e., univariate missing data) univariate and multivariate missing data) Type of missing data Missing variables can include both outcome 1.2 Missing variable is the outcome variable and explanatory variables Outcome variable, usually household Common Mechanism consumption, is not collected in the other More general, and can include missing at 2 underlying missing survey (i.e., data missing at random or random (MAR) or not at random (MNAR) data MAR) Assumption of constant estimated Assumption of constant estimated 3.1 parameters parameters Generally follow a Bayesian approach that 3.2 Non-Bayesian approach update estimates iteratedly with estimated Modelling posterior distributions Include non-parametric, semi-parametic or 3.3 Semi-parametric and parameteric methods fully parametric methods Informally select control variables based on Select control variables based on statistical 3.4 Different a mix of economic and statistical theory theory 4 Target population Prediction from one survey to another Imputation within a survey Proportion of Part of the existing survey, usually less than 5 A whole new survey round or 100% missing data 50% Imputation for a single point in time or one 6 Time Imputation for a single point in time period to another Notes : Studies for economics include ELL (2003), Stifel and Christiaensen (2007), Christiaensen et al., (2012), and Mathiassen (2013). Studies for stastistics include Rubin (1987), Little and Rubin (2002), Schafer and Graham (2002), van Buuren (2012), and Carpenter and Kenward (2013). We only consider imputation for cross sections in this table. 39 Table 2: Decomposition of Changes in Household Welfare between 2008 and 2010, Jordan (percentage) Estimated rate Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 A. Our method Due to characteristics 70.0 68.9 72.8 84.2 85.3 114.2 114.0 Due to coefficients 30.0 31.1 27.2 15.8 14.7 -14.2 -14.0 Total 100 100 100 100 100 100 100 B. Probit model Due to characteristics 45.2 44.5 52.2 108.9 113.3 104.5 135.4 Due to coefficents 54.8 55.5 47.8 -8.9 -13.3 -4.5 -35.4 Total 100 100 100 100 100 100 100 C. Wald test for constant parameters χ2 value 445.4 516.4 504.6 236.9 298.7 219.2 240.3 degree of freedom 8 11 16 28 44 17 45 p value 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Control variables Parsimonious Y Y Y Y Y Y Y Demographics N Y Y Y Y Y Y Work sector N N Y Y Y Y Y Household assets N N N Y Y N Y Expanded list of household assets N N N N Y N Y Income per capita N N N N N Y Y N 22,132 22,132 22,078 22,078 22,078 21,699 21,699 Note : The decomposition of the changes in poverty for Panels A and C are implemented using respectively equation (8) and the Wald test as discussed in the text. The decomposition for Panel B uses the the user-written Stata routine "mvdcmp" (Powers, Yoshioka, and Yun, 2011). All estimates adjust for complex survey design with cluster sampling and stratification. Full model specificaiton is provided in Appendix 2, Table 2.1. 40 Table 3: Predicted Poverty Rates Based on Imputation from the 2008 HEIS into the 2010 HEIS, Jordan (percentage) Estimated rate True rate Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Estimated rate in 2010 15.9 16.0 15.8 15.2 15.1 13.7 13.7 14.4 (0.6) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) Control variables Parsimonious Y Y Y Y Y Y Y Demographics N Y Y Y Y Y Y Work sector N N Y Y Y Y Y Household assets N N N Y Y N Y Expanded list of household assets N N N N Y N Y Income per capita N N N N N Y Y σe 0.35 0.33 0.33 0.28 0.29 0.29 0.26 σu 0.18 0.17 0.15 0.11 0.10 0.11 0.09 ρ 0.20 0.21 0.18 0.12 0.11 0.13 0.10 R 2 0.49 0.54 0.56 0.72 0.73 0.70 0.76 N 11,176 11,176 11,142 11,142 11,142 10,908 10,908 11,223 Note : Standard errors are in parentheses. We use 1,000 simulations for the error terms. All estimates adjust for complex survey design with cluster sampling and stratification. The underlying regression results are provided in Appendix 2, Table 2.1. True poverty rate is the same for the estimation samples used in Models 1 to 7. 41 Table 4: Predicted Poverty Rates Based on Alternative Imputation Methods, Jordan 2008-2010 (percentage) Estimated rate Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 1) Using empirical distribution 16.7 16.8 16.6 15.2 15.2 14.1 13.7 of the error terms (0.6) (0.6) (0.6) (0.6) (0.5) (0.5) (0.5) 2) Direct estimation of poverty 16.9 17.0 16.6 14.2 14.1 13.9 12.9 rate using probit model (0.3) (0.3) (0.3) (0.4) (0.4) (0.3) (0.4) 3) Using MI method, normal 17.8 18.2 17.8 15.5 15.5 14.7 14.0 linear regression model (0.8) (0.8) (0.8) (0.8) (0.8) (0.7) (0.7) 4) Using MI method, predictive 17.8 17.3 16.8 14.8 14.8 14.1 13.5 mean matching model (0.8) (0.8) (0.8) (0.7) (0.7) (0.7) (0.7) Note : Standard errors are in parentheses. We use 1,000 simulations for the error terms for simulation using the empirical distribution of the error terms (row 1) or using the probit model (row 2), and 50 simulations for MI methods (rows 3 and 4). The true poverty rate for 2010 is 14.4 percent, with a standard error of 0.5 percent. Model specification is the same as in Table 3 and provided in more details in Appendix 2, Table 2.1. All estimates adjust for complex survey design with cluster sampling and stratification. 42 Table 5: Comparison of Common Variables between the HEIS and the LFS, Jordan 2008 Quarter 1 Quarter 2 Quarter 3 Quarter 4 HEIS LFS diff. LFS diff. LFS diff. LFS diff. Household size 6.66 5.30 -1.37*** 5.32 -1.35*** 5.18 -1.48*** 5.09 -1.58*** (2.44) (2.38) (0.04) (2.38) (0.04) (2.35) (0.04) (2.29) (0.04) Head's age 47.86 47.18 -0.68*** 47.35 -0.51** 46.88 -0.98*** 46.65 -1.21*** (12.81) (14.45) (0.22) (14.49) (0.23) (14.29) (0.21) (14.26) (0.21) Head's age squared 2454.50 2435.19 -19.32 2452.21 -2.30 2402.18 -52.33** 2379.60 -74.90*** (1320.48) (1502.27) (22.39) (1510.73) (23.43) (1473.33) (21.91) (1467.12) (21.99) Head is male 0.92 0.88 -0.04*** 0.87 -0.05*** 0.86 -0.06*** 0.87 -0.05*** (0.27) (0.33) (0.00) (0.34) (0.01) (0.34) (0.00) (0.33) (0.00) Head is Jordanian 0.96 0.94 -0.02*** 0.94 -0.02*** 0.94 -0.02*** 0.94 -0.02*** (0.20) (0.24) (0.00) (0.24) (0.00) (0.23) (0.00) (0.24) (0.00) Head's highest years of schooling 9.54 9.96 0.41*** 9.99 0.45*** 9.93 0.39*** 9.88 0.34*** completed (4.85) (4.98) (0.08) (5.03) (0.08) (4.95) (0.08) (4.86) (0.08) 0.62 0.61 -0.02* 0.62 -0.01 0.60 -0.02** 0.61 -0.01 Head worked in past 7 days (0.48) (0.49) (0.01) (0.49) (0.01) (0.49) (0.01) (0.49) (0.01) Share of household members age 0- 0.35 0.32 -0.03*** 0.32 -0.03*** 0.32 -0.04*** 0.32 -0.03*** 14 (0.25) (0.26) (0.00) (0.26) (0.00) (0.26) (0.00) (0.26) (0.00) Share of household members age 15- 0.22 0.18 -0.04*** 0.19 -0.04*** 0.19 -0.03*** 0.18 -0.04*** 24 (0.21) (0.21) (0.00) (0.22) (0.00) (0.22) (0.00) (0.22) (0.00) Share of household members age 25- 0.35 0.39 0.04*** 0.38 0.03*** 0.39 0.04*** 0.39 0.04*** 59 (0.16) (0.21) (0.00) (0.21) (0.00) (0.21) (0.00) (0.22) (0.00) Share of household members age 60 0.07 0.11 0.04*** 0.11 0.04*** 0.11 0.04*** 0.11 0.04*** or older (0.16) (0.23) (0.00) (0.24) (0.00) (0.24) (0.00) (0.24) (0.00) Share of household members 0.21 0.23 0.02*** 0.24 0.03*** 0.23 0.02*** 0.23 0.02*** working in past 7 days (0.16) (0.19) (0.00) (0.19) (0.00) (0.19) (0.00) (0.19) (0.00) Household has at least one female 0.16 0.17 0.01** 0.19 0.03*** 0.15 -0.01 0.17 0.01 member working in past 7 days (0.36) (0.38) (0.01) (0.39) (0.01) (0.36) (0.01) (0.37) (0.01) Household has at least one member 0.07 0.07 0.01 0.08 0.01*** 0.07 0.01 0.07 0.00 working as employer (0.25) (0.26) (0.00) (0.27) (0.00) (0.26) (0.00) (0.26) (0.00) Household has at least one member 0.11 0.08 -0.03*** 0.09 -0.02*** 0.09 -0.03*** 0.10 -0.01** self-employed (0.32) (0.27) (0.01) (0.29) (0.01) (0.28) (0.01) (0.30) (0.01) Urban 0.82 0.85 0.03*** 0.85 0.03*** 0.84 0.03*** 0.85 0.03*** (0.39) (0.36) (0.01) (0.36) (0.01) (0.36) (0.01) (0.36) (0.01) Employment income 2.11 1.89 0.23*** 2.04 0.07** 2.05 0.07* 2.05 0.07* (1.84) (1.69) (0.04) (1.78) (0.04) (1.78) (0.04) (1.76) (0.04) N 10,936 12,004 11,789 11,832 11,925 Note : * p<0.05, ** p<0.01 *** p<0.001. Standard deviations/ errors are in parentheses. Differences are estimated with t-tests that takes into account complex survey design with cluster sampling and stratification. 43 Table 6: Comparison of Common Variables between the HEIS and the LFS, Jordan 2010 Quarter 1 Quarter 2 Quarter 3 Quarter 4 HEIS LFS diff. LFS diff. LFS diff. LFS diff. Household size 6.32 5.15 -1.17*** 5.13 -1.19*** 5.03 -1.29*** 5.04 -1.29*** (2.23) (2.25) (0.04) (2.24) (0.04) (2.20) (0.04) (2.22) (0.04) Head's age 48.38 47.03 -1.35*** 47.33 -1.05*** 47.18 -1.20*** 47.08 -1.30*** (12.82) (14.36) (0.22) (14.32) (0.22) (14.39) (0.22) (14.41) (0.22) Head's age squared 2504.47 2417.54 -86.94*** 2444.84 -59.64*** 2432.93 -71.54*** 2424.10 -80.38*** (1338.54) (1491.15) (22.48) (1483.63) (22.33) (1500.16) (23.38) (1497.27) (22.76) Head is male 0.91 0.87 -0.03*** 0.87 -0.04*** 0.87 -0.04*** 0.86 -0.05*** (0.29) (0.33) (0.00) (0.34) (0.01) (0.34) (0.01) (0.34) (0.01) Head is Jordanian 0.96 0.94 -0.02*** 0.94 -0.03*** 0.94 -0.03*** 0.94 -0.02*** (0.19) (0.24) (0.00) (0.24) (0.00) (0.24) (0.00) (0.24) (0.00) Head's highest years of schooling 9.72 10.12 0.40*** 10.19 0.47*** 10.08 0.36*** 10.13 0.41*** completed (4.66) (4.86) (0.08) (4.91) (0.08) (4.95) (0.08) (4.90) (0.08) 0.61 0.62 0.01 0.62 0.00 0.58 -0.03*** 0.60 -0.01 Head worked in past 7 days (0.49) (0.48) (0.01) (0.49) (0.01) (0.49) (0.01) (0.49) (0.01) Share of household members age 0.34 0.32 -0.02*** 0.32 -0.03*** 0.32 -0.03*** 0.31 -0.03*** 0-14 (0.25) (0.26) (0.00) (0.26) (0.00) (0.26) (0.00) (0.26) (0.00) Share of household members age 0.23 0.19 -0.04*** 0.19 -0.04*** 0.18 -0.05*** 0.19 -0.04*** 15-24 (0.21) (0.21) (0.00) (0.22) (0.00) (0.21) (0.00) (0.22) (0.00) Share of household members age 0.36 0.39 0.03*** 0.39 0.03*** 0.39 0.04*** 0.40 0.04*** 25-59 (0.17) (0.21) (0.00) (0.21) (0.00) (0.21) (0.00) (0.21) (0.00) Share of household members age 0.07 0.10 0.03*** 0.11 0.03*** 0.11 0.04*** 0.11 0.03*** 60 or older (0.17) (0.23) (0.00) (0.24) (0.00) (0.24) (0.00) (0.24) (0.00) Share of household members 0.22 0.24 0.02*** 0.24 0.02*** 0.21 -0.01*** 0.23 0.01*** working in past 7 days (0.16) (0.20) (0.00) (0.19) (0.00) (0.18) (0.00) (0.20) (0.00) Household has at least one 0.17 0.19 0.02*** 0.19 0.02*** 0.12 -0.04*** 0.18 0.01 female member working in past 7 (0.38) (0.39) (0.01) (0.39) (0.01) (0.33) (0.01) (0.38) (0.01) Household has at least one 0.09 0.08 -0.01 0.08 -0.01 0.07 -0.02*** 0.07 -0.01** member working as employer (0.28) (0.27) (0.01) (0.27) (0.00) (0.25) (0.00) (0.26) (0.00) Household has at least one 0.12 0.11 -0.02*** 0.11 -0.02*** 0.09 -0.03*** 0.09 -0.03*** member self-employed (0.33) (0.31) (0.01) (0.31) (0.01) (0.29) (0.01) (0.29) (0.01) Urban 0.83 0.84 0.02*** 0.84 0.02*** 0.84 0.01** 0.84 0.01*** (0.38) (0.36) (0.01) (0.36) (0.01) (0.37) (0.01) (0.37) (0.01) Employment income 2.17 2.19 -0.02 2.19 -0.02 2.15 0.02 2.13 0.04 (1.90) (1.82) (0.04) (1.86) (0.04) (1.83) (0.04) (1.83) (0.04) N 11,142 11,816 11,721 11,548 11,575 Note : * p<0.05, ** p<0.01 *** p<0.001. Standard deviations/ errors are in parentheses. Differences are estimated with t-tests that takes into account complex survey design with cluster sampling and stratification. 44 Table 7: Predicted Poverty Rates Based on Imputation from the HEIS into the LFS, Jordan 2008-2010 (percentage) 2008 Year Quarter 1 Quarter 2 Quarter 3 Quarter 4 True rate 2008 12.9 12.4 11.8 11.3 19.5 (0.4) (0.4) (0.3) (0.3) (0.6) Panel A: Non- N 12,004 11,789 11,832 11,925 10,956 standardized 2010 10.1 9.9 9.8 9.5 14.4 characteristics (0.3) (0.3) (0.3) (0.3) (0.5) N 11,816 11,721 11,548 11,575 11,223 2008 19.7 19.7 19.6 19.3 19.5 (0.4) (0.5) (0.4) (0.4) (0.6) Panel B: N 12,004 11,789 11,832 11,925 10,956 Standardized 2010 15.1 15.2 15.2 15.1 14.4 characteristics (0.4) (0.4) (0.4) (0.4) (0.5) N 11,816 11,721 11,548 11,575 11,223 Note : Imputation-based estimates for poverty rates using the LFS data in 2008 and 2010 are shown under the columns "Quarter 1" to "Quarter 4". True poverty rate estimated from the HEIS for each year is shown under the column "True Rate". Model 6 in Table 3 is used to estimate the underlying consumption model, which regresses household per capita consumption on household size, household head's age, age squared, gender, marital status, nationality, years of schooling, work status in the past 7 days, the shares of household members in the age ranges 0-14, 15-24, and 25-59, the share of members working in the the past 7 days, and dummy variables indicating whether the household has at least one female member working in the past 7 days, at least one member working as employer, at least one member being self-employed, per capita income, and whether the household resides in an urban area. These control variables in each quarter of the LFS are standardized by those in the HEIS such that the former have the same weighted mean and standard deviation as the latter respectively in 2008 and 2010. The variables household size and age in both surveys are transformed to normality before standardizing using the Box Cox method. 1,000 simulations are used for the estimates in each quarter. Standard errors are in parentheses. All estimates adjust for complex survey design with cluster sampling and stratification. 45 Table 8: Predicted Poverty Rates Based on Imputation from the HEIS into the LFS, Jordan 2009, 2011-2013 (percentage) Year Quarter 1 Quarter 2 Quarter 3 Quarter 4 Estimated rate 17.5 18.8 19.0 19.1 2009 (0.4) (0.4) (0.4) (0.4) N 11,731 11,765 11,769 11,686 Estimated rate 14.2 15.5 14.5 13.9 2011 (0.4) (0.4) (0.4) (0.4) N 11,184 11,767 11,291 11,607 Estimated rate 13.3 13.6 13.3 13.2 2012 (0.4) (0.4) (0.4) (0.4) N 11,599 11,657 11,393 11,011 Estimated rate 13.1 13.4 12.8 12.8 2013 (0.4) (0.4) (0.4) (0.4) N 11,327 11,321 11,132 11,147 Note : Imputation-based estimates for poverty rates using the LFS data in 2008 and 2010 are shown under the columns "Quarter 1" to "Quarter 4". True poverty rate estimated from the HEIS for each year is shown under the column "True Rate". Model 6 in Table 3 is used to estimate the underlying consumption model, which regresses household per capita consumption on household size, household head's age, age squared, gender, marital status, nationality, years of schooling, work status in the past 7 days, the shares of household members in the age ranges 0-14, 15-24, and 25-59, the share of members working in the the past 7 days, and dummy variables indicating whether the household has at least one female member working in the past 7 days, at least one member working as employer, at least one member being self-employed, per capita income, and whether the household resides in an urban area. These control variables in each quarter of the LFS are standardized by those in the HEIS and LFS respectively in 2008 and 2010 using Proposition 3(ii). The variables household size and age in both surveys are transformed to normality before standardizing using the Box Cox method. 1,000 simulations are used for the estimates in each quarter. Standard errors are in parentheses. All estimates adjust for complex survey design with cluster sampling and stratification. 46 Figure 1: GDP Trends in Real and Nominal Terms and its Growth Rates Data source: International Monetary Fund, World Economic Outlook Database, April 2014. Estimates start after 2010. 47 Figure 2: Predicted Poverty Trends Combining the HEIS and LFS, Jordan 2008-2013 48 Appendix 1: Proofs Proposition 1 Given our consumption model for survey 1 and survey 2 y1 = β1 ' x1 + µ1 + ε 1 (A1.1) y 2 = β 2 ' x2 + µ 2 + ε 2 (A1.2) By Assumption 1, since both x1 and x2 are representative of the population (either at the same time or different time periods), we can replace x1 with x2 in equation (A1.1) to obtain the imputed household consumption in survey 2. y12 = β1 ' x2 + µ1 + ε 1 (A1.3) Writing out Assumption 2 in full, where by definition ∆P = P( y 2 ≤ z 2 ) − P( y1 ≤ z1 ) (A1.4) and P(∆x | Θ j ) = P( β1 ' ( x2 − x1 ) + µ1 + ε 1 ≤ z1 ) = P( y1 2 ≤ z1 ) − P ( y1 ≤ z1 ) (A1.5) Setting equal the right hand sides of equations (A1.4) and (A1.5), it follows that P( y12 ≤ z1 ) = P ( y 2 ≤ z 2 ) (A1.6) where P(.) is the given poverty function. Corollary 1.1 i) Since the poverty function P(.) is defined as the averaged poverty rate for the population, it is an expectation function. Using the iterated expectation rule, 29 we can rewrite equality (A1.6) as E ( P( y12 | x2 ) ≤ z1 ) = E ( P ( y 2 | x2 ) ≤ z 2 ) (A1.7) or equivalently, E ( P( β1 ' x2 + µ1 + ε 1 ≤ z1 )) = P( y 2 ≤ z 2 ) (A1.8) We can estimate the first term on the left hand side in equality (A1.8) by plugging in the estimated parameters for β ˆ ,µˆ1 , and εˆ1 1 ˆ 'x + µ E ( P( β ˆ +ε ˆ ≤ z )) = P( y ≤ z ) (A1.9) 1 2 1 1 1 2 2 Since we do not exactly know the error terms µ ˆ1 in survey one that are associated with the ˆ1 , and ε characteristics x2 in survey 2, we can simulate these error terms from their estimated distributions and can approximate the first term on the left hand side in equality (A1.9) as 1 S ˆ 'x + µ ∑ S s =1 ˆ1 P( y 2 , s ≤ z1 )    → E ( P( β asymptotically 1 2 ˆ1 ≤ z1 )) ˆ1 + ε (A1.10) ˆ ~ ~ ~ ~ ˆ1 where y ˆ ˆ ˆ 2 , s = β1 ' x2 + µ1, s + ε 1, s , and µ1, s and ε 1, s represent the s random draw from the estimated ˆ th distributions for µ ˆ1 (see, e.g., Gourieroux and Monfort, 1997). The number of simulations ˆ1 and ε S should thus be large enough for equality (A1.10) to hold. ii) The proposed variance formula is based on the total variance formula provided in equality (5.20) in Little and Rubin (2002), where 30 ˆ ) = 1 ∑V (Pˆ | x ) +V(1 ∑ P ˆ | x ) + 1 V(1 ∑P S S S V (P ˆ |x ) (A1.11) 2 2,s 2 2,s 2 2,s 2 S s =1 S s =1 S S s =1 29 See, e.g., theorem 4.4.3 in Casella and Berger (2002). 30 Note that even though Rubin derives this variance formula to use with a Bayesian approach, it can also generally fit under the regular standard frequentist’s framework (see, e.g., Rubin, 1987, pp. 67-68). 49 When S tends to infinity (or is practically large enough), the third term on the right hand side in equality (A1.11) will vanish, thus the stated result follows. Corollary 1.2 Given the stricter assumption of constant parameters in place of Assumption 2, that is β1 ≡ β 2 , µ1 ≡ µ 2 , and ε 1 ≡ ε 2 ., it follows that 2 = β1 ' x2 + µ1 + ε 1 ≡ β 2 ' x2 + µ 2 + ε 2 = y 2 y1 (A1.12) which leads to W ( y1 2 ) = W ( y2 ) (A1.13) where W(.) is a very general welfare function that only needs to satisfy the one-to-one mapping condition from the range of y1 2 to that of y 2 . For example, W(.) can be a cumulative distribution function (cdf) F ( y1 2 ≤ k ) = F ( y2 ≤ k ) (A1.14) with k being any given constant. Clearly, this cdf would include the poverty function as a special case when k equals the poverty line z. Another example is W(.) can be an expectation function E ( y1 2 ) = E ( y2 ) (A1.15) Thus Assumption 2 is less restrictive and allows for the more general case where equality (A1.12) may or may not hold. Proposition 2 Using a general matrix notation for the population where Yj and Xj are njx1 and njxk respectively, β j is kx1, and η j is njx1 and represents the vector of error terms, we can break down Xj into two components, one is the observed variables Xj1 (njxk1) and the other the unobserved variables Xj2 (njxk2), for k1+ k2= k, and j= 1, 2. We can rewrite equations (1) and (2) in a general format as Y j = X j1 ' β j1 + X j 2 ' β j 2 + η j (A1.16) where β j1 and β j 2 are k1x1 and k2x1 accordingly. If imputed correctly, the predicted household consumption in survey 2 should be Y ˆ + X 'β ˆ* = X ' β ˆ +ηˆ1 (A1.17) 2 21 11 22 12 However, since Xj2 are unobserved, the second term on the right hand side in equation (A1.17) is absorbed into the error terms, and the predicted household consumption is instead Yˆ = X 'βˆ + νˆ + ηˆ1 (A1.18) 2 21 11 1 ˆ . Subtracting equation (A1.18) from equation (A1.17), we have with ν 1 = X 12 ' β 12 Yˆ* − Y ˆ ˆ = ( X − X )' β (A1.19) 2 2 22 12 12 Assuming β ˆ consists of positive coefficients (e.g., on household assets), if the trend in the 12 unobserved variables is positive or ( X 22 − X 12 ) ≥ 0 , we have P(Yˆ * ) ≤ P(Yˆ) (A1.20) 2 2 since the poverty function P(.) is non-increasing in household consumption. The opposite holds if this trend in the unobserved variables is negative, given the same assumption of the positive coefficients in β ˆ . 12 50 Proposition 3 i) We want to show that the transformed variable x2→1,t in survey 2 has the same distribution as x1t in survey 1 at time t. Assuming that x1t ~ N ( µ1t , σ 12t ) and x2t ~ N ( µ 2t , σ 2 2 t ) , the mean, or first moment, of the standardized variable x2 →1,t is  σ  σ E ( x2→1,t ) = E ( x2t − µ 2t ) 1t + µ1t  = µ1t + 1t E ( x2t − µ 2t ) = µ1t = E ( x1t ) (A1.21)  σ 2t  σ 2t where the next-to-last equality holds since by definition, E ( x2t − µ 2t ) = 0 . Its variance, or second moment is  σ  σ 2t V ( x2→1,t ) = V ( x2t − µ 2t ) 1t + µ1t  = 1 V ( x2t − µ 2t ) = σ 12t = V ( x1t ) (A1.22)  σ 2t  σ 2 2t where the next-to-last equality holds since by definition, V ( x2t − µ 2t ) = σ 2 2 t. Since x2 is assumed to be normally distributed, so is its linearly transformed variable x2→1,t .31 Since the first and second moments completely determine the distribution of normally distributed variables, x2→1,t and x1 have identical distribution. In fact, strictly speaking the assumption of normality is more restrictive than necessary, and we can just assume more generally that the distributions of x1 and x2 belong to the same location-scale family (see, e.g., Casella and Berger (2002, pp. 104)). ii) Similarly, we want to show that the transformed variable x2 →1,t ' in survey 2 has the same distribution as x1t’ in survey 1 at time t’. The assumption that the changes for the variables x in between time t and time t’ are the same for survey 1 and survey 2 is equivalent to σ 1t ( µ − µ ) = ( µ1t ' − µ1t ) (A1.23) σ 2t 2t ' 2t The mean of the standardized variable x2 →1,t ' is  σ  σ E ( x2→1,t ' ) = E ( x2t ' − µ 2t ) 1t + µ1t  = µ1t + 1t ( µ 2t ' − µ 2t ) = µ1t ' = E ( x1t ' ) (A1.24)  σ 2t  σ 2t where the next-to-last equality holds using equality (A1.23). The variance of the standardized variable x2 →1,t ' is  σ σ  σ 2t σ 22 V ( x2→1,t ' ) = V ( x2t ' 2t − µ 2t ) 1t + µ1t  = 1 t V ( x2t ' ) = σ 12t = V ( x1t ' ) (A1.25)  σ 2t ' σ 2t  σ 2 2t σ 2 2t ' where the last equality holds given our assumption that the variables x in different rounds of the same survey are on the same scale, or σ 12t = σ 12t ' . 31 See, e.g., theorem 5.6.4 in DeGroot and Schervish (2012). 51 Appendix 2: Additional Tables Table 2.1: Estimation of Consumption Model Using the HEISs, Jordan 2008 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Household size -0.123*** -0.093*** -0.096*** -0.104*** -0.102*** -0.066*** -0.080*** (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Head's age 0.023*** 0.013*** 0.015*** 0.002 0.002 0.009*** 0.001 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Head's age squared -0.000*** -0.000*** -0.000*** -0.000* -0.000* -0.000*** -0.000 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Head is male -0.074*** -0.059*** -0.074*** -0.090*** -0.089*** -0.040*** -0.066*** (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) Head is Jordanian 0.076*** 0.064*** 0.055*** 0.022 0.004 -0.020 -0.028** (0.02) (0.02) (0.02) (0.01) (0.01) (0.02) (0.01) Head's highest years of schooling 0.035*** 0.035*** 0.034*** 0.012*** 0.011*** 0.019*** 0.006*** completed (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Share of household members age 0- -0.764*** -0.718*** -0.703*** -0.713*** -0.401*** -0.508*** 14 (0.03) (0.03) (0.03) (0.03) (0.03) (0.03) Share of household members age -0.192*** -0.232*** -0.297*** -0.303*** -0.090*** -0.188*** 15-24 (0.03) (0.03) (0.03) (0.03) (0.03) (0.02) Share of household members age -0.131*** -0.251*** -0.298*** -0.302*** -0.130*** -0.209*** 25-59 (0.03) (0.03) (0.02) (0.02) (0.02) (0.02) -0.005 0.003 0.001 0.019** 0.017** Head worked in past 7 days (0.01) (0.01) (0.01) (0.01) (0.01) Share of household members 0.247*** 0.188*** 0.180*** -0.283*** -0.154*** working in past 7 days (0.03) (0.02) (0.02) (0.03) (0.02) Household has at least one female 0.089*** 0.002 -0.004 0.054*** -0.004 member working in past 7 days (0.01) (0.01) (0.01) (0.01) (0.01) Household has at least one 0.205*** 0.065*** 0.059*** 0.096*** 0.021* member working as employer (0.01) (0.01) (0.01) (0.01) (0.01) Household has at least one 0.028** 0.013 0.011 0.019* 0.011 member self-employed (0.01) (0.01) (0.01) (0.01) (0.01) Construction material for the 0.041*** 0.039*** 0.028*** outside walls of the building (0.00) (0.00) (0.00) 0.084*** 0.070*** 0.048*** Number of rooms in the house (0.00) (0.00) (0.00) Main source of drinking water 0.025*** 0.022*** 0.020*** (0.00) (0.00) (0.00) Household owns a car 0.211*** 0.203*** 0.169*** (0.01) (0.01) (0.01) Household owns a computer 0.023*** 0.004 -0.005 (0.01) (0.01) (0.01) Household owns a television 0.151*** 0.101*** 0.095*** (0.03) (0.03) (0.03) Household owns a desk phone 0.068*** 0.053*** 0.038*** (0.01) (0.01) (0.01) Household owns a cell phone 0.107*** 0.084*** 0.065*** (0.01) (0.01) (0.01) Household has internet connection 0.077*** 0.073*** 0.050*** (0.01) (0.01) (0.01) Household owns an airconditioner 0.113*** 0.098*** 0.070*** (0.01) (0.01) (0.01) Household owns a microwave 0.064*** 0.043*** 0.038*** (0.01) (0.01) (0.01) Household owns a water filter 0.054*** 0.038*** 0.030*** (0.01) (0.01) (0.01) Log of income per capita 0.413*** 0.270*** (0.01) (0.01) Urban 0.116*** 0.123*** 0.106*** 0.046*** 0.031*** 0.103*** 0.049*** (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) Constant 6.660*** 7.226*** 7.209*** 7.071*** 7.014*** 4.457*** 5.233*** (0.05) (0.05) (0.05) (0.05) (0.10) (0.06) (0.10) σe 0.35 0.33 0.33 0.28 0.28 0.29 0.26 σu 0.18 0.17 0.15 0.11 0.10 0.11 0.09 ρ 0.20 0.21 0.18 0.12 0.12 0.13 0.11 R2 0.49 0.54 0.56 0.72 0.73 0.70 0.77 N 10956 10956 10936 10936 10936 10791 10791 Note : * p<0.05, ** p<0.01 *** p<0.001. Standard errors are in parentheses. All estimation employs cluster random effects models. Model 5 and Model 7 add to Model 4 the types of house, the energy sources used for cooking, and dummy variables indicating whether the household has a radio, camera, satellite dish or cable, video player, fax machine, solar boiler, freezer, fridge, washing machine, oven, gas-operated oven, dishwasher, vacuum cleaner, and sewing machine. 52 Table 2.2: Estimation of Consumption Model Using the HEISs, Jordan 2010 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Household size -0.133*** -0.098*** -0.101*** -0.110*** -0.107*** -0.069*** -0.082*** (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Head's age 0.027*** 0.015*** 0.016*** 0.004*** 0.004** 0.010*** 0.004** (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Head's age squared -0.000*** -0.000*** -0.000*** -0.000*** -0.000*** -0.000*** -0.000*** (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Head is male -0.080*** -0.077*** -0.103*** -0.108*** -0.111*** -0.049*** -0.071*** (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) Head is Jordanian 0.057*** 0.049** 0.055*** -0.010 -0.019 -0.011 -0.041*** (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) (0.02) Head's highest years of schooling 0.037*** 0.037*** 0.036*** 0.013*** 0.012*** 0.018*** 0.006*** completed (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Share of household members age 0- -0.844*** -0.808*** -0.731*** -0.752*** -0.389*** -0.477*** 14 (0.04) (0.03) (0.03) (0.03) (0.03) (0.03) Share of household members age -0.260*** -0.291*** -0.313*** -0.321*** -0.087*** -0.165*** 15-24 (0.03) (0.03) (0.03) (0.03) (0.03) (0.02) Share of household members age -0.185*** -0.278*** -0.288*** -0.296*** -0.119*** -0.177*** 25-59 (0.03) (0.03) (0.02) (0.02) (0.02) (0.02) 0.012 0.017* 0.014 0.025*** 0.025*** Head worked in past 7 days (0.01) (0.01) (0.01) (0.01) (0.01) Share of household members 0.171*** 0.133*** 0.125*** -0.329*** -0.213*** working in past 7 days (0.03) (0.02) (0.02) (0.03) (0.02) Household has at least one female 0.070*** -0.010 -0.017* 0.054*** -0.003 member working in past 7 days (0.01) (0.01) (0.01) (0.01) (0.01) Household has at least one 0.236*** 0.082*** 0.077*** 0.143*** 0.056*** member working as employer (0.01) (0.01) (0.01) (0.01) (0.01) Household has at least one 0.036*** 0.002 0.001 0.047*** 0.019** member self-employed (0.01) (0.01) (0.01) (0.01) (0.01) Construction material for the 0.054*** 0.050*** 0.038*** outside walls of the building (0.00) (0.00) (0.00) 0.086*** 0.076*** 0.052*** Number of rooms in the house (0.00) (0.00) (0.00) Main source of drinking water 0.012*** 0.011*** 0.009*** (0.00) (0.00) (0.00) Household owns a car 0.186*** 0.180*** 0.137*** (0.01) (0.01) (0.01) Household owns a computer 0.038*** 0.026*** 0.014** (0.01) (0.01) (0.01) Household owns a television 0.128*** 0.052 0.066** (0.03) (0.03) (0.03) Household owns a desk phone 0.075*** 0.062*** 0.040*** (0.01) (0.01) (0.01) Household owns a cell phone 0.105*** 0.077*** 0.051*** (0.02) (0.02) (0.02) Household has internet connection 0.065*** 0.056*** 0.037*** (0.01) (0.01) (0.01) Household owns an airconditioner 0.101*** 0.088*** 0.054*** (0.01) (0.01) (0.01) Household owns a microwave 0.059*** 0.038*** 0.032*** (0.01) (0.01) (0.01) Household owns a water filter 0.034*** 0.023*** 0.024*** (0.01) (0.01) (0.01) Log of income per capita 0.452*** 0.315*** (0.01) (0.01) Urban 0.061*** 0.067*** 0.044*** -0.019* -0.038*** 0.034*** -0.022** (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) Constant 6.761*** 7.455*** 7.430*** 7.182*** 6.808*** 4.257*** 4.748*** (0.05) (0.05) (0.06) (0.06) (0.12) (0.06) (0.12) σe 0.36 0.35 0.34 0.30 0.30 0.29 0.27 σu 0.19 0.19 0.18 0.12 0.12 0.12 0.10 ρ 0.22 0.23 0.21 0.15 0.14 0.15 0.13 R2 0.49 0.53 0.55 0.71 0.72 0.71 0.77 N 11176 11176 11142 11142 11142 10908 10908 Note : * p<0.05, ** p<0.01 *** p<0.001. Standard errors are in parentheses. All estimation employs cluster random effects models. Model 5 and Model 7 add to Model 4 the types of house, the energy sources used for cooking, and dummy variables indicating whether the household has a radio, camera, satellite dish or cable, video player, fax machine, solar boiler, freezer, fridge, washing machine, oven, gas-operated oven, dishwasher, vacuum cleaner, and sewing machine. 53 Table 2.3: Summary Statistics for the HEIS, Jordan 2008-2010 2008 2010 Difference Household size 6.66 6.32 -0.35*** (2.44) (2.23) (0.05) Head's age 47.86 48.38 0.48* (12.81) (12.82) (0.25) Head's age squared 2454.50 2504.47 47.03* (1320.48) (1338.54) (24.91) Head is male 0.92 0.91 -0.01** (0.27) (0.29) (0.00) Head is Jordanian 0.96 0.96 0.01 (0.20) (0.19) (0.01) Head's highest years of schooling completed 9.54 9.72 0.18* (4.85) (4.66) (0.11) Share of household members age 0-14 0.36 0.34 -0.01*** (0.25) (0.25) (0.00) Share of household members age 15-24 0.22 0.23 0.00 (0.21) (0.21) (0.00) Share of household members age 25-59 0.35 0.36 0.00* (0.16) (0.16) (0.00) Head worked in past 7 days 0.62 0.61 -0.01 (0.48) (0.49) (0.01) Share of household members working in past 7 days 0.21 0.22 0.01*** (0.16) (0.16) (0.00) Household has at least one female member working in past 7 days 0.16 0.17 0.01* (0.36) (0.38) (0.01) Household has at least one member working as employer 0.07 0.09 0.02*** (0.25) (0.28) (0.01) Household has at least one member self-employed 0.11 0.12 0.01 (0.32) (0.33) (0.01) Construction material for the outside walls of the building 4.66 4.69 0.03 (1.15) (1.17) (0.03) Number of rooms in the house 3.96 4.06 0.10*** (1.29) (1.37) (0.03) Main source of drinking water 1.78 2.53 0.76*** (1.51) (2.23) (0.04) Household owns a car 0.41 0.48 0.07*** (0.49) (0.50) (0.01) Household owns a computer 0.39 0.49 0.10*** (0.49) (0.50) (0.01) Household owns a television 0.99 0.99 0.00 (0.10) (0.10) (0.00) Household owns a desk phone 0.34 0.25 -0.09*** (0.47) (0.43) (0.01) Household owns a cell phone 0.96 0.98 0.03*** (0.21) (0.12) (0.00) Household has internet connection 0.08 0.15 0.07*** (0.27) (0.36) (0.01) Household owns an airconditioner 0.08 0.12 0.05*** (0.26) (0.33) (0.01) Household owns a microwave 0.22 0.37 0.15*** (0.41) (0.48) (0.01) Household owns a water filter 0.17 0.26 0.09*** (0.38) (0.44) (0.01) Log of income per capita 6.94 7.12 0.18*** (0.67) (0.68) (0.02) Urban 0.82 0.83 0.01 (0.39) (0.38) (0.02) N 10,936 11,142 Note : * p<0.05, ** p<0.01 *** p<0.001. Standard deviations/ errors are in parentheses. Differences are estimated with t-tests that takes into account complex survey design with cluster sampling and stratification. 54