Reissue Edition with a New Preface The Analysis of Household Surveys A Microeconometric Approach to Development Policy Angus Deaton Winner of the 2015 Nobel Prize in Economics The Analysis of Household Surveys The Analysis of Household Surveys A Microeconometric Approach to Development Policy Reissue Edition with a New Preface Angus Deaton © 2018 International Bank for Reconstruction and Development / The World Bank 1818 H Street NW, Washington, DC 20433 Telephone: 202-473-1000; Internet: www.worldbank.org Some rights reserved 1 2 3 4 21 20 19 18 This work is a product of the staff of The World Bank with external contributions. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of The World Bank, its Board of Executive Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. The boundaries, colors, denominations, and other information shown on any map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. Nothing herein shall constitute or be considered to be a limitation upon or waiver of the privileges and immunities of The World Bank, all of which are specifically reserved. References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region “Taiwan, China.” References to “Hong Kong” refer to the region “Hong Kong SAR, China.” Rights and Permissions This work is available under the Creative Commons Attribution 3.0 IGO license (CC BY 3.0 IGO) http:// creativecommons.org/licenses/by/3.0/igo. Under the Creative Commons Attribution license, you are free to copy, distribute, transmit, and adapt this work, including for commercial purposes, under the following conditions: Attribution—Please cite the work as follows: Deaton, Angus. 2018. The Analysis of Household Surveys: A Microeconometric Approach to Development Policy. Reissue Edition with a New Preface. Washington, DC: World Bank. doi:10.1596/ 978-1-4648-1331-3. License: Creative Commons Attribution CC BY 3.0 IGO Translations—If you create a translation of this work, please add the following disclaimer along with the attribution: This translation was not created by The World Bank and should not be considered an official World Bank translation. The World Bank shall not be liable for any content or error in this translation. Adaptations—If you create an adaptation of this work, please add the following disclaimer along with the attribution: This is an adaptation of an original work by The World Bank. Views and opinions expressed in the adaptation are the sole responsibility of the author or authors of the adaptation and are not endorsed by The World Bank. Third-party content—The World Bank does not necessarily own each component of the content contained within the work. The World Bank therefore does not warrant that the use of any third-party- owned individual component or part contained in the work will not infringe on the rights of those third parties. The risk of claims resulting from such infringement rests solely with you. If you wish to re-use a component of the work, it is your responsibility to determine whether permission is needed for that re-use and to obtain permission from the copyright owner. Examples of components can include, but are not limited to, tables, figures, or images. All queries on rights and licenses should be addressed to World Bank Publications, The World Bank Group, 1818 H Street NW, Washington, DC 20433, USA; e-mail: pubrights@worldbank.org. ISBN (paper): 978-1-4648-1331-3 ISBN (electronic): 978-1-4648-1352-8 DOI: 10.1596/ 978-1-4648-1331-3 Cover design: Bill Pragluski, Critical Stages. Library of Congress Cataloging-in-Publication Data has been requested. Contents Preface xi Introduction 1 Purpose and intended audience 1 Policy and data: methodological issues 2 Structure and outline 4 Chapter 1: The design and content of household surveys 7 1.1 Survey design 9 Survey frames and coverage 10 Strata and clusters 12 Unequal selection probabilities, weights, and inflation factors 15 Sample design in theory and practice 17 Panel data 18 1.2 The content and quality of survey data 22 Individuals and households 23 Reporting periods 24 Measuring consumption 26 Measuring income 29 1.3 The Living Standards Surveys 32 A brief history 32 Design features of LSMS surveys 34 What have we learned? 35 1.4 Descriptive statistics from survey data 40 Finite populations and superpopulations 40 The sampling variance of the mean 43 Using weights and inflation factors 44 Sampling variation of probability-weighted estimators 49 References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” v vi  The Analysis of Household Surveys Stratification 49 Two-stage sampling and clusters 51 A superpopulation approach to clustering 56 Illustrative calculations for Pakistan 57 The bootstrap 58 1.5 Guide to further reading 61 Chapter 2: Econometric issues for survey data 63 2.1 Survey design and regressions 66 Weighting in regressions 67 Recommendations for practice 71 2.2 The econometrics of clustered samples 73 The economics of clusters in developing countries 73 Estimating regressions from clustered samples 74 2.3 Heteroskedasticity and quantile regressions 78 Heteroskedasticity in regression analysis 79 Quantile regressions 80 Calculating quantile regressions 83 Heteroskedasticity and limited dependent variable models 85 Robust estimation of censored regression models 89 Radical approaches to censored regressions 91 2.4 Structure and regression in nonexperimental data 92 Simultaneity, feedback, and unobserved heterogeneity 93 Example 1. Prices and quantities in local markets 93 Example 2. Farm size and farm productivity 95 Example 3. The evaluation of projects 97 Example 4. Simultaneity and lags: nutrition and productivity 98 Measurement error 99 Selectivity issues 101 2.5 Panel data 105 Dealing with heterogeneity: difference- and within-estimation 106 Panel data and measurement error 108 Lagged dependent variables and exogeneity in panel data 110 2.6 Instrumental variables 111 Policy evaluation and natural experiments 112 Econometric issues for instrumental variables 115 2.7 Using a time-series of cross-sections 116 Cohort data: an example 117 Cohort data versus panel data 120 Panel data from successive cross sections 121 Decompositions by age, cohort, and year 123 Contents  vii 2.8 Two issues in statistical inference 127 Parameter transformations: the delta method 128 Sample size and hypothesis tests 129 2.9 Guide to further reading 131 Chapter 3: Welfare, poverty, and distribution 133 3.1 Living standards, inequality, and poverty 134 Social welfare 134 Inequality and social welfare 136 Measures of inequality 138 Poverty and social welfare 140 The construction of poverty lines 141 Measures of poverty 144 The choice of the individual welfare measure 148 Example 1. Inequality and poverty over time in Côte d’Ivoire 151 Example 2: Inequality and poverty by race in South Africa 156 Exploring the welfare distribution: inequality 157 Lorenz curves and inequality in South Africa and Côte d’Ivoire 160 Stochastic dominance 162 Exploring the welfare distribution: poverty 164 3.2 Nonparametric methods for estimating densities 169 Estimating univariate densities: histograms 170 Estimating univariate densities: kernel estimators 171 Estimating univariate densities: examples 175 Extensions and alternatives 176 Estimating bivariate densities: examples 180 3.3 Analyzing the distributional effects of policy 182 Rice prices and distribution in Thailand 182 The distributional effects of price changes: theory 183 Implementing the formulas: the production and consumption of rice 187 Nonparametric regression analysis 191 Nonparametric regressions for rice in Thailand 194 Bias in kernel regression: locally weighted regression 197 The distributional effects of the social pension in South Africa 200 3.4 Guide to further reading 202 Chapter 4: Nutrition, children, and intrahousehold allocation 204 4.1 The demand for food and nutrition 206 Welfare measures: economic or nutritional? 206 Nutrition and productivity 210 viii  The Analysis of Household Surveys The expenditure elasticity of nutrition: background 211 Evidence from India and Pakistan 213 Regression functions and regression slopes for Maharashtra 216 Allowing for household structure 219 The effect of measurement errors 221 4.2 Intra-household allocation and gender bias 223 Gender bias in intrahousehold allocation 224 A theoretical digression 225 Adults, children, and gender 229 Empirical evidence from India 231 Boys versus girls in rural Maharashtra: methodology 234 Standard errors for outlay equivalent ratios 235 Boys versus girls in rural Maharashtra: results 237 Côte d’lvoire, Thailand, Bangladesh, and Taiwan (China) 238 4.3 Equivalence scales: theory and practice 241 Equivalence scales, welfare, and poverty 243 The relevance of household expenditure data 244 Cost-of-living indices, consumers’ surplus, and utility theory 245 Calculating the welfare effect of price 246 Equivalence scales, the cost of children, and utility theory 247 The underidentification of equivalence scales 248 Engel’s method 251 Rothbarth’s method 255 Other models of equivalence scales 260 Economies of scale within the household 262 Utility theory and the identification of economies of scale 268 4.4 Guide to further reading 269 Chapter 5: Looking at price and tax reform 271 5.1 The theory of price and tax reform for developing countries 273 Tax reform 273 Generalizations using shadow prices 277 Evaluation of nonbehavioral terms 278 Alternative approaches to measuring behavioral responses 279 5.2 The analysis of spatial price variation 283 Regional price data 283 Household price data 283 Unit values and the choice of quality 288 Measurement error in unit values 292 5.3 Modeling the choice of quality and quantity 293 A stripped-down model of demand and unit values 294 Contents  ix Modeling quality 296 Estimating the stripped-down model 299 An example from Côte d’lvoire 302 Functional form 303 Quality, quantity, and welfare: cross-price effects 306 Cross-price effects: estimation 311 Completing the system 314 5.4 Empirical results for India and Pakistan 315 Preparatory analysis 316 The first-stage estimates 316 Price responses: the second-stage estimates for Pakistan 317 Price estimates and taste variation, Maharashtra 320 5.5 Looking at price and tax reform 323 Shadow taxes and subsidies in Pakistan 324 Shadow taxes and subsidies in India 325 Adapting the price reform formulas 326 Equity and efficiency in price reform in Pakistan 328 Equity and efficiency in price-reform in India 330 5.6 Price reform: parametric and nonparametric analysis 332 5.7 Guide to further reading 334 Chapter 6: Saving and consumption smoothing 335 6.1 Life-cycle interpretations of saving 337 Age profiles of consumption 339 Consumption and saving by cohorts 342 Estimating a life-cycle model for Taiwan (China) 345 6.2 Short-term consumption smoothing and permanent income 350 Saving and weather variability 351 Saving as a predictor of income change? 354 6.3 Models of saving for poor households 357 The basic model of intertemporal choice 357 Special cases: the permanent income and life-cycle models 359 Further analysis of the basic model: precautionary saving 361 Restrictions on borrowing 363 Borrowing restrictions and the empirical evidence 369 6.4 Social insurance and consumption 372 Consumption insurance in theory 375 Empirical evidence on consumption insurance 376 6.5 Saving, consumption, and inequality 383 Consumption, permanent income, and inequality 383 Inequality and age: empirical evidence 386 Aging and inequality 390 x  The Analysis of Household Surveys 6.6 Household saving and policy: a tentative review 393 Motives, consequences, and policy 394 Saving and growth 395 Determinants of saving 397 6.7 Guide to further reading 399 Code appendix 401 Bibliography 439 Subject index 463 Author index 474 Preface It is a pleasure to write a new preface to The Analysis of Household Surveys on its twentieth birthday. It would be better still if this were a preface to a new edi- tion, and I hope that one day I shall write it. In the meantime, I hope that this Preface might be a guide to a new reader, by labeling those parts of the book that seem still relevant, as well as those that would be leading candidates for revision or updating. The origins of the book go back to the early days of the Living Standards Measurement Study (LSMS), which was set up in the World Bank around 1980. As its name suggests, the original idea was to promote household surveys that would enable the better measurement of poverty and of living standards around the world, something that was difficult to do with the data then avail- able. As time went on, and people came and went, the LSMS surveys evolved into multi-purpose tools that would permit not only measurement, but also analysis, permitting a better understanding of how people’s lives work, what makes them tick, and why they are as well off or as poorly off as they are. Hence “the analysis” of household surveys, not just the measurement of income, or consumption, or wellbeing. In the original conception, there were to be a series of volumes on different specific topics, with this volume being more general, though with examples of topics that could be covered using the approach. Today, it is hard even to remember how relatively uncommon the analysis of individual household records was. Although there had been important very early studies, including the 1955 book on family budgets by Sigbert Prais and Hendrik Houthakker—which legend has it was the first economic analysis to use an electronic computer—and although micro analysis figured in some of the early textbooks, most economists were trained in econometric methods that explicitly or implicitly focused on aggregate time series. So there was a lot of important and survey-relevant material that remained uncovered. Students who had completed their econometrics courses and turned to household sur- vey data found much that puzzled them. Perhaps the most obvious gap in standard economics training was then (as it largely still is today) the topic of survey design, and how survey design should (if at all) be incorporated into analysis. Survey data come with “weights,” related xi xii  The Analysis of Household Surveys to survey design (for example, they might be the reciprocals of the probabilities of selection), and many generations of economists have had to wonder for themselves what should be done with them. Of course, this is standard material for survey design statisticians, but those who design surveys do not work in the same way economists do and sometimes have different ways of thinking and different objectives. So, when I started writing the book, I wanted to try to understand these issues better for myself, so I went back to the survey litera- ture, and wrote up what I found. Chapter 1 is still one of the few treatments of survey design from an economics perspective. One important change is that, at around the time I was writing, STATA introduced a full suite of survey-design- based econometric software, so it is no longer necessary for analysts to do what I had to do, and write my own code. Thinking about survey design forces the analyst to confront the difference between sample and population, and to think seriously about the population to which the analysis is supposed to apply. For survey samplers, the aim is often the estimation of some characteristic of a finite population, such as the median consumption of Indian families in 2015, something that could be known with certainty from a census, a complete listing of the population. Is this the right way of thinking about a regression analysis? Or should we consider some pos- sible super population, of which the current population is but one possible realization. These issues are important, and are rarely considered. As an exam- ple, even today papers in economics and in health routinely publish standard errors of means calculated from complete enumerations, such as mortality rates. In the same spirit, Chapter 2 was designed as a bridge from what is taught in an econometrics course and what applied economists will find when they confront microeconomic data. For several years, I used this chapter to teach a course at Princeton that helped prepare applied students in labor and develop- ment. When econometrics is taught by specialists, which has the huge advan- tage of providing an overarching statistical framework, it is often useful to work through some of the nitty-gritty issues of practice that are not conceptually interesting, but can make the difference between convincing and unconvinc- ing results. Sometimes this takes the form of warnings, that technical fixes rarely fix anything by themselves, and that techniques, such as panel data methods, which can work magic in ideal conditions, can be undermined by imperfections of various kinds, particularly measurement error. If I were rewriting Chapter 2 today, I would be even more skeptical. As I taught the material over the years, it became clear that many of the uses of instrumental variables and natural experiments that had seemed so compel- ling at first lost a good deal of their luster with time. One problem is the reliance of instrumental variables on exclusion restrictions. The orthogonality of instru- ments to the error term requires that they be uncorrelated with omitted vari- ables so that, when we are interested in the effect of x on y, and z is an instrument, then z can only affect y through its effect on x, and not through any other mechanism. This often seems plausible when an instrument is first Preface  xiii proposed, but over time, other researchers, or other facts, can make the story much less plausible. There is no general rule here, and some of the studies using natural experiments and instruments have worn well, but that is more the exception than the rule. Twenty years later, I now find myself very much more skeptical about instru- ments in almost any situation. I should also note a mea culpa: the late Tony Atkinson, in his pre-publication review of the book, had noted that instrumen- tal variables hardly ever worked. I should have paid more attention to his views. Natural experiments are a form of instrumental variables. They often give a clean answer, eliminating effects that otherwise would cloud the analysis. Yet the “clean” answer is not always the answer that we want for policy or under- standing. This is one aspect of the familiar trade-off between internal and external validity; a natural experiment is like a laboratory experiment where many factors are held constant, but where we have little idea whether the effect will be replicated in settings that may be more relevant for policy. Of  course, a good laboratory experiment tries to isolate some fundamental mechanism that will always be present, even when modified, but such experi- ments require more theory and background knowledge than is usually present in natural experiments in economics. Occasionally such experiments look more like anecdotes than analysis. If I were writing the book today, there would be a new chapter on random- ized controlled trials (RCTs).1 Two decades ago I, like most economists, thought that if only we could do RCTs, life would be straightforward; we could dis- cover the laws of behavior, understand poverty, and eliminate it. And indeed, one of the major developments in applied microeconomics in the last 20 years has been the widespread use of RCTs, particularly in economic development. As always, the practice has brought useful experience with its complement of successes and failures. RCTs often yield new insights and unexpected find- ings. Yet they also have more problems than we anticipated, both in theory and practice. They are not magic tools, any more than panel data or instrumental variables were magic tools. Indeed, once upon a time, economists thought of linear regression as a magic tool, and the history of econometrics teaching has been one in which the great enthusiasm of the early days gave way to a sadder catalog of regression diseases and diagnostics. The same is happening and will continue to happen with RCTs although for sure, and like regressions, they are likely to remain useful tools. But statistical inference with RCTs is much more difficult than it at first appears, causality can rarely be firmly established, the  influence of omitted variables is not magically erased by randomization, and the lack of blinding—usually impossible in economics—can undermine estimation in the same way that failure of exclusion restrictions undermines 1. For readers interested in what such a chapter would look like, a good account can be found in my 2018 paper with Nancy Cartwright in Social Science and Medicine, Vol. 210, pp. 2–21: “Understanding and Misunderstanding Randomized Controlled Trials,” https://doi.org/10.1016/j​ .socscimed.2017.12.005. xiv  The Analysis of Household Surveys instrumental variable estimation. And, as is widely understood, but often for- gotten, the fact that some mechanism works in one place is no guarantee that it works anywhere else. RCTs cannot, by themselves, support a program of unconditional discovery of “what works.” In the introduction to the book, I speculate about what can be done in the absence of experiments. While I am now more skeptical about what can be done with experiments, I continue to believe in what I wrote then. The trick is to use the data to tell us something that we didn’t know before, and that can help us change our minds, or see things differently. Sometimes this can be done from simple descriptive statistics; that Indian per capita calorie consump- tion was falling during a period of rapid economic growth was an important finding in and of itself, and suggested a whole program of enquiry. That almost half of all children in India were severely malnourished was a finding that then Prime Minister Manmohan Singh described as “a national shame.” Such straightforward descriptions can have huge effects on policy, as can correla- tions and regressions. Causal testing can also be supported in the same way through what philoso- phers call the “hypothetico deductive method.” Given a hypothesis or an idea, the key is to develop and work it to the point where its implications can be transparently checked on the data. Sometimes, this will take an RCT, but if we are clever and diligent enough, it can often be tested by looking at old data in new ways, from a mean, a median, or a cross-tabulation. The analytical work goes into the drawing out of implications from the hypothesis, not into deriv- ing complex econometric methods or technical fixes. Chapters 3 through 6 are more substantive, though all use econometric methods to generate their findings. All of them could be updated with new results, and in some cases (such as the model of quality choice in Chapter 5), I  would modify or be more skeptical about the underlying assumptions. But I  believe that these chapters—as well as the STATA code that comes at the end—serve their original purpose of providing worked examples of the sort of analysis that survey data are suited to. The book has been widely used since 1997, it is frequently cited, and it has been downloaded from the Bank’s website nearly 34,000 times. I am delighted that it will now be back in print for those who would like to have a “real” copy on their “real” desktops. Angus Deaton Princeton, December 2018 References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” on different data sets and different machines. The code listed here is available on the following site: http://www.worldbank.org/householdsurveys References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” References in this publication to “Taiwan,” “Republic of China,” and “Taiwan (China)” refer to the region, “Taiwan, China.” References to “Hong Kong” refer to the region, “Hong Kong SAR, China.” Two decades after its original publication, The Analysis of Household Surveys is reissued with a new preface by its author, Sir Angus Deaton, recipient of the 2015 Nobel Prize in Economic Sciences. This classic work remains relevant to anyone with a serious interest in using household survey data to shed light on policy issues. The book reviews the analysis of household survey data, including the construction of household surveys, the econometric tools useful for such analysis, and a range of problems in development policy for which this survey analysis can be applied. Chapter 1 describes the features of survey design that need to be understood in order to undertake appropriate analysis. Chapter 2 discusses the general econometric and statistical issues that arise when using survey data for estimation and inference. Chapter 3 covers the use of survey data to measure welfare, poverty, and distribution. Chapter 4 focuses on the use of household budget data to explore patterns of household demand. Chapter 5 discusses price reform, its effects on equity and efficiency, and how to measure them. Chapter 6 addresses the role of household consumption and saving in economic development. The book includes an appendix providing code and programs using STATA, which can serve as a template for users’ own analysis. ISBN 978-1-4648-1331-3 90000 9 781464 813313 SKU 211331