Policy Research Working Paper 7379 Lower Bounds on Inequality of Opportunity and Measurement Error Carlos Felipe Balcázar Poverty Global Practice Group July 2015 Policy Research Working Paper 7379 Abstract When measuring inequality of opportunity, researchers can have substantial measurement error, and that mea- usually opt to eliminate within-type variation. Provided surement error can vary considerably across countries. that in practice it is impossible to observe all circum- As a consequence, lower bound estimates of inequality stances, this implies that the researcher estimates a lower of opportunity can demand too little redistribution to bound of the true level of inequality of opportunity. By equalize inequalities due to circumstances and can make using data drawn from 27 Demographic Household Sur- the “traditional” cross-country comparisons misleading. veys (circa 2008), it is found that lower bound estimates This paper is a product of the Poverty Global Practice Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted atcbalcazarsalazar@ worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Lower Bounds on Inequality of Opportunity and Measurement Error azar∗ Carlos Felipe Balc´ JEL Codes: D63, D39, O50 Key words: inequality of opportunity, lower bounds, measurement error ∗I ˜ thank Lidia Ceriani, Francisco Ferreira, Gabriel Lara, Yuan Li, Ambar Narayan, Hugo Nopo, Sailesh Tiwari and an anonymous referee for their useful comments. 1 Introduction The literature on Inequality of Opportunity (IOp) follows Roemer’s conceptual framework (Roemer, 1998) in which inequalities derived from circumstances beyond the control of individuals are morally objectionable; individuals should be held responsible only for the level of effort they exert in comparison to that exerted by other individuals. Therefore, to measure IOp is to measure the amount of inequality (or variation) that comes from circumstances. In practice, however, this is hard to do. On the one hand, it is virtually impossible to account for all characteristics that constitute individuals’ circumstances. On the other hand, the level of effort is usually unobservable. Here I focus on exploring the former problem. Equality of opportunity embodies two basic principles: i) The compensation principle, which de- mands that inequalities due to circumstances be eliminated, and ii) The reward principle, which demands that inequalities due to differences in effort are accepted. These principles are at the core of any empirical application (Ramos and van de Gaer, 2012). To measure IOp we can take an ex-ante or an ex-post approach to compensation. The former is concerned with outcome differences among individuals exerting the same effort, that have different cir- cumstances. The latter focuses on prospects, so there is equality whenever all individuals have the same level of outcome regardless of their circumstances. The ex-ante approach requires that we observe effort or that we impose working assumptions about the relationship between effort and outcomes that enable identification. However, this approach is in- compatible with the reward principle (Fleurbay and Peragine, 2012). The ex-post approach, on the other hand, is compatible with the reward principle. We can achieve this by partitioning the distribution of outcomes for each group of individuals sharing the same circumstances, under the assumption that indi- viduals at the same percentile of the outcome distribution have exerted the same degree of effort (Checchi and Peragine, 2010). Although the literature has proposed several methodologies to measure IOp based on the ex-ante and ex-post approaches (Ramos and van de Gaer, 2012), researchers still confront several difficulties. One of the most salient is the partial observability of circumstances. Barros et al. (2010), Ferreira and Gignoux (2011) and Luongo (2011) show that estimates of IOp based on an incomplete list of circumstances are lower bound estimates of true IOp. Even if we make working assumptions about the distribution of an underlying effort variable, we can still face downward bias due to unobservable circumstances and efforts. The problem is that lower bound estimates only pick up part of the observed inequality, hence any alternative number which exceeds the lower bound would be equally valid (Kanbur and Wagstaff, 2014). This has two major implications: i) Any lower bound estimate of IOp could increase by adding one additional circumstance correlated with the outcome variable. In other words, lower bounds can have measurement error. ii) When measuring IOp on different 1 countries in order to make comparisons, countries could presumably have the same true value of IOp but have different estimated values. This problem implies that lower bound estimates of IOp can have different levels of measurement error, rendering them incomparable. Recently, Niehues and Peichl (2014) address the problem of partial observability of circumstances by using panel data for Germany and the United States. They use a fixed-effects regression to estimate the amount of inequality that comes from time-invariant circumstances and show that “the existing lower bound estimates substantially underestimate IOp [...]” In this paper I contribute with further evidence on this regard. However, I focus on exploring an outcome for which effort (arguably) plays no role: height-for-age for children 0 to 2 years old. This outcome allows me to identify the size of measurement error coming from the empirical impossibility of including all circumstances when estimating IOp since I do not have to make any assumptions about the distribution of effort. By including a comprehensive set of observable circumstances in my analysis I find that measurement error is large and that it varies considerably across countries, implying that making the “traditional” cross-country comparisons using lower bounds estimates of IOp can be misleading. The paper proceeds as follows. In the next Section I describe the analytical framework. In Section 3 I describe the data and present the results of my analysis. In section 4 I conclude. 2 Analytical framework Let us consider a society of N individuals. Each individual is described by two classes of variables: circumstances (C) and efforts (e). Let X = {x1 , ..., xi , ..., xN } ∈ RN + be the vector of outcomes in the population with x = f (C, e), where f : C × e → R+ . I consider, however, the special case where x is completely independent of efforts and where circumstances are partially observable. Let C = Co Cu be the full set of circumstances, where the superscripts o and u stand for observable and unobservable. Traditionally, the measurement of IOp is based on a partition of the population into K = {1, ..., k} mutually exclusive types based on observable circumstances (ex-post approach to com- pensation), so that ∑K N j = N with j ∈ K . In this sense, a type is a group of individuals sharing the same observable circumstances. Once the population is partitioned according to observable circumstances, we can rewrite the overall outcome as follows: X = {x1 , ..., x j , ..., xk } ∈ RN + 2 j j j N where x j = {x1 , ..., xi , ..., xN j } ∈ R+j . Now, X can be partitioned into two vectors: XB = {x1 , ..., x j , ..., xk }, ˜1 , ..., x XW = {x ˜k }. ˜ j , ..., x The first vector is obtained by a smoothing process through which each outcome x ∈ x j , is substituted j ∑N j xi by the arithmetic mean of the outcome in vector x j, denoted by = xj . The second distribution is Nj obtained by a standardization process which consists of rescaling the individuals’ outcome in each type j j xi ˜i = xxj xi , where x = ∑N as follows: x N . Through this process XB eliminates the variation within each type. Analogously, XW removes the variation between each type, so that we are left only with those inequalities caused by unobservable circumstances. Now let I : X → R+ be an index of IOp. If I (X ) is subgroup decomposable (Shorrocks, 1980), then I (X ) = I (XB ) + I (XW ), where I (XB ) accounts for between-type inequality and I (XW ) accounts for within-type inequality. Pro- vided that the Mean Log Deviation (MLD) is additively subgroup decomposable (Foster and Shneyerov, 2000), I use the MLD (without loss of generality) for my empirical exercises.1 Note that since in our approach the degree of effort exerted does not play any role in the measurement of IOp, by grouping into types using Co ⊂ C, the partition of the population we obtain is coarser than the one we would obtain from using C. This would cause an underestimation of the level of inequality. This is easy to see since if I (XW ) > 0, then I (XB ) < I (X ). Thus, I (XB ) represents a lower bound estimate of IOp –as it is traditionally defined in the literature. 3 An empirical exercise using Demographic Household Surveys The data I use come from 27 Demographic Household Surveys (DHS) circa 2008. I focus on kids 0 to 2 years of age because we can confidently assume that children 24 months old or younger have not exerted efforts. Indeed, there is scientific evidence that toddlers start relating their own actions to their surrounding environment after 24 months of age (Rochat, 1998); that is, they become self-aware of their 1 The MLD is defined as 1 x MLD(X ) = ln . N∑N x i 3 actions. This provides an empirical advantage, we can confidently assume that individual effort does not play a role on this outcome. Height-for-age for toddlers is a human opportunity. On the one hand, several studies have shown that taller children are more likely to show higher cognitive test scores and, as a result, higher labor earnings.2 On the other hand, children’s height is a function of both nature and nurture. The height of a child can depend on genetic factors, like the height of his/her parents for example; but it can also depend on nutrition, access to medical services when the mother is pregnant, access to clean water, and many other potential environmental factors that can depend on parents’ choice and government interventions –which belong to the set of nurture variables.3 Note that both nature and nurture variables make up our set of circumstances. But, although the DHS provides information on a broad number of these observable characteristics, many of these variables contain a big number of missing values, compromising the representativeness of the samples. Therefore, I include as circumstances variables that do not compromise the representativeness of our samples due to missing values. I include: age of the toddler (in months), gender (as a dummy), birth order (numerical), age of the mother (in years), height of the mother (in centimeters, as proxy for nature), educational attainment of the mother (in years), household wealth (the DHS’ index of wealth) and geographical location (urban or rural, as a dummy). These comprise a set of variables not much different than the set of variables that is used to analyze other human opportunities (Molina et al., 2013).4 It is important to note that since the height of toddlers increases in variance with age and varies by sex, I use the World Health Organization’s growth charts (WHO, 2006) to create a standardized measure for height (x), which corresponds to the equivalent height the child would have had if (s)he were a 24 months old female. More formally x = Fa− 1 ,g (Fa,g (h)), where F is the distribution function of heights in the reference population for the age and sex group of an individual of age a and gender g; h is the actual height of that individual; a = 24 months; and g = f emale. 2 See Case and Paxson (2008), for example, for evidence and a discussion. 3 See Aturupane et al. (2008) for a discussion. 4 My full sample consists of 70,788 children. However, after dropping biological implausible values for height of my sample –following the WHO (2006) guidelines to identify biologically implausible values for height- and dropping missing values in my variables of interest, I lose 6% of my full sample. Information on sample sizes can be found at: https://sites.google.com/site/cfbalcazars/misc 4 3.1 Results Given that x follows a normal distribution (WHO, 2006), I can estimate XB for each country by means of a linear regression.5 Following Ferreira and Gignoux (2011): xi = α + β Cio + εi , where xi is the observed standardized height for individual i, Cio is the vector of observable circumstances, and εi is an idiosyncratic error term. Note that the distribution of predicted values of this regression (XB ), corresponds to the distribution obtained by replacing each individual’s outcome with the mean outcome of the type to which she/he belongs. Thus MLD(XB ) corresponds to my lower bound estimate of IOp. Provided that the MLD is path independent, we can compute the relative size of within-type inequal- ity as MLD(XB ) IR(XW ) = 1 − ×100, MLD(X ) with IR(XW ) ∈ [0, 100]. IR(XW ) provides us with a comparable cross-country estimate of the extent of measurement error that arises from the impossibility of including all observable circumstances. Note that the higher IR(XW ), the higher the level of measurement error. Table 1 shows the results, but let us focus on the fourth column of results. My computations show that in spite of including a comprehensive set of circumstances all lower bound estimates show a substantial amount of measurement error. Indeed, the value of IR(XW ) ranges from 75.79 (Colombia) to 93.23 (Chad). In other words, as little as 73% and as much as 93% of total variation is not being explained by the set of observable circumstances, conditional on the country. Table 1: Inequality of opportunity decomposed by country Total inequality Between-type inequality Within-type inequality Relative within-type inequality Country MLD(X ) × 100 MLD(XB ) × 100 [MLD(X ) − MLD(XB )] × 100 IR(XW ) Azerbaijan 0.216 0.027 0.190 87.70 Bangladesh 0.184 0.024 0.160 86.89 Bolivia 0.159 0.030 0.129 81.25 Burkina Faso 0.246 0.024 0.221 90.05 Burundi 0.189 0.024 0.164 87.09 Cambodia 0.181 0.024 0.157 86.97 Cameroon 0.235 0.026 0.208 88.76 Chad 0.341 0.023 0.318 93.23 5 Ifwe were to use a non-parametric approach, we would face problems of small cell-sizes: With a large number of types, the sample size of each type can be quite small, leading to biased estimations of type-means. 5 Total inequality Between-type inequality Within-type inequality Relative within-type inequality Country MLD(X ) × 100 MLD(XB ) × 100 [MLD(X ) − MLD(XB )] × 100 IR(XW ) Chad 0.341 0.023 0.318 93.23 Colombia 0.114 0.028 0.086 75.79 Cote d’lvoire 0.205 0.026 0.179 87.49 Egypt 0.351 0.028 0.323 92.05 Ethiopia 0.259 0.025 0.234 90.38 Guinea 0.271 0.023 0.248 91.45 Haiti 0.171 0.026 0.145 84.61 Honduras 0.127 0.027 0.100 78.69 Jordon 0.134 0.022 0.112 83.46 Kenya 0.261 0.025 0.236 90.48 Lesotho 0.234 0.022 0.212 90.50 Liberia 0.246 0.026 0.220 89.35 Morocco 0.305 0.029 0.276 90.39 Mozambique 0.266 0.024 0.242 90.91 Niger 0.301 0.022 0.279 92.59 Peru 0.132 0.028 0.104 78.84 Rwanda 0.192 0.024 0.167 87.25 Tanzania 0.200 0.024 0.176 87.77 Turkey 0.162 0.026 0.136 83.78 Uganda 0.225 0.026 0.198 88.26 Note: I multiply each number in the first three columns of results by 100 for readability. Source: Authors’ calculations based on Demographic Household Surveys. Given that IR(XW ) also varies considerably across countries, we can also conclude that lower bound estimates of inequality of opportunity are not comparable. To be comparable we should obtain a high correlation cofficient between MLD(X ) and MLD(XB ), but the correlation is -0.17. To further show the implications of this, Figure 1 compares the ordinal MLD(X ) country-rankings and the ordinal country- rankings for MLD(XB ), both from most to least. The results show that there are considerable re-rankings. 4 Conclusion Several development reports and research papers have incorporated the IOp dimension when addressing development and social issues. Nonetheless, my results suggest that the empirical applications that use lower bound estimates of IOp can be misleading. On the one hand, lower bound estimates can underes- timate substantially the actual value of IOp, demanding too little redistribution to equalize inequalities due to circumstances. On the other hand, measurement error can vary considerably across samples, thus making the traditional cross-country comparisons unreliable. These conclusions derive from the empiri- cal advantage of using an outcome variable for which we can confidently assume effort plays no role, so that we are able to identify the size of measurement error that comes from the empirical impossibility of including all circumstances in the analysis. 6 Figure 1: Rankings of MLD(XB ) vs. MLD(X ) Rankings using Mean Log Deviation 10 20 o 45 0 0 10 20 Rankings using Mean Log Deviation (between type component) Source: Authors’ calculations based on Demographic Household Surveys circa 2008. A promising path to address the problem of partial observability of circumstances is the use of panel and pseudo-panel data. Niehues and Peichl (2014) show that we can estimate an upper bound of inequal- ity of opportunity through the fixed effects component of a fixed effects regression, addressing problems of measurement error. However, since panel data are scarce, we can use pseudo-panels in order to exploit the availability of household surveys (Verbeek and Vella, 2005). By using pseudo-panels we should be able to estimate the long-run upper bound of IOp a ` la Niehues and Peichl addressing (plausibly) both measurement error and comparability across countries. This is a path that has not been explored yet, and it would be worth exploring by the scientific community. References Aturupane, H., Deolalikar, A., and Gunewardena, D. (2008). The determinants of child weight and height in sri lanka: A quantile regression approach. Research paper/UNU-WIDER 2008.53. Barros, R., Molinas, J., and Saavedra, J. (2010). Measuring progress toward basic opportunities for all. Brazilian Review of Econometrics, 30(2):335–367. Case, A. and Paxson, C. (2008). Stature and status: Height, ability, and labor market outcomes. Journal of Political Economy, 116 (3):499–532. Checchi, D. and Peragine, V. (2010). Inequality of opportunity in italy. Journal of Economic Inequality, 8(4):429–450. 7 Ferreira, F. and Gignoux, J. (2011). The measurement of inequality of opportunity: theory and an application to latin america. Review of Income and Wealth, 57(4):622–657. Fleurbaey, M. and Peragine, V. (2009). Ex ante versus ex post equality of opportunity. ECINEQ Working Paper, No. 141. Kanbur, R. and Wagstaff, A. (2014). How useful is inequality of opportunity as a policy construct? ECINEQ WP 2014, 338. Lasso, C. and Urrutia, A. (2005). Path independent multiplicatively decomposable inequality measures. Investigaciones Economicas, 22 (2):379–387. Luongo, P. (2011). The implication of partial observability of circumstances on the measurement of iop. Research on Economic Inequality, 19(2):23–49. Molina, E., Narayan, A., and Saavedra-Chanduvi, J. (2013). Outcomes, opportunity and development : why unequal opportunities and not outcomes hinder economic development. Policy Research Working Paper Series 6735. Niehues, J. and Peichl, A. (2014). Upper bounds of inequality of opportunity: theory and evidence for germany and the us. Social Choice and Welfare, 43:73–99. Ramos, X. and van de Gaer, D. (2012). Empirical approaches to inequality of opportunity: Principles, measures, and evidence. IZA Discussion Papers 6672. Rochat, P. (1998). Self-perception and action in infancy. Experimental Brain Research, 123:102–109. Roemer, J. (1998). Equality of Opportunity. Harvard University Press, Cambridge MA. Shorrocks, A. (1980). The class of additively decomposable inequality measures. Econometrica, 48 (3):613–625. Verbeek, M. and Vella, F. (2005). Estimating dynamic models from repeated cross-sections. Journal of Econometrics, 127 (1):83–102. WHO (2006). Length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: Methods and development. http://www.who.int/childgrowth/en/. 8