Policy Research Working Paper 8756 Environment and Development Penalized Non-Parametric Inference of Global Trends in Deforestation, Pollution and Carbon Bo Pieter Johannes Andrée Phoebe Spencer Andres Chamorro Harun Dogo Environment and Natural Resources Global Practice February 2019 Policy Research Working Paper 8756 Abstract This paper revisits the issue of environment and develop- intensities. Per capita emissions follow a $J$-curve. Specif- ment raised in the 1992 World Development Report, with ically, poverty reduction occurs alongside degrading local new analysis tools and data. The paper discusses inference environments and higher income growth poses a global and interpretation in a machine learning framework. The burden through carbon. Local economic structure further results suggest that production gradually favors conserving determines the shape, amplitude, and location of tipping the earth’s resources as gross domestic product increases, points of the Environmental Kuznets Curve. The models but increased efficiency alone is not sufficient to offset the are used to extrapolate environmental output to 2030. effects of growth in scale. Instead, structural change in the The daunting implications of continued development are economy shapes environmental outcomes across GDP. The a reminder that immediate and sustained global efforts are analysis finds that average development is associated with an required to mitigate forest loss, improve air quality, and inverted $U$-shape in deforestation, pollution, and carbon shift the global economy to a 2°pathway. This paper is a product of the Environment and Natural Resources Global Practice . It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/research. The authors may be contacted at bandree@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Environment and Development: Penalized Non-parametric Inference of Global Trends in Deforestation, Pollution and Carbon ee1,2,3,* , Andres Chamorro1 , Phoebe Spencer1 , and Harun Bo Pieter Johannes Andr´ Dogo1 1 World Bank Group 2 Department of Spatial Economics/SPINlab, VU Amsterdam, Netherlands 3 Tinbergen Institute * Corresponding Author: email bandree@worldbank.org or b.p.j.andree@vu.nl Keywords: Environment, Development, Penalized Inference, Non-parametric models, Kernel Regularized Least Squares. 1 Introduction Will continuation of economic development increase pressure on the earth’s finite resources, or does the increase in income provide the basis for environmental improvement? This question asked by (Grossman and Krueger, 1995) and highlighted earlier in (World Bank, 1992) is critical to the design of sustainable development strategies but continues to remain at the center of debate. Evidence from past development suggests that increases in wealth and income occur simultaneously with a structural transformation process in which the composition of inputs and methods of production used in the economy gradually shift in favor of less destructive production. We revisit the empirical issue using panel data on environmental output and economic development for a large set of countries using a flexible kernel model that allows dependencies to vary smoothly throughout the data. Interactions between the economic and natural systems are dominated by complex pathways, and the processes that produce the levels of environmental output are high dimensional. This is because environmental output is determined not only by the efficiency of production, which may improve nonlinearly across GDP, but also by the total production size, which varies across panels of countries (Stern et al., 1996; Stern, 2004). If the scale of the economy is large, minute changes in the efficiency of production can result in large differences in output levels. Therefore, if a panel is constructed that includes economies of widely different scales, the variance in environmental output levels can be expected to vary with the GDP levels of the countries. To cope with this, one should acknowledge in the model design that environmental output is de facto a result of both a scale component, and a technology component. Many empirical approaches have tried to model degradation levels directly, not dis- tinguishing between the role of scale and a technology separately, and therefore assume an unrealistic degree of homogeneity of the samples. Vollebergh et al. (2005) pay par- ticular focus to homogeneity assumptions in their environmental output regressions, and conclude that correctly modeling heterogeneity is essential to prevent spurious correlation in reduced-form panel estimations. Other issues relate simply to inappropriately dealing with the serial dependence that results from misspecification and may bias inference. This has partly been addressed by adding control variables as in studies reviewed by (Stern, 1998), or by deploying fixed-effect approaches Stern (2004). Time series approaches are also widespread, for example (Perman and Stern, 2003; Stern, 2004) that claim that the error correction approach provides appropriate diagnostic statistics and specification tests 1 for the environment-economic relationship. However, even in time, linearity and constant variance assumptions break for moderate time dimensions because nonlinearities result from the income dependence of the derivatives of the function mapping changes in income to changes in degradation. From that perspective, the non-parametric error correction ap- proach (Shahbaz et al., 2017) improves on previous work. More discussion on nonlinear cointegration can be found in (Wagner, 2015), the general conclusion is that the diagnostics available in the standard framework are not appropriate in the nonlinear case because pow- ers of integrated processes are themselves not integrated. In the non-parametric context on the other hand, causality and other correct-specification arguments are tightly related to the penalization technique, or bandwidth setting, that may take the limit criterion away from the true parameter. In this paper, we focus on a cross-comparable technology component represented by degradation intensities to cope with the heteroskedasticity related to economic scales. To allow for a wide variety of potential nonlinearities with minimal parametric assumptions, we deploy a machine learning method that learns from similarities in the data using kernels. The method is known as Kernel Regularized Least Squares (Hainmueller and Hazlett, 2014). The key reason behind this choice is that apart from flexibility and taking full advantage of the kernel learning framework, it is still straightforward to back out marginal effects. Machine learning methods are often developed with different applications in mind than the classical regression models that have been developed primarily for economic inference. Because the limit results in non-parametric models depend on externally set tuning pa- rameters that are not part of the vector of estimated parameters, it is not immediately clear whether the estimates can be interpreted similarly as those obtained using paramet- ric methods. Penalization in the non-parametric context differs from penalization in the GLM case, such as in the popular LASSO. While penalized GLM’s require the penalty to vanish asymptotically for generality claims, positive penalization in the limit may be a necessity to ensure identifiable uniqueness for non-parametric models. Regularization or penalization, while primarily known for dealing with over-fitting, is in fact a way to flexibly establish simple subspaces in which consistency theorems hold. As a result, the consistency and normality limits are uniquely defined for every level of penalization, which makes it less straightforward to interpret the estimator, and it’s derivatives, with usual confidence. However, penalties may still be found that yield estimates that conform to standard in- terpretation. Specifically, penalties that result from minimizing an out-of-sample criterion 2 pull the consistency limit toward the result that induces the optimal conditional distri- bution implied by the weighted kernel across all penalties and weights, as judged by the out-of-sample criterion.1 We use the framework together with out-of-sample selection of fixed effects, to model remotely sensed and reported environmental data for 95 countries that include 85% of the world’s population, 83% of global carbon output and 72% of all forest cover. We find that the production gradually favors conserving the earth’s finite resources as GDP increases, but our results are not supportive of a single Environmental Kuznets Curve. Rather, countries follows individual curves with heterogeneity in the location, shape, and height of tipping points, depending on the composition of the economy. Focusing the policy dialog on a single average curve may thus result in harmful economic outcomes, as the traditional EKC may provide a poor estimate of local conditions. Across different possible paths of development, we do observe that average development is associated with an inverted U -shape in deforestation, pollution and carbon intensities of production units. Per capita carbon emissions, follow a J -curve. We use the models to extrapolate environmental output forward to 2030 under alternative growth scenarios, allowing each country to progress further on it’s specific development path. These results highlight the daunting implications of continued development under the current standards of practice and remind us once more that immediate and sustained global efforts are required on multiple fronts to mitigate forest loss, improve air quality, and shift the global economy to a 2◦ pathway. The remainder of this paper is as follows. We discuss estimation methods in Section 2. Additional discussion on the role of the penalty in establishing identifiable uniqueness, and the interpretation of the estimation results are provided in our appendix. Section 3 details the data used for our empirical analysis in section 4. Finally in section 5, we use our empirical descriptions to explore the implications of continuation of growth on environmental output. Section 6 concludes. 2 Methods We use the Kernel Regularized Least Squares estimator developed in Hainmueller and Hazlett (2014). We aim to summarize the key features of non-parametric kernel estimation. 1 That result is the same as the standard convergence toward the minimizer of the expected Kullback- Leibler divergence Kullback and Leibler (1951) between the true conditional density of the data and the conditional density implied by the estimated model White (1994). Under that result, the estimator converges to the result that delivers the best approximation to the true distribution of the data. 3 We do not introduce new theoretical results, instead we aim to provide an overview that highlights differences with respect to parametric estimation. Particularly, we detail the role that regularization plays in correct inference. Readers that are familiar with penalized kernel models may proceed directly to section 2.4, in which we explain how we treat fixed effects in the estimation. In many applications, the postulated model is assumed to consist of a finite set of pa- rameters. This imposes strong assumptions about the behavior of process being modeled. Specifically, linear models assume that the relationship between two variables Y and X described by a parameter β is constant across levels of Y and X . Such strong assumptions about the DGP are rarely if ever justified by economic theory and can lead to seriously erroneous conclusions. The single parameter elasticities in linear models can at most ap- proximate the average of the nonlinear elasticities locally on a function. Linear approaches can yield useful evidence within a relatively narrow range of the overall state space, par- ticularly if large parts of the population are likely to pass through that state. However, in general, local approximations fall short when building global arguments. The errors induced by fixing the relationship at the average increase with the divergence between the average observation and the values of the observations of interest. This is undoubtedly the case in the analysis of economic development and environmental output, in which the struc- tural behavior of the outliers, such as the poorest or most polluted, are often of foremost concern to policymakers. Finite dimensional nonlinear parametric models may address several of these issues, but require strong predictions from underlying economic theory on the implied form of the structural relationship for the parameters to be economically meaningful. Such functions may be difficult to parameterize. Finite series approximators may also provide flexibil- ity, but the resulting conclusions are often different from models in which the order of approximation is allowed to vary along the sample size, see (Horowitz, 2011) for discussion. Non-parametric models make fewer assumptions about the DGP, and can produce ap- proximations with varying flexibility. The key question in this case is how flexible the empirical function should be given the data that have been observed. A regularized non- parametric model exposed to growing data is in a sense an approximator that adjusts its belief of what an appropriate description of the DGP is according to the number of observations that it has seen. Key is that this belief changes appropriately as the data grow. A non-parametric model in which the size of the model is appropriately regulated 4 results in a small size when samples are few, but may increase in dimensionality as the data grows. As a result, the approximation error declines with growing data. Regulation of the order of approximation in non-parametric models occurs through tuning parameters. Correct inference is therefore strongly dependent on values that are not estimated by the criterion, but instead set by the researcher. While the relationship between standard loss minimization and correct parameter inference, as in the linear Least Squares literature, is a basic concept well-known to many researchers, inference based on the estimators of a non-parametric model has to consider the effect of the external parameters for which result do not follow under the same consistency and normality theorems. For example, while Hainmueller and Hazlett (2014) provide consistency and normality results for their model, they state explicitly that these results are different for every level of penalization. This does not make the relationship between the penalty and the DGP clear. 2.1 Assumptions about the DGP Suppose, we observe an nx -variate T -period sequence xT := {x}T t=1 that describes the state of one economy throughout time. At each point in time, we observe N trajectories of this nx -variate sequence, i.e., we focus on repeated cross-sectional vectors of length N describing the evolution of N economies. Each vector contains observations of for example income and the composition of the economy, all indexed over a set of locations i, ..., N . The matrix Xt consisting of nx columns describing different variables and N rows describing the different locations, is indexed by time. We consider a second, repeated cross-sectional sequence y, of degradation levels generated by: y := {yt = h0 (Xt ), t ∈ Z}. (1) We can observe yT , a subset of the results of this process yT := {y}T t=1 . The function h0 : X → Y ⊆ R produces environmental output for every coordinate xt ∈ X .2 We assume that h0 is a unique measurable function that for each coordinate xt ∈ X assigns a true value yt ∈ Y for all t ∈ Z. In a sense, by assuming this particular form, we assume that the environment does not endogenously degrades itself, i.e. that y does not spontaneously generate itself. Instead, this assumes that the evolution of degradation levels for each economy y is symptomatic to external, local economic development variables x. This does 2 Particularly a B(X )/B(Y )-measurable mapping as kernels with a universal approximating property require at least that the target function is measurable, see for example (Micchelli et al., 2006). 5 not exclude the possibility that y may in part affect elements of x, it requires however that feedback effects are invertible, in turn implied by some form of stability, and follow levels in x such that h0 describes the net relationship between x and y .3 We also assume that h0 is smooth, particularly that it maps similar coordinates in X to similar values in Y . This implies that for each state of the economy at each point in time xt , we observe a level of deforestation, pollution or carbon yt that is induced by the state of the economy through the function h0 , and that for small changes in the state of an economy we expect to see small changes in environmental output. Furthermore, it assumes that for two economies that are similar in terms of composition and scale, we expect similar environmental output. 2.2 Kernel Regularized Least Squares for panels In certain situations, economic theory suggests a particular form of h0 . For example in the studies of demand curves, Phillips curves, and Engel curves, there may be a strong prior belief that h0 is nonlinear of a specific form. In more general situations, logic only results in stylized facts about local or global behavior of h0 and we may want to impose little structure on h0 . In the current case, the Environmental Kuznets theory suggests an inverted U -shape between degradation and economic development. The relationships may of course be of a completely other form or differ across environmental variables, while ideally we keep the analysis of both relationships within a similar regression framework. We therefore postulate a very flexible regression of the form ˆ t = h(Xt ; θ ), θ ∈ Θ, t ∈ Z}. ˆ := {y y (2) Our modeled function h is defined as a mapping h : X × Θ → Y , where Θ is the parameter space. In parametric regressions, Θ is assumed to be compact and finite dimensional. This immediately imposes structure on h, thus translating into assumptions about h0 if we maintain a belief that θ0 ∈ Θ. By reducing the size of Θ we simplify the possible structure of h, i.e., chances that θ0 ∈ Θ become increasingly slim. While we minimize assumptions about h0 by working with Θ as an infinite dimensional space, some assumptions about h0 are unavoidable as Θ has to be parameterized eventually. In our example, as we shall see, one still has to specify radial basis functions. 3 If y = f0 (x) + g0 (y ) with g0 describing simultaneous feedback and f0 describing the contemporaneous exogenous effects, then one can also write y = h0 (x) if g0 is invertible, with h0 (x) = (I − g0 )−1 (f0 (x)), hence h0 arises from the composite (I − g0 )−1 ◦ f0 and describes the combined effect of exogenous impulses and feedback. 6 In parametric regressions, Θ plays a key role as the Euclidean space containing all the possible coordinates of potential parameter vectors θ . In the non-parametric case, there is a subtle difference. Suppose that for every θ ∈ Θ, there is a function h(·; θ ) : X → Y that is B(X )/B(Y )-measurable. We can define HΘ (X ) as the Hilbert space containing an infinite collection of functions {h(·; θ ) : θ ∈ Θ} generated by Θ. We shall use a simplified notation to reduce cluttering and instead write that θ indexes the functions hθ ∈ HΘ . The common notation θ0 ∈ Θ is thus equivalent to saying h0 ∈ HΘ , i.e., θ0 ∈ Θ : h(xt ; θ0 ) = h0 (xt ) ∀ xt ∈ X . This clarifies that, while a in a parametric regression problem where we are fore-mostly concerned with searching a compact parameter space Θ for the parameter vector θ → θ0 , in the current framework we are explicitly interested in searching across a space of functions produced under some process of generating flexible functions from simple parameter vectors given the sample space, hX , for infinite θ ∈ Θ, for the function that best resembles the target function hθ → h0 .4 Hence, we can write the estimator also as5 ˆ T := arg min QT (yT , XT ; hθ ). h (3) hθ ∈HΘ There are many ways to generate HΘ . In the current framework, we focus on using a kernel k together with a local parameter θi that weights the surface to produce any flexible functional form. N hθ := θi k (x, xi ) = h(x; θ ). (4) i The functions hθ ∈ HΘ , are allowed to follow any kernel that has the universal approxima- − xi − xj 2 tion property, in this paper we adopt a Gaussian kernel k (xi , xj ; nx ) = exp nx with xi − xj being the Euclidean distance, and nx being a fixed bandwidth equal to the dimension of xT . We count the constant as being part of xT . The kernel k can be understood as a measure of similarity, which is seen by applying a Cauchy-Schwarz inequality k (xi , xj )2 ≤ k (xi , xi )k (xj , xj ) ∀ (xi , xj ) ∈ X , revealing that if xi and xj are similar, then k (xi , xj ) will be close to 1, and close to 0 4 Specifically, each θ indexes a member in HΘ (X ) according to the map hX : Θ → HΘ (X ) with hX (θ ) := h(·; θ ) ∈ HΘ (X ) ∀ θ ∈ Θ. 5 The criterion function QT can also be written as QT (XT , h0 (XT ), h(XT ; θ )), as we started under the notion that yT = {h0 (Xt )}T t=1 = h0 (XT ). Instead of highlighting that we search for functions, this representation makes the direct connection between the criterion function and target function h0 , that generated the data, explicit. 7 when xi and xj are dissimilar. For a given observed collection (y, x ∈ x), hθ is thus a function resulting from placing kernels over xi and scaling the similarity surface using local coefficients θi such that the summated surface approximates the true density of the data. This produces flexible functions that can describe local relationships between y and x by assigning similar observations a similar scaling factor that maps onto similar output. Different parameterizations of the local coefficients θi may produce equally well, e.g. perfect, fits, such that the problem of estimating the vector θ = (θ1 , ..., θN ) is generally ill-posed without adding further structure to the problem. The specific estimation strategy to learn about the trends in the data is therefore of the form ˆ T := arg min QT (yT , XT ; hθ ) − π (hθ ), h (5) hθ ∈HΘ where π (hθ ) > 0 ∀ hθ ∈ HΘ is a strictly positive function that monotonically increases by a measure of complexity defined on hθ . The penalty is critical to ensuring identifiability and consistency of the estimator within simple subset spaces of HΘ . At the same time it allows to fit nonlinearities of varying smoothness while working with a fixed kernel bandwidth that produces a relatively smooth similarity surface, as θi is able to scale the nonlinearities locally albeit at a cost π (hθ ). Hence, it favors less complex solutions to the criterion function by penalizing the high frequency domain in HΘ . Specifically, let K be an N × N symmetric kernel matrix with entries k (xj , xi ) measuring pair-wise similarities. This yields a model that is a linear combination of basis functions, each measuring similarity of one observation to another observation in the data set, and mapping it to a local output.    k (x1t , x1t ) k (x1t , x2t ) · · · k (x1t , xN t ) θ1t ..     k (x2t , x1t ) .   θ2t     yt = h(Xt ; θt ) = K (Xt )θt =  .  .  . .  .   .  .     k (xN t , x1t ) k (xN t , xN t ) θN t (6) The need for a regularization technique is obvious, the parameters (θ1t , θ2t , ..., θN t ) can always rescale the similarity surface to match yt perfectly. Instead, the penalized estimator, takes into account the complexity of the rescaling by introducing a factor λ hθ 2 and K chooses the best fitting function by minimizing: N T arg min (yit − h(xit ; θ ))2 + λ hθ 2 K, (7) hθ ∈HΘ i t 8 N T in which i t (yit − h(xit ; hθ ))2 are the standard sum of squared residuals. λ hθ 2 K = hθ , hθ HΘ is a penalty that increases monotonically as a function of the complexity of h under θ . We focus on the L2 norm. Finally, λ ∈ R>0 is te parameter that determines the strength of the penalty. Using this kernel we can work with an N T × N T kernel matrix by defining the dependent variable Y as the N T length vector resulting from stacking the time observations, X as the N T × nx matrix resulting from stacking the columns similarly and θ as an N T length parameter vector.6 Using the Gaussian kernel, eq. (7) becomes ˆ N T = arg min (Y − K (X )θ ) (Y − K (X )θ ) + λθ K (X )θ . h (9) hθ ∈HΘ In a panel application, the functions hθ that result from weighted kernels can produce interesting time-varying dynamics across levels of Xt . This is for example appropriate when a time-varying stationary processes is of interest in which the nonlinearities change throughout the data but are not depending on time itself. Alternatively, one can work with time itself as a covariate, in which case processes that are only locally stationary can be modeled. Intuitively, the kernel approach then results in similar coefficients for similar time. In the case of non-stationary data, the kernel can approximate local conditional means in the data that may vary throughout the sample space. 2.3 The role of the penalty The basic idea of penalizing the criterion function has been explored in many statistical applications, and is for example at the heart of the widely adopted LASSO estimator (Tib- shirani, 1996; Zou, 2006). The added structure to the criterion function is a frequentist’s analogue to the role that the prior plays within the Bayesian framework. We note that the penalty in the current setting is not primarily a way to improve small sample perfor- 6 Specifically:       y1t x 1t θ11   y2t     x 2t     θ21    . .   . .   . .    .     .     .     yN t     xN t     θN 1     y12     x12     θ12     y22     x22     θ22   Y = . ,X =  . ,θ =  . . (8)   . .     . .     . .          yN T   xN T   θN 2          y1T     x 1T     θ 1T     y2T     x 2T     θ 2T    . .   . .   . .   .   .   .  yN T xN T θN T 9 mance, but that it is in fact the central feature of the learning model that determines what functional forms can be fitted. This differs from kernel approaches in which the bandwidth is the key tuning parameter. In the current approach the bandwidth is fixed to produce smooth functions, but nonlinearities are subsequently locally adjusted using the vector of weights θ to increase flexibility. The penalization approach is able to shrink the hypothesis space and flexibly establish a subspace in which consistency holds. By balancing between fit and complexity of the locally weighted kernel, the size of the subspace can be regulated by the penalty. In the case of penalized GLM’s considered in Blasques and Duplinskiy (2015), nonzero penalties take one away from θ0 in the limit if the penalty effect does not vanish asymptotically.7 In that sense, a penalized criterion delivers a pseudo-true parameter with a divergence from θ0 that is controlled by the penalty function. Setting an appropriate penalty therefore determines what one can infer from θ0 . In the current context, positive penalties are a necessity to ensure uniqueness. This might lead to the thought that penal- ized non-parametric estimators that require positive penalization, are biased by definition. ˆ obtained through eq. (9) is different for every value of λ. The estimate of the weights θ The tuning parameter λ thus represents the researcher’s predefined level of tolerance for accepting nonlinear functions. High values of λ force the model to linearize it’s dependen- cies, whereas extreme values for λ will set all coefficients to zero and describe the data using only an average expectation. Hence for every penalty, we find a different functional ˆλ through eq. (4) given a specified kernel. Since λ itself ˆ λ induced by the estimate θ form h is not an estimated parameter, it is generally difficult, if not impossible, to tell whether ˆ λ − h0 , ˆ close to h0 . Without knowing the magnitude of h eq. (9) yields an estimate of h the method may seem rather useless for economic inference. olkopf and Smola (2001), suggests to set the penalty through an out-of-sample pre- Sch¨ diction minimization problem to remove the dependence of the results on the external in- fluence of the researcher that determines the level of penalization a priori. Hainmueller and Hazlett (2014) suggest one such strategy for Kernel Regularized Least Squares estimates by minimizing out-of-sample prediction errors over a vector λ ∈ Λ based on leave-one-out predictions, noting that it performs well in practice. While practical performance and the removal of external influence on the results provide intuition to set penalties in this way, it ˆ λ − h0 is in fact minimized. In our appendix, we does not focus on the question whether h 7 Furthermore, θ0 in the standard context is the true parameter. In the non-parametric context, that true parameterization arguably does not exist, however one can think of θ0 as the parameterization that produces h0 through the kernel, or alternatively, selects h0 the true (non)linear functional form HΘ that produces the true density of the data. 10 provide additional discussion on the role of the penalty in ensuring identifiable uniqueness and establishing the consistency and normality results. We discuss that the strategy to set penalties by minimizing an out-of-sample criterion naturally pulls the estimator toward the weight vector that induces the true function in the limit, such that inference can be ap- plied as usual. This is because for a given penalty λ, the estimated function conditional on ˆ |λ provides the optimal density across all functions h ∈ HΘ |λ induced under that penalty h that penalty, so choosing the estimate from a set of results found using different penalties ˆ λ − h0 ˆ |λ ∈ Λ that provides the optimal out-of-sample density, also minimizes h h λ∈ Λ in the limit since h0 is the function that by definition provides the best out-of-sample density. In other words, estimating eq. (9) while setting λ based on out-of-sample prediction error ˆ − h0 in the limit across minimization, yields an estimated function that minimizes h the entire family of models generated under all weight vectors and all penalties, which is similar to the standard case in which the criterion converges to the parameter that induces the best conditional density across the entire parameter space (White et al., 1980; White, 1994). 2.4 Fixed effects and out-of-sample shrinkage Linear effects can be included by using difference estimators as detailed in Hainmueller and Hazlett (2014). Nonlinear effects can be modeled by supplying group specific trend variables and group identifiers through X . In this case all coefficients may depend on time, and similarly across similar groups in the data. Nonlinear fixed effects approaches combined with non-parametric parts around the economic variables may result in models with an enormous size while often the amount of observations locally in the time dimension, remains relatively small in environmental economic panels. Model size not only relates to the complexity of functions around the economic variables, but also to the number of fixed effects in the model. In-sample selection strategies to decide on the right number of effects, are complicated in the regularized non-parametric context. While standard regressions additional variables always improve fit, this is not the case in the current context. Adding fixed effects results in different complexity of the local weights vector. Therefore the effect of the complexity penalty in the criterion may increase such that the penalized estimator adjusts the weighting vector to achieve lower complexity. While this reduces the penalty value, it may possibly lower the in-sample R2 . Comparing models with and without fixed effects is therefore a comparison between functional forms with different complexity and 11 nonlinearities. This is a comparison of non-nested models with an unknown, possibly real- valued, difference in degrees of freedom.8 To decide on the right number of effects, we start by estimating a model that includes all fixed effects. We then remove the least significant dummy, and obtain new results. We repeatedly evaluate the out-of-sample prediction performance while shrinking the effects, and select the model with the optimal out-of-sample density across all fixed effect models. Intuitively, this approach starts with a model similar to a linear Fixed Effects model, as the penalty heavily discounts the thresholds introduced by the effects resulting in flattened marginal effects, and gradually allows fixed heterogeneity to be explained by nonlinearities across covariates instead. As a result, our final estimates are guaranteed to be preferred over the standard linear Fixed Effects model, as judged by the out-of-sample criterion. 3 Data We combined measures of tree loss, air pollution concentrations and carbon emissions on one side, and GDP indicators of economic structure on the other. Our data are from a variety of sources and includes 95 countries measured over 1999 − 2014 containing approximately 85% of the world’s population, 83% of the world’s carbon output, and 72% of the world’s forest cover. The selection is mainly driven by availability of data as detailed in the following sub-sections. We have removed areas below 1500 square kilometers − essentially all small islands − from the analysis. A summary of the data as it enters our regressions is given below. A list of all countries included in our sample can be found in fig. 1. Table 1 summarizes the data, all predictors are mapped into the [0, 1] interval to ensure the penalization effect is not driven by differences in variance of the different variables. We scale back the estimation results for easier interpretation. 3.1 Forest cover data We use data from (Hansen et al., 2013), which contains estimates of global tree cover extent (2000) and annual tree cover loss (2001-2014) at a spatial resolution of 30 meters.9 Hansen et al. (2013) analyzed satellite images from Landsat 5, 7, and 8 to identify tree cover extent, defined as vegetation taller than 5 meters in height, and loss, defined as complete 8 Degrees of freedom is a parametric concept whose translation to the non-parametric setting is complex. One can approximate the degrees of freedom empirically, which may result in numbers that are real-valued. 9 The data can be found at https://earthenginepartners.appspot.com/science-2013-global- forest/download v1.2.html. 12 removal of tree cover canopy. The authors reported the tree cover loss data to have a false positive rate of 13 percent, a false negative rate of 12 percent, and a ratio of total forest gain to loss over 2001-2012 of 0.34. The derived data differ from statistics reported by the UN-FAO’s Forest Resource Assessment, but due to the consistent methodology and definition of forests across countries, we believe these data are better suited for a global analysis. We define forests as pixels with a minimum canopy closure density of 30 percent. Finally, we convert the data to area measures and sum the data by country to calculate tree cover loss as a percentage of tree cover extent in 2000. Our intention is to examine “natural dense forests”, but note that the data also capture forest plantations. Our loss measure is thus only a proxy for deforestation, as there may be many other natural and anthropogenic processes (storm damage, fires, mechanical harvesting) that are reflected by the data. We refrain from including the tree cover gain data as there are significant differences in methodology that limit additivity or comparison with loss. 3.2 Pollution data We use concentrations of fine particulate matter (PM2.5 ), coarse dust particles of 2.5 mi- crometers in diameter, as a proxy for broader air pollution. The (0.01◦ × 0.01◦ resolution) data were developed by van Donkelaar et al. (2016) and include global annual ground-level PM2.5 (1999-2014) derived from a combination of satellite-, simulation- and monitor-based sources. The dataset has been developed from satellite-derived Aerosol Optical Depth reflectance values calibrated to ground-based PM2.5 observations using a Geographically Weighted Regression. Remote sensing methods aim to observe particulate matter but are prone to capturing fine dust released from barren lands that have similar reflectance properties in the high frequency spectral wavelengths. This poses a difficulty in our analysis, countries in desert regions have high country wide average pollution levels, while large countries, or those with substantial forest cover where ambient pollution is low, have lower average concentration to what the larger population is exposed to on a regular basis. We used gridded population data that are produced using a combination of light at night data and census data, to identify patches of urban areas.10 We averaged gridded pollution data that falls within urban boundaries to the country-level, defining urban areas as places where population density was higher than 300 people per square kilometer. The results in fig. 1 show that 10 The population grids are from http://sedac.ciesin.columbia.edu/data/collection/gpw-v4, we use the 2000 grids. 13 this procedure results in higher pollution levels in large countries with known pollution problems in cities (notably China, Nepal, Pakistan) or those with forests (notably Lao PDR, Indonesia, Senegal), and in lower concentrations in areas with known deserts (notably Chad, Tunisia, Morocco). Difference between pollution level in urban areas and country averages 20 15 10 5 0 −5 −10 Chad Tunisia Morocco Cameroon Ghana Guatemala Thailand Ethiopia Benin Swaziland Sri Lanka Zimbabwe Togo Turkey Gabon Botswana Solomon Islands Guinea Mozambique Fiji Rwanda El Salvador Zambia Bolivia Madagascar Netherlands Uganda Germany Sierra Leone Malawi Nicaragua Cambodia Costa Rica Denmark Australia Venezuela, RB Tanzania Paraguay Dominican Republic Finland Timor−Leste Congo, Rep. Belgium Nigeria Namibia Panama Norway Ireland Colombia Kenya Ecuador Philippines Lesotho Japan Spain Honduras Sweden Brazil Bangladesh France Mongolia Switzerland Vietnam Suriname Iran, Islamic Rep. Greece United Kingdom Guinea−Bissau Austria Uruguay Azerbaijan United States Central African Republic Canada Indonesia Argentina Italy Mexico Uzbekistan Malaysia Senegal Angola South Africa Georgia Lao PDR Peru Chile India Armenia Tajikistan Kazakhstan Pakistan Nepal China Cote d'Ivoire Figure 1: Difference between average pollution in urban areas and country-wide average PM2.5 data. 3.3 Vegetation control data We use the NDVI from the Moderate resolution Imaging Spectroradiometer (MODIS) from NASA’s Terra satellite to control for effects that relate to a variety of physical characteristics and natural assets of a country.11 This dataset provides spatial and temporal comparisons of global vegetation conditions. The original data have a monthly frequency at a resolution of 1km. We calculated the mean NDVI value for each year in our analysis, we use 2000- 2015 data, and summarized the data to the country-level using the mean, minimum, and maximum value to get a broad description of the vegetation in a country. 3.4 Other data For our economic variables and data on carbon dioxide emissions, we rely on the World Bank’s World Development Indicators (WDI). To ensure cross-country comparability, we 11 Available at https://modis.gsfc.nasa.gov/data/dataprod/mod13.php. 14 use GDP per capita in constant 2011 international dollars adjusted for purchasing power parity. The CO2 emissions estimates retrieved from WDI were produced by the U.S. De- partment of Energy’s Carbon Dioxide Information Analysis Center (CDIAC), and include anthropogenic emissions from fossil fuel consumption and world cement manufacturing. 3.5 Missing values and outliers Forest cover loss included two outliers of respectively a 10.8% and a 5.4% loss in Namibia (2001, and 2005), for comparison the median observation across time in this country was 1.7%, we have capped these numbers at 3% which, seemed an appropriate maximum for the range of forest loss after inspecting a kernel density. We applied a 3 period simple moving average to further smoothen outliers. Our final dataset includes the losses for countries that held 72% of forest cover in 2000. The largest missing forest patch is that of the Russian Federation. The WDI dataset contains a wealth of information, but some important observations are missing. GDP has only .12% missing, manufacturing and services GDP shares have 5.57% missing. We interpolate these values by taking a weighted average of the nearest observations in the time dimension. The dataset includes approximately 83% of global carbon emissions and should be quite representative of missing countries as it is close to the 85% of world population included in our samples. The WDI does not report income shares in each country but sometimes reports a Gini index. We used this to back out income shares.12 After using both the Gini and reported shares, 61.25% of the observations remain missing, but missingness is only over the time dimension. In the countries that have more observations in the time dimension we observe that the income shares are relatively stable over time. We interpolate the remaining missing values using weighted averages of the nearest observations in the time dimension. Poverty rates has 69.54% missing values. A large part of the missing values are due statistics that are not produced in high income countries. We impute all missing poverty and undernourishment rates above 23, 000 GDP per capita with zero. This completes undernourishment. The highest GDP per capita with a positive poverty rate in our data was Malaysia, with 24, 500 GDP ppp per capita and, a poverty rate of 1.3%, all other 12 We make use of the fact that income shares held by a certain share of the population can be read off the Lorenz-curve and that the Gini is a measure of dispersion of the Lorenz-curve calculated from the summed surfaces under the Lorenz-curve and under the 45◦ line. In total we are able to collect 937 observations of both Gini coefficients and income shares held by the first two quintiles. We estimate the nonlinear inverse map with high precision (R2 of .99) using the penalized non-parametric estimator. We then used the Gini observations to predict the income shares. 15 countries over 23, 000 like Kazakhstan, already attained 0% poverty rates in the published data. After, 49.54% of poverty remains missing because most countries are not complete in the time dimension. We first interpolate these variables by taking a weighted average over time. This works well for most variables, but may yield poor results for poverty, as we have seen a tremendous improvement in most countries in past years. We improve the time dynamics in the interpolated poverty data by using information about time dynamics contained in our other variables. We vectorize the interpolated values, and fit the kernel model using the full set of undernourishment, logarithmic GDP per capita, the share of manufacturing, services, urban population shares, and bottom 40 income shares. The model reaches an R2 ’s of .91. We use this model to smoothen the interpolated poverty values by taking an average of the interpolated values and the values predicted by this nonlinear model. Table 1: Summary of the data used in our empirical application. Statistics are not weighted and not necessarily representative of the world averages. Statistic Mean St. Dev. Min Max Annual % Tree loss 0.437 0.406 0.009 2.924 Urban PM2.5 mcg/m3 18.924 12.609 0.311 63.498 CO2 kg / $ 0.239 0.199 0.014 1.990 CO2 ton p.c. 3.402 4.162 0.015 20.208 GDP ppp p.c. 2011 international $ 13,468.880 15,056.710 555.560 64,979.840 Population density, people / sqkm 101.592 138.270 1.524 1,148.514 Undernourishment rate 15.514 13.546 0.000 64.500 Poverty 1.90$ at 2011 international $ 21.177 22.040 0.000 84.740 Manufacturing GDP share 14.371 6.721 0.237 38.733 Services GDP share 70.065 12.734 29.279 93.881 Urban population share 53.685 22.754 12.082 97.818 Bottom 40% income share 15.970 4.130 7.510 28.024 NDVI annual mean 0.502 0.163 0.111 0.762 NDVI annual min 0.327 0.164 −0.027 0.657 NDVI annual max 0.655 0.161 0.170 0.862 Forest cover 2000 extent million ha 3.628 2.815 0.0005 9.883 Country area sqkm 978,152 1,999,320 15,007 9,904,700 3.6 Transformation to degradation intensities To address homogeneity concerns related to the scales of sovereign economies, we model standardized units of deforestation, pollution, and emissions, subsequently standardized per unit of GDP per capita in 2011 international dollars adjusted for purchasing-power parity. These environmental intensities follow a nonlinear trend with variance declining 16 as GDP increases. Figure 2 shows that the variance in the logarithm of the degradation intensities of GDP is stable across GDP per capita. % Tree cover loss Log tree loss intensity of GDP per capita 3.0 Log % loss/ha per 1000$ GDP per capita q qqq qqq q q q qq 0 q q q qq qq q q qqqq q q qq qq q q q q qq q q q q q qqq q q q 2.5 qqq q qq qq qq qqq qq qq q q qq qq q q qq q q qq qq q qq qq qqq qq q qqq q qq q q q qqq q q qqq q qqq qq qqq q q q qq q q q q q q qq q qq qqqq q q qq q qq q qq qq qqq qqq q q q q q q qq q q q qq q q qqqq q q qq qq qq q q qq qq q q qq q qq q q q q q q q qqq q q q q qqq qqq q q q q q qq qqq q qq q q qqqq qq q qq q q q q q q qqqq q q qqqqq q q q qqq qq qq qq qq q qqq q qq q q qq qqq q q qq qq q qqq q q q qq q qqq q q q qqq q qqq q q q q qqq qq q −2 q q q q q q q % Tree cover loss q q qq q q q q qqqq qq q qqq q qq q qq qq qq qq qq q 2.0 q qq q q q q qq qq q q q q q qq q qq qqq qqqqq q q q q q q qq q qq q qq q q q q q qqq q qqq qqq qqq q qqq qqq q q q q q q q q qq q qq q qqq qq q q q qq q q q qq qq qq q qqq q q qq qqq q q qq qq q q qqq q qq q q q qq q q q q q qq q qq q q q q q qq q qqq qq q q qq qqq q q q qq qqqq q q q qqqq qq qq qq q q q qq qq q qqqq q q qq qq q q q qq qqq q q q q q q q q qqq q q qq qq qq qq q q qq q qq q qq q q qqqqqq q qq qqqq q q qq q q q q q qq q q q qq q q q q qqq q q q q q q q q qq q q qqq qqqq q qq q qq qq q qq q q q q q q q q q qq q q q q qqqq qq q q q qqq q q q q q q q q q q q q q q qq qq qqqq qq qq qqq 1.5 q q qqq q qq qq q qq qq q q q q q q q q q q q q qqq q q qqqqq q q q qq q qq q q qqq qq q qq q q q qq qq q qqq q q qq q q q q qq qqq q qqqq qq qqq q q q qq qqqqqq q q qq qq q q q q q qq q qqq qqq qq q q qq qq q q qqq q q q qqq q q q q qqq q q qq q qqq q qqq q qqqq qqqq −4 q q qqq q q qqq q q q q q q q q q qq q qq qq qq q qqq qq q qqq qq qq qq qq qq q q qq qq q qq q q q q q q q q q q qq q q q qqq q q qq qq q qq q q qqq q q q qq q qqq q q q q q q q qq q q q q q q q q q q q qq q q q qq q q qqq q q qq q qqqqq q qq q qqq q qqq q q q qq q q q qq q q q q q q q q q q q qqq q qqq q qq q q q qq q q q qq q q qq q q qqq q q q qq q q q q q q qq q q qq qq q q qqq q qq qq qqqqq qq q q q q q qq qq q q q q qqq qqqq qq q qqq q q q qqq q qqq qq q q qq qq qq qq qq qqq q q qqq q q q qq qq qq 1.0 q qq q q qq q q qq q q qq q q q q q q q q q q q q q qqq q q qq q qq q q q q qq q qq qq qq q qq q qq qqq qq q q q qq q q q qq q q q qq q q q qq qq q q qq q q qqq qq q q q q q q q q q q q qq q q q qqq q qqqq qq qqqqqq qqq q q qqq qq q q qqqq qq qq q qq q q q q qq qq qqqq q q q q qqq qq q qq qq qqq q qq q qqq q q q qq q qq q qq qq qq q qq q q q q q q q q q q q qq qq q qq q qqq q q qqqq q qq q q q qq q q q qqq qqq q q q qq q q qq qqqq q q qq qq q q q q q q qq q q q q q qq q qqq qq q qq qqq q qq q q q q q q qq q q qqq q q q q qq q qq qqqq qq q q qqq qqq q q q q q qq −6 q q qq qq q qqqq qqq qq qqq q q q q q q q q q q q q qqqq q q qq q qq qq q q q qq q qqqq q q qqq q q q qq q qqq q q q q q q q q qq q q 0.5 q qq q q qq q q q qq qq q qq q q q qq qq qq q q q qq qqqq q qq q q q qq qq q q qq q q qq qq qq qq q q qqq q qqqqq qq q qqqqq qq qq qq qqqq q q q qq qq qq q qq q q q q q q qq qqqqq qqq q qq qq q qq q qq q q q q q q q qqq q q qq q qqq q qqqqq qq q q q q q q qq qq q q qq qq q q qq qq q qq qq qqq qqq qq qq qqq q q qq qq q q qq qq qq q qqqq q qqq qqq qqqq qq qqqq q q q qq qq q q qqqq q q q q qq q q q q q q qq q q q q qq qq q q qqq qq q q qq q q qq qq qqqq qq qqq q qqq qq qqq q qq q qqq qq qq q q qq qq qq q q q qqq qq q q q q q qq q q q qqqq q q q q qqq q qq qq q q qqq q qqqqqq q q qqq qq qq qqq qqq q qqqq q qq qq qq q q q qqqq qqqqq qqq q qqq q q q qq q qq qq q q qq q q q q q qq q q qq qq q qq qqqq q q q qqq q q qqq q q q qqq qq qq qq qq qqq qqq qqq qq qqq qq qq q q qqq q qq qq qq qqq q q q qq qqq qq q q q qq q qqq q q qq qq q qqq q q qqqqq qq q qq q q q q q qqq qqqqq qq q qqq q q q q q q qq qqq qq qqq q q qq q q q q q q qqqq qqq qqqqq q q q q q qq q q q qq qqq q qqq qq qqq q q q qq qq qqq qq q qq qq q q q q q qq q q q qqqq q qq q q qq qq q q q q q q q q q qq q qqq q q qq q q q qq q qqq qqq qq q q qqqq q q q qqqq q qq qq qq q q qqqq q qq q q qqq qq q q q qqq q qq q qqq q qq qq q q q q q q q q q q qq q q q q q q q q qq q qqq q qqq qq qqq q q qq q q q qqqq q qq qq q q q q q qqqq qq q qq q q qq q q qq q q q 0.0 q qq qq qq qq qqq q q qqq qq qqq q q q qqq q qq qqqqq qqq q qq qqq qqq q q q qqqqqqq q q q q q q qq qqq q qqq qq qq qqqq qq qq qqq q q q qqq qq q q qq qq qq qqq q qq qq qq qq qqq qqqqqqqqq qq qq q q q 7 8 9 10 11 7 8 9 10 11 GDP per capita log scale GDP per capita log scale Air pollution concentrations Log pollution intensity of GDP per capita Log PM25 mcg/m3 per 1000 GDP per capita 4 q q q q q 60 q q q q qq q q q qq q q qq qq qq q q q q q qq q q qq q qq qqq q qqq q qq q qq q qq q q q q qq q q q q qqq q q q q q q q q q q q q q q q q q qqq q q q q qq q qqq q q q q q q q qq qq q qq q q q q qq qqqq qq qq q q q q q qq q qq q qqqqq q q qq 3 q q q q qqq qqq qq q q qq q q q q q q qqqq q qq qq q q q q q q q q q qqq q q q q q qq q qq qqqqq qq qq q q qq q qq q qq qqq qq qqq q qq 50 q q qq q q qqq qq q q q q q qqqqqq q qq qqq q q qq q qq qqq q qqqqqqq q q q qq qq q q q q qq q q qq q qq q qq q q q q qq q qq q q q qq q q q q qqqq qqq q qq q q qq qq q qqqq q q q q qq qq q q qq q qqqqq q qq q q qq q qqqq q q q q q qq q qq q q q qq q q q q qqq qq q qqqqqqqq q q q q qq qq qqq qqqqq qqqq qqq q q q q qq PM25 mcg m3/$ q q q qq qq q q 2 qqq q q q q q q qq q qq q q q q qqqqqq q qqq q q q q qqq q q q qq q qq qq q qqqqq qqq q qqq qq qq q q qq q qq q qq q q q q q qq q qq qq qq q q q qq q qq q q q q qq q qq q qq q q q q q q q q qq q 40 q q q q q q q q q qq qqq qqq q qq q q qq qq q qqqqqq q q q q qqqqq q q q qq q q qq qq q q qq qqq qqq qq q qq q qq qqq qqqq q qq qq q q qqqqq q q q q qq qq q q qq qq q q q q q q q q qq q q q q qq q q q qqqq q qq q q qq q q q q q q q q qq q qq q q qqqq q q q qq q qq q qq q qqqq q q q qq qq q qq qq q qqq q qqq q q qq q qq qq q qq qq q q q q q qq 1 q q qqq qq q qqq q q q q q q q q qqq q qqq q qq q qq qq qqq q q q qqq qqqq q q q q q q qqqq q q qqq qqqq qq q qq q qq q q q q q qqq q q q qq qqqq q qqq q qq qqqqqqq qq q q q q q qq qq q q qqq qq q q q qq q q qq qq q q q q qq q q q q q q q q qq q q qq q q q q qq q q q q qqq q qq qq qq qqq qq q q 30 q q q q q qq q qq qq qqq q q qqqq q q q q q q q q q qq q q qqq qq qq qq q qqqqq q qq qqq qq q qq q qqq qqqq qq q q q q qq qq qq qq qqq q q q q q q qq q q q qq q qqq q qq q qqq q qq q q qq qq q qqq qqqq qq qq q qq q qqq qq qq q q q q q qq q q q q q qqq q qqq q q q q q q q q qqq qq qq q q q qq q qq q q qq q q q qq q q qq qq q qq qq q qqq q qqq qq q q q q q q q q q q qqq q qqq q q q q q q q q q q qq qq qq qqqq qq qq q qq q qqqq qqq q q qq q q q q qq q qq q qqq q 0 q qq qq qq qq qq q qqq qqq qqqqqq q q q qq qq q q q qq q q q q q q qq q q qqq q q qq qq q q q q q q qq q q qqqqq q q qq q qq q qq q q q qq qqqq q q qq qq qqq qqqqqq q q q q q q qq q q q qq qqq qqq q q qqq q q qqqqqqq qq q q q q qq q q q q qq q q qq q q qq q qq q qq q q q qq qq q qq qqq qqq qq q qq q q q 20 qq q qq q q qq qq q qqq qq qq q q q q qq q q qqqq qqq q q q q qqq qqq qq qq q qqqq qqqq q qq q qqq qqq q q q q qqqq qqqq qq qq qqq q qq q qq q qqqqq qq q qq q qq q qqq qq qq qq qq qq q q q qq q q q q qq qq q qqq qqq qq q qqq q qq q qq q qq qq qq qq qq q q qq q q qq qq qqq q qq q q qqq qq qq q q q qqq q q qq q q q q q q q q q qqq qqq qq q qq qq qq qqqq q q qq q q qq qq q qqqq q q qq q q qqq q q qq qq qq qqq qq q qq qq q q qqq q qqqqq qq q q q q q q q q qqqqq qq q qqq qq qqq qq qq −1 q qq qq q qqq qq q q q q q q qqq qq qq q q q qq qq q qqq q qq qq q q q qqq qq q qq qq q qq q q qqq q qqqqqqqq qq q qq q q qq qq qq q qqqq qqq q q q q q q qq q q q q qq qq qq qq q qq q q qqqqq qq q qq qq q q q q q q qq q qq q qq qq q qq q qqq qq qq q qqq qq qqq qq q q q q q q qq q q q q qq qq q qqq qqq q q qq qq q q q q q qq q q q q qqqq q qqqq qqq q q q qq qq qq qq q q qqqq qq q qq qq q qqq q q q qq q q q q qq q q qq q q q q q q qq qq q qqqq q q q q q qq qq q q q q q q q q q qqq q q q q q q q q q q q q q q q q q qqqq q q q qq q q qq q q q q qqq qqqq qq q q q qqq q qq qqq qqqq qqqqqq q q q qq qqq qqq q qq qq q qq qq q q q qq qqq q q q q q qq q q q q qq q q q qq 10 qqq q qq q q qq q qqq qqq q qq q qq q q qq qq q q q qqq q qq q q q qq q q q qq q q qq q q qq q q q qq q qq q qqq qq qq q q q q qq q q q q q qqq q q q q q q q qqqqqq q q q qq q q q qq q q q q qq q q q q q q q q q qqq qqq q qq q qq qqqq q qqqqq q q qqq q qq q q qq qqq qq qq q q qq qq q qqq qq qq q q q q qq qqq q q qq q q q qqqq qq q qq q qq qqq q q qq qq q qq qq qq q qq q q qq qq q q q qq q q qq q qq qq q q q q q qq qq q q qq qq qq qqq q q q qqq qqqq qqq q q q qqq qq qq q q q q qqq q qq qq qq q qq q qq qq qqq qq q q q qq qq qq qq q qq q q −2 qqqq qq qqq q q q q qq q q qq qq qq q qqqqq q qq qq qq q qqq qqq qqqq q q qqqq qqq qqq q q q qqq q q q q q qq qq q q q qqq q qq qqq q qqqq q q qq qqqqqq q qq qq q q q q qqq qq qq q qq qq q q q qqq qq qq qq qq qq qq q qq q qqqqq qqqqq q qq qq qq qq qq q qq q q qq q q q q 0 q 7 8 9 10 11 7 8 9 10 11 GDP per capita log scale GDP per capita log scale q q q q q Log carbon intensity of a dollar intensityq ofqa dollar Carbon q q per GDP per capita 0 Log CO2 kg/$ per $1000 GDP per capita 1.0 q qqq q q q qq qq −1 q q q q qq qq q q q q qq qqq q q q q q q 0.8 qq q q qqq q q q q q q q q q q CO2 kg per ppp $ GDP q q q q q qq q q q qq q q qq qqq q q qq q qqq qqq q q q qqq qqq q qq q qq qq qq q qq q qq q q q qq q q q q qq q q qqq q q q −2 q q q qqq qq q qqqq qq qq q qqq qq q qq qqq q q q q qq qq qq qq q q q q qq qq q qq q qqq q q qq qqq q q q q q q qq q q qq qq qq qqqq q q qqq q q q q qq qqqq qq q q q qqqqq qqqq q q qq q q q q qq q q q q qq q q q q qq q q qq q q qq qqq qq q q q q qq q q q q q q qq qqq q qq qqq qqqqqq qq qqqq qq q q q qq q q qq q q qq q q q q 0.6 q q q q qq q qq qq q qqqq q qq q qqq qq q q q q q q q q qqq q q qqqqqqqq qqqq qq q qq qq qq q q q qq qqq qqq qq q q qqq qq qqqq q qq qqq q qqqq qq q q q q qq q q q q q qq qq qqq q q qqq −3 qq q q qqq qq qq q q q q qqq qq qq q qq q qq qq qq q qqq q qqq qq qqqq qq q q q qq qq q qqq q q q qqqqqq qqq qqq qq qq q qqq q q q q qq q q q q q qqq q q q q q qq q qq qq q q q q q qqqq q qqqq q q q q q qq q q qqq qq qq q q q q qq q q qq q q q qq qq q q q q q q q q qq q qq q q qq qq q q q qq q q qq qq q q q q q q qq q q q q q qq qq q qq q q qqqqqqq qqq q qq qq qq qq q q q qq qqq qq q qqq q q qq q qqqqq q qq qq q qq q q qqq qq q qqqq qqq qq qqq qq q q q q qq q q q q q q qq q q qq q q qq q q q qq qq q q qqqq q qqq q qq qq q qq q qqqqq qq q qqq q q q qq q qq qqq q q q q qq qq q qq q q qqq qq q qq qqq qqq qq qq q q q q q q q q qqq qq q qq q q q q q q q 0.4 q q q q q q q q q qqq qqqq q qqqq q qq qqqq qq −4 q qqq q q q q q qqqq qqq q qqqq qq q qqq qq qq q q qq qq q q q q qq qq qq q q qq qqq qq qqq qq q qq qqq q qq qq qq q qq qqq q q q q q qq q q q q q q q q qqq qqqq qq qq q q q qq q qqq q q q qq q qq qq q q q q q q qq q qq qq q q q qq q qq qqq q qq qqqq qqqq qq qq q qq q qq q q q q q qq qqqqqq qq q q q qq q q q qqq q qq qq q qq qq q qq q qq qq q qqq q q q qqq qq qq q qq q qq qq q qqq q qq qq qq qq q qq qq q q qq qqqq qq qq qq qq q q q q qq q q q q q qq q q q q qq qq qq qqq q q q qqq qq q qq q q q qq q q qq q qqq qqqq q q q qq q qq q qq q q qq q q q qqqq q q q qq q qq q q qq qq qqqq qqq q q q q q qqq qq qq q q qq qqq qqq qqqqq qqqqq q q qqq q q qqq q q q qq q qqqqqqq q qq qqq qq q qqq q q qq q qq q qq q qqqqq q q qqq qq q q q qq q q qq q q q q qqq qqq qq qq q q qq q q q q q q qq qq qq q qq qq q qq q q qq qqqqq qqq qqqq qq q qqq q q qq q qqq q q q q qqqq qqq qqq q q q q q q q qqq qq q q q q qq q qq −5 q q q qqq q qq q qq q qq qq qq q qq q qq qq q q q qqqq q qq q q q q q qq qqqqq q q q q q q q q q qqqqq qqq qq q qq q qq qq q qq qq qqq q qq q q qqqqq qqq q q qq q q q qqqq q q qq qq q qq q q qq q qq qqq q qq qqq q qq qqq q qqq qq qq q q q q qqqq q q qq qq q q q q q 0.2 qq q qq q qq q q q q qqqq qqq q q qq q qq qqq q qq qq q q qq q qq qq qq q q q qq qq q qq qqq q qq qq q qq qq q qq qqq qqq qq qq q qq q q qq qq q q qq qq q qq q qq q qq q q q q q q q q q q qq qq q qq qq q qq q q q q q qqqqqq q q qq qq q q qq q q qq q qq q q qqqq q q q q q q qqqqqqq qqqqq q qqq q qqqq q q qq qq q qq q qq q qq q q qq q qq q q qq qq q q q q q q qq q qqq q qq q q qq qq q q q q q qq q q qq q qq qq q q qqqqq qq q q qq qqq qqqqqq q qq q q qq qq q q qqq q qqq q q qqq q q qq qqq qq q qq qq q q qq q qqq q qq q q q q q qq q qq q qqq qq q q q qq qq q qq q qq qq q q qqq q q qqq qq q qq q q qq q qq q q q q q q q qqq qq q q q q q qq q q qq q q q qq qqq q q qq q q q qq qqq qqqqqqqq q q qq qq q q qq q qq qqq q qqqqqqqqq q qq qqq q q qqq q q q q q q q q qq q qqq q qq q q q q q qq q qq q q q q qq qqq qq qqq q q q qq q q q q q q q q qq q qq qq q q q qq qq q q qq q q q q qq q q q q qq q qq q qq q qq qq qq qq qq q qq qq qq q q q q q q qq qq q qq q q qq q qq qq q q q qq q qq q q q q q qq q q qq qq qqq q qq q q −6 q q q qq q q qq qq qq q qq q qqqq q qqq qqqq q qqq qqq qq q q qq qqq qq qq q qqq q qqq q q qqqq qq qqq q q q q qq q qqq qqq q qq q qq q qq q q qqqq qq qqq qq qqq q qqqq qqq q qqq qq q qq q qq qq q q q q q qq qqq qqq qqq qq q q qq q qq q qqq qqq qq qq q qqq q qq q q q qq qqqq q qq q qq q 0.0 qq q qq qqq qq q q 7 8 9 10 11 7 8 9 10 11 GDP per capita log scale GDP per capita log scale Figure 2: Observed degradation intensities and degradation levels across income. 4 Empirical Results The results show that the environmental output intensities are well explained by the data and that evidence for nonlinear dependencies is pervasive throughout all three models. Tables 2 to 4 show the marginal effects for individual models, summarized using the mean, quantiles and medians together with t-statistics.13 For brevity, we omit the control variables in the tables.14 All models have been checked for time fixed effects, but in all three cases the out-of-sample performance was optimal in the models without fixed components. The 13 We obtained our results using the R implementation of KRLS. Our out-of-sample shrinkage strategy is not implemented by default, and requires many model fits. We found that an optimized BLAS/LAPACK implementation provided better speed than the C + + implementation of bigKRLS. 14 Annual mean, min and max NDVI values, forest cover, and country size 17 appendix contains conditional expectations together with confidence bands for each of the economic variables, holding the effects of all other variables constant at their mean values which provide a guidance throughout our discussion of the results. Variables for which the marginal effects within the inner 50% of the percentiles range have an identical and significant sign are highlighted in the tables. This reveals that many of the variables contribute both positively as well as negatively to the output intensities depending on the data levels at which effects are evaluated. This shows that nonlinearities are important. We find that income has an unambiguous effect, all three environmental intensities improve with income but not sufficiently to offset scale growth.15 While increases in GDP provide a basis for the improvement of production efficiency, it appears not to lower the net environmental output. However, as GDP increases, a structural change occurs in which poverty goes down and the shares of manufacturing services, and urban population gradually increase. We will highlight several of these structural effects that are best visualized in fig. 8 and fig. 9. Figure 9 shows how poverty, the production composition, urban population shares, and the income distribution trend across GDP. 4.1 Individual model results The results for deforestation show that early increases in population density correlate with a decrease in deforestation intensity while high population densities correlate with an in- crease. The trend across urban populations is a weak inverted-U . The effects of man- ufacturing and services are less ambiguous, the move out of an agricultural society and specifically an increasing share in services that occurs with increasing GDP, is a strong correlate of declining deforestation rates. There is some evidence that economies with an unequal income distribution retain a higher deforestation intensity of production. The effect of the poverty variables is however mixed. Reducing the undernourishment rate ini- tially seems to increase deforestation, while the transition out of extreme poverty correlates with a decrease in deforestation intensity. In contrast with the deforestation results, we see that an increase in population density unambiguously drives pollution up. The pollution intensity trend across the urbanization rate is initially flat, but after 50% of the population has urbanized, the trend becomes 15 The log-log specification allows for a simple interpretation. To offset the scaling effects, the marginal effect of log GDP per capita needs to be smaller than -1, which we do not observe within the 25%-75% range of effects. 18 Table 2: Deforestation intensity results using the penalized kernel regression. (means) (25%) (50%) (75%) Dependent: Log deforestation intensity of 1000 GDP p.c. ∗∗ Log 1000 GDP per capita -0.453*** -0.610*** -0.476*** -0.289*** (-16.962) (-22.856) (-17.83) (-10.822) Population density -0.001*** -0.003*** -0.002*** 0.001*** (-5.351) (-18.303) (-9.578) (8.449) Undernourishment rate 0.001 -0.008*** 0.002 0.011*** (0.371) (-4.48) (0.919) (6.171) Poverty 1.90$ rate -0.005*** -0.013*** -0.004*** 0.005*** (-4.118) (-10.353) (-2.878) (4.297) Manufacturing GDP share -0.015*** -0.055*** -0.016*** 0.023*** (-5.459) (-19.402) (-5.551) (8.04) Services GDP share∗∗ -0.032*** -0.051*** -0.036*** -0.016*** (-16.57) (-26.109) (-18.436) (-7.961) Urban population share 0.006*** -0.008*** 0.009*** 0.019*** (5.597) (-6.929) (7.767) (17.111) Bottom 40% income share∗ -0.027*** -0.06*** -0.033*** 0.003 (-5.416) (-12.304) (-6.671) (0.55) ∗ N = 1520 R2 = 0.922 λ=0.691. p<.1; ∗∗ p<.05; ∗∗∗ p<.01 Constant omitted, t-statistics in parenthesis. Optimal model contained no fixed effects. Model controls for mean, min and max NDVI, forest cover, and country size. ∗ Inner 50% of significant marginal effects same sign, but range includes zero. ∗∗ Inner 50% of marginal significantly excludes zero. Table 3: Pollution intensity results using the penalized kernel regression. (means) (25%) (50%) (75%) Dependent: Log pollution intensity of 1000 GDP p.c. ∗∗ Log 1000 GDP per capita -0.691*** -0.842*** -0.692*** -0.567*** (-47.899) (-58.388) (-48.026) (-39.307) Population density∗∗ 0.002*** 0.001*** 0.002*** 0.003*** (18.523) (6.658) (17.063) (30.846) Undernourishment rate -0.002** -0.008*** -0.003*** 0.003*** (-2.172) (-8.675) (-3.256) (3.267) Poverty 1.90$ rate∗ 0.003*** 0.000 0.003 0.008*** (4.493) (0.371) (4.842) (11.603) Manufacturing GDP share -0.009*** -0.023*** -0.01*** 0.004*** (-6.436) (-15.78) (-6.756) (2.492) Services GDP share∗ -0.007*** -0.012*** -0.007*** -0.001 (-7.00) (-11.68) (-6.753) (-1.286) Urban population share -0.006*** -0.015*** -0.008*** 0.001** (-10.746) (-24.929) (-14.101) (2.547) Bottom 40% income share -0.019*** -0.045*** -0.015*** 0.012*** (-7.776) (-18.221) (-6.086) (4.682) ∗ N = 1520 R2 = 0.978 λ=0.691. p<.1; ∗∗ p<.05; ∗∗∗ p<.01 Constant omitted, t-statistics in parenthesis. Optimal model contained no fixed effects. Model controls for mean, min and max NDVI, forest cover, and country size. ∗ Inner 50% of significant marginal effects same sign, but range includes zero. ∗∗ Inner 50% of marginal significantly excludes zero. 19 Table 4: Carbon intensity results using the penalized kernel regression. (means) (25%) (50%) (75%) Dependent: Log carbon intensity of 1000 GDP p.c. ∗∗ Log 1000 GDP per capita -0.630*** -0.755*** -0.635*** -0.519*** (-42.341) (-50.71) (-42.699) (-34.903) Population density -0.000*** -0.001*** -0.001*** 0.000*** (-4.812) (-11.919) (-5.656) (3.356) Undernourishment rate 0.006*** -0.001 0.008*** 0.014*** (5.927) (-0.523) (8.248) (14.741) Poverty 1.90$ rate 0.002** -0.002*** 0.001** 0.008*** (2.475) (-3.295) (2.092) (11.496) Manufacturing GDP share∗ 0.017*** 0.001 0.019*** 0.036*** (10.815) (0.366) (12.344) (23.015) Services GDP share 0.001 -0.007*** 0.002 0.01*** (1.198) (-6.708) (1.667) (9.01) Urban population share 0.002*** -0.005*** 0.002*** 0.008*** (2.826) (-7.41) (2.995) (12.768) Bottom 40% income share 0.006** -0.018*** 0.002 0.026*** (2.269) (-6.492) (0.825) (9.297) ∗ N = 1520 R2 = 0.956 λ=0.635. p<.1; ∗∗ p<.05; ∗∗∗ p<.01 Constant omitted, t-statistics in parenthesis. Optimal model contained no fixed effects. Model controls for mean, min and max NDVI, forest cover, and country size. ∗ Inner 50% of significant marginal effects same sign, but range includes zero. ∗∗ Inner 50% of marginal significantly excludes zero. negative. This indicates that early urbanization is polluting, but that after reaching a tipping point, the city environment becomes cleaner. The trends across manufacturing and services are also primarily downwards. Agricultural societies have a higher pollution intensity of income, while a shift into manufacturing and services reduces the environmen- tal output per unit of production. It remains difficult to say whether the effects reduce pollution on a net basis as this structural transformation occurs jointly with an increase in total productivity. However, for an identical amount of total GDP produced, the data seems to suggest that an agricultural economy produces the highest amount of pollution. An economy with a high manufacturing share produces less pollution, while an entirely service orientated economy outputs the lowest amount of pollution. This may also relate to a differential in value produced by these sectors which may imply different quality of production processes and differential in the total amount of economic activity for a fixed level of GDP. Across poverty and undernourishment we see hyperbolic effects that suggest that the eradication of extreme hunger occurs jointly with an increase in pollution intensity while later poverty eradication eventually occurs jointly with a reduction in pollution inten- sity. Poverty rates are unambiguously correlated with higher pollution intensities. Again, 20 similar to the deforestation results, it seems that societies with high income inequality are also more polluting. Carbon intensities trend also with urbanization. We find that the carbon intensities initially increase together with the urbanization process, however after the 50% urban pop- ulation tipping point, the environment becomes more efficient in carbon consumption. The shift in production composition trends oppositely with those of deforestation and manu- facturing. High manufacturing and high services share in the production composition both correlate with higher carbon emission intensities. The initial decline in undernourishment rates occur together with improvements in the carbon emission intensities, poverty reduc- tion however trends with an increase. Finally, we see that equality − a stronger bottom 40% - increases carbon output when everything else is held constant, which is again an opposite trend of what we observed for deforestation and pollution. 4.2 Further results on heterogeneity Combined, the results show that income and poverty reduction provide a basis for im- provements in the efficiency of economies in their use of finite resources. The economic composition is not unambiguous in its effects. To understand how structural transfor- mation, together with urbanization, poverty reduction and increases in total production, interplay to produce a commonality in environmental output trends, we track the model predictions keeping the control variables at their means. We also keep the income distribu- tion fixed at a mean value as it does not trend clearly with GDP as seen in fig. 9, and keep population densities fixed at means. Figure 3 shows the prediction surfaces using poverty and income as Cartesian coordinates. The model predictions fit the output levels well after scaling the log intensities, see fig. 10. While this shows that all countries gradually grow out of poverty and improve their efficiencies following a common pattern, it also reveals that there is significant heterogeneity in the environmental output intensities that relates to differences in poverty and hunger rates, urban population shares and GDP composition. This highlights that the shape of the EKC strongly depends on the development path of a country across all its dimensions. Furthermore, while the progression in output inten- sities follows a similar path, slight deviations from the local average may result in large differences in total environmental output. This reveals that while different development paths may relate to relatively small differences in the environmental output intensities, it may produce rather large differences in actual forest loss, air quality and carbon emissions 21 depending on the scale of the economy. Log annual tree %loss per $1000 GDP per capita Annual tree cover %loss model fit with fixed control variables model fit with fixed control variables 85 85 q q q q q q q q q q q q [−5.09 − −4.63] q q q q q q [0.07 − 0.14] 76 76 qq q q q qq q q q q q qq q qq q q [−4.62 − −4.17] q q qq q qq q q [0.15 − 0.22] q q q q q q q q q q q q q q qqq q qqq q qq q q q q q q q q [−4.15 − −3.69] q q qqq q qqq q qq q q q q q q q q [0.23 − 0.30] qq q qq q 66 66 qqq qq q qq q qqq qq q qq q q qq q q qq q qq qq q q [−3.69 − −3.23] q qq q q qq q qq qq q q [0.30 − 0.38] q q q qq q q q q q q q q qq q q q q q qq qq qq qq q q qq q q q q qq q q qq q q q qq q q q q q [−3.22 − −2.76] qq q q q qq q q q q q [0.38 − 0.46] qq qqq q qq qqqq qqq q q q qq qqq q qq qqqq qqq q q q q q q q q q q q q q q q qq q qq qq q qq qqqq qq qq q qq qq q qq qqqq qq 57 57 q q q q q q q q q q qq qq q [−2.75 − −2.30] q q q q q q q q qq qq q [0.47 − 0.54] qq qq qq q q qqq qq qq qq q q qqq qq q qq qqq qq qq q qq qqq qq q qq q q q qq q [−2.29 − −1.83] q qq q q q qq q [0.55 − 0.62] Poverty Rate Poverty Rate qq q q qqq qq qq q q qqq qq qq q qqq q q qqq q qq qq q qqq q q qqq q qq qq q qq qq q qq qq qq q q q [−1.82 − −1.36] qq qq q q q [0.63 − 0.70] 47 q q q 47 q q qq q qq q qq qq q q q q q qq q qq q q qq qq q q q q q q qqq q qq q q q qq q q q q q qqq q qq q q q qq q q q qq qq qq qqq qqqq qq qqqqq q q q q qq qq qq qqq qqqq qq qqqqq q q q qq q qq q [−1.35 − −0.89] qq q qq q [0.71 − 0.78] qq qq q qqq qq q q qqq qqqq qq qq q qqq qq q q qqq qqqq q q qqqq q q qqq q q qq q qqq q q q qqqq q q qqq q q qq q qqq q q q q q qq q qq q q q q [−0.89 − −0.42] qq q qq q q q q [0.79 − 0.86] 38 38 qq q q q q q qq q qq q qq q q q q q qq q qq q q qq qq q qq q q qq q q qqq q qq qq q qq q q qq q q qqq qq qqq qq qqq q qq qqq qq qqq q q qq q q q q qq q q q q qq q qqq q q qq qq qq q qq q qqq q qq qq qq qq q q q q q q q q qq q q qq q q q q q q q q qq q qq q qq qqqqqq qqq q qq qq q qq qqqqqq qqq q qq q q q qqq qq qq qqq q q qq q q q qqq qq qq qqq q q qq 28 28 q qqq qq qq qq q qq q q qqq qq qq qq q qq q q qq q q qq q q q q q q qq q q qq qq q q q q q q qq qq q qqq q q q q q q q qq qq q q q q q q q qq qq q qqq q q q q q q q q q q qqqq q q q q q q qqqq q q q q q q q q qqqqq qq q qq qq q q qq q q q q q q q qqqqq qq q qq qq q q qq q q qq qqqq qqq qqq qq q q qq qq q q q qq qq qqqq qqq qqq qq q q qq qq q q q qq q q qq qq q qq q q q qq q qq qqq q q q qq qq q qq q q q qqq qq qqq q q q q q q qq qq q q q q q q qq qq q qq q q q qq q qq q q q qq q 19 qq q q 19 q qq q q q qq qq q q q q qq q q q q q qq q qqqq qq q q q q q qq q qq qq qq q q q q q qq q qqqq qq q q q q q qq q qq qq q q qq q q q qq q q q qq q q q qq q q q q q qq q q q q q q qq q q q q q q qq q q q q q q qq q q q q qqq qq qq qq qqqq qqq q q q qqq qq qq qq qqqq qqq q q qq q qq q qq qq q q qq q qq q qq qq q q q q qq q qqq qq qq q q q q qq q qqq qq qq q qq qqq q q q q q qq q qq q q q q qq qqq q q q q q qq q qq q q q q q qqqq qq q q qqqqq q qq qqqqqq qq q q q qqqq qq q q qqqqq qq qqqqqq qq q q q q q q qq qqq qq q q q q qqq qq q q q qq q q q q q qq qqq qq q q q q qqq qq q q q qq q q q 9 9 qq qqq q q qq q qq q q qq qq q q qq qqq q q qq q qq q q qq qq q q qqq q q q qq qqq qq q qq qqq qqq q q qq q q qq qqqq q qq q qqq qqq q qq qq q qqq q q q qq qqq qq q qq qqq qqq q q qq q q qq qqqq q qq q qqq qqq q qq qq q qqq q q qqq q qq qq qq q qqq q q qqq q qq qq qq q q q q qq qqq q q q q qq qqqqq q qqqqq q q q q q qqq q qq q q q qqq q q q q q q q q qq qqq q q q q qq qqqqq q qqqqq q q q q q qqq q qq q q q qqq q q q q q q qqqqq qqq qq qqq q qqq q q q q q q qqqqq q q qq qqqq q qqq q q q q q q q qqq q qq q q q q qq qq qq q q qq q q qqq qqqq q q q q q q qq qq q q q q qq qq q q q q q qq q qqqqqq q qq q q qqq q qq q q q q qq qq qq q q qq q q qqq qqqq q q q q q q qq qq q q q q qq qq q q q q q qq q qqqqqq q qq q q q qq q qqq qq q q qq qq qq q qqq q q qqq qq qq qq qqq q qqqqqqqq q q q qq q qqq qq q q qq qq qq q qqq q q qqq qq qq qq qqq q qqqqqqqq qq q qq qq qqq q qqq qq qq qq qq qq qqq q qqq qq qqq qqqq q q qq q qq qq qqq q qqq qq qq qq qq qq qqq q qqq qq qqq qqqq q q 0 0 q qq qq q q q qqq qq qqqqq q qq q qqq q qq q qq qq q qqq qq qq q q qq q qq q q q q q q q q qq qq qq q qq q q qq q q qq q q qq qq q qq q qqq q qq qqq q qq q q q qq qq q q q qqq qq qqqqq q q qq q qq q qq q qqq q qq qq q qq q q q qq q qq q qq q q qq qq q q q qq q qq q q qq q qq q q qqq qq q qq q qqq q qq qqq q qq q q 500 1250 2250 3500 6000 7500 12500 17500 40000 66000 500 1250 2250 3500 6000 7500 12500 17500 40000 66000 GDP Per Capita log scale GDP Per Capita log scale Log PM25 mcg/m3 per $1000 GDP per capita PM25 mcg/m3 model fit with fixed control variables model fit with fixed control variables 85 85 q q q q q q q q q q q q [−1.32 − −0.85] q q q q q q [6.27 − 10.5] 76 76 qq q q q qq q q q q q qq q qq q q [−0.84 − −0.38] q q qq q qq q q [10.5 − 14.8] q q q q q q q q q q q q q q q q qqq q qq q qq q q q q q q q [−0.37 − 0.08] q q q qqq q qqq q qq q q q q q q q [14.8 − 19.0] q q 66 66 qq q qq qq q q qq q q qq q q qq qq qq q q qq q q qq q q q q q q q q q q qq qq q qq q q q q q q q [0.10 − 0.56] q q q q q q q q q qq qq q qq q q q q q q q [19.1 − 23.3] q q qq q q q q qq q q qq q q q qq q q q q q [0.56 − 1.02] qq q q q qq q q q q q [23.4 − 27.6] qqqq qq q qq qq q qqq q qqqq qq q qq qq q qqq q qq qqq qq q q q q qq qqqqq q q q q qq qq qqq qq q q q q qq qqqqq q q q q qq 57 57 q q q q q q q q q q qqqq q [1.04 − 1.51] q q q q q q q q qqqq q [27.6 − 31.9] q qq q qq q q qqq q qq q qq q q qqq qq q qq qqq qq qq q qq qqq qq q qq q q q qq q [1.51 − 1.97] q qq q q q qq q [32.0 − 36.2] Poverty Rate Poverty Rate qq q q qq q q qq qq q q qq q q qq qq qq qq qqq q q qqq qq q q qq qq qq qqq q q qqq qq q q qqq qq q q q [1.98 − 2.45] qqq qq q q q [36.3 − 40.1] 47 q 47 q q qq q qq q qq qq q q q q q q qq q qq q q qq qq q q q q q q q qqq q qq q q qq q q q q q qqq q qq q q qq q q q qq qq qq qqq qqqqq qqqqq q q q q qq qq qq qqq qqqqq qq q qq q q q qq q qq q [2.45 − 2.91] qq q qq q [41.0 − 44.6] qq qq q qqq q qqq qq qqqq qq qq q qqq q qqq qq qqqq q q qqqq q q qqq q q qq q qqq q q q qqqq q q qqq q q qq q qqq q q q q q q q qq q qq q qq q [2.94 − 3.40] qq q qq q qq q [44.9 − 49.0] 38 38 q q q q q qq q q q q q q q qq q q qq q qq q q q qq q qq qq q qq q q q qq q qq q q qq q q qq q qq q q qq q q qq q qq qq qq q qqqq qq qq q qqqq q q qq q q qq q qq q q q q q q qq q q qq q qq q q q q q q q qq q q qq qq q q q q q q q q qq q q qq q q q q q q q q qq q qq q qq qqqqqq q qq q qq qq q qq qqqqqq q qq q qq q q q qqq qq q q q qqq q qq q q q qqq qq q q q qqq q qq 28 28 q qqq qqq q qqq q qq q qqq qqq q qqq q qq q qq q q qq q q q q q q qq q q qq qq q q q q q q qq qq q qqq q q q q q q q qq qq q q q q q q q qq qq q qqq q q q q q q q q q q qqqq q q q q q q qqqq q q q q q q q q qqqqq qq q qq qq q q qq q q q q q q q qqqqq qq q qq qq q q qq q q qq qqqq qqq qqq qq q q qq qq qq q qq qq qqqq qqq qqq qq q q qq qq qq q qq q q qq qq q qq q q q qq q qq qqq q q q qq qq q qq q q q qqq qq qqq q q q qq q qq qq q q q qq q qq qq q qq q q q qq q qq q q q qq q 19 qq q q 19 q qq q q q qq qq q q q q qq q q q q q q qq qqq qq q q q q q q qq qq qq qq q q q q q q qq qqq qq q q q q q q qq qq qq q qq q q q qq q qq q q qq q q q qq q qq q q q q q q q qq q qq qq qqq q q q q q q q q q q qq q qq qq qqq q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q qq q qqqq qq q qqq qq q q q q qq q qqqq qq q qqq qq q qqqq q qq q qq q qq qq q qqqq q qq q qq q qq qq q qq q qqq q q q qq q qqq q q q q qq qqq q q q q qqqqq qq qq qqq q qq q q qq q qq q qq qqq q q q q qqqqqq qq qqq q qq q q qq q qq q q qq q q q q qq qq q qq qq q q q q q qq q q q q qq qq q qq qq q q q q q q q q q 9 9 qq q qq qqq qq q qq qq qqqqq q qq qq qq q qqq qq q qq qqq qq q qq qq qqqqq q qq qq q q qqq q qqqq q qq qqq q q q q q qqq qqqq q qq qq qq qq q qqqq q qq qqq q q q q q qqq qqqq q qq qq q qq qq qq qqq q qq q qq qq qq qqq q qq q qq qq qqq q q q q q qq qq q q qq qq qqq qq qq qq qqq q qqq q q q q q qq qq q q qq qq qqq qq qq qq qq qq q q q qq q q qq qqqq q q qq q q q qqq q q qq q qq q qq q q q qq qqqqq q q q q q qq q q qq qqqq q q qq q q q qqq q q qq q qq q qq q q q qq qqqqq q q q q qq q q q qq q q q q qqq q qq q q q q qq qq qq q q qq qqqq qq q qq q qq qq q qqq q q qq qq q q q q qq q q qqqq q qq q q qqq q qq q q q q qq qq q q q q q qq qqqq qq q qq q qq qq q qqq q q qq qq q q q q qq q q qqq qq q q q q q qqq q q q q qqq q qq q q q q qq qqq q q q q qq q qqqqqq q q q q qq qq qqq q qqq q qqqqqq q q q qqq q q q q qqq q qq q q q q qq qqq q q q q qq q qqqqqq q q q q qq qq qq qq qqq q qq qqqq q q qq q qq q q q qq qqq q q qqq q qqq qq qqq qqqq q q qq q qq q q q qq qqq q q qqq qqq qq q q qq qqq q 0 0 q qq qq q q q qqq qq qqqqq q qq q qqq q qq q qq qq q qqq qq qq q q qq q qq q q q q q q q q qq qq qq q qq q q qq q q qq q q qq qq q qq q qqq q qq qqq q qq q q q qq qq q q q qqq qq qqqqq q q qq q qq q qq q qqq q qq qq q qq q q q qq q qq q qq q q qq qq q q q qq q qq q q qq q qq q q qqq qq q qq q qqq q qq qqq q qq q q 500 1250 2250 3500 6000 7500 12500 17500 40000 66000 500 1250 2250 3500 6000 7500 12500 17500 40000 66000 GDP Per Capita log scale GDP Per Capita log scale Log CO2 kg per ppp $ of GDP per $1000 GDP per capita CO2 kg per ppp $ of GDP model fit with fixed control variables model fit with fixed control variables 85 85 q q q q q q q q q q q q [−5.52 − −5.12] q q q q q q [0.07 − 0.10] 76 76 qq q q q qq q q q q q qq q qq q q [−5.12 − −4.73] q q qq q qq q q [0.10 − 0.13] q q q q q q q q q q q q q q q qqq q qqq q q q qq qqq q q [−4.73 − −4.33] q q qqq q q q qq q q q qq qqq q q [0.13 − 0.16] q q 66 66 qq q qq qq q q q qq q qq q q qq qq qq q q q qq q qq q q q q q q q q q q qq qqq q q q q q q q q q [−4.33 − −3.94] q q q q q q q q q qq qqq q q q q q q q q q [0.16 − 0.19] q q qq q q q q qq q q qqq q q qq qq q q [−3.92 − −3.54] qqq qq qq qq q q [0.19 − 0.22] qqq qq q qq q q qq q qqq q qqq q qqq qq q qq q q qq q qqq q qqq q qq qqq q qq qq q q qqqqq qqq qq qqq q qq qq q q qqqqq qqq 57 57 q q q q qq q q q q qq qq q qq qqq q [−3.54 − −3.15] q q qq q q q q qq qq q qq qqq q [0.23 − 0.26] q q q q q q q q q q q q qq qq qq q qq q qq q qq q q [−3.15 − −2.75] qq qq qq q qq q qq q qq q q [0.26 − 0.29] Poverty Rate Poverty Rate qq qq qq q q qqq qq qq qq q q qqq qq qq qq qqq q qqqq q q q q qq qq qq qqq q qqqq q q q q qqq q qq q q [−2.75 − −2.36] qqq q qq q q [0.29 − 0.32] 47 q 47 q q qq qq qq qq q q q q q q q qq qq qq q qq q q q q q q q qqq q q qq q q q qq q q qqq q q qq q q q qq q qq q qq q qq q qq qqq q qq qq qq q qq qq q q qq q qq q qq qqq q qq qq qq q qq qq q q qq qq q q qq q qqq q q q qqq qqqqq q [−2.36 − −1.97] qq q q qq q qqq q q q qqq qqqqq q [0.32 − 0.35] qq qq qq q q qqq q q qq q qqq q qq qq qq q q qqq q q qq q qqq q q q q q q q qq q q q q q q q [−1.96 − −1.57] qq q q q q q q q [0.35 − 0.38] 38 38 q q q q q qq q q q q q q q qq q q qq q qq q qqq q q q q qq q qq q qqq q q q q q q qq q q qq q qq q q qq q q qq q qq qq qq q qqqq qq qq q qqqq q q qq q qqq q qqqq q q q q qq q qqq q qq q q q q q q q qq q q qq qq q q q q q q q q qq q q qq q q q q q q q q qq q qq q qq qqqqqq qqq q qq qq q qq qqqqqq qqq q qq q q q qqq qq q q qqq q q qq q q q q qqq qq q q qqq q q qq q 28 28 qqq q qqq q qqq qq qqq q qqq q qqq qq q qq q q qq q q q q q q qq q q qq qq q q q q q q qqqq q qqq q q q q q q q qq qq q q q q q q q qqqq q qqq q q q q q q q q qq qqqq q q q q qq qqqq q q q q q q q qqqqqq qq q q qq q q q qq q q q q q q qqqqqq qq q q qq q q q qq q q qq q q qqq qq qqq qq q q qq qq qq q qq qq q q qqq qq qqq qq q q qq qq qq q qq q q qq qq qqq q q q qq q qq qqq q q q qq qq qqq q q q qqq qq qqq q q q q q q qq qq q q q q q q qq qq q qq q q q qq q qq q q q qq q 19 qqq qq q 19 qq q q qq q qq qq q q q qq q qqq qq q qq qq q q q q q qq q qqqq q q q qq q q qq q qq qq q q q q qq q qqqq q q q qq q q qq q qq q q q q qqq q q q qq q qq q q q q q q qqq q q q qq q qq q q q q q qq q qq q q q q qq q q q q q qq q qq q q q q qq q q q q q qq qqq q qqq qq q q q q qq qqq q qqq qq q q q qq qqqq q q qq q q q qq qqqq q q qq q qqq q q q q q q qq qqq q q q q q q qq qq q q qq q qqq qqq q q q q qq q qqqq q q qq qqq q qq q q qq q q qq q q qq q qqq qqq q q q q qq q qqqq q qq qqq q qq q q qq q q q q q qq q q q qq qq q q q q q q q q q qq q q q qq qq q q q q q q qq q q q q qq q q qq q q q q qq q q 9 9 qq qqq q qq q q qqq qq q qq q qq qqq q qq q q qqq qq q qq q qqq q q q qq qqq qq q qq qqq qqq q q qq q qqq qqqq q qq q qqq qq qq q q qqq qqq q q q qq qqq qq q qq qqq qqq q q qq q qqq qqqq q qq q qqq qq qq q q qqq qqq q q qqqq qq qq qq q qqq q q qqqq qq qq qq q q q q qq qqq q q q q qq qqqqq q qqqqq q q q q q qq q q qq q q q q qqq q qqq q q q qq qqq q q q q qq qqqqq q qqqqq q q q q q qq q q qq q q q q qqq q qqq q qqqqq qq qq qqq q qqq qqq q q q q qqqqq qq qq qqq q qqq qqq q q q q q qqq q qq q q q q qq qq q q q q q qq q q qqq qqq q q qq q qq qq q q qqq q q q qq q q q q q q q qq q q q qq qq q q q q qqq q qq q q q q qq qq q q q q q qq q q qqq qqq q q qq q qq qq q q qqq q q q qq q q q q q q q qq q q q qq qq q q q q q qq q qqq q qqq q qq qq qq qqq q qq qq q qqq qq qq qq qqqq qqqq q q q qq q qqq q qqq q qq qq qq qqq q qq qq q qqq qq qq qq qqqq qqqq qq q qq q qq qqq q qqqq q qq q qqq qqq qqq qq qqq qq qqqq q qq q qq q qq q q qq q q qq q qq q qqq qqq qqq qq qqq qq qqqq q 0 0 q qq qq q q q qqq qq qqqqq q qq q qqq q qq q qq qq q qqq qq qq q q qq q qq q q q q q q q q qq qq qq q qq q q qq q q qq q q qq qq q qq q qqq q qq qqq q qq q q q qq qq q q q qqq qq qqqqq q q qq q qq q qq q qqq q qq qq q qq q q q qq q qq q qq q q qq qq q q q qq q qq q q qq q qq q q qqq qq q qq q qqq q qq qqq q qq q q 500 1250 2250 3500 6000 7500 12500 17500 40000 66000 500 1250 2250 3500 6000 7500 12500 17500 40000 66000 GDP Per Capita log scale GDP Per Capita log scale Figure 3: Model fits of degradation intensities of log GDP (left) and the rescaled environmental output levels (right) across poverty and income. Population densities and income equality as well as the control variables are held constant at the mean. 22 An important takeaway is that heterogeneity in the actual output levels (right), is primarily large around the income levels where output is also highest (around $4,000 for deforestation, $6,000 for pollution, and $8,000 for the carbon weight of a single dollar production value). This indicates that the theorized Environmental Kuznets tipping points are also the points at which an averaged result, such as obtained from a linear regression, provides the poorest indication of relationships at the individual country level. While a few general rules could be extracted from the marginal effects, such as the inequality, income and population density effects, the larger part of the environmental data seems to relate heterogeneously to economic variables. 4.3 Further results on average curvature Estimated standardized environmental degradation levels across income 100 Forest loss/ha/GDP p.c. Normalized log intensity levels 80 Pollution mcg/m3/GDP p.c. Carbon kg/$/GDP p.c. 60 40 20 0 1100 1800 3000 4900 8100 13400 22000 Logarithmic GDP per Capita Estimated standardized environmental degradation levels across income 100 Forest loss/ha Normalized degradation levels 80 Pollution mcg/m3 Carbon kg/$ Carbon kg p.c. 60 40 20 0 1100 1800 3000 4900 8100 13400 22000 Logarithmic GDP per Capita Figure 4: Normalized predicted environmental output levels across income. Predictors are held at ex- pectations conditional on GDP. The R2 ’s from logarithmic GDP per capita to poverty, undernourishment, manufacturing, services and urban population shares are respectively 0.801, 0.633, 0.142, 0.573 and 0.739. The conditional expectations are plotted in fig. 9. Population density, income equality, and controls are held at their means. 23 The heterogeneity in amplitude, and location of tipping points, conditional on the economic variables, implies that a single Kuznets Curve, such as it has often been treated in the lit- erature, is a description that applies only poorly to individual country cases. However, to do some justice to the classical concept we can still construct an average development path and explore how the models fit environmental outputs to that. To do so, we de- rive conditional expectations for poverty, undernourishment, GDP composition, and urban population shares, using only the logarithmic GDP per capita as an explanatory variable. We then use these conditional values to build a data set that includes all variables as lo- cal averages along with GDP itself. Again, we keep the control variables and the income distribution as well as population densities fixed. We normalize the results to compare the slopes and location tipping points across income. Figure 4 shows the curvatures associated with these development paths. We have dropped the lower 2.5% of GDP observations, and the upper 20%. We focus on this range because of its particular relevance for development policy. We note that the maximum total output associated with the average tipping point is an interesting statistic, but due to heterogeneity this may be a poor approximate to predict whether a country is close to its potential tipping point after observing only envi- ronmental output. The deforestation rate associated with the average development path attains a maximum of .66% annually, while that highest pollution concentration maxes at 28.7 mcg/m3 and the carbon weight of a dollar reaches 0.271 kg. 4.4 Further results on heterogeneity in curvature and tipping points The average pathways accurately describe the transition out of poverty but it provides less insight into the effects if compositional changes. To better understand the impor- tance of deviations in transitional variables, we plot the degradation levels associated with the average development path with additional differences in manufacturing shares, urban population shares and poverty rates. Figure 5 shows that changing these variables, while keeping everything else at the local averages, has important impacts on the location, shape, and height of tipping points. For example, increasing the share of manufacturing by 10 points, shifts the tipping point of deforestation to the left, while economies that retain high agricultural shares reach a tipping point at higher income. This implies that an earlier transition out of agriculture may prevent high deforestation rates at higher income and lower pollution levels at its peak. This is a slightly counter intuitive result as manufactur- ing has traditionally been portrayed as the main source of pollution. However, since our 24 data only indicate the share of manufacturing in total GDP and not the quality or quan- tity of goods produced, higher rates may also correspond to differences in the number of manufacturing sites and the methods of production used. The carbon emissions associated with this structural change are higher, suggesting a trade-off between pollution-heavy and carbon-intense production. In a similar fashion, poor countries that have a high urbaniza- tion rate have higher deforestation rates and reach a pollution peak faster. Poor countries that have lower urbanization, on the other hand, eventually maintain higher pollution and carbon emissions levels at higher income. This suggests that the draw-down in pollution output is not just a matter of income and productivity, it may relate to attaining critical urban population mass combined with increased income. The effects of poverty, finally, do not impact the location and shape of the environmental output levels. Countries with high poverty rates unambiguously deforest and pollute more, but emit less carbon. Estimated tree %loss across GDP for Estimated PM25 across GDP for different Estimated CO2 across GDP for different different manufacturing shares manufacturing shares manufacturing shares +10 point % +10 point % +10 point % 0.6 local average local average local average 0.25 30 −10 point % −10 point % −10 point % CO2 kg per ppp $ 0.5 25 0.20 %loss PM25 0.4 20 0.15 0.3 15 0.10 0.2 10 1100 3000 8100 22000 1100 3000 8100 22000 1100 3000 8100 22000 GDP per capita log scale GDP per capita log scale GDP per capita log scale R2 of underlying model = 0.922 R2 of underlying model = 0.98 R2 of underlying model = 0.956 Estimated tree %loss across GDP for Estimated PM25 across GDP for Estimated CO2 across GDP for different urbanisation rates different urbanisation rates different urbanisation rates 30 0.7 +10 point % +10 point % +10 point % local average local average local average 0.25 −10 point % −10 point % −10 point % 0.6 25 CO2 kg per ppp $ 0.20 0.5 %loss PM25 20 0.4 0.15 15 0.3 0.10 10 0.2 1100 3000 8100 22000 1100 3000 8100 22000 1100 3000 8100 22000 GDP per capita log scale GDP per capita log scale GDP per capita log scale R2 of underlying model = 0.922 R2 of underlying model = 0.98 R2 of underlying model = 0.956 Estimated tree %loss across GDP for Estimated PM25 across GDP for Estimated CO2 across GDP for different poverty rates different poverty rates different poverty rates 0.7 0.30 +10 point % +10 point % +10 point % 30 local average local average local average 0.6 −10 point % −10 point % −10 point % 0.25 CO2 kg per ppp $ 25 0.5 0.20 %loss PM25 20 0.4 0.15 0.3 15 0.10 0.2 10 1100 3000 8100 22000 1100 3000 8100 22000 1100 3000 8100 22000 GDP per capita log scale GDP per capita log scale GDP per capita log scale R2 of underlying model = 0.922 R2 of underlying model = 0.98 R2 of underlying model = 0.956 Figure 5: predicted environmental output levels across income. Predictors are held at expectations conditional on GDP and one variable has been incremented +/- 10 points in each plot. The local average trend is identical to those in fig. 4. 25 5 Projecting 2030 To explore whether continuation of current growth can be expected to lower environmental outputs without intervention, we extrapolate GDP into the future and calculate associated model responses under three simplistic scenarios of growth. We use the estimated models to back out the conditional expectations for the economic variables based on GDP, and use these datasets of hypothetical future economies to predict levels of environmental output at each point in time. In a base scenario, each sovereigns grows at individual median 1999- 2014 compound rates, with the highest growth rate capped at the 90% percentile (5.27% annually). In a pessimistic future, each country continues at one asymmetric deviation unit (-3.67%) below the base rate, and in an opportunistic growth scenario countries continue one asymmetric deviation unit (+0.89%) above the base rates.16 In the opportunistic scenario, rates are capped at the 95% percentile rate (5.87% annually). Finally, we construct Business As Usual (BAU) as the average of the three results to balance between possible asymmetry. Table 5 summarizes the growth rates. To limit the complexity, we keep population growth slightly below individual country median compound rates, resulting in 8.5 billion people by 2030 (in line with United Nations projections).17 We use extrapolated GDP levels to derive fits for poverty and undernourish- ment levels, and the GDP composition, using univariate model fits of the penalized kernel model. We let the urban population depend additionally on log population densities.18 At each point in extrapolated time, we compare the conditional expectations to the predic- tions of our base year (2014), and compute the percentage change that we then multiply with observed 2014 values. We keep all data points within observed intervals, including a cap on the sum of agriculture and services shares. This means that effectively after reach- ing a level of $64,980 per capita, our projection halts both the income effect on efficiency improvements and the effect of scale increase on the environmental output for a country. In the pessimistic scenario, this does not affect any individual country result, in the base 16 Asymmetric deviation units have been calculated as the difference between the median and respectively the 25% and 75% quantiles of growth rates. In the calculations we have dropped the two largest outliers (in absolute value) for each country. 17 We reduced the population growth rates by 0.05 times the absolute point percentages globally to reduce growth everywhere, then reduced population growth rates by an additional .15 times the percentage rates in the top 40% income countries and additional .25 times in top 20% income countries. This simple scenario is designed to represent relatively higher growth in lower income countries and a slow down in the developed countries in line with UN projections. 18 The R2 ’s of the models are, 0.632 for undernourishment, 0.785 for poverty, 0.142 for manufacturing, 0.573 for services, and 0.814 for urban population shares. The uncertainty of the impact of changes in manufacturing remains high in our projections. 26 this fixes Norway’s output at current levels and caps those of the U.S. and Switzerland in respectively the last 4 and 5 years of the projection (final 9 and 10 years in the high growth scenario.). Global population weighted GDP scenario Global population weighted poverty projection 25000 Historical Pessimistic 20 BAU GDP per capita Opportunistic 20000 Poverty rate 15 15000 Historical 10 Pessimistic BAU 10000 Opportunistic 5 2000 2005 2010 2015 2020 2025 2030 2000 2005 2010 2015 2020 2025 2030 Year Year Forest cover weighted % tree cover loss, Population weighted PM25 concentrations, projection for the world projection for the world 36 0.40 0.45 0.50 0.55 0.60 0.65 Historical Pessimistic Annual % Tree cover loss 34 BAU PM25 concentration Opportunistic 32 30 Historical Pessimistic 28 BAU Opportunistic 2000 2005 2010 2015 2020 2025 2030 2000 2005 2010 2015 2020 2025 2030 Year Year Total carbon emission output, projection for the world 80 Historical Pessimistic 70 BAU CO2 Billion Tons Opportunistic 60 50 40 30 20 2000 2005 2010 2015 2020 2025 2030 Year Figure 6: Projection for global environmental outcomes. Table 6 contains aggregate statistics of the BAU line and highlights the distributional changes across income. Figure 6 presents our projections at the global level made by aggregating all the country- level results and assuming that the average in-sample trend scales appropriately with miss- ing areas. Results are also available for income segments in table 6. In the average scenario, global extreme poverty falls below 7.4% of the global population. The poorest 20% coun- tries in our sample have stronger successes and go from 45% poverty to just over 33%. While poverty reductions and GDP increases may improve livelihoods through eco- 27 nomic gain, air pollution remains a serious threat to wellbeing as the average global citizen remains exposed to 36 mcg /m3 , nearly twice the WHO prescribed guidelines. Addition- ally, development comes at a cost of an annual carbon output that reaches 63GT which is nearly a doubling from the 2014 levels, and a total loss of 242 million hectares of forested land. About 58% of forest loss is in countries with poverty rates above 3% or income in the bottom 60%. Many of those countries have tropical rain forests with slow regrowth rates estimated at 27% (Hansen et al., 2013). Using those statistics, loss in these countries totals 136 million ha between 2014 and 2030, netting over 3.4% of the global 2000 dense forest cover. Other insights include that success in eradicating poverty likely slows as China and India near 0% poverty and populations in poor countries grow faster than those in the developed world. Our modeled data do not signal that development alone will result in successful slowdown in natural capital depletion. At the global level, results suggest ongo- ing increases in global deforestation rates and carbon emissions. Global pollution exposure stabilizes disregard the growth scenario, the results instead suggest a distributional shift to- ward lower income countries with improving and worsening conditions balancing out at the global scale. Air pollution concentrations rise by 28% in the bottom 20% income countries. Table 6 shows that the entire bottom 30% income countries of our sample in fact continues to face increasing pollution exposure. Projections of forest cover and carbon emissions on the other hand, are heavily dependent on the economic outlook. Growing wealth in the de- veloping world together with rapid population growth may accelerate future global carbon output. To focus explicitly on trends in the developing world, we aggregated the results for the bottom 60 percent per capita GDP of our samples, see fig. 7. These results suggest that deforestation rates, while high, may stabilize. Similarly to pollution at a global level, our modeled data suggest a distributional shift in which deforestation increases in the poorest countries while some improvements may occur in the third GDP quintile. The global increase depicted in fig. 6 is thus partly because decoupling of deforestation and economic activity has not fully occurred even in high income countries including Canada, the U.S. and Norway. These countries, however, have better regrowth rates (0.625% regrowth rate in the temperate climatic domain) than the tropical forests that largely lie in the developing world (Hansen et al., 2013). Pollution results suggest that development in the developing world may for the first time result in possible improvements in air quality. Disregard, pollution remains dangerously high at twice the WHO guidelines. Carbon emissions increase fast, 28 the results indicate that involving the developing world (i.e. bottom 60% per capita GDP countries), that currently contributes about 15% or 4.9 GT to total emissions, in GHG emission policy dialogs will become more important as their total output more than doubles to 11.4 GT in our BAU. This will be equivalent to 35% of current global emissions. Bottom 60 population weighted GDP scenario Bottom 60 population weighted poverty projection 35 10000 Historical Pessimistic BAU 30 GDP per capita Opportunistic 8000 Poverty rate 25 6000 20 Historical Pessimistic 4000 15 BAU Opportunistic 2000 2005 2010 2015 2020 2025 2030 2000 2005 2010 2015 2020 2025 2030 Year Year Forest cover weighted % tree cover loss, Population weighted PM25 concentrations, projection for bottom 60 projection for bottom 60 40 Historical Historical 0.6 Pessimistic Pessimistic BAU BAU PM25 concentration 38 % Tree cover loss Opportunistic Opportunistic 0.5 36 34 0.4 32 0.3 30 2000 2005 2010 2015 2020 2025 2030 2000 2005 2010 2015 2020 2025 2030 Year Year Total carbon emission output, projection for bottom 60 12 Historical Pessimistic 10 BAU CO2 Billion Tons Opportunistic 8 6 4 2 2000 2005 2010 2015 2020 2025 2030 Year Figure 7: Projection for environmental outcomes in the developing world. Table 6 contains aggregate statistics of the BAU line and highlights the distributional changes across income. To shed final light on the challenges that await global leadership at the fronts of global warming and reducing carbon emission outputs, we have broken down total carbon emis- sions in per capita GDP quintiles in fig. 11. Emissions in the bottom 20% are not signifi- cantly contributing to world aggregates. Starting at the second quintile, we see considerable emission outputs all growing at alarming rates. Even the emissions of the current top 20% 29 keep increasing in our modeled data, possibly suggesting that the 2008-2014 successes may have partly been driven by a combination of slow economies and improvements due to ongoing urbanization that will likely halt as rural areas become less populated. Improved economic conditions in the top income countries may result in renewed upward momen- tum, albeit the increase is much slower than what is seen in the lower income countries. The slowdown resulting from modeled data indicates that stabilized levels may possibly be reached soon in high income level countries. Disregard of the decoupling discussion in high income countries, that evidently must result from policy intervention, we see that carbon output in the fourth income quintile, and China in particular, dominates the global result. In total, 57% of average global 2030 emissions, or 35.8 GT, is produced in these countries. Without improved policies, the fourth income quintile produces more than the 2014 global output. 6 Discussion and Conclusion Through this paper, we have discussed penalized non-parametric modeling of environmental output across economic development. This type of modeling works well for nonlinear pro- cesses given that they do not result in overly complex dynamics. We deployed the framework to study environmental data in a panel of 95 countries. We specifically modeled satellite- derived deforestation, satellite-derived air pollution, and reported carbon emissions. To deal with heteroskedastic variance, we transformed the data to logarithmic degradation intensity of per capita GDP. Our out-of-sample shrinkage of fixed error components did not support time fixed effects. Our results suggest that production gradually favors conserving the earth’s finite re- sources as GDP increases, but that this alone is not sufficient to offset the effects of scale growth. Instead, structural change in the economy shapes environmental output curves. This process shares similarities between sovereigns, but remains largely heterogeneous. These results do not support a single Environmental Kuznets rule. Instead, the results emphasize the importance of local economic conditions on environmental results. Across all data levels, some effects hold unambiguously. Poverty and income inequality corre- late with higher pollution, higher deforestation, and lower carbon emissions; agricultural GDP correlates with deforestation; population densities correlate with pollution; and higher manufacturing shares correlate with increased carbon emissions. We find various tipping points in other variables, notably across urbanization rates. While local conditions may 30 be unique, average development is associated with an inverted U -shape in deforestation, pollution and carbon intensities of production units. Per capita carbon missions follow a J -curve as the increase in per person productivity is not sufficiently offset by efficiency improvements. Disregarding the level of per capita GDP, we observe that at least one form of natural capital degradation is high, conflicting with the belief that countries tend to ”clean up” as they develop. One could argue that the scope of the impacts of externalities to production increases with development, with the burden falling to increasingly distant households both in time and space. Although local air pollution may be more intrusive on daily life, the consequences of climate change will remain globally impactful for generations to come. We extrapolated our descriptions forward in time to highlight the daunting implications of development under continuation of current practices without improving policies. Our results are generally in line with emission paths associated to the high radiative forcing scenario’s considered in IPCC’s 4.9◦ world (RCP 8.5). Our projections did not indicate successes on the fronts of reducing deforestation. Air quality improves in some currently severely polluted places, but worsens in poor regions. In our results, deforestation follows an inverted U -shape across average development in the developing countries. This confirms and extends recent results from Crespo Cuaresma et al. (2017) that provide evidence for a partial EKC for forest cover at low income. How- ever, we find that growth alone is not sufficient to halt forest loss, and we find evidence that within the bottom 60% income countries, deforestation shifts to the bottom 30%, and that countries within the top 40% income do not fully stop deforesting. Others have simi- larly detailed forest loss in high-income countries, for example in the United States (Sleeter et al., 2012). Future efforts should also aim to understand the regrowth dynamics across economic development, we have only used it as a control variable in our models. Generally, the temperate zones have much better regrowth rates. Taking this and projected increases in the bottom 20% into account, the African forests seem to be at increased risk as economic successes in these areas accelerate, while the amazon faces only marginal improvements in the immediate future in our modeled projections. On the pollution side, our model projects rising PM2.5 levels in the lowest 30% income countries, with a general decrease in PM2.5 in middle income countries. PM2.5 remains far above WHO air quality guidelines in many countries, particularly in lower and middle income groups. Given population growth, these levels will expose greater and greater popu- 31 lations to pollution-related health risks as exposed populations continue to grow. Currently, about 90% of the global population is exposed to air quality that does not comply with the World Health Organization’s Air Quality Guidelines (World Health Organization, 2016). Tallis et al. (2017) expect that by 2050, business-as-usual development will result in over 4.8 billion people living in countries with worse air quality than in 2010. As a comparison, in our average modeled data 52% of people currently live in places where air quality has worsened by 2030. This totals to approximately 4.4 billion by 2030. On the carbon end, our results suggest emission levels that could lead to the high ra- diative forcing scenario in IPCC’s 4.9◦ world (RCP 8.5) are largely in line with business-as- usual development. Worse scenarios may in fact thus be considered as relevant possibilities. Recent studies suggest we are not alone in such a conclusion. See for example Peters et al. (2012) and comments, suggesting - in line with our findings - that reported successes in carbon reduction are short-lived and largely relate to the 2008-2009 crisis and aftermath. Emissions rapidly increased in many places with the recovery. Furthermore, Peters et al. (2013) and comments thereon reveal that recent emissions continue to track the high end of suggested emission scenarios, making it increasingly unlikely that global warming will stay below 2◦ . This is in line with our result that continuing current development, puts the world on emissions associated with a 4.9◦ pathway. This is further substantiated by the conclusion that developments on the fronts of negative emissions are required to reach a 2◦ future Gasser et al. (2015). Combined, the evidence suggests that a worst-case scenario over 4.9◦ in 2100 is both not unrealistic and overlooked in both the scientific community and the political arena. References Bates, C. and White, H. (1985). A Unified Theory of Consistent Estimation for Parametric Models. Econometric Theory, 1(02):151–178. Blasques, F. (2010). Identifiable Uniqueness Conditions for a Large Class of Extremum Estimators. In ETA International Symposium on Econometric Theory and Applications, Singapore. Blasques, F. and Duplinskiy, A. (2015). Penalized Indirect Inference. Crespo Cuaresma, J., Danylo, O., Fritz, S., McCallum, I., Obersteiner, M., See, L., and Walsh, B. (2017). Economic Development and Forest Cover: Evidence from Satellite Data. Scientific Reports, 7:40678. 32 Dijk, D. V., Franses, P. H., Lucas, A., and Lucas, A. (1999). Testing for Smooth Transition Nonlinearity in the Presence of Outliers. Journal of Business & Economic Statistics, 17(2):217. Domowitz, I. and White, H. (1982). Misspecified models with dependent observations. Journal of Econometrics, 20(1):35–58. Gasser, T., Guivarch, C., Tachiiri, K., Jones, C. D., and Ciais, P. (2015). Negative emissions physically needed to keep global warming below 2◦ C. Nature Communications, 6:7958. Granger, C., King, M. L., and White, H. (1995). Comments on testing economic theories and the use of model selection criteria. Journal of Econometrics, 67(1):173–187. Grossman, G. M. and Krueger, A. B. (1995). Economic Growth and the Environment. The Quarterly Journal of Economics, 110(2):353–377. Hainmueller, J. and Hazlett, C. (2014). Kernel regularized least squares: Reducing mis- specification bias with a flexible and interpretable machine learning approach. Political Analysis, 22(2):143–168. Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina, A., Thau, D., Stehman, S. V., Goetz, S. J., Loveland, T. R., Kommareddy, A., Egorov, A., Chini, L., Justice, C. O., and Townshend, J. R. G. (2013). High-Resolution Global Maps of 21st-Century Forest Cover Change. Science, 342(6160):850–853. Horowitz, J. L. (2011). Applied Nonparametric Instrumental Variables Estimation. Econo- metrica, 79(2):347–394. Kent, J. T. and Tyler, D. E. (2001). Regularity and Uniqueness for Constrained M - Estimates and Redescending M -Estimates. The Annals of Statistics, 29(1):252–265. Kullback, S. and Leibler, R. A. (1951). On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86. Micchelli, C. A., Xu, Y., and Zhang, H. (2006). Universal Kernels. Journal of Machine Learning Research, 7:2651–2667. Perman, R. and Stern, D. I. (2003). Evidence from panel unit root and cointegration tests that the Environmental Kuznets Curve does not exist. Australian Journal of Agricultural and Resource Economics, 47(3):325–347. er´ Peters, G. P., Andrew, R. M., Boden, T., Canadell, J. G., Ciais, P., Qu´ e, C. L., Marland, G., Raupach, M. R., and Wilson, C. (2013). The challenge to keep global warming below 2◦ C. Nature Climate Change, 3(January):4–6. 33 er´ Peters, G. P., Marland, G., Le Qu´ e, C., Boden, T., Canadell, J. G., and Raupach, M. R. (2012). Rapid growth in CO2 emissions after the 2008-2009 global financial crisis. Nature Climate Change, 2(1):2–4. otscher, B. M. and Prucha, I. R. (1991). Basic structure of the asymptotic theory in P¨ dynamic nonlineaerco nometric models, part i: consistency and approximation concepts. Econometric Reviews, 10(2):125–216. otscher, B. M. and Prucha, I. R. (1997). P¨ Dynamic Nonlinear Econometric Models. Springer Berlin Heidelberg, Berlin, Heidelberg. Rothenberg, T. J. (1971). Identification in Parametric Models. Econometrica, 39(3):577– 591. olkopf, B. and Smola, A. J. (2001). Learning with kernel: Support Vector Machines, Sch¨ Regularization, Optimization and Beyond. Shahbaz, M., Shafiullah, M., Papavassiliou, V. G., and Hammoudeh, S. (2017). The CO2–growth nexus revisited: A nonparametric analysis for the G7 economies over nearly two centuries. Energy Economics, 65:183–193. Sin, C.-Y. and White, H. (1996). Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics, 71(1):207–225. Sleeter, B. M., Sohl, T. L., Wilson, T. S., Sleeter, R. R., Soulard, C. E., Bouchard, M. A., Sayler, K. L., Reker, R. R., and Griffith, G. E. (2012). Projected Land-Use and Land- Cover Change in the Western United States. Baseline and Projected Future Carbon Storage and Greenhouse-Gas Fluxes in Ecosystems of the Western United States. Smarzewski, R. (1986). Strongly unique best approximation in Banach spaces. Journal of Approximation Theory, 47(3):184–194. Stern, D. I. (1998). Progress on the Environmental Kuznets Curve? Environment and Development Economics, 3(2):173–196. Stern, D. I. (2004). The Rise and Fall of the Environmental Kuznets Curve. World Devel- opment, 32(8):1419–1439. Stern, D. I., Common, M. S., and Barbier, E. B. (1996). Economic growth and environmen- tal degradation: The environmental Kuznets curve and sustainable development. World Development, 24(7):1151–1160. Tallis, H., Hawthorne, P., Polasky, S., Reid, J., Beck, M., Brauman, K., Bielicki, J., Binder, S., Burgess, M., Cassidy, E., Clark, A., Costello, C., Fargione, J., Game, E., Gerber, J., Isbell, F., Kiesecker, J., McDonald, R., Metian, M., Molnar, J., Mueller, N., O’Connell, C., Ovando, D., Troell, M., Boucher, T., and McPeek, B. (2017). Meeting Economic Growth and Multiple Environmental Elements of Sustainable Development. Forthcoming. 34 Tibshirani, R. (1996). Regression selection and shrinkage via the lasso. Journal of the Royal Statistical Society B, 58(1):267–288. van Donkelaar, A., Martin, R. V., Brauer, M., Hsu, N. C., Kahn, R. A., Levy, R. C., Lyapustin, A., Sayer, A. M., and Winker, D. M. (2016). Global Estimates of Fine Particulate Matter using a Combined Geophysical-Statistical Method with Information from Satellites, Models, and Monitors. Environmental Science & Technology, 50(7):3762– 3772. Vollebergh, H. R. J., Dijkgraaf, E., and Melenberg, B. (2005). Environmental Kuznets curves for CO2: heterogeneity versus homogeneity. Ssrn, (November):39. Wagner, M. (2015). The Environmental Kuznets Curve, Cointegration and Nonlinearity. Journal of Applied Econometrics, 30(6):948–967. White, H. (1994). Estimation, inference, and specification analysis. Cambridge University Press. White, H., White, and Halbert (1980). Using Least Squares to Approximate Unknown Regression Functions. International Economic Review, 21(1):149–70. World Bank (1992). World Development Report 1992. Development and the Environment. Technical report. World Health Organization (2016). Ambient Air Pollution: A global assessment of exposure and burden of disease. World Health Organization, pages 1–131. Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 101(476):1418–1429. 35 Supplementary Appendix Penalized Non-parametric Inference of Global Trends in Deforestation, Pollution and Carbon ee, Bo Pieter Johannes Andr´ Andres Chamorro, Phoebe Spencer, Harun Dogo A Additional discussion Non-parametric approaches are capable of producing parameterization mappings that ap- proximate nonlinearities arbitrarily well, but do not necessarily also produce uniquely iden- tifiable solutions to the criterion function if the hypothesis space produces universal ap- proximations that fit the data arbitrarily well for any sample size. Estimation is therefore problematic without additional structure to the estimator, which in our case comes in the form of a penalty to the criterion, but in other settings may relate to bandwidths or other tuning parameters. It is a challenge in its own right to understand how this complexity-penalized estimator is positioned relative to the classical least squares approx- imation context as considered by White et al. (1980).19 In the standard context, the best approximation is produced by a unique point in the entire parameter space, while in the penalty context a unique best approximation exists for every given penalty. Hence, the divergence between the true functional form and the pseudo -true approximation, is not driven by boundaries to the parameter space as in the parametric case, but rather it is driven by the penalty. Ultimately penalization confines the hypothesis space to simple spaces and must reflect a prior belief that the true functional form would not result in a large penalty. That prior belief carries over to the limit result if the penalty does not vanish. This produces a bias, or may even render the result completely arbitrary if the penalty is set without caution. In the current setting, our penalty arises as a function of an out-of-sample criterion. As a result, the space of functions that are viewed as acceptable solutions to the criterion is generated by the data itself, and the penalized non-parametric method is able to obtain approximations of increasing complexity as the data size tends toward infinity. In the finite sample case, this estimator is appropriate given that the re- lationship between environmental degradation and indicators of economic development is not dominated by high-frequency components that would result in strong complexity. In this additional discussion, let N, Z, and R denote the sets of the natural, integer, and real numbers. R>0 includes all positive, non-zero, reals. For a set A, we use B(A) to denote the Borel σ -algebra over A. We use t, ..., T ∈ Z to index time, and i, ..., N ∈ N to 19 White et al. (1980) discuss convergence toward the unique least squares approximator that may differ from the true parameter in the presence of misspecification bias. i index cross-sections, it, ..., N T ∈ N × Z labels all locations in space-time. We use boldfaced letters, e.g., a ∈ A to denote vectors. Furthermore, ×t=T t=1 A = AT denotes the Cartesian product of T copies of A, and A∞ = ×t=∞ t=−∞ A is the Cartesian product of infinite copies. For two maps f and g , f ◦ g is their composition resulting from a point-wise application, and ·, · denotes the inner product space.20 Finally, · A denotes a norm on A. A.1 Identifiability in nonlinear models Closely related to regulating the size of non-parametric models is the ill-posedness of un- regulated non-parametric models. Before discussing the relationship between penalization and identifiability of the criterion of non-parametric estimators, we provide a simplified discussion on identifiable uniqueness and its relation to inference in the context of finite dimensional nonlinear models.21 Hypothesis testing in a framework of finite parameter nonlinear models is often plagued by the problem that verification of the assumptions required for identifiability, relies itself on the outcome of a hypothesis that may be difficult to test. This is problematic as identifiable uniqueness plays a key role in establishing consistency and normality of test statistics. This is illustrated by a model of the form: −(x − c)2 y = δ exp + ε, γ in which the postulated relationship between y and x is assumed to follow a hyperbolic 1 : γ = 0 relating to the curve across levels of x. In this model the linearity hypothesis H0 2 : δ = 0 non-existence of the curved functional form depends on a second hypothesis H0 being true or false. This follows from the fact that for δ = 0, γ can take any value without changing the predicted density implied by the model. In this case any form of completeness required for identifiable uniqueness of the estimator, holds at most for a subset of the parameter space in which the model would in fact produce an inverted U -shaped form. Distributions corresponding to different values of γ are only sufficiently distinct when δ is sufficiently bounded away from zero. Without establishing existence and uniqueness of a consistent estimator, it is impossible to establish normality, hence the distribution of test statistics remains unknown.22 20 As a generalization of the dot product in the Euclidean space, to higher dimensional spaces including infinite dimensional Hilbert spaces. 21 Identifiable uniqueness is a difficult concept, more elaborate general discussion can be found here (P¨otscher and Prucha, 1991) formal definitions and discussion at a deeper level regarding strongly unique best approximation in Banach spaces can be found here (Smarzewski, 1986); and discussion on regulated M -estimation can be found here (Kent and Tyler, 2001) and an overview of concepts is written in the monograph of (Blasques, 2010). 22 Auxiliary test statistics may still be derived, but it is sometimes difficult to ensure that Taylor expansions do not capture nonlinearities of a type not predicted by the economic theory. See for example (Dijk et al., 1999) for a discussion in the threshold framework. Researchers may also choose to rely on information criteria to compare various descriptions of the data and decide between economic theories (Granger et al., 1995). In the limit, Penalized Likelihood Criteria select the model that minimizes Kull-Back Leibler divergence with probability 1 (Sin and White, 1996), but convergence rates depend on the penalty chosen. The acceptance of an economic theory thus relies on information outside the model. In a sense, a researcher has flexibility ii More intuition is found in the following two definitions adapted from Definition 1 and Definition 2 in (Rothenberg, 1971). DEFINITION. 1. Two points α1 ∈ A and α2 ∈ A are said to be observationally equivalent with respect to a function h evaluated over x if h(x; α1 ) ≡ h(x; α2 ) ∀ x ∈ R. DEFINITION. 2. A point α1 ∈ A is said to be identifiable by a function h evaluated over x if there is no other point α ∈ A that is observationally equivalent. Let θ := (δ, c, γ ) denote a vector of parameters, with θ ∈ Θ, and θ0 := (δ0 , c0 , γ0 ) be the true vector of parameters. For consistency toward the true parameter, one would not only require θ ˆ := arg minθ∈Θ Q(y, x; θ ) ˆ − θ0 → 0 to be the solution a.s. to the criterion θ as N → ∞, but it needs to be the identifiable unique solution. Following the definitions above, then by the definition θ0 as the minimizer of Q(y, x; θ ), there needs to be assurance of some form that arg min Q(y, x; θ0 ) < arg min Q(y, x; θ ) ∀ θ ∈ Θ \ θ0 , (10) θ ∈Θ θ ∈Θ excluding arg min Q(y, x; θ0 ) ≤ arg min Q(y, x; θ ) ∀ θ ∈ Θ \ θ0 . (11) θ ∈Θ θ ∈Θ as the alternative. The standard assumption is that Θ is compact. Together with almost sure continuity in θ ∈ Θ, Weierstrass’ theorem implies that θ0 exists as a non-empty set ˆ = h(x) if a.s. Equation (10) can result directly from the parameterized model y h(x; θ0 ) = h(x; θ ) ∀ θ ∈ Θ \ θ0 , (12) such that there is no point in Θ other than θ0 that is observationally equivalent to θ0 . Specifically the observational equivalence definition may fail to hold if Θ is high dimensional. If eq. (10) is not implied by the nature of h, it can also be provided by additional structure to the criterion Q(·; θ ) conditional on regions in Θ, or by limiting the search to remain within a subset θˆ := arg minθ∈Θ ⊂Θ Q(y, x; θ ), where Θs is a compact subset of the parameter s space that may possibly grow in complexity along with the sample size. DEFINITION. 1 and DEFINITION. 2 are intuitive, but provide no testable condition to decide upon the identifiability of an estimator. One insightful definition is the following adapted from (Bates and White, 1985) that ensures that the solution to the criterion is well separated. DEFINITION. 3. Suppose θ0 minimizes a real-valued criterion Q∞ (·; θ ) on a compact metric space Θ, within a circular neighborhood ℵ0 (r) ⊂ Θ with radius r > 0 that has a compact complement ℵ0 (r)c : Θ \ ℵ0 (r), then θ0 is uniquely identified on Θ if and only if for every r > 0, inf [Q∞ (θ ) − Q∞ (θ0 )] > 0. θ ∈ℵ0 (r)c to corroborate specific theories by designing the information criteria to support them. iii A.2 Identifiability in non-parametric models Non-parametric models aim to learn from the data without assuming that h is up to finitely many parameters, and work under the axiom that the parameter space Θ may in fact be infinitely dimensional. By allowing for that, we minimize the risk that our parameterization assumptions precludes θ0 ∈ Θ, solving for misspecification bias that results from parametric assumptions. However, without imposing further structure to the criterion it is generally ˆ → θ0 uniformly over Θ as the not possible to establish consistency of our estimate θ compactness assumption on Θ does not hold in infinite dimensions.23 This poses a problem in verifying DEFINITION. 3, and subsequently establishing a consistency result, such as those of (Domowitz and White, 1982). One solution is to focus the arguments on establishing a compact subset of the parameter space such that over the complement of the compact subset the criterion function is even- otscher and Prucha, 1997). This follows by first constructing a subset tually ”large”, see (P¨ Θs ⊂ Θ such that θ0 ∈ Θs , and such that uniformly, [Q∞ (θ ) − Q∞ (θ0 )] > ∀ θ ∈ Θ \ Θs where is some positive finite constant. The subset, is closed as its complement is open, and bounded as it is contained in a ball of finite radius implied by the fact that [Q∞ (θ ) − Q∞ (θ0 )] ≤ ∀ θ ∈ Θs (d), can only hold for d < ∞, hence it is compact. As a consequence it is sufficient to show that consistency holds within Θs , since any M -estimator must eventually fall within the compact subset. We can summarize identifiable uniqueness of θ0 in an open space as follows. DEFINITION. 4 (Identifiability in an open space). Suppose θ0 minimizes a real-valued crite- rion Q∞ (·; θ ) on an open metric space Θ. Suppose furthermore that θ0 minimizes Q∞ (·; θ ) within a circular neighborhood Θs (d) ⊂ Θ, that has finite positive radius d > 0, then uni- formly over Θ there exists some positive constant for which [Q∞ (θ ∈ Θ \ Θs ) − Q∞ (θ ∈ Θs )] > . If furthermore, there is also a circular neighborhood ℵ0 (r) ⊂ Θs with radius r < d that has a compact complement ℵ0 (r)c : Θs \ ℵ0 (r), then θ0 is uniquely identified on Θ if and only if for every r, 0 < r < d, inf [Q∞ (θ ) − Q∞ (θ0 )] > 0. θ ∈ℵ0 (r)c ⊂Θs (d)⊂Θ In a sense, we want to exert some control over the structure of Q∞ (·; θ ) on Θ such that any description θ ∗ observationally equivalent to θ0 for any θ ∈ Θs is uniquely identifiable by the criterion within Θs , disregarding how Θ is structured outside Θs . One such an approach can be found in the well-known kernel estimator. The solution offered by the kernel method depends on selecting an appropriate bandwidth that controls for the size of local neighborhoods in the sample space throughout which nonlinearities smoothly differ. For too small bandwidths, the kernel method creates a subspace Θk ⊃ Θs that allows overly-flexible fits to the data. This can create an ill-posed problem, in which multiple solutions to the criterion within Θk may still deliver equally good fits as judged by the 23 By definition a set A ∈ Rd is compact if and only if it is closed and bounded. iv criterion evaluated over Θk . It is obvious that DEFINITION. 4 is not applicable in such a context. For too small bandwidths, the kernel method establishes Θk ⊂ Θs that is small, and while DEFINITION. 4 may work for Θk , we are not sure that in fact θ0 ∈ Θk due to the parameterization assumptions used to construct Θk . The role of the bandwidth is therefore extremely important, identifiable uniqueness of the estimator requires the bandwidth to be sufficiently large, while reducing miss-specification bias requires the bandwidth to be sufficiently small. In an ideal framework, both factors are balanced out and Θk grows as N → ∞ at an appropriate rate. A.3 The role of the penalty in the estimator The fitted nonlinearities are allowed to be of any form, but λ > 0 implies the penalty is never removed completely. Positive penalization is key to ensuring that there exists a finite radius neighborhood Θs (d) ⊂ Θ in which any M-estimator must eventually fall uniformly over Θ as [Q∞ (θ ∈ Θ \ Θs ) − Q∞ (θ ∈ Θs )] > (θ ; λ) > 0, where (θ ; λ) > 0 is ensured for any θ by λ > 0. Penalties that vanish completely at a pre-specified rate, are interesting when the researcher wishes to impose penalties only when the estimator is confronted with small sample sizes. This requires however that the criterion is uniquely identified at λ = 0 eventually. Vanishing penalties may improve inference when using estimators that have poor small sample behavior by ensuring that the estimator is relatively inert to weakly nonlinear signals and less likely to overfit the data in local regions of the sample space. Penalties that take values in R>0 , can improve small sample behavior, but maintain a bias towards linear solutions that persists in the limit. Note that eq. (9) reveals that convergence of our estimator h ˆN T −hˆ ∞ → 0 to a specific ˆ ∞ ∈ HΘ , where h target function h ˆ ∞ is possibly the true function or the best approximator ˆ∞ → 0, which is the more ˆN T − θ as judged by the penalized limit criterion, is the same as θ common notation. Hence, we shall use the latter, but really we are interested in ensuring ˆ ∞ is a uniquely identifiable point in HΘ as close to h0 as possible. Consistency and that h normality theorems for eq. (7) are provided in Hainmueller and Hazlett (2014). The results ensure a limit convergence toward the best approximation of the conditional expectation function given penalization, hence the limit solution is conditional on the researcher’s choice of λ. The theory provided is therefore to be understood in terms of θ ˆN T converging to a pseudo-true parameter as N T → ∞, that by construction minimizes the penalized criterion even if the penalty does not vanish. To understand the relationship between the pseudo- true parameter and the true parameter conditional on the penalty, it is helpful to consider precisely how the penalty influences the criterion and delivers the identifiable uniqueness property. Let θ ˆπ := arg minθ∈Θ Q∞ (θ ) + π (θ ), that minimizes the penalized ˆπ be the point θ criterion, and θ0 be the point θ0 := arg minθ∈Θ Q∞ (θ ) that minimizes an unpenalized out- sample criterion. θπ is the best approximator similar to the misspecification case studied in (White et al., 1980), whereas θ0 is the true parameter, that is the weights vector that v induces h0 through the kernel, which is the true function that provides the best out-of- sample density by its definition. The function h ˆπ ) is the best approximator ˆ π := h(xt ; θ of h0 := h(xt ; θ0 ) as judged by the penalized criterion Q∞ (θ ) + π (θ ) for a given level of penalization π . The penalty does not imply that h(xt ; θ ˆπ ) = h(xt ; θ ) ∀ θ ∈ Θ \ θˆπ and ˆπ is identifiable unique any xT ∈ X and all N T ∈ N × Z. However, it ensures that θ as the minimizer of the limit criterion even in the case of two observationally equivalent ˆπ and any xT ∈ X and all ˆπ ) ≡ h(xt ; θ ∗ ) for some θ ∗ ∈ Θ \ θ parameterizations h(xt ; θ N T ∈ N × Z. PROPOSITION. 1 (Identifiable uniqueness). The function h ˆ π := arg minh ∈H Q∞ (hθ ) + θ Θ ˆπ := arg minθ∈Θ Q∞ (θ ) + π (θ ) is uniquely identified within π (hθ ) produced by hX at point θ HΘs a simple subset in the infinite dimensional Hilbert space HΘ , if π is a strictly positive penalty function continuous on Θ. The argument follows from: ˆπ := arg minθ∈Θ Q∞ (θ ) + π (θ ), is by definition the minimizer of Q∞ (·; θ ) + π (θ ) Proof. θ that is by construction of the least squares function and the penalty function π : Θ → R>0 a real-valued criterion on an open metric space Θ. Furthermore there exists some positive constant for which [Q∞ (θ ∈ Θ \ Θs ) + π (θ ∈ Θ \ Θs ) − Q∞ (θ ∈ Θs ) + π (θ ∈ Θs )] > , because for Q∞ (θ ∈ Θ \ Θs ) ≡ Q∞ (θ ∈ Θs ), [π (θ ∈ Θ \ Θs ) − π (θ ∈ Θs )] > , ˆπ minimizes Q∞ (·; θ ) + π (θ ) within a by the monotonicity of π on Θ which implies that θ neighborhood Θs ⊂ Θ. Furthermore Θs (d) has finite radius d < ∞ because [π (θ ∈ Θs (d)) − π (θ ∈ Θ \ Θs (d))] ≤ implies d < ∞, by finiteness of in turn implied by continuity of the penalty. Finally, Θs (d) ⊂ Θ is compact as it closed because its radius is finite, and its complement Θ \ Θs is open. We have now established that uniformly over Θ, the estimator must fall eventually inside Θs . The rest of the argument follows from standard identifiability arguments in compact parameter spaces as in (Bates and White, 1985; Domowitz and White, 1982) focused on Θs . That is, define a circular neighborhood ℵk (r) ⊂ Θs with noneggative radius r < d that has a compact complement ℵk (r)c : Θs \ ℵk (r). θk is uniquely identified on Θ as by 0 < r < d < ∞, for every (r, d), inf ˆπ ) > 0. ˆ π ) + π (θ Q∞ (θ ) + π (θ ) − Q∞ (θ θ ∈ℵk (r)c ⊂Θs (d)⊂Θ vi In our case, this is implied by continuity of the criterion and additionally by the fact that ˆπ ), by definition for any observationally equivalent point π (θ ∗ ) such that Q∞ (θ ∗ ) ≡ Q∞ (θ ˆπ as the minimizer of minθ∈Θ Q∞ (θ ) + π (θ ) the continuity of π implies of θ ˆ π ) < π (θ ∗ ). π (θ ˆπ ) < π (θ ∗ ), providing that for two observationally Central to the result is that π (θ equivalent functions, the pseudo-true parameter is the parameter vector that induces a less complex functional form. So far we have treated π to be fixed at a prespecified level. However, for any given level of penalization, the solution to the penalized criterion is different. We can make that ˆ∞ |π ), and more explicit by writing it as the limit estimate conditional on a penalty value (θ ˆ∞ |π ) − θ0 . This displays the heavy importance analyzing the role of π in the divergence (θ on determining an appropriate penalty π as it is crucial to the outcome, see Blasques and Duplinskiy (2015) for some thoughts on how to choose appropriate penalty weights in a general context. Asymptotically, if the impact of π vanishes, for example by using penalties 1 of an o((N T )− 2 ), consistency toward θ0 can still be met in the limit, again see Blasques and Duplinskiy (2015) for detail. However, in small samples, similar to the Bayesian case, a researcher can exert influence on the outcome by setting the value of π . In the current framework, π > 0 prevents a generality claim as it would follow in the parametric case, however we can still focus the argument on finding an optimal penalty that minimizes ˆ ∞ |π ) − h0 . ˆ∞ |π ) − θ0 , or equivalently the function divergence (h (θ PROPOSITION. 2 (Best approximation across penalties and weights). The divergence between the best approximation as judged by the penalized limit criterion given a level of penalization and the true function is smaller than the divergence as evaluated at all other limit estimates resulting under other penalty weights ˆ ∞ |π ) − h0 ∀ π ∈ Π \ π0 ⊆ R>δ , ˆ ∞ |π0 ) − h0 < (h (h and results under the penalty π0 that minimizes an out-of-sample criterion ˆ ∞ |π ), Π ⊆ R>δ π0 : arg min Q∞ (h π ∈Π ˆ ∞ |π0 ) is the best approximation of h0 over HΘ ×Π := for some small positive δ . Hence, (h s {HΘs |π1 × HΘs |π2 × ... × HΘs |π } ∀ π ∈ Π, that is across all penalties and weights. The argument follows from ˆ∞ |π := arg minθ∈Θ Q∞ (θ ) + π (θ ) be the minimizer of the penalized ˆπ = θ Proof. Let θ criterion for a certain level of penalization and θ0 := arg minθ∈Θ Q∞ (θ ) the minimizer vii of an unpenalized out-of-sample criterion. When plugging the true parameter in the pe- ˆπ ) + π (θ ˆπ − θ0 → 0 implies that similarly | Q∞ (θ nalized criterion, then taking θ ˆπ ) − [Q∞ (θ0 ) + π (θ0 )] | → 0 This minimization is solved if the in-sample criterion evaluates ˆπ ) − π (θ0 )| → 0. Hence, ˆπ ) − Q∞ (θ0 )| → 0 equivalently, as then immediately also |π (θ |Q∞ (θ ˆπ ) − Q∞ (θ0 )| → 0, is sufficient for θ ˆπ ) − π (θ0 )| → 0 or |Q∞ (θ either |π (θ ˆπ − θ0 → 0. Any such result following from taking both penalties π (θ ˆπ ) and π (θ0 ) to zero simultane- ˆπ )| − ˆ π ) + π (θ ously as N → ∞ is prohibited by the fact that π : Θ → R>0 . However |Q∞ (θ |Q∞ (θ0 ) + π (θ0 )| attains a minimum when setting the penalty to minimize the criterion defined on out-of-sample errors. Specifically since θ0 is by construction the minimum of ˆπ ) the out-of-sample criterion in the limit, setting π0 to minimize arg minπ∈Π ∀ ⊆R Q∞ (θ ≥0 gives ˆπ ) − Q∞ (θ0 )|. π0 : arg min |Q∞ (θ π ∈Π ˆπ ) − Q∞ (θ0 )| → 0, it must follow that and if |Q∞ (θ ˆπ ) − Q∞ (θ0 )| → 0, |Q∞ (θ and, ˆπ ) − π (θ0 )| → 0. |π (θ If Π ⊆ R≥0 is constructed such that π0 ∈ Π for which the arg min’s above reach 0, then (hˆ ∞ |π0 ) − h0 = 0 would follow and we reach the true target function. Now Π ⊆ R≥0 can contain penalties infinitely close to 0, in practice one must work with finite sets for a grid search across Π and construct instead a set Π ⊆ R≥δ being the set of all possible parameters bounded away from zero by some arbitrarily small positive constant δ . If π0 ∈ / Π for which ˆ ∞ |π0 ) − h0 = 0, then still (h ˆπ ) − Q∞ (θ0 )| ∀ π ∈ Π \ π0 ⊆ R>δ , ˆπ ) − Q∞ (θ0 )| < |Q∞ (θ |Q∞ (θ 0 thus also ˆπ ) − Q∞ (θ0 )| < |Q∞ (θ |Q∞ (θ ˆπ ) − Q∞ (θ0 )| ∀ π ∈ Π \ π0 ⊆ R>δ 0 and therefore ˆ∞ |π ) − θ0 ∀ π ∈ Π \ π0 ⊆ R>δ , ˆ∞ |π0 ) − θ0 < (θ (θ which induces through the definition of h as the weighted kernel also ˆ ∞ |π ) − h0 ∀ π ∈ Π \ π0 ⊆ R>δ , ˆ ∞ |π0 ) − h0 < (h (h ˆ ∞ |π0 ) turns out to be the best approximation of h0 for all penalties in Π implying that (h that each result itself as a best approximator within the subset HΘs |π within the penalized criterion necessarily falls given the level of penalization. viii PROPOSITION. 2 implies that if the penalty is chosen by minimizing a criterion out-of- sample, a weights vector can be estimated that produces the function closest to the target function across all penalties and weights . Effectively, a researcher is able to identify an approximation that is arbitrarily close to the true curve, by solving the estimator on very large data iteratively for a sufficiently wide range of penalties and selecting the result that performs optimal as an out-of-sample predictor. This is an intuitive solution as θ0 caries a natural interpretation as the optimal out-of-sample predictor. B Additional Results Conditional expectations of log Tree intensity Conditional expectations of log Tree intensity Conditional expectations of log PM25 intensity Conditional expectations of log PM25 intensity −2.5 −5 −3 −1 qqqqqqq qqqqqqqqqq qqqqq 3 qqqqq qqqq qqqqq 2.0 qqqqq E[Y|X] E[Y|X] E[Y|X] E[Y|X] qqqq qqqq qqqqq qqqq qqqq qqqq qqqq qqq qqq qqqq qqqq qq qqqq qqqq qq qq qqqq qqqq −1 1 qqqq qq qqqq qq qqqq qqqqqqqqqqqqqqqq qqqq qq qqqq qqqqq qqqq qqq qqq −4.0 qqqqq qqqq 1.0 qqqqqq qqqqqqq qqqqq qq qqqq qq qq 7 8 9 10 11 0 200 400 600 800 1000 7 8 9 10 11 0 200 400 600 800 1000 log GDP per capita Population density log GDP per capita Population density Conditional expectations of log Tree intensity Conditional expectations of log Tree intensity Conditional expectations of log PM25 intensity Conditional expectations of log PM25 intensity −3.0 −2.4 qqqqqqq qqqqqqqqqqqqqqqqqqq qqqq 1.2 1.2 E[Y|X] E[Y|X] E[Y|X] E[Y|X] qqqqq qqqq qqq qqqq qqqqqqqqqqqqqqqqqqqqqqqq qqqq qqqq qqqq qqqq qqqqq qqqq qqqq qqqq qqqq qqqqq qqqq qqqq qqqq qqqqq qqqq qqqq qqqq qqqq qqqq −3.6 qqqqqq qqqq q qqqqqqqqqqqqqqqqqqqqq 0.4 qqqqqqq −3.2 qqqqqqqqqqqq 0.9 0 10 20 30 40 50 60 0 20 40 60 80 0 10 20 30 40 50 60 0 20 40 60 80 Undernourishment rate Poverty rate 1.90$ Undernourishment rate Poverty rate 1.90$ Conditional expectations of log Tree intensity Conditional expectations of log Tree intensity Conditional expectations of log PM25 intensity Conditional expectations of log PM25 intensity −2.0 qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqq E[Y|X] E[Y|X] E[Y|X] E[Y|X] qqqq qqqq qqqq qqqq 1.3 qqqq qqqq 1.4 qqqq qqqq qqqqq qqqq qqqq qqqq qqqqqqq qqqq qqqqqqqqqqq qqqq −4.0 qqqq qqqq qqqqqqqq qqqqq qqqqq qqqq qqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqq −4.5 qqqqqqqqq qqqq q 0.9 qq 1.1 0 10 20 30 40 30 40 50 60 70 80 90 0 10 20 30 40 30 40 50 60 70 80 90 Manufacturing GDP share Services GDP share Manufacturing GDP share Services GDP share Conditional expectations of log Tree intensity Conditional expectations of log Tree intensity Conditional expectations of log PM25 intensity Conditional expectations of log PM25 intensity 0.8 1.1 1.4 −3.0 qqqq qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq 1.2 qqqqq qqqqqq E[Y|X] E[Y|X] E[Y|X] E[Y|X] qqqqqqqqqqqqqq qqqq qqqqqqqq qqqqqq qqqq qqqqqq qqqq qqqqqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq −3.0 qqqq qqqq qqqq qqqq qqqq qqqq −3.4 qqqqq q qqqq 0.6 qqqqqqqqqqqqqqqqqqqqqqqq qqqq q q 20 40 60 80 100 10 15 20 25 20 40 60 80 100 0 2 4 6 8 10 Urban population share Bottom 40% income share Urban population share Bottom 40% income share Figure 8: Conditional expectations of deforestation (left two columns) and pollution (right two columns) intensity of income for each variable fixing other variables constant at their mean. ix Conditional expectations of log CO2 intensity Conditional expectations of log CO2 intensity −3.1 qqqqqqq −2 qqqqq qqqqqqqqqq E[Y|X] E[Y|X] qqqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqq −3.5 −5 qqqqq qqqqq 7 8 9 10 11 0 200 400 600 800 1000 Conditional expectations of undernourishment Conditional expectations of poverty log GDP per capita Population density q q q q q q q 40 60 q q q q q q q q q q q q Conditional expectations of log CO2 intensity Conditional expectations of log CO2 intensity q q 30 q q q q 40 q E[Y|X] E[Y|X] q q q q q q q q q q q q q 20 q q q q q q q −2.8 −3.3 q q q q q q qqqqqqqq qqqqqqq q 20 q qqqqqqqq q E[Y|X] E[Y|X] qqqqq q q qqqqq q 10 q qqqq q q qqqq q qqqq q q q qqqq qqqq q q q qqqq q qqqqqqq q q q qqqq qqqqqqq q q q q q −3.6 qqqq q q qqqqqqqqqqqq q q q q q q q q q q q q q q q q q q q q q qqqq 0 −3.8 0 qqqqq 7 8 9 10 11 7 8 9 10 11 0 10 20 30 40 50 60 0 20 40 60 80 GDP per capita log scale GDP per capita log scale Undernourishment rate Poverty rate 1.90$ Conditional expectations of manufacturing Conditional expectations of services 18 85 q q q q q q q q q q q q q q q q q q q q q 80 q q 16 Conditional expectations of log CO2 intensity Conditional expectations of log CO2 intensity q q q q q q q q q q q q q q q q q q q q q q q q 75 q q q q q q 14 q q q E[Y|X] E[Y|X] 70 q −3.2 q q q −3.8 −3.2 q q q qqqqqqqq qqqqqqqqqqqq qqqqqqqq q E[Y|X] E[Y|X] 65 qqqqq q q qqqqq q 12 qqqq qqqq q q qqqq qqqq q q q qqqq qqqq q 60 q qqqq q qqqq qqqq q q qqqqq −3.7 q q qqqqq 10 q qqqqqqqqqqqqqqqq q q 55 q q q q q q q q q q q q q q q q q q 50 0 10 20 30 40 30 40 50 60 70 80 90 7 8 9 10 11 7 8 9 10 11 GDP per capita log scale GDP per capita log scale Manufacturing GDP share Services GDP share Conditional expectations of urban population Conditional expectations of bottom 40% income Conditional expectations of log CO2 intensity Conditional expectations of log CO2 intensity 80 22 q q q q q q q q q q q q q q q q q q 70 q q q q q q q 20 q q −3.4 q −3.0 60 q q q qqqqqqqqqqqqqqqqqqqq E[Y|X] E[Y|X] E[Y|X] E[Y|X] qqqq q q q qqqq qqqq q qqqq 18 qqqq qqqq q q qqqq 50 qqqq q qqqq q qqqq q q qqqq qqqq qqqq q qq q q q q q q q qqqqq q qqqqqqqqqqqqqqqqqqqq q q 16 q q 40 q q −3.8 q −3.5 q q q q q q q q q q q q q q q q q q q q q q 30 q q q 14 q q q q q q q q q q q q q q q q 20 40 60 80 100 10 15 20 25 q q q 20 7 8 9 10 11 7 8 9 10 11 Urban population share Bottom 40% income share GDP per capita log scale GDP per capita log scale Figure 9: Conditional expectations of carbon intensity of income for each variable keeping other variables constant at their mean (left two columns). Table 5: Summary of base rates in percentages by 5-percentiles used in the projection. GDP Population 1 1.33 2.57 2 2.78 2.54 3 2.22 2.28 4 2.93 2.10 5 3.02 1.99 6 2.49 1.98 7 3.69 1.40 8 2.55 1.58 9 2.62 1.51 10 1.67 0.96 11 3.62 0.96 12 2.31 1.31 13 2.97 0.83 14 2.82 1.04 15 2.93 1.15 16 3.85 0.85 17 1.10 0.46 18 1.47 0.30 19 1.47 0.23 20 1.24 0.49 x 0 5 10 15 0 10 20 30 40 0.0 0.5 1.0 Angola Angola Angola 1.5 Argentina Argentina Argentina Armenia Armenia Armenia Australia Australia Australia Austria Austria Austria Azerbaijan Azerbaijan Azerbaijan Bangladesh Bangladesh Bangladesh Belgium Belgium Belgium Benin Benin Benin Bolivia Bolivia Bolivia Botswana Botswana Botswana Brazil Brazil Brazil Cambodia Cambodia Cambodia Cameroon Cameroon Cameroon Canada Canada Canada Centr. Afr. Rep. Centr. Afr. Rep. Centr. Afr. Rep. Chad Chad Chad Chile Chile Chile China China China Colombia Colombia Colombia Congo, Rep Congo, Rep Congo, Rep Costa Rica Costa Rica Costa Rica Cote d'Ivoire Cote d'Ivoire Cote d'Ivoire Denmark Denmark Denmark Dominican Dominican Dominican Ecuador Ecuador Ecuador El Salvador El Salvador El Salvador Ethiopia Ethiopia Ethiopia Fiji Fiji Fiji Finland Finland Finland France France France Gabon Gabon Gabon Georgia Georgia Georgia Germany Germany Germany Ghana Ghana Ghana Greece Greece Greece Guatemala Guatemala Guatemala Guinea Guinea Guinea Guinea−Bissau Guinea−Bissau Guinea−Bissau Honduras Honduras Honduras India India India Indonesia Indonesia Indonesia Iran Iran Iran Ireland Ireland Ireland Italy Italy Italy Japan Japan Japan Kazakhstan Kazakhstan Kazakhstan xi Kenya Kenya Kenya Lao PDR Lao PDR Lao PDR Lesotho Lesotho Lesotho Madagascar Madagascar Madagascar Malawi Malawi Malawi Malaysia Malaysia Malaysia Mexico Mexico Mexico Mongolia Mongolia Mongolia Morocco Morocco Morocco Mozambique Mozambique Mozambique Namibia Namibia Namibia Nepal Nepal Nepal Netherlands Netherlands Netherlands Nicaragua Modeled versus predicted CO2 emissions Nicaragua Nicaragua Nigeria Nigeria Modeled versus predicted pollution levels Nigeria Modeled versus predicted tree cover %loss Norway Norway Norway Pakistan Pakistan Pakistan Panama Panama Panama Paraguay Paraguay Paraguay Peru Peru Peru Philippines Philippines Philippines Rwanda Rwanda Rwanda Senegal Senegal Senegal Sierra Leone Sierra Leone Sierra Leone Solomon Isl. Solomon Isl. Solomon Isl. South Africa South Africa South Africa Spain Spain Spain Sri Lanka Sri Lanka Sri Lanka Suriname Suriname Suriname Swaziland Swaziland Swaziland Sweden Sweden Sweden Switzerland Switzerland Switzerland Tajikistan Tajikistan Tajikistan Tanzania Tanzania Tanzania Thailand Thailand Thailand Timor−Leste Timor−Leste Timor−Leste Togo Togo Togo Tunisia Tunisia Tunisia Turkey Turkey Turkey Uganda Uganda Uganda Figure 10: Accuracy of predicted degradation levels at high income. U.K. U.K. U.K. United States United States United States Uruguay Uruguay Uruguay Uzbekistan Uzbekistan Uzbekistan Venezuela Venezuela Venezuela Vietnam Vietnam Vietnam Zambia Zambia Zambia Predicted Zimbabwe Zimbabwe Zimbabwe Predicted Predicted Observed Observed Adjusted PM25 Table 6: BAU-2030 and base year data aggregated by 5% percentiles of income. GDP per capita as population weighted averages, population in millions, number of poor people in millions, annual tree loss in square kilometers, PM2.5 in population weighted average concentrations, carbon emissions in million tons. World totals are scaled to world totals using multipliers (1.16 for population and carbon based on the share of population in our data, and tree loss 1.42 based on the share of tree cover in the data). Income GDP p.c. GDP p.c. Pop. Pop. No. Poor No. Poor Treeloss Treeloss PM25 PM25 CO2 CO2 Group 2014 2030 2014 2030 2014 2030 2014 2030 2014 2030 2014 2030 1 1, 094 1, 548 68 101 41 55 3, 382 3, 640 15 20 10 28 2 1, 495 2, 611 171 262 74 70 3, 750 3, 977 17 21 20 109 3 1, 899 2, 388 46 67 19 24 1, 415 1, 684 23 25 22 39 4 2, 305 3, 308 96 139 36 43 1, 821 1, 931 29 31 27 71 5 2, 917 4, 720 237 305 57 45 1, 295 1, 331 47 52 96 285 6 4, 249 5, 381 265 366 43 48 5, 782 5, 713 46 48 190 282 7 5, 265 9, 887 111 133 6 2 4, 716 4, 031 23 20 170 385 8 5, 426 10, 301 1, 519 1, 936 341 288 4, 440 3, 769 54 51 2, 286 5, 634 9 6, 875 9, 344 176 231 21 22 3, 297 3, 043 12 13 212 399 10 8, 219 10, 627 15 17 1 1 4, 002 3, 395 9 9 14 22 11 10, 078 16, 232 292 352 22 11 17, 799 17, 676 15 15 540 1, 338 12 11, 993 16, 578 114 141 11 11 3, 784 4, 037 15 17 646 1, 150 13 13, 135 24, 563 1, 691 1, 825 49 5 33, 540 39, 407 44 44 11, 212 25, 839 14 16, 347 20, 955 206 244 4 3 2, 032 2, 132 22 24 1, 128 1, 864 15 18, 386 28, 014 162 193 3 2 5, 160 5, 859 17 17 750 1, 402 16 22, 607 37, 980 53 60 0 0 1, 822 2, 194 20 24 438 1, 112 17 33, 000 39, 059 204 219 0 0 6, 647 8, 832 21 22 1, 160 1, 770 18 38, 515 47, 345 244 253 0 0 26, 962 32, 972 12 13 2, 324 3, 324 19 43, 584 53, 144 128 132 0 0 4, 694 5, 794 15 16 1, 276 1, 788 20 51, 694 64, 462 354 384 0 0 17, 649 19, 466 10 11 5, 534 7, 414 world 14, 088 19, 899 7, 137 8, 537 730 630 218, 664 242, 656 36 36 32, 544 62, 934 xii Total carbon emission output, Total carbon emission output, 0.35 projection for bottom 20 projection for second quintile 8 Historical Historical Pessimistic Pessimistic 7 BAU BAU 0.25 CO2 Billion Tons CO2 Billion Tons Opportunistic Opportunistic 6 5 0.15 4 3 2 0.05 2000 2005 2010 2015 2020 2025 2030 2000 2005 2010 2015 2020 2025 2030 Year Year Total carbon emission output, Total carbon emission output, projection for third quintile projection for fourth quintile 40 3.5 Historical Historical 35 Pessimistic Pessimistic BAU BAU 30 3.0 CO2 Billion Tons CO2 Billion Tons Opportunistic Opportunistic 25 2.5 20 2.0 15 1.5 10 1.0 5 2000 2005 2010 2015 2020 2025 2030 2000 2005 2010 2015 2020 2025 2030 Year Year Total carbon emission output, Total carbon emission output, projection for top 20 projection for the world 17 80 Historical Historical 16 Pessimistic Pessimistic 70 BAU BAU 15 CO2 Billion Tons CO2 Billion Tons Opportunistic Opportunistic 60 14 50 13 40 12 30 11 10 20 2000 2005 2010 2015 2020 2025 2030 2000 2005 2010 2015 2020 2025 2030 Year Year Figure 11: Future carbon under three scenarios broken down by income quintiles. xiii