WPS5400 Policy Research Working Paper 5400 A Control Function Approach to Estimating Dynamic Probit Models with Endogenous Regressors, with an Application to the Study of Poverty Persistence in China John Giles Irina Murtazashvili The World Bank Development Research Group Human Development and Public Services Team August 2010 Policy Research Working Paper 5400 Abstract This paper proposes a parametric approach to estimating below the poverty line. In this application, it is shown a dynamic binary response panel data model that allows that migration is important for reducing the likelihood for endogenous contemporaneous regressors. This that poor households remain in poverty and that non- approach is of particular value for settings in which poor households fall into poverty. Furthermore, it is one wants to estimate the effects of an endogenous demonstrated that failure to control for unobserved treatment on a binary outcome. The model is next heterogeneity would lead the researcher to underestimate used to examine the impact of rural-urban migration the impact of migrant labor markets on reducing the on the likelihood that households in rural China fall probability of falling into poverty. This paper--a product of the Human Development and Public Services Team, Development Research Group--is part of a larger effort in the department to study the effects of rural to urban migration on household outcomes and investment decisions in migrant sending communities. Policy Research Working Papers are also posted on the Web at http://econ. worldbank.org. The author may be contacted at jgiles@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team A Control Function Approach to Estimating Dynamic Probit Models with Endogenous Regressors, with an Application to the Study of Poverty Persistence in China* John Giles + and Irina Murtazashvili JEL Codes: C13, C33, O15, P25 Key Words: Dynamic Binary Response Models; Control Function Approach; PovertyPersistence; Migration; Rural China * The paper has benefitted from helpful comments and conversations with Alan de Brauw, Ana Maria Herrera, Martin Ravallion, Peter Schmidt, David Tschirley, Adam Wagstaff, Jeffrey Wooldridge, from seminar participants at Ohio State University and conference participants at the June 2007 UNUWIDER Conference on Fragile States held in Helsinki and the September 2009 Midwest Econometrics Group Annual Meeting. We gratefully acknowledge financial support for data collection from the National Science Foundation (SES0214702), the Michigan State University Intramural Research Grants Program, the Ford Foundation (Beijing) and the Weatherhead Center for International Affairs (Academy Scholars Program) at Harvard University. The research discussion and conclusions presented in this paper reflect the views of the authors and should not be attributed to the World Bank or to any affiliated organization or member country. + Development Research Group, The World Bank. Email: jgiles@worldbank.org. Corresponding Author. Department of Economics, University of Pittsburgh, Pittsburgh, PA 15260. Tel: (412)648 1762, fax: (412)6481793, and email: irinam@pitt.edu. 1 Introduction Dynamic binary response models have considerable appeal for a diverse range of policy analy- ses in which identifying or controlling for state dependence is important and one is interested in a binary outcome.1 When the outcome is also a¤ected by an endogenous treatment, then an additional complication arises in e¤orts to identify the e¤ects of the treatment on the outcome and on state dependence. In this paper, we propose a parametric approach to estimating dynamic binary response panel data models with endogenous contemporaneous regressors. Our method combines a recent approach to solving the unobserved heterogeneity and the initial conditions problems in non-linear dynamic models (Wooldridge, 2005) with a control function approach to controlling for endogeneity of contemporaneous explanatory variables in non-linear models (e.g., Smith and Blundell, 1986; Rivers and Vuong, 1988; Papke and Wooldridge, 2008). Among other possible applications, the relevance and potential strength of our approach can be demonstrated in analyses of how migration in developing countries a¤ects the poverty status of residents living in migrant source communities. In this setting, we are faced with two important sources of endogeneity: ...rst, the migration decision of community residents may be driven by negative shocks that also raise the probability that households are poor. Second, we expect there to be correlation between migration decisions and the unobserved characteristics of individuals and communities, which may also a¤ect poverty status. Our approach allows us to consistently estimate parameters of a dynamic binary response panel data model with unobserved heterogeneity when some of the continuous contemporaneous explanatory variables are endogenous. To account for the endogeneity in migration from home communities, we employ a control function approach in which residuals from the re- duced form for the endogenous regressor are introduced as covariates in the structural model. Recently, Papke and Wooldridge (2008) employ this approach to deal with an endogenous regressor in a static fractional response panel data model to study the e¤ects of school in- puts on student performance. In contrast with Papke and Wooldridge, this paper develops 1 The range of research areas for which dynamic binary response models have proven important include: labor force participation (Heckman and Willis, 1977; Hyslop, 1999), the probability of receiving welfare (Bane and Ellwood, 1986), the experience social exclusion (Devicienti and Poggi, 2007), and the identi...cation of adverse selection in insurance markets (Chiappori and Salanie, 2000). 2 a control function approach for a dynamic model. To deal with the dynamic nature of the model, we consider two possibilities. We ...rst use a "pure" random e¤ects approach which assumes that unobserved heterogeneity is independent of the observed exogenous covariates and initial conditions. Next, we relax this strong assumption by employing the dynamic cor- related random e¤ects model introduced by Wooldridge (2005). This approach is not only more relevant for analyses of poverty persistence, but also more exible and computationally straightforward than alternative approaches currently in use. We then implement our empirical approach using panel household and village data from rural China. Following the market-oriented reforms introduced in the early 1980s, there was s a pronounced decline in the proportion of China' population living below the poverty line s (Ravallion and Chen, 2007). While much of the literature examining growth in China' rural areas has focused on incentive e¤ects related to reform and on the role of local non-farm employment, there has been relatively little research demonstrating the relationship between increasing migration and the probability that households within villages have consumption levels below the poverty line. Our empirical analysis demonstrates an economically signi...cant causal relationship between migration and poverty reduction in rural China. In performing this exercise, we highlight the usefulness of our econometric approach to settings in which the researcher must work from binary indicators of poverty status, which is often the only information available from administrative data sources. The paper proceeds as follows. In the Section 2 below, we ...rst review approaches to esti- mation of dynamic binary response panel data models, and then propose a general approach to estimating these models when there is an endogenous regressor. In Section 3, we introduce the rural China setting, and motivate a speci...c implementation of the model developed in Section 2, and ...nally describe a strategy for identifying the e¤ect of migration on poverty s within China' villages. In Section 4, we discuss our estimation results and the performance of the model, and then in Section 5 we summarize our results and discuss the potential value of the estimator introduced in the paper. 3 2 Estimation of a Dynamic Binary Response Panel Data Model with an Endogenous Regressor 2.1 Dynamic Binary Response Panel Data Models Dynamic binary response panel data models with unobserved heterogeneity have been used extensively in theoretical and empirical studies. Both parametric and semi-parametric meth- ods have been proposed to solve the initial conditions problem and to obtain consistent esti- mates of model parameters when all explanatory variables other than the lagged dependent variable are strictly exogenous.2 Semi-parametric methods allow estimation of parameters without specifying a distribution of the unobserved heterogeneity, but they are often overly restrictive with respect to the strictly exogenous covariates. Honoré and Kyriazidou (2000), for example, propose an approach that does not allow for discrete explanatory variables. More importantly, because semi-parametric methods do not specify the distribution of the unobserved heterogeneity, the absolute importance of any of the explanatory variables in a dynamic binary response panel data model cannot be determined. Models which do not place any assumption on either the unobserved e¤ects or the initial conditions, or their relation- ship to other covariates, are best described as ...xed e¤ects models, and the semi-parametric approach of Honoré and Kyriazidou (2000) falls into this class of models.3 Due to their computational simplicity, parametric methods have received greater at- tention than semi-parametric methods. There are four main parametric approaches, all employing conditional maximum likelihood (CMLE) analysis, that have been employed for estimation of the dynamic nonlinear panel data models in which all covariates other than the lagged dependent variable are strictly exogenous. The ...rst approach treats the initial conditions for each cross-sectional unit - yi0 - as nonrandom variables. If, in addition, unob- served e¤ects, ci , are also assumed to be independent of the exogenous explanatory variables, 2 With a structural binary outcome model that allows for unobserved e¤ects, one must be concerned that bias could be introduced through a systematic relationship between an unobserved e¤ect and the initial value of the dependent variable. This is known as the initial conditions problem. 3 We follow Chay and Hyslop (2000) in classifying models requiring no assumption on unobservable e¤ects or initial conditions as ...xed e¤ ect models, and refer to random e¤ ect models as those in which one speci...es a distribution of unobserved e¤ects and initial conditions given exogenous explanatory variables. 4 zi = (zi1 ; zi2 ; :::; ziT ); one obtains the density of (yi1 ; yi2 ; :::; yiT ) given the initial conditions, yi0 ; and zi , by integrating out the ci . We refer to the relationship between the observed exogenous covariates and the unobserved heterogeneity in the ...rst method as one of "pure" random e¤ects because we assume ci to be independent of zi and yi0 . While this method may provide a way to obtain consistent estimates of the model parameters, nonrandomness of the initial conditions requires a very strong and often implausible assumption of independence between the initial conditions and the unobserved e¤ects. A second parametric approach involves treating the initial conditions as random and specifying the density for yi0 given (zi ; ci ). With this density, one can then obtain the joint distribution of all the outcomes, (yi0 ; yi1 ; yi2 ; :::; yiT ), conditional on unobserved heterogeneity, ci , and strictly exogenous observables, zi . The most important drawback of this approach, however, lies with the di¢ culty of specifying the density of yi0 given (zi ; ci ).4 A third method, proposed by Heckman (1981), suggests approximating a density of the initial conditions, yi0 ; given (zi ; ci ) and specifying a density of the unobserved e¤ects given the strictly exogenous explanatory variables. The density of (yi0 ; yi1 ; yi2 ; :::; yiT ) given zi can then s be obtained. While Heckman' approach avoids the drawbacks of the second method, it is computationally challenging. Since both the second and the third methods explicitly specify a distribution of the unobserved heterogeneity conditional on strictly exogenous observables and a distribution of the initial conditions conditional on the unobserved e¤ects and the exogenous covariates, they can be classi...ed as random e¤ects models. Finally, an approach proposed by Wooldridge (2005) recommends obtaining a joint distri- bution of (yi1 ; yi2 ; :::; yiT ) conditional on (yi0 ; zi ) rather than a distribution of (yi0 ; yi1 ; yi2 ; :::; yiT ) s conditional on zi as in Heckman' approach. For this method to work, one must specify a density of ci given (yi0 ; zi ):5 This fourth approach is more exible and requires fewer computa- s s tional resources than Heckman' technique. In this method, similar to Heckman' approach, we call the relationship between the observed exogenous covariates and the unobserved het- erogeneity a "correlated" random e¤ects relationship because we allow ci to be a linear function of zi and yi0 : 4 More details on this approach and potential drawbacks can be found in Wooldridge (2002), page 494. 5 s s The speci...cation of this density in Wooldridge' method is motivated by Chamberlain' (1980) approach, which models the distribution of the unobserved e¤ect conditional on the strictly exogenous variables. 5 In the next section we develop an approach to consistently estimating parameters of a dynamic binary response panel data model when the contemporaneous explanatory variables are not strictly exogenous. To do so, we employ a control function approach, popularized by Smith and Blundell (1986) and Rivers and Vuong (1988). The main idea of our approach is to add (control) variables into the structural model to control for endogeneity. We consider a model with two possible sources of endogeneity: correlation between the unobserved het- erogeneity and a regressor, and correlation between a regressor and the structural error. For this reason, we model the relationships among the unobserved e¤ect, exogenous covariates, and the error from the reduced form equation for the endogenous explanatory variable. 2.2 A General Approach to Estimation Our speci...cation of the binary response model assumes that for a random draw i from the population, there is an underlying latent variable model: y1it = z1it 1 + 2 y2it + y1i;t 1 + c1i + u1it ; (1) y1it = 1[y1it 0]; t = 1; :::; T; (2) where z1it is a 1 (K 1) vector of strictly exogenous covariates, which may contain a constant term, y2it is an endogenous covariate, c1i is an unobserved e¤ect, and u1it is an idiosyncratic serially uncorrelated error such that Var(u1it ) = 1. 1[ ] is an indicator function. We assume a sample of size N randomly drawn from the population, and that T; the number of time periods, is ...xed in the asymptotic analysis. For simplicity, we assume a balanced panel. 0 Let denote ( 1; 2; )0; which is a 1 (K + 1) vector of parameters. Importantly, this model allows the probability of success at time t to depend not only on unobserved heterogeneity, c1i , but also on the outcome in t 1. A key assumption is that the dynamics in model (1) are correctly speci...ed, in which case dynamic completeness of the model implies that the error term is serially uncorrelated. Allowing u1it to have arbitrary serial correlation would suggest including more lags of the dependent variable (1). For example, in the simplest 6 case of a linear model, when an error term, uit , follows AR(1) process, a simple calculation shows that a dependent variable, yit , actually depends on not only yi;t 1 but also yi;t 2 . Similarly, in the context of our model, one should have a good reason to expect a serially correlated error term u1it and yet to include only one lag of y1it . Further, we make additional assumptions on strict exogeneity of the contemporaneous explanatory variables. First, conditional on c1i ; the contemporaneous covariates, z1it ; are assumed to be strictly exogenous. Second, we allow some of the explanatory variables, here represented by the scalar y2it , to be endogenous: y2it = z1it 1 + z2it 2 + c2i + u2it = zit + zi + a2i + u2it = zit + zi + v2it ; (3) where t = 1; :::; T , c2i is an unobserved e¤ect, and u2it is an idiosyncratic serially uncorrelated 2 error with Var(u2it ) = 2. Let zit = (z1it ; z2it ) be a 1 L vector of instrumental variables, with L K; i.e., we assume the vector z2it contains at least one element. Line two of equation (3) reects our use of the Mundlak-Chamberlain device for the unobserved e¤ect, c2i . We replace c2i with its projection onto the time averages of all the exogenous variables: 1 P T c2i = zi + a2i . Then, the new composite error term is v2it = a2i + u2it . Further, zi = T zit , t=1 and = ( 01 ; 0 2 )0: We follow Rivers and Vuong (1988) and refer to (3) as a reduced form equation. Next, consider the relationship between u1it and u2it . We assume that (u1it ; u2it ) has a zero mean, bivariate normal distribution and is independent of zi = (z1i ; z2i ) = (zi1 ; zi2 ; :::; ziT ). Note that under joint normality of (u1it ; u2it ); with Var(u1it ) = 1, we write u1it = u2it + e1it = (v2it a2i ) + e1it ; (4) where = = 2; 2 = Cov(u1it ; u2it ); 2 2 = Var(u2it ); and e1it is a serially uncorrelated random term, which is independent of zi and u2it . The absence of serial correlation of e1it follows 7 from the fact that u1it and u2it are both assumed not to su¤er from serial correlation. If there were no lagged dependent variables on the right hand side of equation (1), there would be little need to worry about possible serial correlation in the error term u2it of equation (3), as long as we assume that u1it is also serially uncorrelated. However, we are interested in a dynamic model, and the assumption of no serial correlation in u2it is crucial for equation (4). Since equation (3) is essentially a reduced form equation for the endogenous variable y2it , the assumption of no serial correlation in u2it (and in e1it , as a result) is appropriate in the context of our model. Equation (4) is essentially an assumption regarding the contemporaneous endogeneity of y2it . It suggests that the contemporaneous v2it is su¢ cient for explaining the relation between u1it and v2it . In other words, once we somehow account for endogeneity of y2it in period t, we might think that y2it becomes "completely"exogenous, and we can estimate the parameters of interest using standard methods valid for exogenous explanatory variables. However, there is the possibility of an additional feedback from the endogenous variable y2 in di¤erent time periods to the main dependent variable of interest, y1 , at time t. This possibility arises because we let the reduced form equation for the endogenous variable, y2it , contain a time-constant unobserved e¤ect, a2i . 2 2 2 From assumption (4), e1it Normal(0; e1 ), where e1 =1 , since Var(u1it ) = 1, and = Corr(u1it ; u2it ); we write y1it = 1[x1it + c1i + (v2it a2i ) + e1it 0] = 1[x1it + v2it + (c1i a2i ) + e1it 0 = 1[x1it + v2it + c0i + e1it 0]; (5) 0 where t = 1; :::; T , x1it = (z1it ; y2it ; y1i;t 1 ), =( 1; 2; )0, and c0i = c1i a2i is a composite unobserved e¤ect. A potential limitation of the assumptions we use to arrive at equation (5) is that they rule out endogenous regressors that are discrete or have severely limited support. In the application we present in section 3 below, y2it will be the share of registered long-term village residents who are employed as migrants outside the village and the support for this variables will be comfortably within the [0,1] interval. Thus, the above assumptions 8 are plausible. Since the unobserved e¤ect c0i is present in equation (5), we should consider the relation between the unobserved e¤ect c0i and the explanatory variables in equation (5). Importantly, the composite unobserved e¤ect c0i is a function of v2it , where t = 1; :::; T; by construction: c0i = c1i a2i = c1i (v2it u2it ); t = 1; :::; T: Thus, in order to obtain consistent estimates of the parameters from equation (5), we must take into account the relation between c0i and v2it in di¤erent time periods. First, we use a "pure"random e¤ects approach, i.e., we assume that 2 c0i jzi ; y1i0 ; v2i Normal( 0 v 2i ; a1 ); t = 1; :::; T; (6) which can be written as c0i = + a1i ; t = 1; :::; T; where a1i jzi ; y1i0 ; v2i Normal(0; 21 ) 0 v 2i a 1 PT and is independent of (zi ; y1i0 ; v2i ), where v 2i = T v2it ; and v2i = (v2i1 ; v2i2 ; :::; v2iT ). While t=1 a limiting assumption in many potential applications, the "pure"random e¤ects assumption (6) may be relevant for certain cases. In particular, when every individual in the initial time period is in the same state (e. g., we are interested in the population of people who smoke), assumption (6) might be appropriate. Further, since we assume that the composite unobserved e¤ect, c0i , is independent of the initial condition, y1i0 , it is natural to think that s v2it ' in di¤erent time periods have equal impacts on c0i . Consequently, we employ v 2i as a s su¢ cient statistic for describing the relation between c0i and v2it ' in di¤erent time periods. Then, under assumptions (1)-(4) and (6), we rewrite equation (5) as y1it = 1[x1it + v2it + 0 v 2i + a1i + e1it 0]: (7) Clearly, the estimates of = p 2 + 2 , = p 2 + 2 , and 0 = p 0 2 + 2 can be obtained e1 a1 e1 a1 e1 a1 using standard random e¤ects probit software by including v 2i in each time period into the ^ 1 P T list of explanatory variables along with x1it and v2it , where v 2i = T v2it : ^ ^ ^ t=1 As we discussed earlier, however, the assumption of independence between the unob- served e¤ect, the initial conditions and the exogenous covariates is often too restrictive. In 9 particular, the "pure"random e¤ects assumption is unrealistic in the context of the applica- tion to poverty persistence that we will examine below. For instance, unobserved dimensions of ability are very likely to be related to poverty status not only in the initial period, but also in future periods. Rather than using a "pure" random e¤ects approach, we build on the dynamic "corre- lated" random e¤ects model introduced by Wooldridge (2005). Instead of the conditional distribution of c0i assumed in (6), we now assume that 2 c0i jzi ; y1i0 ; v2i Normal(v2i 0 + zi 1 + 2 y1i0 ; a1 ); (8) 2 which follows from writing c0i = v2i 0 +zi 1 + 2 y1i0 +a1i ; where a1i jzi ; y1i0 ; v2i Normal(0; a1 ) and independent of (zi ; y1i0 ; v2i ). Since we allow for a nonzero correlation between the com- s posite unobserved e¤ect, c0i , and the initial condition, y1i0 , v2it ' in di¤erent time periods s might have di¤erent e¤ects on c0i . Thus, we let v2it ' from di¤erent time periods have un- s equal "weights" for explaining c0i . Assumption (8) extends Chamberlain' assumption for a static probit model to the dynamic setting. To allow for correlation between c0i and zi and y1i0 ; we assume a conditional normal distribution with linear expectation and constant variance. Assumption (8) is a restrictive assumption since it speci...es a distribution for c0i given zi ; y1i0 ; v2i . However, it is an improvement on the "pure" random e¤ects approach in that it allows for some dependence between the unobserved e¤ect and the vector of all explanatory variables across all time periods. Then, under assumptions (1)-(4) and (8), we rewrite equation (5) as y1it = 1[x1it + v2it + c0i + e1it 0] = 1[x1it + v2it + v2i 0 + zi 1 + 2 y1i0 + a1i + e1it 0]: (9) Equation (9) suggests that we can estimate = p 2 + 2 and = p 2 + 2 along with e1 a1 e1 a1 0 = p 0 2 + 2 , 1 = p 1 2 + 2 and 2 = p 2 2 + 2 using standard random e¤ects probit e1 a1 e1 a1 e1 a1 software by including v2i ; zi ; and y1i0 in each time period into the list of explanatory variables ^ along with x1it and v2it : ^ 10 2.3 Allowing for Serial Correlation of Errors in the First Stage If the ...rst stage error, u2it ; is serially correlated, we must modify our two-step estimating procedure. To be speci...c, assume u2it follows an AR(1) process: u2it = u2i;t 1 + e2it , where 2 e2it is a white noise error with Var(e2it ) = e2 , and Cov(e1it ; e1it 1 ) = Cov(u1it u2it ; u1i;t 1 u2i;t 1 ) 2 = Cov(u1it u2it e2it ; u1i;t 1 u2i;t 1 ) = E(u2 1 ); 2i;t which is more than 0; unless either = 0 or = 0. Clearly, assumption (4) is no longer appropriate and must be modi...ed. 0 De...ne the variance-covariance matrix of v2i as E(v2i v2i ); a T T matrix that we assume to be positive de...nite. Then, 0 1 2 T 2 T 1 1 B C B T 3 T 2 C B 1 C B C B 2 T 4 T 3 C 2B B 1 C 0 2 0 E(v2i v2i ) = a2 jT jT + 2 B C; (10) . . ... . . C B . . C B C B C B T 2 T 3 T 4 1 C @ A T 1 T 2 T 3 1 2 2 e2 where jT is a T 1 vector of ones, and 2 = 1 2 . We can obtain consistent estimates of the parameters in (10), and use them to transform v2it to v2it , which is a ...rst stage error free 2 2 2 of serial correlation. One useful method for estimating , a2 , e2 , and 2 is the minimum distance estimator, described in detail by Chamberlain (1984).6 Once we have ...rst stage errors free of serial correlation, we use the transformation u2it = vit a2i to adjust assumption (4). We can then assume that under joint normality of 6 Cappellari (1999) has developed code that conveniently implements this method in Stata. 11 (u1it ; u2it ), u1it = u2it + e1it = (v2it a2i ) + e1it ; (11) where e1it is a serially uncorrelated random term, which is independent of zi and u2it . Inclu- sion of u2it instead of u2it in equation (11) guarantees that e1it will not be serially correlated. We are then able to write y1it = 1[x1it + c1i + v2it a2i + e1it 0] = 1[x1it + v2it + (c1i a2i ) + e1it 0 = 1[x1it + v2it + c0i + e1it 0]; (12) where t = 1; :::; T , and c0i = c1i a2i is a composite unobserved e¤ect. Based on equation (12), it is straightforward to adjust the two-step estimating procedure discussed in Section 2.2 to account for the presence of the serial correlation in u2it . Under the "correlated"random e¤ects assumption (8), equation (12) can be written as y1it = 1[x1it + v2it + c0i + e1it 0] = 1[x1it + v2it + v2i 0 + zi 1 + 2 y1i0 + a1i + e1it 0]: (13) Then, we can estimate the parameters , , 1, and 2 using standard random e¤ects probit software by including v2i , zi , and y1i0 in each time period into the list of the explanatory ^ variables along with x1it . 2.4 Calculation of Average Partial E¤ects To assess the magnitude of state dependence we must calculate the average partial e¤ect (APE) of the lagged dependent variable on its current value. We follow an approach proposed by Wooldridge (2002) to calculate the APEs after our two-step estimation procedure. The 12 APEs can be calculated by taking either di¤erences or derivatives of E[ (x1t + v2it + v2i 0 + zi 1 + 2 y1i0 )]; (14) where t = 1; :::; T , variables with a subscript i are random and all others are ...xed. In order to obtain estimates of the parameter values in (14), we appeal to a standard uniform weak law of large numbers argument.7 For any given value of x1t (x0 ), a consistent 1 estimator for expression (14) can be obtained by replacing unknown parameters by consistent estimators: X N N 1 (x0 ^ + ^ v2it + v2i ^ 0 + zi ^ 1 + ^ 2 y1i0 ); 1 ^ ^ (15) i=1 where t = 1; :::; T , the v2it are the ...rst-stage pooled OLS residuals from regressing y2it on zit , ^ 1=2 v2i = (^i1 ; vi2 ; :::; viT ), the ^ v ^ ^ subscript denotes multiplication by ^ 2 = ( 2\21 ) e1 + a , and ^ ; ^; ^ 0 ; ^ 1 ; ^ 2 ; and ^ 2 are the conditional MLEs. Note that ^ 2 is the usual error variance estimator from the second-stage random e¤ects probit regression of y1it on x1it ; v2it ; zi ; and ^ y1i0 : One may then employ either a mean value expansion or a bootstrapping approach to obtain asymptotic standard errors. We can compute either changes or derivatives of equation (15) with respect to x1t to obtain the APEs of interest. In common with the adjustment to our estimating procedure, one must also correct the estimated APEs when errors are serially correlated. We obtain the APEs by taking either di¤erences or derivatives of E[ (x1t + v2it + v2i 0 + zi 1 + 2 y1i0 )]; (16) where t = 1; :::; T . For any given value of x1t (x0 ), a consistent estimator of expression (16) 1 is obtained by replacing unknown parameters by consistent estimators: X N N 1 (x0 ^ + ^ v2it + v2i ^ 0 + zi ^ 1 + ^ 2 y1i0 ); 1 ^ ^ (17) i=1 7 See Wooldridge (2002) for details. 13 where t = 1; :::; T , v2it is a ...rst stage residual cleaned of serial correlation, where the ^ 1=2 subscript denotes multiplication by ^ 2 = ( 2\21 ) e1 + a , and ^ ; ^; ^ 1 ; ^ 2 ; and ^ 2 are the conditional MLEs. One may then compute derivatives of equation (17) with respect to x1t to obtain the APEs of interest. 3 Migrant Labor Markets and Poverty Persistence in Rural China Before applying the dynamic binary response model developed above to an analysis of how y migration a¤ects poverty status in rural China, we ...rst brie review the history of rural- urban migration in China and review other evidence on the impacts of migration in migrant sending communities, and introduce the data source that will be used for our analysis. Next, we propose a speci...c implementation of the dynamic binary response model to an analysis of the impact of migration on the probability that a rural household is poor. We then describe our approach to identifying the migrant networks, which a¤ect the cost of ...nding migrant employment for village residents. 3.1 Rural-Urban Migration in China Rapid growth in the volume of rural migrants moving to urban areas for work during the s 1990s signalled a fundamental change in China' labor market. Estimates using the one percent sample from the 1990 and 2000 rounds of the Population Census and the 1995 one percent population survey suggest that the inter-county migrant population grew from just over 20 million in 1990 to 45 million in 1995 and 79 million by 2000 (Liang and Ma, 2004). Surveys conducted by the National Bureau of Statistics (NBS) and the Ministry of Agriculture include more detailed retrospective information on past short-term migration, and suggest even higher levels of labor migration than those reported in the census (Cai, Park and Zhao, 2008). Before labor mobility restrictions were relaxed, households in remote regions of rural China faced low returns to local economic activity, reinforcing geographic poverty traps 14 (Jalan and Ravallion, 2002). A considerable body of descriptive evidence related to the growth of migration in China raises the possibility that migrant opportunity may be an important mechanism for poverty reduction. Studies of the impact of migration on migrant households suggest that migration is associated with higher incomes (Taylor, Rozelle and de Brauw, 2003; Du, Park, and Wang, 2006), facilitates risk-coping and risk-management (Giles, 2006; Giles and Yoo, 2007), and is associated with higher levels of local investment in productive activities (Zhao, 2003). The use of migrant networks and employment referral in urban areas are important s dimensions of China' rural-urban migration experience. Rozelle et al (1999) emphasize that villages with more migrants in 1988 experienced more rapid migration growth by 1995. Zhao (2003) shows that number of early migrants from a village is correlated with the probability that an individual with no prior migration experience will choose to participate in the migrant labor market. Meng (2000) further suggests that variation in the size of migrant ows to di¤erent destinations can be partially explained by the size of the existing migrant population in potential destinations.8 3.2 The RCRE Household Survey The primary data sources used for our analyses are the village and household surveys con- s ducted by the Research Center for Rural Economy at China' Ministry of Agriculture from 1986 through the 2003 survey year. We use data from 90 villages in eight provinces (Anhui, Jilin, Jiangsu, Henan, Hunan, Shanxi, Sichuan and Zhejiang) that were surveyed over the 17-year period, with an average of 6305 households surveyed per year. Depending on village size, between 40 and 120 households were randomly surveyed in each village. The RCRE household survey enumerates detailed household-level information on incomes and expenditures, education, labor supply, asset ownership, land holdings, savings, formal 8 s Referral through one' social network is a common method of job search in both the developing and developed world. Carrington, Detragiache, and Vishnawath (1996) explicitly show that in a model of mi- gration, moving costs can decline with the number of migrants over time, even if wage di¤erentials narrow between source communities and destinations. Survey-based evidence suggests that roughly 50 percent of new jobs in the US are found through referrals facilitated by social networks (Montgomery, 1991). In a study s of Mexican migrants in the US, Munshi (2003) shows that having more migrants from one' own village living in the same city increases the likelihood of employment. 15 and informal access to credit, and remittances.9 In common with the National Bureau of Statistics (NBS) Rural Household Survey, respondent households keep daily diaries of income and expenditure, and a resident administrator living in the county seat visits with households once a month to collect information from the diaries. Our measure of consumption includes nondurable goods expenditure plus an imputed ow of services from household durable goods and housing. In order to convert the stock of ow durables into a of consumption services, we assume that current and past investments in housing are "consumed"over a 20-year period and that investments in durable goods are consumed over a period of 7 years.10 We also annually "inate" the value of the stock of ect durables to re the increase in durable goods'prices over the period. Finally, we deate all income and expenditure data to 1986 prices using the NBS rural consumer price index for each province. There has been some debate over the representativeness of both the RCRE and NBS surveys, and concern over di¤erences between trends in poverty and inequality in the NBS and RCRE surveys. These issues are reviewed extensively in Appendix B of Benjamin et al (2005), but it is worth summarizing some of their ...ndings here. First, when comparing cross sections of the NBS and RCRE surveys with overlapping years from cross sectional surveys not using a diary method, it is apparent that some high and low income households are under-represented.11 Poorer illiterate households are likely to be under-represented because enumerators ...nd it di¢ cult to implement and monitor the diary-based survey, and refusal rates are likely to be high among a- uent households who ...nd the diary reporting method a costly use of their time. Second, much of the di¤erence between levels and trends from the NBS and RCRE surveys can be explained by di¤erences in the valuation of home-produced grain and treatment of taxes and fees. 9 One shortcoming of the survey is the lack of individual-level information. However, we know the numbers of working-age adults and dependents, as well as the gender composition of household members. 10 Our approach to valuing consumption follows the suggestions of Chen and Ravallion (1996) for the NBS Rural Household Survey, and is explained in more detail in Appendix A of Benjamin et al. (2005). 11 The cross-sections used were the rural samples of the 1993, 1997 and 2000 China Health and Nutrition Survey (CHNS) and a survey conducted in 2000 by the Center for Chinese Agricultural Policy (CCAP) with Scott Rozelle (UC Davis) and Loren Brandt (University of Toronto). 16 3.3 Migration and Poverty One of the bene...ts of the accompanying village survey is a question asked each year of village leaders about the number of registered village residents working and living outside the village. In our analysis, we consider all registered residents working outside their home county to be migrants.12 Both the tremendous increase in migration from 1987 onward and heterogeneity across villages are evident in Figure 1. In 1987 an average of 3 percent of working age laborers in RCRE villages were working outside of their home villages, which rose steadily to 23 percent by 2003. Moreover, we observe considerable variability in the share of working age laborers working as migrants. Whereas some villages still had a small share of legal village residents employed as migrants, more than 50 percent of working age adults from other villages were employed outside of home villages by 2003. In other research using this data source, de Brauw and Giles (2008) use linear dynamic panel data methods with continuous regressors to demonstrate a robust relationship between the reduction of obstacles to rural-urban migration and household consumption growth. While one might suspect that the non-poor, who have su¢ ciently high human capital and other dimensions of ability, may bene...t most from reductions in barriers to migration, gen- eral equilibrium e¤ects of out-migration may lead to greater specialization of households in villages and this may have bene...ts for the poor. In particular, de Brauw and Giles demon- strate that households at the lower end of the consumption distribution tend to expand both labor supply to productive activities and the land per capita cultivated by their house- holds than do richer households when out-migration increases. This raises the prospect that migration may be causally related to poverty reduction within rural communities as well. Changes in the village poverty headcount are negatively associated with the change in the number of out-migrants, suggesting that poverty declines with increased out-migration (Fig- ure 2). Nonlinearities in the bivariate relationship are evident in the non-parametric lowess plot of the relationship. Whether obvious non-linearities are related to the simultaneity of shocks and increases in out-migration and poverty for some villages or the simple fact that we 12 From follow up interviews with village leaders, it is apparent that registered residents living outside the county are unlikely to be commuters and generally live and work outside the village for more than six months of the year. 17 have not controlled for other characteristics of villages, establishing a relationship between migration and increased poverty within villages is likely to require an analytical approach that eliminates endogeneity bias due to simultaneity and potential sources of unobserved heterogeneity. In the empirical application of our discrete binary response model below, we examine whether out-migration from villages is associated with reductions in the probability that household consumption falls below the poverty line in rural China. Researchers in the poverty literature have questioned the appropriateness of running poverty regressions of this type because the analyst discards richer information provided by the complete distribution of consumption in favor of a binary variable. Not only is information discarded, but one also introduces distributional assumptions associated with estimating a binary response model.13 While recognizing these concerns, our examination of poverty persistence using a dynamic binary response model is useful for two reasons: ...rst, it helps to highlight the strengths of our approach to estimating dynamic binary response models. When analysts only have access to administrative data on such outcomes as receipt of unemployment bene...ts or welfare participation, then analysis of persistence in participation or receipt of support is important and requires a binary outcome model (e.g., Adren, 2007; Bane and Ellwood, 1986). While our analysis discards some information, we do this to provide evidence on the appropriateness of our approach to estimating dynamic binary response models. Second, use of a dynamic binary response model focusses attention on whether or not a household passes a speci...c point in the distribution of consumption, or alternatively income (e.g., Biewen, 2009; Hansen and Wahlberg, 2009). By doing this, we address a policy-relevant question of how a treatment, in this case increased migration, a¤ects the likelihood that poor households will remain poor and the likelihood that non-poor households will fall into poverty. We are agnostic as to whether poverty is reduced through direct participation in the migrant labor market, or through indirect general equilibrium e¤ects that raise the return to labor in agricultural and other local activities. 13 See Ravallion (1996) for a useful exposition of these issues. 18 3.4 Estimating the Impact of Migrant Labor Markets on Poverty Persistence We will estimate the dynamic binary outcome model for the likelihood that a household i from village j falls below the poverty line at time t: i i 0 povit = 1[ 1 povit 1 + 2 (Mjt povit 1 )+ 3 Mjt +Xit 1+ 2 lpcit +Dt +ui +vj tt +"it ]; (18) where povit is a binary indicator for whether the household is poor in year t. Current poverty status will be a¤ected by poverty status in the prior period, povit 1 , the size of the migrant network from village j through which the household i may be able to obtain a job referral, i Mjt , a vector of household demographic and human capital characteristics, Xit , household land per capita, lpcit , and year dummies to control for macroeconomic shocks, Dt . We will be concerned about the possibility that an unobserved household e¤ect, ui , may be s systematically related to the size of the household' migrant network, to other covariates, and to household poverty status, and thus introduce endogeneity concerns. Since village ...xed e¤ects are at a higher level of aggregation than household ...xed e¤ects, when controlling for household ...xed e¤ects, we also e¤ectively control for ...xed e¤ects associated with the village in which households are located. Further, we will be concerned that there may be village- speci...c trends, vj tt , related to underlying endowments and initial conditions that also have an impact on household poverty status. The error term, "it , may be serially correlated, and we are concerned that shocks in the error term may also be systematically related to i the size of the migrant network, Mjt , and to the possibility of falling into poverty, and thus contribute an additional source of endogeneity. From the model speci...ed in (18), we are particularly interested in identifying the coe¢ - i i i cients on povit 1 , Mjt and Mjt povit 1 . The coe¢ cients on povit 1 and Mjt povit 1 allow us to gauge the importance of persistence in the probability that a household is poor, and the impact of access to migrant employment opportunities through the migrant network on i poverty persistence. 3, the coe¢ cient on Mjt ; allows us to determine the impact of the migrant network on the probability that a household will fall into poverty. The speci...cation shown in (18) may have additional sources of endogeneity if we be- 19 lieve that household demographic and human capital variables in Xit , or land per capita, lpcit , vary with unobserved shocks in period t or t 1. We address the possible concern over endogenous household composition by using household demographic and human capital variables for the legal long-term registered residents of households. While household size may vary somewhat with shocks as individuals move in and out of the household for the purpose of ...nding temporary work elsewhere, such variations do not show up in registered household membership. Long-term membership only changes when households split with such events as marriage or legal change of residence to another location. Land managed by the household may also vary with shocks. Land markets in rural China do not function well: land cannot be bought and sold, and only in the last few years have farmers gained the right to explicitly transfer land. Instead land is allocated by village leaders, and reallocated or adjusted among households within village small groups if a household is judged to have too little land to support itself. Nonetheless, there is some possibility that reallocation may be related to shocks that occur in period t or t 1 that may also be systematically related to poverty status and the migrant network size.14 We thus use the period t 2 value of land per capita and estimate: i i 0 povit = 1[ 1 povit 1 + 2 (Mjt povit 1 )+ 3 Mjt +Xit 1+ 2 lpcit 2 +Dt +ui +vj tt +"it ] (19) i One remaining issue is that we do not perfectly observe the network Mjt through which household i may use for job referrals. Instead, we observe the share of registered long-term village residents who are employed as migrants outside the village in a particular year, or Mjt . The true migrant network may include former legal registered residents who have now changed their long-term residence status, implying that the actual potential network is larger. Alternatively, the household may not be familiar with all of the village out-migrants, and thus the actual network through which a household may seek referrals may be smaller. Thus, we will estimate: 0 povit = 1[ 1 povit 1 + 2 (Mjt povit 1 )+ 3 Mjt +Xit 1+ 2 lpcit 2 +Dt +ui +vj tt +"it ] (20) 14 Wooldridge (2002) shows that when the assumption of strict exogeneity of the regressors fails in the context of the standard FE estimation the inconsistency of the instrument is of order T 1 . 20 In our identi...cation strategy below, we will instrument the endogenous share of village out-migrants, Mjt , with village level instruments, identifying the size of the village migrant labor force, interacted with period t 2 lagged land per capita, lpcit 2 , in order to allow for di¤erences in the e¤ective value of the village migrant network for households with di¤erent amounts of land. 3.5 Identi...ying the Migrant Network To identify the village migrant network, we make use of two policy changes that, working together, a¤ect the strength of migrant networks outside home counties but are plausibly unrelated to consumption growth. First, a new national ID card (shenfen zheng) was intro- duced in 1984. While urban residents received IDs in 1984, residents of most rural counties did not receive them immediately. In 1988, a reform of the residential registration system made it easier for migrants to gain legal temporary residence in cities, but a national ID card was necessary to obtain a temporary residence permit (Mallee, 1995). While some rural counties made national IDs available to rural residents as early as 1984, others distrib- uted them in 1988, and still others did not issue IDs until several years later. The RCRE follow-up survey asked local o¢ cials when IDs had actually been issued to rural residents of the county. In our sample, 41 of the 90 counties issued cards in 1988, but cards were issued as early as 1984 in three counties and as late as 1997 in one county. It is important to note that IDs were not necessary for migration, and large numbers of migrants live in cities without legal temporary residence cards. However, migrants with temporary residence cards have a more secure position in the destination community, hold better jobs, and would thus plausibly make up part of a longer-term migrant network in migrant destinations. Thus, ID distribution had two e¤ects after the 1988 residential registration (hukou) reform. First, the costs of migrating to a city should fall after IDs became available. Second, if the quality of the migrant network improves with the years since IDs are available, then the costs of ...nding migrant employment should continue to fall over time. As a result, the size of the migrant network should be a function of both whether or not cards have been issued and the time since cards have been issued in the village. Given that the size of the potential network has an upper bound, we expect the years-since-IDs-issued to 21 have a non-linear relationship with the size of the migrant labor force and we expect growth in the migrant network to decline after initially increasing with distribution of IDs. In Figure 2, we show a lowess plot of the relationship between years since IDs were distributed and the number of migrants from the village from year t 1 to t. Note the sharp increase in migrants from the time that IDs are distributed and then a slowing of the increase over time (which would imply an even slower growth rate). This pattern suggests non-linearity in the relationship between ID distribution and new participants in the village migrant labor force. We thus specify our instrument as a dummy variable indicating that IDs had been issued interacted with the years since they had been issued, and then experimented with quadratic, cubic and quartic functions of years-since-IDs-issued. We settle on the quartic function for our instruments because, as we show below, it ...ts the pattern of expanding migrant networks better than the quadratic or the cubic functions. Since ID distribution was the responsibility of county level o¢ ces of the Ministry of Civil A¤airs, which are distinctly separate from agencies involved in setting policies a¤ecting land, credit, taxation and poverty alleviation (the Ministry of Agriculture and Ministry of Finance handle most decisions that a¤ect these policies at the local level), it is plausible that ID distribution is not be systematically related to unobservable policy decisions with more direct relationship to household consumption. Ideally, a policy would exist that was randomly implemented, a¤ecting the ability to migrate from some counties but not others. As the di¤erential timing of the distribution of ID cards was not random, we must be concerned that counties with speci...c characteristics or that followed speci...c policies were singled out to receive ID cards earlier than other counties, or that features of counties receiving IDs earlier are systematically correlated with other policies a¤ecting consumption growth. These counties, one might argue, were "allowed"to build up migrant networks faster than others. In two earlier papers, de Brauw and Giles (2008a and 2008b) address several possible concerns with use of the years-since-IDs quartic as instruments for the size of the village migrant labor force. They ...rst show that timing of ID distribution appears to be related to remoteness of the village, but not systematically related to village policies that may a¤ect consumption growth, with village administrative capacity, or with the demand for IDs within the village. They thus argue in favor of including a village ...xed e¤ect to control for features 22 of the local county which may have a¤ected timing of ID distribution, and then identify the size of the village migrant labor force o¤ of non-linearities in the time that it requires for migrant networks to build up. In this paper, we identify the village migrant network by further interacting the quartic in years-since IDs with land per capita held by households in period t 2: Why might we expect that interacting with lpcit 2 might achieve this? We believe that the land per capita managed by households will likely pick up a dimension of proximity of di¤erent households within the village. Within villages in rural China, households are separated into smaller units of roughly 20 households known as village small groups (cun xiaozu), which were referred to as production teams during the Maoist period. These households are located in clusters and will have closer relationships with one another than with households of other small groups. Moreover, property rights to land in rural China typically reside with the small group, not with the village. Thus, when land reallocations take place they typically take place within but not across small groups. Small groups make more frequent small adjustments to household land as the land per capita available starts to become unequal with di¤erential changes in household structure across households within the small group, but there is much less exibility in making adjustments across small groups. As a result, much of the variability of land per capita within villages occurs across small groups.15 Interacting a village level instrument for the migrant network with land per capita will allow the importance of Mjt to vary across households, and much of the di¤erence across households occurs because of unobserved di¤erences in the small groups in which they reside and from which migrants refer to as home. As period t 2 lagged land per capita appears as an exogenous regressor and is also in- teracted with the quartic in years since IDs were distributed in the ...rst stage, our estimation approach must also eliminate bias introduced through likely serial correlation of the error term in both the ...rst stage regression. To this end, it is important to note that our two-step 15 We do not know village small group membership in the RCRE survey prior to 2003 when a new survey instrument was introduced. If we regress land per capita on village dummy variables in 2003, we obtain an R-Squared of 0.503, while if we run a regression of land per capita on small group dummy variables, we obtain an R-Squared of 0.616. A Lagrange Multiplier test for whether the small group e¤ects add anything signi...cant over the village e¤ects, which is e¤ectively a test of whether small group coe¢ cients are constant within villages, yields an LM statistic of 310.67, which has a p-value of 0.0000. 23 estimation procedure developed in Section 2 above allows for serial correlation of ...rst-stage errors. 4 Results Before estimating equation (20), we establish that our instruments are signi...cantly related to the migrant share of the village labor force. We estimate the relationship as a quadratic, cubic, and quartic function of the years since IDs were issued each interacted with period t 2 (3) (6) land per capita. These results are reported in columns (1)­ and columns (4)­ of Table 2 for odd years from 1989-2001.16 We ...nd a strong relationship between our instruments and the size of the migrant network for each speci...cation. For the remainder of our estimation we favor the quartic function interacted with t 2 land per capita for two reasons: First, the e¤ects of ID card distribution on the migration network can be determined more exibly when we use the quartic speci...cation. Secondly, the partial R2 increases slightly from the quadratic to the quartic for the both samples we consider. After controlling for the household characteristics, the instruments have jointly signi...cant e¤ects on the share of migrants with an F-statistic of 44.62 for the 1989 to 2001 sample. We next proceed to estimate model (20), but ...rst treat migration as exogenous and show results for both linear probability and probit implementations in Table 3. In all four speci...cations, we observe a positive association between migrant share of the village and probability that a household is below the poverty line, and this reects the response of households to short-term shocks and the simultaneity between short-term migration and consumption decisions. Given the descriptive evidence shown in Figure 2, it is unsurprising to ...nd that short-term increases in the poverty headcount will be correlated with year to year changes in the share of migrants from the village. The positive relationship suggests that migration is truly endogenous and suggests the need for an estimation strategy that allows for identi...cation in a dynamic binary response model where there are endogenous regressors. When we introduce the years-since IDs instrument, which is shown elsewhere to 16 Since the RCRE survey was not conducted in 1992 and 1994, we estimate the dynamic model with two-year spacing from 1989 to 2001. 24 be unrelated to short-term uctuations in the local economy (de Brauw and Giles, 2008a and 2008b), we can identify the longer term relationship between growth of the migrant labor market and the probability that a household will fall below the poverty line. In Table 4, we report the control function (CF) estimation results based on the"pure" random e¤ects and "correlated"random e¤ects approaches. For the purposes of comparison, we also estimate model (20) using a naive linear probability model (LPM). As one might expect, the coe¢ cients on lagged poverty status are signi...cant and positive, indicating a strong persistence in poverty status, both in the pure random e¤ects approach shown in columns (1) and (3) and in the correlated random e¤ects models shown in columns (2) and (4). The decline in the value of the coe¢ cient on lagged poverty status between pure and correlated random e¤ects models, (1) and (2) for linear probability models and (3) and (4) for the dynamic probit models, suggests that unobserved heterogeneity associated with poverty status introduces considerable upward bias in estimates of poverty persistence. Estimates of poverty persistence using either a dynamic linear probability model or the dynamic probit would lead the researcher to overstate the importance of chronic, persistent poverty. The signi...cant coe¢ cient on the initial value of poverty status in the correlated random e¤ects models suggests a substantial correlation between unobserved e¤ects and the initial condition. Once we instrument for migrant share of the registered village population, and thus control for simultaneity bias introduced through shocks to the local economy, we ...nd that the migrant labor market is negatively associated with the probability of falling into poverty. Moreover, the coe¢ cient on the interaction of village migrant share and lagged poverty status suggests that the magnitude of the e¤ect of migration on poverty reduction is greater among households who were poor in the previous period, and thus migration reduces poverty persistence even more than it reduces the likelihood that the non-poor will fall into poverty. This result is consistent with de Brauw and Giles (2008a), who ...nd that in a linear panel data framework, households with lower levels of prior consumption tend to experience more rapid consumption growth with increased out-migration from rural villages.17 17 We employ the Hausman test for endogeneity to formally assess the need to control for endogeneity of migration share. The t-statistic for the signi...cance of the ...rst-stage residuals in the pure RE probit model is 3.08 with p-value of 0.002, which suggests there is enough evidence to reject the null hypothesis that the share of village out-migrants is exogenous. For the correlated RE probit, the t-statistic for the ...rst-stage residuals is 2.93 with p-value of 0.003. Thus, for the correlated RE model, we also reject the null hypothesis 25 In order to examine the e¤ect of migration on poverty persistence, we calculate the average partial e¤ects (APEs) using the coe¢ cient on share of migrants and the interaction term and show the estimates in Table 5. The APEs calculated using the correlated random e¤ects dynamic probit approach (models 3 and 4) are generally smaller than those calculated using the linear probability model (models 1 and 2). The naive LPM approach, which is often preferred as a means of avoiding dynamic nonlinear models, will lead us to conclude that migraton has a more pronounced impact on poverty reduction than one ...nds using the correlated random e¤ects probit model. Again the consequences of ignoring unobserved heterogeneity in the dynamic binary response model are of considerable interest. Failure to control for unobserved heterogeneity in the pure random e¤ects model would lead us to overstate the e¤ects of previous period poverty status on current poverty and understate the e¤ect of the migrant labor market in contributing to reductions in the probability that a household would fall below the poverty line. For those households living above the poverty line, the correlated random e¤ects CF estimate of the APE (model 4) suggests that a one percent increase in the share of village residents working as migrants would reduce the probability of falling into poverty by about 3.2 percentage points. For those already below the poverty line, the correlated random e¤ects CF estimate of the APE shows that a one percent increase in the village migrant share will reduce the probability of remaining in poverty by 3.5 percentage points. 5 Conclusions In this paper, we have developed a dynamic binary response panel data model that allows for an endogenous regressor. The control function approach which we implement is of particular value for settings in which one wants to estimate the e¤ects of a treatment which is also endogenous. Our empirical example demonstrates that alleviating an omitted variables bias can lead to estimated e¤ects that are larger in absolute value when we allow for the correlation between unobserved heterogeneity, initial conditions and exogenous variables. We apply the model to examine the impact of rural-urban migration on the likelihood that the share of migrants is exogenous. 26 that households in rural China fall below the poverty line. Our application demonstrates that migration is important both for reducing the likelihood that households remain in poverty or fall into poverty if they were not poor in the previous period. From this speci...c application, we show that failing to adequately control for unobserved heterogeneity in non- linear dynamic panel data models will introduce substantial bias to parameter estimates. In particular, failure to control for unobserved heterogeneity would lead us to overstate the persistence of poverty and to understate the role that migration plays in poverty reduction. Apart from analyzing the e¤ects of migration on a binary outcome, our application suggests that there may be many other settings in which the correlated random e¤ects control function approach may improve an existing analytical approaches. In any analysis aiming to examine how a new program a¤ects persistence of a state, one may be concerned that unobserved heterogeneity will lead to upward bias in estimates of the e¤ect of the initial state. Moreover, as program participation, or take-up, may be endogenous, the analyst will need to worry about this source of bias as well. The empirical strategy developed in Section 2 o¤ers a parametric solution to the more general problem of identifying the impact of an endogenous treatment in a dynamic binary response model. 27 6 References Adren, T. 2007. "The Persistence of Welfare Partipation," IZA Discussion Paper 3100, Oc- tober 2007. Bane, M.J. and D.T. Ellwood. 1986. "Slipping into and out of Poverty: The Dynamics of Spells,"The Journal of Human Resources, 21(1): 1-23. Benjamin, D., L. Brandt and J. Giles. 2005. "The Evolution of Income Inequality in Rural China,"Economic Development and Cultural Change 53(4): 769-824. Biewen, M. 2009. "Measuring State Dependence in Individual Poverty Status: Are There Feedback E¤ects to Employment Decisions and Household Composition?" Journal of Applied Econometrics 24(7): 1095-1116. Cai, F., A. Park and Y. Zhao. 2008. "The Chinese Labor Market," chapter prepared s for China' Great Economic Transition, Loren Brand and Thomas Rawski (eds), Cambridge University Press. Cappellari, L. 1999. "Minimum Distance Estimation of Covariance Structures," 5th UK Meeting of Stata Users. Carrington, W., E. Detragiache and T. Vishnawath. 1996. "Migration with Endogenous Moving Costs," American Economic Review 86(4): 909-930. Chamberlain, G. 1980. "Analysis of Covariance with Qualitative Data," Review of Eco- nomic Studies 47, 225-238. in Chamberlain, G. 1984. "Panel Data," Handbook of Econometrics, Volume 2, Z. Griliches and M. D. Intriligator (eds.). Amsterdam: North Holland, 1247-1318. Chan, Kam Wing and Li Zhang. 1999. "The Hukou System and Rural-Urban Migration in China: Processes and Changes,"China Quarterly 160: 818-55. Chay, K.Y. and D.R. Hyslop. 2000. "Identi...cation and Estimation of Dynamic Binary Response Models: Empirical Evidence Using Alternative Approaches," mimeo. Chen, S. and M. Ravallion. 1996. "Data in transition: Assessing Rural Living Standards in Southern China,"China Economic Review, 7(1): 23-56. Chiappori, P., and B. Salanie. 2000. "Testing for Asymmetric Information in Insurance Markets," Journal of Political Economy 108, 56-78. de Brauw, A. and J. Giles. 2008a. "Migrant Opportunity and the Educational At- tainment of Youth in Rural China," Policy Research Working Paper 4585, The World Bank (February 2008). de Brauw, A. and J. Giles. 2008b. "Migrant Labor Markets and the Welfare of Rural Households in the Developing World: Evidence from China,"Policy Research Working Paper 4526, The World Bank (April 2008). Devicienti, D. and A. Poggi. 2007. "Poverty and social exclusion: two sides of the same coin or dynamically interrelated processes?,"LABORatorio R. Revelli Working Papers Series 62, LABORatorio R. Revelli, Centre for Employment Studies. Du, Y., A. Park and S. Wang. 2005. "Migration Helping China' Poor?" s Journal of Comparative Economics, 33(4): 688-709. Giles, J. 2006. "Life More Risky in the Open? Household Risk-Coping and the Opening s of China' Labor Markets," Journal of Development Economics 81(1): 25-60. Giles, J. and K. Yoo. 2007. "Precautionary Behavior, Migrant Networks and Household Consumption Decisions: An Empirical Analysis Using Household Panel Data from Rural 28 China," The Review of Economics and Statistics, 89(3): 534-551. Hahn, J. and G. Kuersteiner. 2002. "Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed E¤ects When Both n and T Are Large," Econometrica 70, 1639-1657. Hansen, J. and R. Wahlberg. 2009. "Poverty Persistence in Sweden," Review of the Economics of the Household, 7(2), 105-132. Heckman, J.J. 1981. "The Incidental Parameters Problem and the Problem of Initial Conditions in Estimating a Discrete Time - Discrete Data Stochastic Process," in: C.F. Manski and D. McFadden, (Eds.), Structural Analysis of Discrete Data with Econometric Applications. MIT Press, Cambridge, MA, 179-195. Heckman, J.J. and R.J. Willis. 1977. "A Beta-logistic Model for the Analysis of Se- quential Labor Force Participation by Married Women," Journal of Political Economy, 85, 27-58. Honoré, B.E. and E. Kyriazidou. 2000. "Panel Data Discrete Choice Models with Lagged Dependent Variables," Econometrica 68, 839-874. Hyslop, Dean R. 1999. "State Dependence, Serial Correlation and Heterogeneity in Intertemporal Labor Force Participation of Married Women,"Econometrica 67(6): 1255-94. Jalan, J. and M. Ravallion. 1998. "Transient Poverty in Post-Reform Rural China," Journal of Comparative Economics, 26(2): 338-357. Jalan, J. and M. Ravallion. 2002. "Geographic Poverty Traps? A Micro Model of Consumption Growth in Rural China," Journal of Applied Econometrics 17(4): 329-46. s Liang, Z. and Z. Ma. 2004. "China' Floating Population: New Evidence from the 2000 Census," Population and Development Review 30(3): 467-488. s Mallee, H. 1995. "China' Household Registration System Under Reform," Development and Change 26(1):1-29. Meng, X. 2000. "Regional wage gap, information ow, and rural-urban migration" in Yaohui Zhao and Loraine West (eds) Rural Labor Flows in China, Berkeley: University of California Press, 251-277. Montgomery, J.D. 1991. "Social Networks and Labor-Market Outcomes: Toward an Economic Analysis," American Economic Review 81(5): 1407-18. Mundlak, Y. 1978. "On the Pooling of Time Series and Cross Section Data," Econometrica 46, 69-85. Munshi, K. 2003. "Networks in the Modern Economy: Mexican Migrants in the U.S. Labor Market," Quarterly Journal of Economics 118(2): 549-99. Papke, L.E. and J.M. Wooldridge. 2008. "Panel Data Methods for Fractional Response Variables with an Application to Test Pass Rates," Journal of Econometrics 145: 121-33 Ravallion, M. 1996. "Issues in Measuring and Modelling Poverty," The Economic Journal 106: 1328-1343. s Ravallion, M. and S. Chen. 2007. "China' (Uneven) Progress Against Poverty,"Journal of Development Economics, 82(1): 1-42. Rivers, D. and Q. H. Vuong. 1988. "Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models," Journal of Econometrics 39, 347-366. s Rozelle, S., L. Guo, M. Shen, A. Hughart and J. Giles. 1999. "Leaving China' Farms: Survey Results of New Paths and Remaining Hurdles to Rural Migration," The China Quar- terly 158: 367-393. 29 Smith, R. and R. Blundell. 1986. "An Exogeneity Test for a Simultaneous Equation Tobit Model with an Application to Labor Supply," Econometrica 54, 679-685. Taylor, J.E., S. Rozelle, and A. de Brauw. 2003. "Migration and Incomes in Source Com- munities: A New Economics of Migration Perspective from China," Economic Development and Cultural Change, 52(1), 75-101. Wooldridge, J.M. 2000. "A Framework for Estimating Dynamic, Unobserved E¤ects Panel Data Models with Possible Feedback to Future Explanatory Variables," Economics Letters 68, 245-250. Wooldridge, J.M. 2002. Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA. Wooldridge, J.M. 2005. "Simple Solutions to the Initial Conditions Problem in Dy- namic, Nonlinear Panel Data Models with Unobserved Heterogeneity," Journal of Applied Econometrics 20, 39-54. Zhao, Y. 2003. "The Role of Migrant Networks in Labor Migration: The Case of China," Contemporary Economic Policy 21(4): 500-511. 30 Figure 1 Share of Village Labor Force Employed as Migrants By Year .6 .5 .4 Share .3 .2 .1 0 1987 1989 1991 1993 1995 1997 1999 2001 2003 Year Source: RCRE Village Surveys 1987 to 2003. 31 Figure 2 Change in Poverty Headcount Versus Change in Number of Migrants .1 .05 0 -.05 -.1 -200 -100 0 100 200 Change in Number of Migrants, Village Lowess Fit Linear Fit Source: RCRE Village and Household Surveys, 1987 to 2003. 32 Figure 3 Change in Out-Migrants in Village Labor Force Versus Years-Since-IDs were Distributed 15 Change in Number of Out-Migrants in Village Workforce 10 5 0 -5 0 5 10 15 Years Since ID Cards Issued Source: 2004 RCRE Supplemental Survey on Land and Village Governance. 33 Table 1. Household and Village Characteristics Odd Years from 1989 to 2001 Obs. Full Sample Obs. Balanced Sample Household Poverty Status mean 42453 0.20 26159 0.20 st. dev. 0.40 0.40 Household Income per Capita mean 42447 721.4 26159 685.8 st. dev. 649.3 537.5 Household Consumption per Capita mean 42453 521.9 26159 499.1 st. dev. 376.1 332.6 Number of Household Members mean 42491 4.1 26159 4.2 st. dev. 1.5 1.4 Number of Prime Age Household Laborers mean 42491 2.5 26159 2.6 st. dev. 1.1 1.0 Household Land per Capita mean 42453 1.4 26159 1.4 st. dev. 1.2 1.1 Household Average Years of Education mean 41658 6.2 26156 6.3 st. dev. 2.6 2.5 Household Share of Females mean 41659 0.45 26156 0.45 st. dev. 0.21 0.20 Share of Migrants from the Village mean 42491 0.06 26159 0.06 st. dev. 0.06 0.06 Year of ID Distribution in a Village mean 41814 1988.0 26159 1988.0 st. dev. 2.1 2.1 Years Since ID was Issued in a Village mean 41814 6.7 26159 7.0 st. dev. 4.5 4.5 Notes: Consumption and income per capita are reported in 1986 RMB Yuan. 34 Table 2. What Factors Determine the Size of the Village Migrant Network? First-Stage Regressions Dependent Variable: Village Migrant Share Odd Years from 1989 to 2001 Model (1) (2) (3) Household Population -0.0003 -0.0003 -0.0003 (0.0003) (0.0003) (0.0003) Number of Working Age Laborers in 0.0002 0.0002 0.0003 the Household (0.0004) (0.0004) (0.0004) Land Per Capita t-2 -0.0040*** -0.0036*** -0.0017*** (0.0004) (0.0005) (0.0006) Average Years of Education -0.0006*** -0.0006*** -0.0007*** (0.0001) (0.0001) (0.0001) Female Share of the Household -0.0011 -0.0011 -0.0011 (0.0015) (0.0015) (0.0015) (Years-Since-IDs Available) * (Land 0.0008*** 0.0006*** -0.0018*** Per Capita t-2) (0.0001) (0.0002) (0.0004) (Years-Since-IDs Available)2 * (Land -0.0000*** -0.0000 0.0007*** Per Capita t-2) (0.0000) (0.0000) (0.00009) (Years-Since-IDs Available)3 * (Land 0.0008*** -0.0000 -0.0001*** Per Capita t-2) (0.0000) (0.0000) (Years-Since-IDs Available)4 * (Land 0.0006*** 0.0000*** Per Capita t-2) (0.0002) (0.0000) Observations 22422 22422 22422 R-squared 0.79 0.79 0.79 F-Statistic on IVs with Averages 62.51 58.11 44.62 F-Statistic on IVs w/o Averages 46.04 31.60 32.64 Partial R2, IVs with Averages 0.005 0.005 0.007 Partial R2, IVs w/o Averages 0.001 0.001 0.003 Notes: In parenthesis we show fully robust standard errors [*** p<0.01, ** p<0.05, * p<0.1]. All regressions include time averages of the explanatory variables, year dummies, and interactions between village dummies and time trend. 35 Table 3. Estimating Determinants of Poverty Status with Migrant Share Treated as Exogenous Dependent Variable: Poverty Status Linear Probability Model Probit Pure RE Correlated RE Pure RE Correlated RE Model (1) (2) (3) (4) Lag Poverty Status 0.390*** 0.339*** 1.045*** 0.794*** (0.012) (0.013) (0.041) (0.048) Village Migrant Share Interacted with -0.974*** -0.767*** -2.356*** -1.654*** and Lag Poverty Status (0.134) (0.132) (0.465) (0.484) Village Migrant Share 0.285*** 0.221*** 1.476*** 1.233** (0.076) (0.074) (0.519) (0.543) Number of Household Members 0.046*** 0.057*** 0.257*** 0.371*** (0.002) (0.004) (0.014) (0.021) Number of Prime Age Household Laborers -0.023*** -0.024*** -0.127** -0.147*** (0.003) (0.004) (0.017) (0.023) Second Lag of Land per Capita -0.002 0.000 -0.034* -0.027 (0.003) (0.004) (0.018) (0.031) Average Years of Education -0.007*** 0.000 -0.042*** -0.007 (0.001) (0.001) (0.006) (0.009) Share of Females -0.053*** -0.006 -0.293*** -0.028 (0.012) (0.015) (0.068) (0.093) Dependent Variable in 1989 0.090*** 0.508*** (0.009) (0.047) Observations 22422 22422 22422 22422 Number of households 3737 3737 3737 3737 R-Squared 0.35 0.36 Notes: In parenthesis we show fully robust standard errors [*** p<0.01, ** p<0.05, * p<0.1]. All regressions include the explanatory variables in each year, year dummies, and interactions between village dummies and time trend. 36 Table 4. Estimating Determinants of Poverty Status with Endogenous Share of Migrants Second-Stage Regressions Dependent Variable: Poverty Status Linear Probability Model Control Function (1) (2) (3) (4) Model Pure RE Correlated RE Pure RE Correlated RE Lag Poverty Status 0.391*** 0.335*** 1.046*** 0.792*** (0.013) (0.012) (0.054) (0.052) Village Migrant Share Interacted with -0.994*** -0.784*** -2.443*** -1.779*** and Lag Poverty Status (0.128) (0.125) (0.512) (0.526) Village Migrant Share -2.628*** -3.955*** -12.201** -18.896** (0.833) (1.039) (5.660) (8.191) Number of Household Members 0.047*** 0.057*** 0.261*** 0.368*** (0.003) (0.004) (0.020) (0.028) Number of Prime Age Household Laborers -0.023*** -0.023*** -0.125*** -0.143*** (0.003) (0.005) (0.022) (0.031) Second Lag of Land per Capita -0.006* -0.001 -0.050* -0.036 (0.003) (0.005) (0.026) (0.043) Average Years of Education -0.009*** -0.002 -0.048*** -0.018 (0.001) (0.002) (0.008) (0.013) Share of Females -0.058*** -0.012 -0.312*** -0.061 (0.013) (0.019) (0.098) (0.130) Dependent Variable in 1989 0.086*** 0.497*** (0.009) (0.053) Observations 22422 22422 22422 22422 Number of households 3737 3737 3737 3737 R-Squared 0.29 0.32 Replications for Bootstrap Errors 100 100 100 100 Notes: In parenthesis we show bootstrapped standard errors [*** p<0.01, ** p<0.05, * p<0.1]. All regressions include the explanatory variables in each year, year dummies, and interactions between village dummies and time trend. Regressions (1) and (3) include first stage residuals free of serial-correlation and their time averages. Regressions (2) and (4) include first stage residuals free of serial-correlation and residuals from the first stage in each year. For regressions (1) through (4) the instrumental variables are quartic polynomial of years-since-ID-was-issued with each term interacted with second lag of land per capita. 37 38 Table 5. Average Partial Effects of Determinants of Poverty Status (Endogenous Share of Migrants) LPM Control Function Pure RE Correlated RE Pure RE Correlated RE Model (1) (2) (3) (4) Lag Poverty Status 0.324*** 0.282*** 0.181*** 0.125*** (0.010) (0.009) (0.007) (0.006) Share of Migrants when Lag Poverty = 0 -2.628*** -3.955*** -2.092** -3.156** (0.833) (1.039) (1.056) (1.413) Share of Migrants when Lag Poverty =1 -3.621*** -4.739*** -2.511** -3.453** (0.834) (1.028) (1.044) (1.394) Share of Migrants (averaged) -2.641*** -3.965*** -2.260** -3.273** (0.833) (1.038) (1.050) (1.405) Number of Household Members 0.047*** 0.057*** 0.045*** 0.061*** (0.003) (0.004) (0.004) (0.005) Number of Prime Age Household Laborers -0.023*** -0.023*** -0.022*** -0.024*** (0.003) (0.005) (0.004) (0.006) Second Lag of Land per Capita -0.006* -0.001 -0.009* -0.006 (0.003) (0.005) (0.005) (0.007) Average Years of Education -0.009*** -0.002 -0.008*** -0.003 (0.001) (0.002) (0.002) (0.002) Share of Females -0.058*** -0.012 -0.053*** -0.010 (0.013) (0.019) (0.018) (0.022) Poverty Status in 1989 0.086*** 0.092*** (0.009) (0.009) Replications 100 100 100 100 Notes: In parenthesis we show bootstrapped standard [*** p<0.01, ** p<0.05, * p<0.1]. 39