WPS6480


Policy Research Working Paper                      6480




       PPML Estimation of Dynamic Discrete
       Choice Models with Aggregate Shocks
                                     Erhan Artuc




The World Bank
Development Research Group
Trade and International Integration Team
June 2013
Policy Research Working Paper 6480


  Abstract
  This paper introduces a computationally efficient method                          on agents’ expectations, thus it can accommodate
  for estimating structural parameters of dynamic discrete                          macroeconomic and policy shocks. The regression
  choice models with large choice sets. The method is                               requires count data as opposed to choice probabilities;
  based on Poisson pseudo maximum likelihood (PPML)                                 therefore it can handle sparse decision transition matrices
  regression, which is widely used in the international trade                       caused by small sample sizes. As an example application,
  and migration literature to estimate the gravity equation.                        the paper estimates sectoral worker mobility in the
  Unlike most of the existing methods in the literature,                            United States.
  it does not require strong parametric assumptions




  This paper is a product of the Trade and Integration Team, Development Research Group. It is part of a larger effort by
  the World Bank to provide open access to its research and make a contribution to development policy discussions around
  the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The author may be
  contacted at eartuc@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
 PPML Estimation of Dynamic Discrete Choice Models
              with Aggregate Shocks
                                             Erhan Artuc∗
                                              May, 2013


                                                Abstract
           This paper introduces a computationally eﬃcient method for estimating structural
       parameters of dynamic discrete choice models with large choice sets. The method is
       based on Poisson pseudo maximum likelihood (PPML) regression, which is widely used
       in the international trade and migration literature to estimate the gravity equation.
       Unlike most of the existing methods in the literature, it does not require strong para-
       metric assumptions on agents’ expectations, thus it can accommodate macroeconomic
       and policy shocks. The regression requires count data as opposed to choice probabili-
       ties; therefore it can handle sparse decision transition matrices caused by small sample
       sizes. As an example application, the paper estimates sectoral worker mobility in the
       United States.

            Keywords: Poisson Pseudo Maximum Likelihood, Labor Mobility, Migration, Dis-
       crete Choice Models, Gravity Equation.
          JEL Codes: C25, F22, J61, J62, F16.




   ∗
    The views in this paper are the author’s and not those of the World Bank Group or any other institution.
Artuc: World Bank, Trade and International Integration Unit, Development Economics Research Group
(Economic Policy), 1818 H Street, NW Washington DC, 20433 USA; eartuc@worldbank.org. I thank Jim
Anderson, Chad Bown, Irene Brambilla, David Kaplan, Hiau Looi Kee, John Kennan, John McLaren, Caglar
Ozden, Daniel Lederman, Guido Porto, Ray Robertson, Diego Rojas and Yoto Yotov for their comments.
All errors and omissions are mine.


                                                     1
       Poisson pseudo maximum likelihood regression (henceforth PPML) within the context of
the gravity equation, has become very popular in international trade and migration literature.
It was introduced by Gourieroux, Monfort and Trognon (1984). Then, more recently, Santos
Silva and Tenreyro (2006) showed that it is a simple but powerful method for estimating
bilateral resistance parameters of the gravity equation. After these two seminal papers, it has
become one of the standard tools in the international economics literature, widely used to
explain trade, and more recently migration ﬂows1 . This paper extends this popular method
further and shows how it can be used to estimate structural dynamic discrete choice models
by adding a linear reduced form regression step. This novel method can handle models
with large choice sets, heterogeneity, and aggregate shocks. Our approach is an intuitive
combination of well known and widely used methods, therefore it imposes little set-up cost
to the econometrician and can utilize standard statistical software.
       The method has two steps: First, we run PPML regression using discrete choice data,
similar to the gravity equation estimation, to estimate expected values that appear in the
Bellman equations. Second, we construct a linear regression equation by plugging the esti-
mated expected values into the Bellman equation that characterizes the dynamic decision
making process of agents. In the second step, we estimate distributional and utility ﬂow
parameters of the discrete choice model. Since we estimate expected values rather than cal-
culating value functions by iteration or backward solution, expectations of agents are fully
accounted for even when they are not quanti�?able by the econometrician. Both regressions
are based on orthogonality conditions, rather than maximum likelihood, therefore the dis-
tributions of payoﬀ streams or aggregate shocks are not required for the estimation. The
estimated system does not need to be at the steady state. In fact, the steady state may not
even exist in the presence of macroeconomic and policy shocks. The orthogonality conditions
we use have analytical derivatives, therefore the method we use is much faster than maximum
   1
    Two recent examples from the migration literature that use gravity equation are Beine, Docquier and
Ozden (2011) and Grogger and Hanson (2011). In the trade literature, Olivero and Yotov (2011) generalize
gravity equation in a dynamic framework but estimate trade ﬂows at the steady state without considering a
discrete choice speci�?cation.


                                                   2
likelihood based methods and it allows us to estimate a large number of parameters.
       Accounting for aggregate shocks is an important challenge for the estimation of dynamic
discrete choice models. In the literature, the most common methods are based on maximum
likelihood estimation (henceforth ML) using backwards solution or conditional choice proba-
bilities. ML estimation requires strong distributional assumptions on aggregate shocks, thus
on workers’ expectations about payoﬀs. In the literature, the most common assumption is
the absence of aggregate shocks; because it is very diﬃcult to rigorously model transmission
of aggregate shocks into the payoﬀ streams and workers’ expectations2 . In contrast to ML
estimation, our method does not require distributional assumptions on aggregate shocks or
workers’ expectations, except rationality. Therefore, using this novel method, international
and internal migration, sectoral labor mobility, occupational mobility, and other dynamic
discrete choice models can be estimated eﬃciently in the presence of macroeconomic and
policy shocks.
       The two most recent papers that address similar discrete choice problems are Anderson
(2011) and Artuc, Chaudhuri and McLaren (2010). Anderson (2011) shows how the gravity
equation can be considered as an equilibrium condition for discrete choice problems, and it
can be estimated with PPML regression. After the estimation step, he solves the structural
push and pull parameters from the PPML regression coeﬃcients along with multilateral re-
sistance parameters. Technically, our �?rst step regression is similar to a gravity equation,
as in Anderson (2011). Diﬀerent from him, we interpret PPML regression �?xed eﬀects as
expected values in the Bellman equation that gives the optimality condition for the under-
lying discrete choice model. We estimate parameters of the Bellman equation rather than
the gravitational push and pull parameters.
   2
    For example, after the recent housing crisis of 2007, the demand for unskilled labor in the construction
sector decreased signi�?cantly. This negative shock potentially aﬀected expected payoﬀs of workers in the
construction sector and welfare of low-skill immigrants in general. Consequently, the construction sector
shrank signi�?cantly and some states such as Alabama and Arizona passed strict anti-immigration laws. Any
migration or sectoral mobility model that is estimated via ML has to contain distributional assumptions
regarding such policy and macroeconomic shocks that are diﬃcult to quantify.




                                                     3
    Artuc, Chaudhuri and McLaren (2010) derive an equilibrium condition for workers’ sec-
toral choice, that is in essence an Euler equation. Similar to ours, their method also allows
aggregate shocks. They impute workers’ values from observed gross ﬂows. However, their
expected value imputation method does not allow sparsity in the transition matrices of ﬂows
from one decision to another. This limits the number of choices, and also creates problems
in incorporating heterogeneity. Furthermore, it increases the standard errors, preventing
estimation of detailed versions of their model. Our method is free from these limitations.
    The idea of imputing values from conditional choice probabilities (henceforth CCP) was
�?rst introduced by Hotz and Miller (1993), a milestone for discrete choice models due to their
inversion theorem. They consider stationary problems and estimate choice probabilities us-
ing a non-parametric method. Aguirregabiria and Mira (2002) introduced a new algorithm,
“Nested Pseudo Maximum Likelihood�? (henceforth NPM) that combines a CCP with an
iterative step to improve eﬃciency of CCP in small samples3 . More recently, in their semi-
nal paper, Arcidiacono and Miller (2011) combine CCP with the expectation-maximization
(henceforth EM) algorithm to allow certain non-stationary processes and unobserved het-
erogeneity. Similar to Arcidiacono and Miller (2011), our estimation procedure can use an
EM loop to account for unobserved heterogeneity. In this paper we argue that PPML is a
convenient and eﬃcient alternative to CCP for problems with large choice sets or a small
sample. However, it is not a substitute for EM or NPM, and it can be used in combination
with EM and NPM instead of non-parametric CCP estimation.
    Our procedure has two major diﬀerences with the other non-iterative dynamic program-
ming methods in the literature: First, we use Poisson regression rather than the Hotz and
Miller (1993) inversion equation to estimate expected values. Second, our method does not
rely on maximum likelihood estimation but orthogonality conditions. Therefore, we do not
need distributional assumptions for aggregate shocks. Thanks to the linearity of the estimat-
ing equations, it is well suited for problems with a large number of choices and structural
   3
     It is essentially an intuitive combination of Hotz and Miller (1993) and Rust (1987). See Aguirregabiria
and Mira (2010) for an extensive survey of the literature.


                                                     4
parameters. Our method is computationally eﬃcient and can utilize standard statistical
software, widely used to estimate gravity equations in the context of migration and trade
ﬂows4 . Similar to the other non-iterative solution methods, the state space has to be small
compared to the backwards solution methods.
        In the next section, we present a representative discrete choice model that can be esti-
mated with our method. In the following sections, we summarize our estimation strategy,
and provide an example application, and present simulation results.



1         Model
Consider an economy with in�?nitely-lived L agents and N sectors, where each agent is in
a discrete state s ∈ S . Sectors can be industries, occupations, cities, countries, or any
combination of such choices, while the state could be the type of agent such as education
level, gender, age or other individual characteristics. It is also possible to consider economic
policies as a part of state space, such as trade policy, migration policy, or education policy.
We can also incorporate unobserved types, which is omitted from this section for the sake
of clarity.
        A type s agent chooses a sector i ∈ {1, 2, 3, .., N } in the end of period t − 1, and receives
instantaneous utility ui,s
                       t at time t de�?ned as



                                            ui,s  i,s  i,s
                                             t = wt + η ,                                         (1)

       i,s
where wt   is the observed sector speci�?c random payoﬀ common to all type s agents working
in sector i with �?nite moments, and η i,s is the unobserved sector speci�?c iid utility shock
also common to all type s agents. Hence, the state of each agent can be summarized with
the pair (i, s) where s is the type and i is the current sector.
    4
    PPML estimation is based on an orthogonality condition, which has an analytical derivative. This
computational convenience makes the estimation process much faster than alternatives. For example, it
converges within minutes even with hundreds of choices and many structural parameters.


                                                    5
                           i,s
      We assume that only wt   is observed by the econometrician, η i,s is known by agents,
but not by the econometrician. All agents are risk neutral, have rational expectations and
a common discount factor β < 1. The expected future payoﬀ streams can change over time,
      i,s        i,s
Et+1 wt +n = Et wt+n for n ≥ 1. The present discounted choice-speci�?c utility of agent l is

equal to
                                      i,s
                           Uti,s,l = wt   + η i,s + max βEt Vtj,s   ij,s
                                                              +1 − Ct    − εj,l
                                                                            t   ,                    (2)
                                                     j


where Ctij,s + εj,l
                t is the cost of choosing sector j , for type s agent l who is currently in sector

i. The “moving cost�? has two components, a deterministic part, Ctij,s , common to all type
s agents, and a random part, εj,l
                              t , speci�?c to agent l . All type s agents are identical except

for their individual moving cost shock εj,l                 ii,s
                                        t . We assume that Ct    = 0, which means the �?xed
component of moving cost is zero for stayers.
                                                                     i,s
      The timing of events is as follows: 1. Agents learn values of wt   once they receive it.
2. Then, in the end of time t, they learn the random component of “moving cost,�? εj,l
                                                                                  t , for

every j = 1, .., N , and choose the next period sector (based on expected stream of future
payoﬀs and moving costs). 3. Agents pay the moving cost, Ctij,s + εj,l
                                                                   t , where j is the chosen

sector. 4. Period t + 1 starts, and the cycle repeats itself.
      After taking expectation of (2) with respect to agent speci�?c shocks, the choice speci�?c
value function can be expressed as


                              i,s
                   Vti,s,l = wt   + η i,s + Et max β            π (s, s ) Vtj,s ,l
                                                                            +1 − Ct
                                                                                   ij,s
                                                                                        − εj,l
                                                                                           t     ,   (3)
                                                 j
                                                         s ∈S


      where π (s, s ) is the probability of switching from type s to type s . We assume that
π (s, s ) is exogenous5 . Henceforth, we drop the agent superscript l for notational convenience.
      We can rearrange the value function as


                             Vti,s = wt
                                      i,s             ˜ti,s
                                          + η i,s + β V              j    ij,s
                                                        +1 + Et max{εt + εt },
                                                                       j

  5
      It is possible to endogenize this transition matrix, but is out of scope of this paper.


                                                         6
where
                                     εij,s
                                      t    = [β Vtj,s      i,s     ij,s
                                                  +1 − β Vt+1 ] − Ct ,


and
                                         Vti,s
                                           +1 =          π (s, s ) Et Vti,s
                                                                        +1 .                   (4)
                                                  s ∈S

Then, the choice speci�?c values can be written as

                                             i,s
                                    Vti,s = wt   + η i,s + β Vti,s   i,s
                                                               +1 + Ωt .                       (5)


   The option value Ωi,s
                     t is equal to


                             N      ∞
                  Ωi,s
                   t     =               (εj + εij,s   j
                                                t )f (ε )           F (εj + εij,s
                                                                             t    − εik,s j
                                                                                     t )dε ,
                             j =1   −∞                        k=j


   where F (ε) is the cumulative distribution function and f (ε) is the probability density
function of the moving cost shocks. The option value, Ωi
                                                       t , is the extra utility generated by

being able to change sectors. As moving cost Ctij increases, the option value decreases, and
it diminishes to zero when the moving cost goes to in�?nity. The option value function is
crucial for the implementation of estimation process since it can be solved analytically under
certain distributional assumptions.
   Assume that εi
                t is distributed iid extreme value type I with location parameter −νγ , scale

parameter ν , and cdf F (ε) = exp (− exp (−ε/ν − γ )), where E (ε) = 0, V ar (ε) = π 2 ν 2 /6
and γ is the Euler’s constant.
   Assume that mij,s
                t    is equal to the ratio of type s agents who switch from sector i to
sector j . This can be interpreted as gross ﬂows from i to j , or the probability of choosing
                                                                                    ij,s
j conditional on (i, s). The total number of agents moving from i to j is equal to yt    =
Li,s ij,s      i,s                                                       ij,s
 t mt , where Lt is the number of type s agents who are in i at time t. yt    can be
interpreted as number of people migrating from one city to another, changing occupation,



                                                          7
changing industry, etc.
        Thanks to the extreme value distribution and McFadden (1973), the gross ﬂow mij,s
                                                                                     t    is
equal to
                                            exp       β Vtj,s      i,s   ij,s
                                                          +1 − β Vt+1 − Ct
                                                                                    1
                                                                                    ν
                              mij,s
                               t      =     N
                                                                                            ,       (6)
                                                exp       β Vtk,s      i,s   ik,s
                                                              +1 − β Vt+1 − Ct
                                                                                        1
                                                                                        ν
                                          k=1

and we can show that the option value6 is equal to

                                                N
                                                                                            1
                           Ωi,s
                            t     = ν log           exp     β Vtk,s      i,s   ik,s
                                                                +1 − β Vt+1 − Ct                .   (7)
                                            k=1
                                                                                            ν

        Note that we could use an expression similar to (7) to construct a CCP representation of
the Bellman equation, because Ωi,s         ii     ij
                               t = −ν log mt and mt is a conditional choice probability,

see Appendix C for the details. We do not use the method CCP or the Hotz-Miller inversion
equation in this paper. Actually, unlike Hotz Miller (1993) or Artuc Chaudhuri McLaren
(2010), we never take logarithm of probabilities in the estimation algorithm. Diﬀerent from
the CCP representation, we estimate expected values directly from count data which makes
our method more convenient when estimation of probabilities or evaluation of the likelihood
function is diﬃcult. This is usually the case when the number of choices and structural
parameters is large.
        In the next section, we describe the estimation procedure of the generic model we present
here. (5), (6) and (7) play key roles in the estimation procedure.



2         Estimation
Our method has two stages: First, the Poisson regression stage, where we estimate expected
values associated with each choice for every time period. Second, the Bellman equation
    6
        See Appendix B for derivation of the equations.



                                                             8
stage, where we plug estimated expected values into a Bellman equation to construct a
linear regression and retrieve structural parameters of the model.

   Step 1: PPML Regression
                                                         ˜ti,s and bilateral resistance parame-
   In this step, our goal is to estimate expected values V
ters Ctij,s . We construct a simple expression for ﬂows between options, similar to the gravity
equation, which is essentially a Poisson pseudo maximum likelihood regression available in
many diﬀerent types of statistical software.
   The Stage 1 regression equation is



                            ij,s
                           yt    = exp Λj,s  i,s  ij,s
                                        t + Γt + Ψt
                                                          1,ij,s
                                                       + ξt      ,                             (8)

          ij,s                                                                  ij,s
   where yt    is total number of agents with state (i, s) who choose j (hence yt    = Li,s ij,s
                                                                                        t mt ),

Λj,s
 t   is the destination �?xed eﬀect, Γi,s                              ij,s
                                     t is the origin �?xed eﬀect, and Ψt    is the bilateral
resistance term. The equation above can be interpreted as a Poisson pseudo-maximum
likelihood regression7 .
   Derivation of the Stage 1 regression equation:
   If we multiply (6) with Li
                            t , we get




                   i,s         β ˜ j,s β ˜ i,s           1     1 ij,s
                  yt   = exp     Vt+1 − Vt+1 + log Li,s
                                                    t   − Ωi,s
                                                           t − Ct     ,
                               ν       ν                 ν     ν

   then we can arrange the terms as i-speci�?c terms, j -speci�?c terms and bilateral terms.
(Note that, we need to drop either destination or �?xed eﬀect for one choice. Otherwise the
regression matrix becomes singular. Assume that we drop the destination �?xed eﬀect for the
choice i = 1).
  7
    See Gourieroux, Monfort and Trognon (1984) and Cameron and Trivedi (1998) for properties of the
PPML regression.



                                                9
   Then the j -speci�?c term or the destination �?xed eﬀect Λj,s
                                                           t is equal to




                                        β ˜ j,s β ˜ 1,s
                                 Λj,s
                                  t =     V − V ,
                                        ν t+1 ν t+1

   the i-speci�?c term or the origin �?xed eﬀect Γi,s
                                                t is equal to




                            β ˜ i,s 1 i,s           β ˜ 1,s
                      Γi,s                    i,s
                       t = − Vt+1 − Ωt + log(Lt ) +   V ,
                            ν       ν               ν t+1

   and the bilateral resistance term Ψij,s
                                      t    is equal to



                                             1
                                    Ψij,s
                                     t    = − Ctij,s .
                                             ν

   Note that the option value term Ωi,s
                                    t can be expressed as


                             1 i,s
                              Ω = −Λi,s  i,s      i,s
                                    t − Γt + log(Lt ),                                    (9)
                             ν t

    Stage 2: Bellman Equation
   In Stage 1, we have estimated the expected values, Λj,s
                                                       t , and moving cost parameters,

Ψij,s
 t . The next step is to estimate other parameters, including 1/ν . In Stage 2, the goal is to

construct the Bellman equation using the estimated parameters from Stage 1 and estimate
the remaining parameters.
   The Stage 2 regression equation is



                                                     β i,s
                              φi,s
                               t   = ζts + η i,s +             i,s
                                                       wt+1 + ξt   ,                     (10)
                                                     ν



                                              10
      where ζts is the time dummy speci�?c to type s, η i,s is the sector dummy speci�?c to s,
 i,s                                                i,s
wt +1 is the expected wage constructed using (12), ξt is the regression residual and �?nally

φi,s
 t is the dependent variable constructed from Step 1 estimates using equation




                         φi,s
                          t   = Λi,s
                                 t +             π (s, s ) β Γi,s        i,s
                                                              t+1 − log(Lt+1 ) .                   (11)
                                          s ∈S


      The expected wages in (10) are equal to



                                     i,s                             i,s
                                    wt +1 =               π (s, s ) wt +1 .                        (12)
                                                   s ∈S


      It is possible to use Generalized Method of Moments or Instrumental Variables method
for the regression 8 .
      Derivation of the Stage 2 regression equation:
      After multiplying (5) with β/ν , aggregating it over possible states and moving all terms
to the left hand side, we get

                      β i,s    β                 i,s
                 Et     Vt+1 −        π (s, s ) wt +1 + η
                                                          i,s
                                                              + β Vti,s   i,s
                                                                    +2 + Ωt+1      = 0,
                      ν        ν s ∈S

      Then



              β 1,s β i,s                                                   β 2 1,s
 Et Λi,s
     t −        Vt+1 − wt+1 − η i,s −      π (s, s ) −β Γi,s
                                                         t+1 + log(Li,s
                                                                    t+1 ) −    Vt+2          = 0, (13)
              ν       ν               s ∈S
                                                                            ν

             i,s
      where wt +1 is de�?ned in (12) and

  8
    For a detailed analysis of identi�?cation problems in discrete dynamic models, see Magnac and Thesmar
(2002).



                                                     11
                                                  β
                                        η i,s =              π (s, s ) η i,s .
                                                  ν   s ∈S


      We de�?ne



                                         β 1,s                          β 2 1,s
                                ζts =     V +       π (s, s )              V      ,
                                         ν t+1 s ∈S                     ν t+2

      then we can re-arrange (13) and write it as

                                                                 β i,s
                                   Et φi,s  s
                                       t − ζt − η
                                                  i,s
                                                      −           w    = 0.
                                                                 ν t+1
      Alternative speci�?cations:
We focus on models that can be estimated using repeated cross-section data with retrospec-
tive questions, such as household labor force surveys which are available for many countries.
An example from the US is the March supplement of Current Population Survey9 . However
if longitudinal data are available, it is possible to consider unobserved heterogeneity in the
model. Arcidiacono and Miller (2011) show how an EM loop can be incorporated in CCP to
estimate unobserved heterogeneity. Their intuition can also be applied to PPML regression.
Appendix D illustrates how it is possible to use an EM loop within PPML regression when
panel data are available.
      Another alternative modeling approach is to use wage shocks rather than moving cost
shocks in agents’ utility function. In Appendix A, we provide an equation that can be used
instead of (10) in case of such wage shocks.
      In the next section we present an example to illustrate a practical application of the
method.
  9
      Other countries with such data, that we are aware of so far, are Indonesia, Mexico and Turkey.


                                                       12
3     Example Application: Sectoral Mobility in the US
In this section we present an application of the estimation method. First, we estimate
a disaggregated variant of Artuc, Chaudhuri and McLaren (2010) using the exact same
data, which is the Current Population Survey from years 1975 to 2001 (henceforth CPS).
Recently, Kaplan, Lederman and Robertson (2013), Artuc and McLaren (2012) and Artuc,
Bet, Brambilla and Porto (2013) used the estimation method we introduce herein.
    Model
    To elaborate on the generic model we presented in the previous section, consider that
sectors are industries in which workers choose to work in each time period. For each choice,
                         i
workers receive a payoﬀ wt and an idiosyncratic utility η i common to all workers in sector i.
Assume that η 1 = 0 for normalization. For simplicity, we consider one type of worker, hence
drop the state superscript s. We allow the deterministic moving cost to change over time
such that Ctij = ct if i = j and Ctij = 0 if i = j .
    We use two regressions to estimate structural parameters of the model. First, the Poisson
regression equation is
                                ij             j              ij
                               yt  = exp Γi
                                          t + Λt + Ψt 1i=j + et ,                          (14)

    where the regression coeﬃcient Ψt = −ct /ν and the indicator function 1i=j is equal to
one when i = j and zero otherwise, and eij
                                        t is the residual. In many cases, the discount rate

can not be identi�?ed, therefore we assume that it is equal to β = 0.97, and known by the
econometrician.
    Second, the regression equation based on the Bellman equation is

                                                       β i
                                    φi         i
                                     t = ζt + η +
                                                             i
                                                        w + ξt ,                           (15)
                                                       ν t+1

       i
where ξt is the residual, φi    i      i          i                                 i
                           t = Λt + β Γt+1 − log(Lt+1 ) is the dependent variable, η is a sector

dummy, and ζt is a time dummy. We set η 1 = 0 for the �?rst sector.
    In the “Alternative speci�?cation�? we use diﬀerent set of sector dummies, we allow the

                                                 13
sector speci�?c �?xed utility to have linear time trends. This speci�?cation potentially cap-
tures changes in employment opportunities in diﬀerent sectors over time. Modifying the
assumption on �?xed utility only aﬀects the second stage regression.
       The second stage regression for the “Alternative Speci�?cation�? is

                                                          β i
                                  φi         i    i
                                   t = ζt + η1 + η2 t +
                                                                i
                                                           w + ξt ,                                 (16)
                                                          ν t+1

       i                       i                       1          1
where η1 is the intercenp and η2 is the trend. We set η1 = 0 and η2 = 0 for the �?rst sector.
       Data
       Artuc, Chaudhuri and McLaren (2010) use CPS data for males between 25 to 64 years
old, from the year 1976 to 2001; we use the same data and sample section procedure. CPS is
a repeated cross-section, and its March supplement provides retrospective industry questions
regarding workers’ industry in the previous year, along with their current industry. These
retrospective questions allow us to construct number of workers moving from industry i to j ,
            ij
denoted as yt  . In addition to ﬂow data, we use average wage data for each industry. Artuc,
Chaudhuri and McLaren (2010) aggregate industries to 6 major sectors10 . Diﬀerent from
them, we aggregate industries to 16 sectors. The sectors are: 1. Agriculture, 2. Mining, 3.
Construction, 4. Non-durable manufacturing, 5. Durable manufacturing, 6. Transportation,
7. Communications, 8. Utilities, 9. Wholesale trade, 10. Retail trade, 11. Finance, 12.
Business, 13. Personal services, 14. Entertainment, 15. Professional, and 16. Public.
       In addition to increasing the number of choices, we consider sector speci�?c iid utility
shocks, η i , and let the deterministic part of moving cost, ct , change over time. These two
changes improve their theoretical model signi�?cantly because some sectors may be more
preferable by workers for non-pecuniary reasons and the moving costs may change over the
twenty six year sample. These possibilities are now addressed in the model. We use PPML
regression in the �?rst step and IV regression in the second step. Everything else is exactly
  10
    They were not able to disaggregate sectors further because their method did not allowed zero cells in
the transition matrix. The were able to estimate at most seven structural parameters.



                                                   14
the same as their basic model and benchmark regression, including the choice of instruments.
    Results
    We estimate the distributional parameter 1/ν , 15 parameters for η i , and 26 parameters for
ct , thus 42 structural parameters total. In the �?rst stage, we estimate ct /ν , and destination
and origin �?xed eﬀects using equation (14). Then, we construct the second stage regression
equation (16) using the destination and origin �?xed eﬀects from the �?rst stage regression. In
                                                                    i
the second stage, we estimate the remaining structural parameters, ηt /ν and 1/ν . We use a
one year lag for the second stage IV regression.
    Table 1 shows the estimation results for the basic speci�?cation. We present robust stan-
dard errors in the �?rst stage regression. In the �?rst step, all coeﬃcients are signi�?cant at 1
percent level. We �?nd that Ct /ν changes between 4.49 and 4.88, with an average of 4.67.
    In the second step, 1/ν is estimated as 0.96 and is signi�?cant at 1 percent level, and 9
out of 15 unobserved utility coeﬃcients are signi�?cant at 1 percent level and 5 coeﬃcients
are signi�?cant at 5 percent level.
    Table 2 shows the estimation results for the alternative speci�?cation. The �?rst stage
regression for the alternative speci�?cation is identical to the basic speci�?cation, thus Ct /ν
estimates are exactly the same. We �?nd that 1/ν is estimated as 3.67 which is much larger
than the basic speci�?cation estimate. (Note that larger 1/nu means smaller ν and C ).
    In the following section, we simulate data for steady state and transition under pol-
icy shocks and re-estimate the model with simulated data to illustrate performance of our
estimation method relative to other methods in the literature.



4     Monte Carlo Simulations
Running counter-factual policy simulations is usually the main motivation for structural
estimation. Reduced form equations are subject to Lucas critique and can not be used




                                               15
in policy simulations11 . Although we use estimators which are traditionally reduced form,
each coeﬃcient in the regression equations corresponds to a structural parameter. Using the
structural parameters, it is possible to simulate the model presented in the previous sections
under diﬀerent policy scenarios. However, in this paper, we are not interested in particular
eﬀects of policies per se: Our goal is to show the performance of this new estimation method
using simulated data. We expose the system to policy shocks and illustrate robustness of the
estimation method under non-stationary conditions. In a sense, we create aggregate shocks
arti�?cially.
       For an illustration, we consider an open economy model with trade shocks, exogenous
prices and endogenous wages. To simulate trade shocks, we need to de�?ne equilibrium real
wages as functions of labor supply and prices. Assume that sectors are perfectly competitive
with simple Cobb-Douglas production functions. We assume that workers are paid their real
marginal products. Then, the following real wage equation closes the model


                                        i
                                       wt = (pi         ˜i i ai −1 ,
                                              t /Pt )ai A (Lt )                                         (17)

       where pi
              t is the exogenous price of sector i output, Pt is the consumer price index Pt =

Πi (pi  bi                           ˜i
     t ) with basket shares bi , and A is a constant that is calibrated from the data.

       We calculate Cobb-Douglas labor shares and consumer basket shares from Bureau of
                                           ˜i to match average wages in given sectors.
Economic Analysis data. Then, we calibrate A
The calibration exercise is similar to Artuc Chaudhuri McLaren (2010). The production
function and consumer price index parameters are reported in Table 2 along with wages and
labor allocations. We normalize all prices to one at steady state, pi
                                                                    t = 1 for i = 1, .., 16, and

�?x the deterministic part of moving cost to be constant12 over time ct = 4.5. We assume
  11
     As an example, consider a policy experiment of reducing the moving costs, Ct , by 50 per cent. Assume
that we would like to know the eﬀect of this change on workers’ mobility decisions. We cannot simply change
the resistance coeﬃcient Ψij
                           t and keep other coeﬃcients as they were. Because after a change in the moving
cost, the values would also change, thus the Γi       i
                                               t and Λt parameters would change as well. So, it is impossible
                                  i        i
to use reduced form parameters Γt and Λt for simulations. Because of Lucas critique, one has to know the
underlying structural parameters.
  12
     We assume that moving costs are constant over time for cosmetic reasons, so that the results are easy

                                                     16
ν = 1 and assign arbitrary values to η i .
   For the simulations, we use a multiple shooting algorithm similar to Lipton et al (1982),
but one can use other shooting methods instead.
   We consider four simulation exercises:
   In Simulation I, we simulate the model around steady state13 . Then, we estimate the
model using 26 years of simulated data.
   In Simulation II, we drop the manufacturing prices 20 percent as a surprise one time
shock, which implies a tariﬀ reduction in the protected manufacturing industries (sectors 4
and 5). After this one time shock, we let the system reach new steady state over time. Then,
we estimate the model using simulated data during this transitory period.
   In Simulation III, we increase the number of years from 26 to 100 to show the asymptotic
properties of the estimation method.
   In Simulation IV, we decrease the number of choices from 16 to 8 to show the impact of
having a smaller number of observations because of smaller number of choices.
   Then, we repeat all four simulation exercises 300 times. All simulations are conducted
with L = 20, 000 agents, which is approximately equal to the sample size of March-CPS that
is used for the estimation in the previous section.
   Table 4 presents the Monte Carlo simulation results. The column labeled as “Sim I�? shows
that the estimates are reasonably close to the true values and expected to be unbiased.
   The column “Sim II�? shows that using data contaminated with a non-stationary trade
policy shock does not aﬀect the performance of the method. Note that we did not specify
the nature of the aggregate shock in the estimation procedure. The method introduced
herein does not require strong distributional assumptions about the aggregate shocks. CCP
method and other maximum likelihood based methods require the aggregate shocks to be
to read.
  13
     Note that wages show some ﬂuctuations over time, for that reason we added an iid normal shock to
equilibrium wages with standard deviation equal to 0.05, approximately equal to the standard error of
average wages in the data. We also added an unexpected surprise shock to the wages with a standard
deviation equal to 0.05.



                                                 17
fully speci�?ed and to be stationary.
       The column “Sim III�? presents the results for the longer time series with 100 years. It
hints that the method has plausible asymptotic properties, i.e. standard errors decrease
as we increase the length of time series, and the estimates converge to the true parameter
values.
       Finally, column “Sim IV�? shows that as the number of choices decrease, the standard
errors increase in both stages.
       In the following tables, 5 and 6, we compare results of PPML and CCP based estimation
strategies. We use “Simulation I�? data with 20, 000 agents, then we repeat the exercise with
2, 000 and 4, 000 agents to demonstrate small sample properties of the estimators.
       Table 5 shows estimated values and standard errors in parentheses, Λi , using PPML and
CCP methods with 2, 000 agents, with 4, 000 agents, with 20, 000 agents and with in�?nitely
many agents14 . We use equal weights in the non-parametric stage of CCP estimation. The
last two columns show that both PPML and CCP methods converge to the true values,
therefore they are asymptotically equivalent. With �?nite number of agents, PPML estimates
are closer to the true values, especially when the sample size is small. However, we do
not argue that PPML is more eﬃcient than CCP, because the non-parametric stage of
CCP estimation could be conducted with diﬀerent weights and we cannot try all possible
weighting vectors. When the number of choices are large, PPML is more convenient than
non-parametric CCP estimation since it does not rely on taking logarithms of probabilities
that can be very close to zero.
       Table 6 presents the estimation results (and standard errors in parentheses) for C/ν and
1/ν using diﬀerent estimation methods. When we use a maximum likelihood based method,
we assume that the econometrician knows the distribution of aggregate shocks to wages in
order to use maximum likelihood estimation. Also we assume that the econometrican knows
the true values of η ’s, because otherwise repeating ML estimation procedure 300 times takes
   The econometrician can observe true switching probabilities mij
  14
                                                                  t when there are in�?nitely many agents.
                                                                   i
But there may still be uncertainty due to the aggregate shocks to wt .


                                                   18
unreasonably long time.
       PPML1 is the method described in the previous section, which is the two stage procedure
with PPML estimation in the �?rst stage and linear regression in the second stage. PPML2
method uses PPML to impute expected values within a maximum likelihood estimation algo-
rithm. It is similar to CCP, but rather than imputing expected values non-parametrically we
use PPML. ACM method is the estimator used in Artuc, Chaudhuri and McLaren (2010)15 .
CCP method is the conditional choice probability method that uses maximum likelihood
and non-parametric estimation of expected values with Hotz-Miller inversion equation. In
the CCP and PPML2 methods, we assume that the exonometrician knows the distribution
of aggregate shocks, since it is needed for the maximum likelihood estimation. Table 6 shows
that all four methods perform well with large sample. However, CCP method did not con-
verge when the sample size was small (L = 2, 000). PPML based methods seem to perform
better when sample size is small, also PPML1 has an important advantage over ML-based
methods since it does not require distributional assumptions about the aggregate shocks.
       In the next Monte-Carlo exercise, we shut down the aggregate shocks to wages. Without
aggregate shocks, it is straightforward use iterative methods pioneered by Rust (1987). Iter-
ative estimation methods are out of the scope of this paper but the “Nested Psudo-Maximum
Likelihood�? method introduced by Aguirregabiria and Mira (2002) is relevant and impor-
tant. They showed that it is possible to use an iterative step to improve the performance of
CCP with small samples. Without aggregate shocks, we are able to compare performance
of the NPM with PPML methods.
       Table 6 presents results of Monte-Carlo simulations with PPML1, PPML2, ACM CCP
and NPM (standard errors are in parentheses). With all �?ve methods, estimates converge
to the true parameter values as the sample size increases. NPM indeed improves the CCP
results for small samples, however it is very diﬃcult to implement it when there are aggregate
  15
     Unlike Artuc Chaudhuri and McLaren (2010), we have zero cells in the transition matrix where mij t =0
for some i, j, t because we have 16 choices rather than 6. Diﬀerent from them, we drop the observation when
mij
  t = 0, which makes the ACM estimator biased for this particular exercise.




                                                    19
shocks. CCP cannot be used as starting point for the NPM algorithm when the sample
size is very small (when we simulated 2000 agents the CCP method did not converge to
�?nite numbers). Naturally, it is possible to use PPML estimates as a starting point for the
NPM algorithm. The row labeled as “PPML-NPM�? shows estimates of a variation of NPM
method that uses PPML as a starting point rather than CCP. The extra NPM-loop after
the PPML regression reduces the standard errors, but it is diﬃcult to implement when there
are aggregate shocks.



5    Conclusion
We present a novel and computationally eﬃcient method for estimating dynamic discrete
choice models with heterogeneity and time-varying resistance (i.e moving cost) parameters.
The method performs well with large number of choices, sparse decision transition matri-
ces (caused by small sample size) and aggregate shocks. All expectations of agents are
fully accounted for in the �?rst step regression, which allows us to be agnostic about agents’
expectations and distribution of aggregate shocks. Therefore the method can be used for
estimation out of steady state. Potential applications are migration, sectoral and occupa-
tional labor mobility models with large number of discrete choices, macroeconomic shocks
and limited heterogeneity.




                                             20
   Appendix A: An Alternative Model with Wage Shocks
   The moving cost shock εi
                          t , in essence, is a utility shock. However, it is common in the

labor economics literature to consider wage shocks, rather than utility shocks, as the main
driving force behind labor mobility. Consider an alternative speci�?cation where εi
                                                                                 t−1 is a

wage shock that is revealed at the end of time t − 1 but aﬀects observed wage at time t,
                                                                       i    i
rather than a utility shock. Assume that the econometrician observes w
                                                                     ¯t  = wt + εi
                                                                                 t−1 , but
     i
not wt . Then (10) can not be used as the basis for the regression, since observed wages are
                    i                                                       i
                  ¯t
self-selected and w   is a biased measure of true underlying sectoral wage wt .
   With wage shocks, there are not any changes in the PPML regression step, but the second
step has to be modi�?ed. The expected wage conditional on being in sector i is equal to

                                                                   n
                         Et    i
                              wt   +   εi
                                        t−1 |i   =       i
                                                        wt   −ν          mji     ji
                                                                          t log mt ,
                                                                  j =1



where mji
       t is the ratio of agents who switch from j to i conditional on being in sector i in

period t,
                                                         Lj,s
                                                          t mt
                                                              ji,s
                                          mji,s
                                           t    =        n                .
                                                              Lk,s
                                                               t mt
                                                                   ki,s

                                                        k=1

   Assume that

                                                  n
                                   µi,s
                                    t     =−            mji,s
                                                         t    log mji,s
                                                                   t    ,
                                                 j =1


then the wages in the second stage regression equation should be corrected using µi,s
                                                                                  t . Deriva-

tion of the equations are provided in Appendix B.3.

   Appendix B: Derivation of Key Equations
   As noted in the main text, the cdf for the extreme value type I distribution with location


                                                         21
parameter −νγ and scale parameter ν is :


                                    F (ε) = exp(− exp(−ε/ν − γ )),


where E (ε) = 0, V ar (ε) = π 2 ν 2 /6 and γ is the Euler’s constant (γ ∼
                                                                        = 0.577). Then, pdf is:


                        f (ε) = (1/ν ) exp(−ε/ν − γ − exp(−ε/ν − γ )).



   B.1 Gross Flow Function
   We are dropping the state superscript s and time subscript t for notational convenience.
De�?ne
                                         εij = [β V j − β V i ] − C ij ,

   The gross ﬂow function, mij , is equal to the probability that a given i sector worker will
switch to j sector, that is the probability of a sector i worker to have higher utility in sector
j in the next period. This probability is


                       mij = Pr εij + εj ≥ εik + εk for k = 1, . . . , n ,

   this can be written as

                                          ∞
                               ij
                            m =               f (εj )         F (εj + εij − εik )dεj .
                                         −∞             k=j


   Thanks to the extreme value distribution and McFadden (1973), the gross ﬂow mij,s
                                                                                t    can
be written as
                                      exp        β Vtj        i     ij
                                                     +1 − β Vt+1 − Ct
                                                                                    1
                                                                                    ν
                         mij
                          t     =    N
                                                                                            ,
                                          exp       β Vtj
                                                        +1      − β Vti
                                                                      +1 −   Ctij       1
                                                                                        ν
                                    k=1

   which is equal to


                                                          22
                                                              exp εij
                                                                   t /ν
                                                mij
                                                 t =         N
                                                                                   .
                                                                  exp εik
                                                                       t /ν
                                                            k=1



   B.2 Option Values
   We follow the steps in Artuc, Chaudhuri and McLaren (2010). De�?ne, for convenience:
                                      n
                                          exp(z k /ν )
x = εj /ν + γ and z = log(            k=1
                                       exp(z k /ν )
                                                       ).
   Now, de�?ne:



               ∞
Φij ≡          −∞
                  εj f (εj )   j =k   F (εj + εij − εik )dεj


           1
     =     ν
                  εj exp(−εj /ν − γ − exp(−εj /ν − γ ))                     k=j    exp(− exp(−[εj + εij − εik ]/ν − γ ))dεj

   Then,



           1
Φij =      ν
                  εj exp(−εj /ν − γ − exp(−εj /ν − γ )) exp(−                            k=j   exp(−[εj + εij − εik ]/ν − γ ))dεj


           1                                                n
     =     ν
                  εj exp(−εj /ν − γ ) exp(−                 k=1   exp(−[εj + εij − εik ]/ν − γ ))dεj


           1                                         n
     =     ν
                  εj exp (−εj /ν − γ ) −             k=1    exp(−[εj + εij − εik ]/ν − γ ) dεj


           1                                                                       n
     =     ν
                  εj exp (−εj /ν − γ ) − exp((−εj /ν − γ ))                        k=1   exp(−[z j − z k ]/ν ) dεj


           1                                                                        n
     =     ν
                  εj exp (−εj /ν − γ ) − exp((−εj /ν − γ ))                         k=1   exp(z k /ν ) / exp(z j /ν ) dεj

                                                             n
                                                                 exp(z k /ν )
   Note that, x = εj /ν + γ and z = log(                     k=1
                                                              exp(z k /ν )
                                                                              ).   Then,



                                                                  23
         Φij =      εj exp(−x − exp(−(x − z )))dx


              =     ν (x − γ ) exp(−x − exp(−(x − z )))dx


              = (−νγ ) exp(−z ) + ν     x exp(−x − exp(−(x − z )))dx


              = (−νγ ) exp(−z ) + ν exp(−z )      x exp(−x + z − exp(−(x − z )))dx

   We know that exp(−z ) = mij from McFadden (1973). Substituting this in:


              Φij = (−νγ )mij + νmij       x exp(−x + z − exp(−(x − z )))dx


                   = (−νγ )mij + νmij      x exp(−x + z − exp(−(x − z )))dx


                       +νmij     z exp(−x + z − exp(−(x − z )))dx


                       −νmij     z exp(−x + z − exp(−(x − z )))dx




Then we set y = x − z , thus


            Φij = (−νγ )mij + νmij (x − z ) exp(−x + z − exp(−(x − z )))dx


                    +νmij      z exp(−x + z − exp(−(x − z )))dx




                                             24
   Φij = (−νγ )mij + νmij                     y exp(−y − exp(−y ))dy + νzmij     exp(−y − exp(−y ))dy


         = (−νγ )mij + νmij                   y exp(−y − exp(−y ))dy + νzmij .



   Noting that         y exp(−y − exp(−y ))dy = γ (Euler’s constant), we can simplify:


                                           Φij = (−νγ )mij + νzmij + νγmij


                                               = −ν log(mij )mij

   Then we can add this across possible destinations j , note that the utility of a worker in
i is equal to:

                                     n
                 Vti   =   ui
                            t   +           Φij   ij ij    ij j
                                             t − mt Ct + βmt Vt+1
                                    j =1



                                     n
                       =   ui
                            t   +          mij        ij     ij      j
                                            t −ν log(mt ) − Ct + β Vt+1 )
                                    j =1



                                     n
                       =   ui
                            t   +          mij        ij     ij       j      i          i
                                            t −ν log(mt ) − Ct + β (Vt+1 − Vt+1 ) + β Vt+1
                                    j =1



                                     n
                       = ui
                          t+               mij ij         ij        i
                                            t εt − ν log(mt ) + β Vt+1 .
                                    j =1




                                                          25
                                                                                         n
   Now, recall from above that log(mij ) = εij
                                            t /ν − log                                   k=1   exp(εik /ν ) . This yields:

                                         n                           n
                     Vti   =   ui
                                t   +          mij
                                                t     ν log               exp(εik /ν )           + β Vti
                                                                                                       +1
                                        j =1                        k=1
                                                      n
                           = ui
                              t + ν log                    exp(εik /ν )         + β Vti
                                                                                      +1 .
                                                     k=1


   This implies that the option value Ωi can be written as


                                                 Ωi = −ν log mii .


   B.3 Wage Shocks
   Assume that di                                             j
                t denotes agent’s choice at time t. Expected ε conditional on a sector i

agent choosing sector i is equal to

                                                       ∞
                                                       −∞
                                                            εj f (εj )          j =k   F (εj + εt ij − εt ik )dεj
              E   εi
                   t |dt   = i, dt+1 = j =                ∞
                                                        −∞
                                                             f (εj )           j =k   F (εj + εt ij − εt ik )dεj
                                                      Φij
                                                        t
                                                 =
                                                      mij
                                                        t

                                                 = −ν log mij
                                                           t



   Adding this across possible origins, we �?nd

                                                                          n
                                Et ε j
                                     t−1 |dt         = j = −ν                  mij     ij
                                                                                t log mt ,
                                                                         i=1

   where mji
          t is the probability of a sector i agent to originate from sector j



                                                                Li,s
                                                                 t mt
                                                                     ij
                                                 mij
                                                  t        =    n                .
                                                                         kj
                                                                     Lk
                                                                      t mt
                                                               k=1




                                                               26
      Appendix C: CCP Representation of the Model
      Following the steps in Appendix B.2 and using (3) the Bellman equation can be written
as




                 Vti,s = ui,s         j,s   ij,s
                          t + max β Vt+1 − Ct    − εj
                                                    t ,
                                    j

                                                                      Φij,s
                     =   ui,s
                          t     +       mij
                                         t    β Vtj,s
                                                  +1    −   Ctij,s   + tij,s    ,
                                    j
                                                                      mt

                     = ui,s
                        t +             mij        ij     ij       j,s    k,s         k,s
                                         t −ν log(mt ) − Ct + β (Vt+1 − Vt+1 )) + β Vt+1 ,
                                    j




                                                                      N
= ui,s
   t +           mij
                  t  Ctij − β (Vtj,s    i,s
                                 +1 − Vt+1 ) + ν log                            ¯in
                                                                            exp ε t /ν   − Ctij + β (Vtj,s    k,s        k,s
                                                                                                       +1 − Vt+1 ) + β Vt+1
             j                                                        n=1
                                              N
= ui,s
   t +           mij
                  t  β Vti
                         +1 + ν log                     ¯in
                                                    exp ε t /ν            − β Vtk,s      k,s
                                                                                +1 + β Vt+1
             j                                n=1
                                              N
= ui,s
   t +           mij
                  t  β Vti
                         +1 + ν log                     ¯in
                                                    exp ε t /ν            − β Vtk,s   ik,s
                                                                                +1 + Ct    + β Vtk,s   ik,s
                                                                                                 +1 − Ct
             j                                n=1

= ui,s     k,s   ik,s
   t + β Vt+1 − Ct    − ν log mik,s
                               t



      Then




     Vtj,s − Vti,s = uj,s     k,s   jk,s
                      t + β Vt+1 − Ct    − ν log mjk,s
                                                  t    − ui,s     k,s   ik,s
                                                          t − β Vt+1 + Ct    + ν log mik,s
                                                                                      t    ,

                  = (uj,s  i,s    jk,s
                      t − ut ) − Ct    − ν log mjk,s
                                                t    + Ctik,s + ν log mik,s
                                                                       t    ,


      since k can be any sector, we can add the expression over all possible sectors to increase

                                                               27
precision




  Vtj,s − Vti,s = (uj,s  i,s
                    t − ut ) +         xij,k −Ctjk,s − ν log mjk,s
                                                              t    + Ctik,s + ν log mik,s
                                                                                     t      , (18)
                                 k=1



     where xij,k
            t    is an arbitrary weighting vector such that            xij,k
                                                                        t    = 1.
                                                                   k
     Then (18) is the CCP representation of the model. Note that ui,s
                                                                  t
                                                                         i,s
                                                                      = wt   + η i,s . We
need guessed values of η i,s for the CCP representation of the model. Therefore, we need to
estimate all parameters at once if we use CCP and maximum likelihood. Which makes CCP
computationally demanding when the number of choices and structural parameters are large.
Also may be diﬃcult to use CCP in certain cases when many of the observed conditional
choice probabilities are close to zero.
     Appendix D: EM loop within PPML regression
     It is possible to incorporate Expectation-Maximization algorithm to our estimation pro-
cedure in the �?rst step. For notational convenience we consider the case where agents’
current and last two sectors are observed in the data, it is straightforward to generalize this
procedure for panels with longer time dimensions.
     Assume that we observe each agent’s decision at time t and t + 1, let us denote agent’s
location at time t with i, time t + 1 with j , and time t + 2 with k . The two period ﬂow at
time t is denoted with mijk,s
                        t     , the number of workers who chose i, j , and k consecutively is
          ijk,s
equal to yt     = Li  ijk,s                                                  ˜, the observed
                            . Each agent has an unobserved discrete type σ ∈ S
                   t mt

states (or types) are still denoted with s ∈ S . We are interested in �?nding the ratio of type
σ workers in the observed ﬂow mijk,s
                               t     , let us denote this probability with ζtijk,s,σ . Then the
number of workers with state (i, s, σ ) who choose j then k starting at time t is equal to
           ijk,s
ζtijk,s,σ yt     .
     Note that




                                                   28
                         exp         β Vtj,s,σ    i,s,σ ij,s,σ
                                         +1 − β Vt+1 − Ct
                                                                                 1
                                                                                 ν
                                                                                               exp         β Vtk,s,σ    j,s,σ jk,s,σ
                                                                                                               +2 − β Vt+2 − Ct+1
                                                                                                                                              1
                                                                                                                                              ν
ζtijk,s,σ mijk,s
           t       =   N
                                                                                           .   N
                                                                                                                                                  ,
                             exp       β Vtn,s,σ    i,s,σ in,s,σ
                                           +1 − β Vt+1 − Ct
                                                                                      1
                                                                                      ν
                                                                                                     exp        β Vtn,s      j,s   jn,s,σ
                                                                                                                    +2 − β Vt+2 − Ct+1
                                                                                                                                              1
                                                                                                                                              ν
                       n=1                                                                     n=1


which can be represented in log-linear format to construct a Poisson regression as we show
in the �?rst step.
       Assume that we have an initial guess for Ztijk,s,σ for time t. Let us denote this initial
                   ijk,s,σ,(1)
guess with Zt                    . Using this initial guess, we can run the Poisson regression in the �?rst
step



                   ijk,s,σ,(1) ijk,s            i,s,σ,(1)          j,s,σ,(1)         k,s,σ,(1)           ijk,s,σ,(1)
          log Zt              yt        = ∆t                + Γt               + Λt               + Ψt                  Xtijk,s + eijk,s,σ
                                                                                                                                   t



       Then the updated probability will be

                                              i,s,σ,(1)          j,s,σ,(1)       k,s,σ,(1)           ijk,s,σ,(1)
                ijk,s,σ,(2)              ∆t               + Γt               + Λt              + Ψt                Xtijk,s
               Zt                =                                                                                                    ,
                                                   i,s,σ ,(1)          j,s,σ ,(1)         k,s,σ ,(1)           ij,s,σ ,(1)
                                          ˜
                                                ∆t               + Γt                + Λt                + Ψt                Xtij,s
                                       σ ∈S


       Hence guess at step τ + 1 will be

                                               i,s,σ,(τ )         j,s,σ,(τ )         k,s,σ,(τ )          ijk,s,σ,(τ )
             ijk,s,σ,(τ +1)               ∆t                + Γt               + Λt               + Ψt                  Xtijk,s
            Zt                =                                                                                                           .
                                                   i,s,σ ,(τ )         j,s,σ ,(τ )         k,s,σ ,(τ )          ijk,s,σ ,(τ )
                                          ˜
                                                ∆t               + Γt                + Λt                + Ψt                   Xtijk,s
                                       σ ∈S

                                               ijk,s,σ,(τ +1)
Our simulations con�?rm that Zt                                     converges to true Ztijk,s,σ and we obtain consistent
estimates for the destination and origin �?xed eﬀects16 .




  16
       Results are available upon request.


                                                                       29
References
 [1] Anderson, James (2010). “The Gravity Model,�? The Annual Review of Economics, 3(1).

 [2] Aguirregabiria, Victor and Pedro Mira (2002). “Swapping the Nested Fixed Point Al-
    gorithm: A Class of Estimators for Discrete Markov Decision Models,�? Econometrica,
    70(4).

 [3] Aguirregabiria, Victor and Pedro Mira (2010). “Dynamic Discrete Choice Models: A
    Survey,�? Journal of Econometrics, 156.

 [4] Arcidiacono, Peter, and Robert Miller (2011). “CCP Estimation of Dynamic Discrete
    Choice Models with Unobserved Heterogeneity,�? forthcoming in Econometrica.

 [5] Artuc, Erhan, Shubham Chaudhuri and John McLaren (2010). “Trade Shocks and Labor
    Adjustment: A Structural Empirical Approach,�? American Economic Review, 100(3).

 [6] Artuc, Erhan, and John McLaren (2012). “Trade Policy and Wage Inequality: A Struc-
    tural Analysis with Occupational and Sectoral Mobility,�? NBER Working Paper: 18503.

 [7] Artuc, Erhan, German Bet, Irene Brambilla and Guido Porto (2013). “Trade Shocks,
    Firm Level Investment Inaction and Labor Market Responses,�? Mimeo: World Bank.

 [8] Baine, Michel, Frederic Docquier and Caglar Ozden (2009). “Diasporas,�? Journal of
    Development Economics, 95(1).

 [9] Cameron, Colin and Pravin Trivedi (1998). “Regression Analysis of Count Data,�? Cam-
    bridge University Press, Cambridge.

[10] Gourieroux, C., A. Monfort, and A. Trognon (1984). “Pseudo maximum likelihood
    methods: Applications to Poisson models.,�? Econometrica, 52(3).

[11] Grogger, Jeﬀrey and Gordon Hanson (2011). “Income Maximisation and the Selection
    and Sorting of International Migrants.,�? Journal of Development Economics, 95(1).

                                             30
[12] Hotz, V. Joseph, and Robert Miller (1993). “Conditional Choice Probabilities and the
    Estimation of Dynamic Models.,�? Review of Economic Studies, 60(3).

[13] Kaplan, D.S., D. Lederman and R. Robertson (2013). “Worker-Level Adjustment Costs
    in a Developing Country: Evidence from Mexico.,�? Mimeo: World Bank.

[14] Lipton, D., J. Poterba, J. Sachs, L. Summers (1982). “Multiple shooting in rational
    expectation models,�? Econometrica, 50.

[15] Magnac, Thierry, and David Thesmar (2002). “Identifying Dynamic Discrete Decision
    Processes,�? Econometrica, 70(2).

[16] McFadden, Daniel (1973). “Conditional Logit Analysis of Qualitative Choice Behavior,�?
    in P. Zarembka (ed.) Frontiers in Econometrics, New York, Academic Press.

[17] Olivero, Maria and Yoto Yotov (2011) “Dynamic Gravity: Theory and Empirical Im-
    plications,�? Canadian Journal of Economics forthcoming.

[18] Rust, John (1987). “Optimal Replacement of GMC Bus Engines: An Empirical Model
    of Harold Zurcher,�? Econometrica, 55(5).

[19] Santos Silva, Joao, and Silvana Tenreyro (2006). “The log of gravity,�? Review of Eco-
    nomics and Statistics, 88(4).




                                             31
Table 1: Regression Results (Basic Speci�?cation)

                 Moving Cost
                     Estim        SE
        Mean Ct /ν 4.671 **     (0.055)
        Max Ct /ν 4.884 **      (0.059)
        Min Ct /ν 4.488 **      (0.054)
           1/ν      0.959 **    (0.255)

               η/ν (Utility)
         Sector Estim        SE
         1       0.000       -
         2       -0.595 ** (0.162)
         3       -0.156 * (0.094)
         4       -0.231 * (0.125)
         5       -0.224 * (0.114)
         6       -0.258 ** (0.113)
         7       -0.601 ** (0.166)
         8       -0.420 ** (0.131)
         9       -0.297 ** (0.122)
         10      -0.063      (0.070)
         11      -0.486 ** (0.169)
         12      -0.224 * (0.101)
         13      -0.140 ** (0.052)
         14      -0.385 ** (0.080)
         15      -0.285 * (0.133)
         16      -0.295 ** (0.128)


           * signi�?cant at 5% level.
           ** signi�?cant at 1% level.




                      32
Table 2: Regression Results (Alternative Speci�?cation)

                    Moving Cost
                        Estim        SE
           Mean Ct /ν 4.671 **     (0.055)
           Max Ct /ν 4.884 **      (0.059)
           Min Ct /ν 4.488 **      (0.054)
              1/ν      3.672 **    (0.667)


                    η/ν (Utility)
                 Intercept             Trend
  Sector    Estim       SE       Estim     SE
  1         0.000       -        0.000     -
  2         -2.378 ** (0.455) 0.009        (0.008)
  3         -1.422 ** (0.314) 0.027 ** (0.009)
  4         -1.528 ** (0.340) 0.003        (0.008)
  5         -1.509 ** (0.330) 0.011        (0.008)
  6         -1.690 ** (0.359) 0.023 ** (0.009)
  7         -2.257 ** (0.425) -0.004       (0.008)
  8         -1.565 ** (0.313) -0.014 * (0.008)
  9         -1.712 ** (0.362) 0.014 * (0.008)
  10        -0.857 ** (0.224) 0.014 * (0.008)
  11        -2.057 ** (0.407) -0.013       (0.008)
  12        -1.234 ** (0.269) 0.001        (0.007)
  13        -0.727 ** (0.166) 0.017 * (0.008)
  14        -1.351 ** (0.239) 0.018 ** (0.008)
  15        -1.497 ** (0.329) -0.010       (0.008)
  16        -1.505 ** (0.331) -0.006       (0.007)


              * signi�?cant at 5% level.
              ** signi�?cant at 1% level.




                         33
             Table 3: Descriptive Statistics and Simulation Parameters


                 Descriptive Statistics                  Simulation Parameters
         Labor   Allocation         Wage         Labor Share Constant CPI Share
Sector   Mean        SE       Mean       SE           a             A˜           b
  1       0.02      (0.00)      0.58    (0.03)      0.30           0.14        0.07
  2       0.02      (0.00)      1.19    (0.05)      0.30           0.23        0.00
  3       0.09      (0.01)      0.92    (0.06)      0.85           0.75        0.30
  4       0.17      (0.02)      1.04    (0.03)      0.57           0.86        0.20
  5       0.10      (0.01)      1.00    (0.03)      0.57           0.64        0.10
  6       0.06      (0.00)      1.00    (0.05)      0.49           0.50        0.06
  7       0.02      (0.00)      1.21    (0.06)      0.42           0.28        0.03
  8       0.03      (0.00)      1.07    (0.05)      0.49           0.34        0.04
  9       0.06      (0.00)      1.04    (0.04)      0.58           0.53        0.00
  10      0.11      (0.01)      0.81    (0.05)      0.58           0.54        0.00
  11      0.05      (0.00)      1.22    (0.09)      0.22           0.54        0.01
  12      0.05      (0.01)      0.95    (0.05)      0.68           0.53        0.05
  13      0.01      (0.00)      0.71    (0.04)      0.61           0.22        0.03
  14      0.01      (0.00)      0.85    (0.05)      0.60           0.22        0.06
  15      0.14      (0.01)      1.08    (0.05)      0.68           0.84        0.06
  16      0.07      (0.01)      1.06    (0.03)      0.82           0.81        0.00




                                         34
                Table 4: Simulation Results: Estimation with PPML


                              Moving Cost ( C/ν and 1/ν )
                           Sim I         Sim II           Sim         III           Sim    IV
             Actual   Estim     SE   Estim     SE     Estim             SE      Estim        SE
Mean Ct /ν    4.500    4.503 (0.023) 4.503 (0.022) 4.503              (0.022)    4.504     (0.037)
Max Ct /ν     4.500    4.507 (0.024) 4.507 (0.022) 4.507              (0.022)    4.509     (0.035)
Min Ct /ν     4.500    4.500 (0.021) 4.497 (0.021) 4.497              (0.021)    4.498     (0.036)
   1/ν        1.000    0.995 (0.119) 0.993 (0.109) 0.999              (0.049)    1.010     (0.186)

                                 Fixed Utility ( η i /ν )
                           Sim I         Sim II                 Sim   III            Sim   IV
 Sector      Actual   Estim     SE   Estim      SE          Estim       SE      Estim        SE
    2         0.100    0.103 (0.017) 0.101 (0.016)           0.101    (0.007)    0.101     (0.020)
    3         0.150    0.153 (0.020) 0.153 (0.017)           0.151    (0.008)    0.151     (0.024)
    4         0.200    0.202 (0.019) 0.203 (0.017)           0.201    (0.007)    0.201     (0.025)
    5         0.250    0.252 (0.016) 0.252 (0.016)           0.250    (0.007)    0.252     (0.019)
    6         0.300    0.301 (0.020) 0.301 (0.017)           0.301    (0.008)    0.303     (0.023)
    7         0.350    0.352 (0.027) 0.350 (0.026)           0.351    (0.011)    0.352     (0.039)
    8         0.400    0.400 (0.029) 0.399 (0.027)           0.400    (0.012)    0.403     (0.040)
    9         0.000    0.002 (0.027) 0.002 (0.025)           0.000    (0.013)      -          -
   10        -0.100   -0.098 (0.036) -0.097 (0.034)         -0.100    (0.015)      -          -
   11        -0.150   -0.148 (0.040) -0.147 (0.039)         -0.149    (0.018)      -          -
   12        -0.200   -0.197 (0.040) -0.196 (0.038)         -0.199    (0.017)      -          -
   13        -0.250   -0.248 (0.022) -0.249 (0.022)         -0.251    (0.009)      -          -
   14        -0.300   -0.298 (0.024) -0.301 (0.023)         -0.302    (0.012)      -          -
   15        -0.350   -0.346 (0.064) -0.345 (0.064)         -0.350    (0.029)      -          -
   16        -0.400   -0.396 (0.059) -0.395 (0.055)         -0.399    (0.025)      -          -




                                        35
 Table 5: Simulation Results: Imputing Values with PPML and CCP

                      L = 4, 000         L = 20, 000          L→∞
        Actual    PPML        CCP     PPML       CCP     PPML      CCP
    2
Λ        0.395     0.391     0.578     0.400     0.421    0.395   0.395
        (0.142)   (0.305) (0.327)     (0.185) (0.244)    (0.142) (0.142)
Λ3       1.025     1.030     1.378     1.026     1.020    1.025   1.025
        (0.145)   (0.298) (0.345)     (0.178) (0.230)    (0.145) (0.145)
Λ4       1.518     1.511     1.919     1.519     1.501    1.518   1.518
        (0.179)   (0.295) (0.339)     (0.198) (0.249)    (0.179) (0.179)
Λ5       1.248     1.262     1.642     1.251     1.252    1.248   1.248
        (0.138)   (0.275) (0.323)     (0.166) (0.219)    (0.138) (0.138)
Λ6       1.106     1.114     1.473     1.112     1.122    1.106   1.106
        (0.168)   (0.276) (0.313)     (0.194) (0.235)    (0.168) (0.168)
Λ7       0.716     0.730     1.009     0.713     0.730    0.716   0.716
        (0.153)   (0.295) (0.318)     (0.173) (0.224)    (0.153) (0.153)
Λ8       0.895     0.908     1.218     0.903     0.908    0.895   0.895
        (0.153)   (0.271) (0.298)     (0.182) (0.234)    (0.153) (0.153)
Λ9       0.797     0.823     1.143     0.785     0.794    0.797   0.797
        (0.155)   (0.273) (0.322)     (0.176) (0.223)    (0.155) (0.155)
Λ10      0.703     0.711     0.987     0.702     0.707    0.703   0.703
        (0.141)   (0.282) (0.334)     (0.176) (0.229)    (0.141) (0.141)
Λ11      0.736     0.745     1.028     0.738     0.757    0.736   0.736
        (0.146)   (0.285) (0.343)     (0.179) (0.242)    (0.146) (0.146)
Λ12      0.368     0.395     0.562     0.378     0.400    0.368   0.368
        (0.136)   (0.270) (0.319)     (0.183) (0.249)    (0.136) (0.136)
Λ13      -0.405    -0.416    -0.651    -0.407   -0.431    -0.405  -0.405
        (0.127)   (0.344) (0.447)     (0.174) (0.259)    (0.127) (0.127)
Λ14      -0.433    -0.452    -0.685    -0.425   -0.424    -0.433  -0.433
        (0.122)   (0.343) (0.468)     (0.190) (0.252)    (0.122) (0.122)
Λ15      0.840     0.837     1.137     0.840     0.846    0.840   0.840
        (0.156)   (0.292) (0.341)     (0.182) (0.245)    (0.156) (0.156)
Λ16      0.273     0.266     0.390     0.281     0.285    0.273   0.273
        (0.147)   (0.301) (0.363)     (0.194) (0.242)    (0.147) (0.147)




                                      36
Table 6: Simulation Results: Comparing Diﬀerent Methods (with Aggregate Shocks)

             Sample Size     Method            C/ν                1/ν
                   -         Actual    4.500       -      1.000       -
          L = 2, 000, T = 25 PPML1     4.530    (0.015)   1.001    (0.024)
                             PPML2     4.515    (0.016)   1.006    (0.027)
                              ACM      4.217    (0.250)   0.908    (0.084)
         L = 4, 000, T = 25 PPML1      4.515    (0.010)   1.000    (0.020)
                             PPML2     4.506    (0.010)   1.003    (0.021)
                              ACM      4.429    (0.179)   0.958    (0.060)
                              CCP      4.517    (0.011)   1.083    (0.038)
         L = 20, 000, T = 25 PPML1     4.503    (0.005)   0.999    (0.014)
                             PPML2     4.500    (0.005)   1.003    (0.016)
                              ACM      4.560    (0.074)   1.001    (0.032)
                              CCP      4.506    (0.005)   0.998    (0.018)
           L → ∞, T = 25     PPML1     4.500    (0.000)   0.999    (0.012)
                             PPML2     4.498    (0.001)   1.003    (0.014)
                              ACM      4.500    (0.000)   0.999    (0.019)
                              CCP      4.498    (0.001)   1.003    (0.014)




                                      37
Table 7: Simulation Results: Comparing Diﬀerent Methods (without Aggregate Shocks)

            Sample Size       Method                C/ν                1/ν
                  -           Actual        4.500       -      1.000       -
         L = 2, 000, T = 25   PPML1         4.530    (0.015)   0.999    (0.023)
                              PPML2         4.515    (0.015)   1.007    (0.023)
                               ACM          4.248    (0.269)   0.912    (0.081)
                            PPML-NPM        4.501    (0.014)   1.013    (0.016)
        L = 4, 000, T = 25    PPML1         4.515    (0.010)   0.999    (0.016)
                              PPML2         4.507    (0.010)   1.003    (0.016)
                               ACM          4.429    (0.177)   0.961    (0.056)
                               CCP          4.520    (0.011)   1.080    (0.031)
                               NPM          4.495    (0.010)   1.006    (0.015)
        L = 20, 000, T = 25   PPML1         4.503    (0.005)   1.000    (0.007)
                              PPML2         4.500    (0.005)   1.001    (0.007)
                               ACM          4.559    (0.073)   1.006    (0.025)
                               CCP          4.505    (0.005)   0.996    (0.012)
                               NPM          4.499    (0.005)   1.015    (0.016)
          L → ∞, T = 25       PPML1         4.500    (0.000)   1.000    (0.000)
                              PPML2         4.500    (0.000)   1.000    (0.000)
                               ACM          4.500    (0.000)   1.000    (0.000)
                               CCP          4.500    (0.000)   1.000    (0.000)
                               NPM          4.500    (0.000)   1.000    (0.000)




                                       38