Research Observer

                                           EDITOR
                           Shantayanan Devarajan, World Bank


                                         CO-EDITOR
                                Gershon Feder. World Bank


                                      EDITORIAL BOARD
                           Susan Collins, Georgetown University
                           Angus Deaton, Princeton University
                   Barry Eichengreen, University of California-Berkeley
                             Emmanuel Jimenez, World Bank
                                Benno Ndulu, World Bank
                         Howard Pack, University of Pennsylvania
                                 Luis Serven, World Bank
                                Sudhir Shetty,World Bank
                               Michael Walton, World Bank

The World Bank Research Observer is intended for anyone who has a professional interest in
development. Observer articles are written to be accessible to nonspecialist readers; con-
tributors examine ley issues in development economics, survey the literature and the lat-
est World Bank research, and debate issues of development policy. Articles are reviewed by
an editorial board drawn from across the Bank and the international community of econo-
mists. Inconsistency with Bank policy is not grounds for rejection.

The journal welcomes editorial comments and responses, which will be considered for pub-
lication to the extent that space permits. On occasion the Observer considers unsolicited
contributions. Any reader interested in preparing 'such an article is invited to submit a
proposal of not more than two pages to the Editor. Please direct all editorial correspon-
dence to the Editor, The World Bank Research Observer, 1818H Street,NW, Washington, DC
20433: USA.

The views and interpretations expressed in thisjournal are those of the authors and do not
necessarily represent the views and policies of the World Bank or of its ExecutiveDirectors
                                                                                             -
or the countries they represent. The World Bank does not guarantee the accuracy of data
included in this publication and accepts no responsibility whatsoever Tor any consequences
of their use. when maps are used, the boundaries, denominations, and other information
do not imply on the part of the World Bank Group any judgment on the legal status of any
territory or the endorsement or acceptance of such boundaries.



                                P

        For more information, please visit the Web sites of the Research Observer at
            www.wbro.oxfordjournals.org, the World Bank a t w~vw.worldbank.org,
                  and-oxford Univgrsity Press a t www.oxfordjournals.org.

Research Observer
- - -



Volume 23    Number 1 Spring 2008




Governance Indicators: Where Are We, Where Should We Be Going?
       Daniel Kaufmann and Aart Kraay


Two Comments on "Governance Indicators: Where Are We, Where
Should We Be Going?" by Daniel Kaufmann and Aart Kraay
        Shantayanan Devarajan and Simon Johnson


Walking up the Down Escalator: Public Investment and Fiscal Stability
       William Easterly,Timothy Irwin, and Luis Serven                   37


What Can Countries in Other Regions Learn from Social Security Reform
in Latin America?
       Indermit S. Gill, Ceren Ozer, and Radu Tatucu


Why OECD Countries Should Reform Rules of Origin
        Olivier Cadot and Jaime de Melo

Subscriptions
A subscription to Tlre Worki Bank Resrnrch Obs~~rvrr(ISSN 0257-303L) comprises 2 issues. Prices include postage: b r subscribers
outside the Americas, issucs are sent air Freight.
Annual Subscription Rate (Volume 23. 2 issues. 2008)
Academic libraries
Print edition and site-wide online access: IlS$140/f9 3/140
Print edition only: IiS$133/EH9/l         33
Site-wide online access only: US$133/fH9/l            33
Corpornte
Print edition and silc-widc online access: U S $ ~ O X I Ł ~ ~2O8  Y /
Print edition only: [JS$198/f l 32119X
Site-wide online access only: US$198/L132/19t(
Personal
Print edition and individual online access: US$50/f 33/50
Please note: ITS$ rate applies to ITS & Canada. Eurose applies to Europe. UKf applies to IJK and Kest 01 World.
Rnrders w~tlrrnailing ad~lr~ssesin tlon-OECII co~rntriesand in socirrlist ec.otronli(~sirz trarlsitiorz arc, pligihle to r(,ceive curnplir~zrntnr!l
sul~srril~tionson rcyuest by writirry Lo tllr OK a~ldrvsshelo~it
There may be other subscription rates available: for a complete listi~rg,please visit www.wbro.oxFordjournaIs.org/ subscriptions.
Full pre-payment in the correct currcocy is required for all ordcrs. Payment should be in IJS dollars for orders being delivered to
Lhe USA or Canada: Euros for orders being delivered within Europe (excluding the UKJ: GBP sterling b r orders being delivered
elsewhrre (i.e not being dclivercd to USA. Canada, or Europe). All orders should be accompanied by full payment and sent to
your nearest Oxbrd Journals oftice. Subscriptions are acceptcd for conlplete volumcs only Orders are regarded as tirm, and
payments are not refundable. Our prices include Standard Air as poslage outsidr of the [JK. Claims must he notified within b u r
moriths of despatchlorder date (whicheveris later). Subscriptions in thc EEC may be subject to European VAT. Ifregistered, please
supply de~ailsto avoid unnecessary charges. For subscript~c~nsthat include online versions. a proportion of the subscription
price may be subject to IIK U4T. Subscribers in Canada, please add CST lo the prices quoted. Personal rate subscriptions are only
available iF payment is made by personal cheque or credit card, delivery is to a private address, and is for personal use only
Back issues: The current year and two previous years' issues are available from Oxford Ilniversily Press. Previous volumes can
bc ohtainrd lrom the Periodicals Service Company. 11 Main Strrct. Germantown. NY 12526, IJSA. E-mail: psc@periodicals.com.
Tel: (518) 537-4700. Fax: (518) 537-5899.
Contact information:Journals Customer Service 1)epartment. Oxford Ilniversity Press. Great Clarendon Stntct. Oxford OX2 hDt:
UK.E-nlail: jn~s.cud.serv(i(loxlorrljournals.org.'~el:  +44 (0)1865 353907. Fax: +44 111)18h5353485.
In the Americas, please contact:Journals Custc~merServlcc 1)eparlment. Oxford Lnivcrsity Press. 2001 Evans Road, Cary. NC
27513. USA. E-mail: jniorders(~oxbrdjournals.ore.Tel: (800) 852-7323 (toll-free in L'SAlCanada) or (91'1) 677-0977. Fax:
(919) 677-1 714. In Japan,please contact: Journals Customer Service Departmenl. Oxford Clniversity Press. 4-5-10-HP Shiba,
blinato-ku. Tokyo. 108-8 386. Japan. E-mail: ~~kudaoup@po.iijnet.or.jp. +81 3 5444 5858. Fax: +81 3 3454 2929.
                                                                               Tcl:
Postal informatinn: TIIPWorld Batik Kcsoarch Ohsurvrr (ISSN 0257-5031) is published twice a year, in Feb. and Aug., by Oxford
Irniversity Press for the International Bank For Reconstn~clion and Developmcntl~~e                  HORI.I) H ~ N K .
                                                                                                                     Postmaster: send address
changes to Tlrv Worlri Bllrtk Rosntrch Ohseri~c~r.Journals Customer Service Ilepartment. Oxford University Press. 2001 Evans Road.
car^: NC 2751 3-2009. Communications regarding original articles and edi(oria1 management should be addressed to The Editor.
Tl~rWorld Barlk Rusearch Ohserver. The World Bank. 1818 H Street. NW. Washington. D.C. 2043 3. USA.
Oxford Journals Environmental and Ethical Policies: Oufor~lJournals is committed to working with the global community
to bring the highest quality research to the widest possible audience. Oxford Journals will protect the environment by
implementing crlviron~l~etltallyliiendly policies and practices wherever possible. Please see http:iiw~~~w.oxbrdjournals.org/
eth~calpolicies.htmlb r further inlbrmation on Oxford [ournals' environmental and ethical policies.
DIGITALOBJECT IDENTIFIERS: For information on dois and to rrsolve them. please visit www.doi.org.
Permissions: Fnr information on how to request permissions lo reprnduce articlcs or inl'ormation from this journal. please visit
www.oxfordjournals,org!jnls/perrnisbions.
Advertising:Inquiries about adverlising should be sent to Hclen I'earson. Oxfcrrd Journals Advertising. PO Box 347. Abingdon
OX14 1GJ. UK. E-mail: helen(~;oxFordads.com.Tel: +44 (011235 201904. Far: +4J ((1)8704 296864.
Disclaimer: Statements of fact and opinion in thc artlcles in Tlrr Miorlrl Bank K~srar('11Ohscrwr are those of the respeclive
authors and contributors and not of the l~lternationalBank Tor Heconstruction and I)evelopment/r~rWIIRII) B A ~ Kor Oxford
Ilniversity Press. Neither Oxfwd LJniversily Prcss nor the Internalionai Rank for Reconstructio~~and Development/~~~i                  IVIIRLI) R A ~ K

make any reprcsentation, express or implied, in respect of the accuracy of Lhe material in this jonrnal and cannot acccpt sny
lcgal responsibilily or liability for any errors or omissions that may be made. The rcader should make her or his own evaluation
as to the appropriateness or otherwise of any expcrimclltal Lechl~iquedescribed.
Paper used: Tlrp M4)rlrl Rurrk Rpsi~,tr~.lrOhsrr-v1.r is printed on acid-frcr paper that meets the minimum requirements of ANSI
Standard 259.48-1984 (Permanence of Paper).
Indexing and abstracting: Tire M.k)rlclBorlk Kczs~,nn'hOhs~r~icris indexcd and/or abstracted by -IHI/I,V~           OK,LI. c . 4 ~
                                                                                                                                 ~Zbslracts.Currcrrt
(:or~trrrts/Socinland Bdl~nviornlS~.i~vrr,rs.)t*rrrrl,rl Ecorrorllic Litmtrlrr~~iEcoriLit,
                                                                                        PALS lrrtrrrmlio~wl.KrPRr (R~smrchin E~~onomic
Rrpers]. Sorial Sar\~ir.psCitnt~or~In[kx. and N'ilsor~Husir~rss;ibstr~rcts.


Copyright 8.. 'l'he Intcrnatlonal Balk fur Keconstructlon and i)cvelopme~it/.~.~~:      \LORI.II I ~ A ~ K

All rights reserved; no part of this publicdlon may be reproducrd, storcd in a rrtricvai bystem, or transmitted in any form or by
any means, electr~mic.mechanical, photocopying, recording, or otherwise without prior written permission or the publisher or ir
license permitling reslr~ctedcopying issued In the IIK by thc Copyrighl Licensing Agency Ltd. 90 'rottenham Courl Road.
London W1P 9HE. rrr in the IlSA by the Copyrighl Clearatst. Center. 222 Roscurood Drive. Danvers. MA 01923.

     Governance Indicators: Where Are We,
                       Where Should We Be Going?



                                                                -

                                           Daniel Kaufmann                 Aart Kraay
                                                                                                                         .-



Progress in measuring governance is assessed using a simple fran~ework that
distinguishes between indicators that measure formal rules and indicators that measure
the pmcticnl application or o~ctcomesoJ these rilles. The analysis calls attentior1 to the
strerzgths and weaknesses of both types of indicators as well as the conzplemerziarities
between them. It distinguishes between the vielvs of experts and the results of surveys
atld assesses the nzerits of aggregate as opposed to individual governance indicators.
Somc simple principles are ident$cd to guide the use and refinement of existing govern-
ance indicators and the developnzcnt offut~lreindicators. These irlclude transparently dis-
closing and acco~intirzg for the margins of error in all indicators, drawing f m n ~a
diversity of indicators and exploiting con~ylenzentaritiesamong thern, submitting all
irldicators to rigorous public and acaderrlic scrutiny, and being realistic in expectations of
future indicators. [EL codes: H1, 017

       Not everything that can be counted counts, and not everythirlg that
       counts can be counted.                                                                     -Albert       Einstein

Most scholars, policymalers, aid donors, and aid recipients recognize that good
governance is a fundamental ingredient of sustained economic development. This
growing understanding, initially informed by a very limited set of empirical
measures of governance, has spurred intense interest in developing more refined.
nuanced, and policy-relevant indicators of governance. This article reviews
progress in measuring governance, emphasizing empirical measures explicitly
designed to be cornparable across countries and in most cases over time. The goal
is to provide a structure for thinking about the strengths and weaknesses of
different types of governance indicators that can inform both the use of existing
indicators and ongoing efforts to improve them and develop new ones.'
    The tirst section of this article reviews definitions of governance. Although there
are many broad definitions of governance, the degree of definitional disagreement


I: The Author 20[)X. Published by Oxkrd TJniversily Press un behalrol the International Bank For Reconstruction and
Ilevelopment 1 II I E \VOKIJ) BAYK. Allrights reserved. For permissions, please e-rnail: journals.perrnissions@joxIordjournills.org
doi:10.1093!whro!lkmol2                Advance Access publication January 3 1. LOOX                                       23:l-30

can easily be overstated. Most definitions appropriately emphasize the importance
of a capable state that is accountable to citizens and operating under the rule of
law. Broad principles of governance along these lines are naturally not amenable to
direct observation and thus to direct measurement. As Albert Einstein noted, "Not
everything that counts can be counted." Many differenttypes of data provide infor-
mation on the extent to which these principles of governance are observed across
countries. An important corollary is that any particular indicator of governance
can usefully be interpreted as an imperfect proxy for some unobserved broad
dimension of governance. This interpretation emphasizes throughout this review a
recurrent theme that there is measurement error in all governance indicators,
which should be explicitly considered when using these kinds of data to draw
conclusions about cross-country differencesor trends in governance over time.
   The second section addresses what is measured. The discussion highlights the
distinction between indicators that measure specific rules "on the books" and indi-
cators that measure particular governance outcomes "on the ground." Rules on
the books codify details of the constitutional, legal, or regulatory environment;
the existence or absence of specific agencies, such as anticorruption commissions
or independent auditors; and so forth--components intended to provide the key
de jure foundations of governance. On-the-ground measures assess de facto govern-
ance outcomes that result from the application of these rules (Do firms find the
regulatory environment cumbersome? Do households believe the police are
corrupt?). An important message in this section concerns the shared limitations
of indicators of both rules and outcomes: Outcome-based indicators of governance
can be difficult to link back to specific policy interventions, and the links from
easy-to-measure de jure indicators of rules to governance outcomes of interest are
not yet well understood and in some cases appear tenuous at best. They remind
us of the need to respect Einstein's dictum that "not everything that can be
counted counts."
   The third section examines whose views should be relied on. Indicators based
on the views of various types of experts are distinguished from survey-based indi-
cators that capture the views of large samples of firms and individuals. A category
of aggregate indicators that combine, organize, provide structure, and summarize
information from these different types of respondents is examined. The fourth
section examines the rationale for such aggregate indicators, and their strengths
and weaknesses.
   The set of indicators discussed in this survey is intended to provide leading
examples of major governance indicators rather than an exhaustive stocktaking of
existing indicators in this taxonomy.2A feature of efforts to measure governance
is the preponderance of indicators focused on measuring de facto governance out-
comes and the paucity of measures of de jure rules. Almost by necessity, de jure
rules-based indicators of governance reflect the views or judgments of experts. In


2                                     The World Bank Resrarctr Observer; ~101.23. no. 1 (Spring 2008)

contrast, the much larger body of de facto indicators captures the views of both
experts and survey respondents.
   The article concludes with a discussion of the way forward in measuring govern-
ance in a manner that can be useful to policymakers. The emphasis is on the
importance of consumers and producers of governance indicators clearly recogniz-
ing and disclosing the pervasive measurement error in any type of governance indi-
cators. This section also notes the importance of moving away from oft-heard false
dichotomies, such as "subjective" or "objective"indicators or aggregate or disaggre-
gated ones. For good reason, virtually all measures of governance involve a degree
of subjective judgment, and different levels of aggregation are appropriate for differ-
ent types of analysis. In any case, the choice is not either one or the other, as most
aggregate indicators can readily be unbundled into their constituent components.



What Does Governance Mean?

The concept of governance is not a new one. Early discussions go back to at least
400 BCE, to the Arthashnstra. a treatise on governance attributed to Kautilya,
thought to be the chief minister to the king of India. Kautilya presents key pillars
of the "art of governance," emphasizing justice, ethics, and anti-autocratic ten-
dencies. He identifiesthe duty of the king to protect the wealth of the state and its
subjects and to enhance, maintain, and safeguard this wealth as well as the inter-
ests of the kingdom's subjects.
   Despite the long provenance of the concept, no strong consensus has formed
around a single definition of governance or institutional quality. For this reason,
throughout this article the terms governance, institutions, and institutional quality
are used interchangeably, if somewhat imprecisely. Researchers and organizations
have produced a wide array of definitions. Some definitions are so broad that they
cover almost anything (such as the definition "rules, enforcement mechanisms,
and organizations" offered in the World Bank's World Development Report 2002:
B~iildingInstitutions for Markets). Others, like the definition suggested by North
(2000),are not only broad but risk making the links from good governance to
development almost tautological: "How do we account for poverty in the midst of
plenty? . . . We must create incentives for people to invest in more efficient tech-
nology increase their skills, and organize efficient markets.. .. Such incentives
are embodied in institutions."
   Some of the governance indicators surveyed capture a wide range of develop-
ment outcomes. While it is difficult to draw a line between governance and the
ultimate development outcomes of interest, it is useful at both the definitional and
measurement stages to emphasize concepts of governance that are at least somc-
what removed from development outcomes themselves. An early and narrower


KuriJrnunn and Krcmy                                                                 3

definition of public sector governance proposed by the World Bank is that
"governance is the manner in which power is exercised in the management of a
country's economic and social resources for development" (World Bank 1992,
p. 1).This definition remains almost unchanged in the Bank's 2007 governance
and anticorruption strategy, with governance defined as "the manner in which
public officials and institutions acquire and exercise the authority to shape public
policy and provide public goods and services" (World Bank 2007, p. 1).
   Kaufmann, Kraay, and Zoido-Lobaton (1999a, p. 1)define governance as "the
traditions and institutions by which authority in a country is exercised. This
includes the process by which governments are selected. monitored and replaced;
the capacity of the government to effectively formulate and implement sound
policies; and the respect of citizens and the state for the institutions that govern
economic and social interactions among them."
   Although the number of definitions of governance is large, there is some con-
sensus. Most definitions agree on the importance of a capable state operating
under the rule of law. Interestingly, comparing the last three definitions cited
above, the one substantive difference has to do with the explicit degree of empha-
sis on the role of democratic accountability of governments to their citizens. Even
these narrower definitions remain sufficiently broad that there is scope for a wide
diversity of empirical measures of various dimensions of good governance.
  The gravity of the issues dealt with in these definitions of governance suggests
that measurement is important. In recent years there has been debate over whether
such broad notions of governance can be usefully measured. Many indicators can
shed light on various dimensions of governance. However, given the breadth of the
concepts, and in many cases their inherent unobservability,no single indicator or
combination of indicators can provide a completely reliable measure of any of these
dimensions of governance. Rather, it is useful to think of the various specific indi-
cators discussed below as all providing imperfect signals of fundamentally unobser-
vable concepts of governance. This interpretation emphasizes the importance of
taking into account as explicitly as possible the inevitable resulting measurement
error in all indicators of governance when analyiing and interpreting any such
measure. As shown below, however, the fact that such margins of error are finite
and still allow for meaningful country comparisons across space and time suggests
that measuring governance is both feasibleand informative.



Governance Rules or Governance Outcomes?

This section examines both the rules-based and outcome-based indicators of gov-
ernance. A rules-based indicator of corruption might measure whether countries
have legislation prohibiting corruption or have an anticorruption agency.


4                                      The World Bank Rrseizrc.110bservc.c vol. 23, no. 1 (Spring 2008)

An outcome-based measure could assess whether the laws are enforced or the
anticorruption agency is undermined by political interference. The views of firms,
individuals, nongovernmental organizations (NGOs), or commercial risk-rating
agencies could also be solicited regarding the prevalence of corruption in the
public sector. To measure public sector accountability, one could observe the rules
regarding the presence of formal elections, financial disclosurc requirements for
public servants, and the like. One could also assess the extent to which these
rules operate in practice by surveying respondents regarding the functioning of
the institutions of democratic accountability.
   Because a clear line does not always distinguish the two types of indicators, it
is more useful to think of ordering different indicators along a continuum, with
one end corresponding to rules and thc other to ultimate governance outcomes of
interest. Because both types of indicators have their own strengths and weak-
nesses, all indicators should be thought of as imperfect, but complementary
proxies for the aspects of govcrnance they purport to measure.


Rules-Based Indicators of Governance

Several rules-based indicators are uscd to assess governance (tables 1 and 2).
They include the Doing Business project of the World Bank, which reports
dctailed information on the legal and regulatory environment in a large set of
countries; the Database of Political Institutions, constructed by World Bank
researchers, and the POLITY-IV database of the University of Maryland, both of
which report detailed factual information on features of countries' political
systems; and the Global Integrity Index (GII),which provides detailed information
on the legal framework governing public sector accountability and transparency
in a sample of 41 countries, most of them developing economies.
   At first glance, one of the main virtues of indicators of rules is their clarity. It is
straightforward to ascertain whether a country has a presidential or a parliamen-
tary system of government or whether a country has a legally independent anti-
corruption commission. In principle, it is also straightforward to document details
of the legal and regulatory environment, such as how many legal steps are
required to register a busincss or fire a worker. This clarity also implies that it is
straightforward to measure progress on such indicators. Has an anticorruption
commission been established.?Have business entry regulations been strcamlined?
Has a legal requirement for disclosure of budget documents been passed? This
clarity has made such indicators very appealing to aid donors interested in
linking aid with performance indicators and in monitoring progress on such
indicators. Set against these advantages are three main drawbacks.
   They are less "objective" than they appear. It is easy to overstate the clarity and
objectivity of rules-based measures of governance. In practice, a good deal of

Table 1. Sources and Types of Information Used in Governance Indicators

                                                                      mpe of indicator
                                        --

                                              Rules-bused                           Outcori~es-based
                                                   --
                                                                                                   A




Source oJinJormation                      Broad         Specific             Broad                    Specific

Experts

Lawyers                                                DB

Commercial risk-rating agencies                                       DRI. EIU. PRS
Nongovernmental organizations                          GI1            HER, RSE CIR. FRH          GII, OBI
Governments and multilaterals                                         CPIA                       PEFA
Academics                               DPI. PIV                      DPI, PIV
Survey respondents

Firms                                                                                            ICA. GCS, WCY
Individuals                                                           AFR, LBO. GWP

Aggregate indicators combining experts and survey respondents         TI. WGI. MOI
                  ---

   Note: AFR is Afrobarometer, CIR is Cingranelli-Richards Hutnan Rights Dataset. CPIA is Country Policy and
Institutional Assessment. DB is Doing Business, DPI is Database of Political Institutions. DRI is Global Insight
DRI, EIU is Economist Intelligence Unit. FRH is Freedom House. GCS is Global Competitiveness Survey, GI1 is
Global Integrity Index, GWP is Gallup World Poll, HER is Heritage Foundation. ICA is Investment Climate
Assessment, LBO is Latinobarometro. MOI is Ibrahim Index of African Governance. OBI is Open Budget Index.
PEFA is Public Expenditure and Financial Accountability. PIV is Polity IV, PRS is Political Risk Services, RSF is
Reporters Without Borders. TI is Transparency International, WCY is World Competitiveness Yearbook, and WGI
is WorldwideGovernance Indicators.
   Source: Authors' compilation based on data from sources listed in table 2.




subjective judgment is involved in codifying all but the most basic and obvious
features of a country's constitutional, legal, and regulatory environments. (It is no
accident that the views of lawyers, on which many of these indicators are based,
are commonly referred to as opinions.) In Kenya in 2007, for example, a consti-
tutional right to access to information faced being undermined or offset entirely by
an official secrecy act and by pending approval and implementation of the Freedom
of Information Act. In this case, codifying even the legal right to access to infor-
mation requires carefuljudgment as to the net effect of potentially conflicting laws.
Of course,this drawback of ambiguity is not unique to rules-based measures of gov-
ernance: interpreting outcome-based indicators of governance can also involve
ambiguity, as discussed bclow. There has been less recognition, however, of the
extent to which rules-based indicators also reflect subjective judgment.
   The links between indicators and outcomes are complex. possibly subject to long lags,
and often not well understood. These problems complicate the interpretation of
rules-based indicators. In the case of rules-based measures, some of the most
basic features of countries' constitutional arrangements have little normative
content on their own; such indicators are for the most part descriptive. It makes


6                                                  T11rMbrld Rur~kResearch (Il?srrwr:vol. 23, no. 1 (Spring 200X)

Table 2. Country Coverage and Frequency of Governance Surveys
                                                    -

                                       Number of       Frequencg of
.Varnr                               countriescovered    surveg                Web site
-
    -


Afrobarometer                               18        Triennial     www.afrobarometer.org
Cingranelli-Richards Human Rights         192         Annual        www.humanrightsdata.com

 Dataset
Country Policy and Institutional          136         Annual        www.worldbank.org
 Assessment
Doing Business                                        Annual

Database of Political Institutions                    Annual

Global Insight DRI                                    Quarterly

Economist Intelligence Unit                           Quarterly

Freedom House                                         Annual

Global Competitiveness Survey                         Annual

Global Integrity Index                                Triennial

Gallup World Poll                                     Annual

Heritage Foundation                                   Annual

Investment Climate Assessment                         Irregular

Latinobarometro                                       Annual

Ibrahim Index of African Governance                   Triennial

Open Budget Index                                     Annual

Polity IV                                             Annual

Political Risk Services                               Monthly

Public Expenditure and Financial                      Irregular

  Accountability
Reporters without Borders                  165        Annual        www.rsf.org

World Competitiveness Yearbook              47        Annual        www.imd.ch

   Source:Authors' compilation.




little sense,for example,to presuppose that presidential (asopposed to parliamentary)
 systems or majoritarian (as opposed to proportional) representation in voting
 arrangements are intrinsically good or bad. Interest in such variables as indi-
 cators of governance rests on the case that they may matter for outcomes, often
 in complex ways. In their influential book, Persson, Torsten, and Tabellini (2005)
 document how these features of constitutional rules influence the political process
 and ultimately outcomes such as the level, composition, and cyclicality of public
 spending (Acemoglu 2006) challenges the robustness of these findings). In such
 cases, the usefulness of rules-based indicators as measures of governance depends
crucially on how strong the empirical links are between such rules and the
 ultimate outcomes of interest.
    Perhaps the more common is the less extreme case in which rules-based indi-
cators of governance have normative content on their own, but the relative


Kaufn~annand Kmny                                                                           7

importance of different rules for outcornes of interest is unclear. The GII, for
example, provides information on the existence of dozens of rules, ranging frorn
the legal right to freedom of speech to the existence of an independent ombuds-
man to the presence of legislation prohibiting the offering or acceptance of bribes.
The Open Budget Index (OBI) provides detailed information on the budget
processes, including the types of information provided in budget documents,
public access to budget documents, and the interaction between executive and
legislative branches in the budget process. Many of these indicators arguably have
normative value on their own: having public access to budget documents is desir-
able and having streamlined business registration procedures is better than not
having them.
  This profusion of detail in rules-based indicators leads to two related difficulties
in using them to design and monitor governance reforms. The first is that as a
result of absence of good information on the links between changes in specific
rules or procedures and outcomes of interest, it is difficult to know which rules
should be reformed and in what order. Will establishing an anticorruption com-
mission or passing legislation outlawing bribery have any impact on reducing cor-
ruption? If so, which is more important? Should, instead, more efforts be put into
ensuring that existing laws and regulations are implemented or that there is
greater transparency, access to information, or media freedom? IIow soon should
one expect to see the impacts of these interventions? Given that governments typi-
cally operate with limited political capital to implement reforms, these trade-offs
and lags are important.
  The second difficulty in designing or monitoring reforms arises when aid donors
or governments set performance indicators for governance reforms. Performance
indicators based on changing specific rules, such as the passage of a particular
piece of legislation or a reform of a specific budget procedure, can be very attractive
because of their clarity: it is straightforward to verify whether the specified policy
action has been taken.3Yet "actionable" indicators are not necessarily also "action
worthy," in the sense of having a significant impact on the outcomes of interest.
Moreover, excessive emphasis on registering improvements on rules-based indi-
cators of governance leads to risks of "teaching to the test" or, worse "reform illu-
sion," in which specific rules or procedures are changed in isolation with the sole
purpose ofshowing progress on the specific indicators used by aid donors.


Major gaps exist between statutory rules on the books and their implementation on the
ground. To tale an extreme example, in all 41 countries covered by the 2006 GII,
accepting a bribe is codified as illegal, and all but three countries (Brazil.
Lebanon, and Liberia) have anticorruption commissions or similar agencies. Yet
there is enormous variation in perceptions-based measures of corruption across
these countries. The 41 countries covered by the GI1 include the Democratic


8                                        Thr Worltl Rank Resectrch Ohsrrv~cvol. 2 3.110.I (Spring 2008)

Republic of Congo, which ranks 200 out of 207 countries on the 2006
Worldwide Governance Indicators (WGI) control of corruption indicator, and the
United States, which ranks 23.
  Another example of the gap between rules and implementation (documented
in more detail in Kaufmann, Kraay, and Mastruzzi 2005) compares the statutory
ease of establishing a business with a survey-based measure of firms' perceptions
of the ease of starting a business across a large sample of countries. In industrial
countries, where de jure rules are often implemented as intended, the two
measures correspond quite closely. In contrast, in developing economies, where
there are often gaps between de jure rules and their de facto implementation, the
correlation between the two is very weak; the de jure codification of the rules and
regulations required to start a business is not a good predictor of the actual con-
straints reported by firms. TJnsurprisingly, much of the difference between the
de jure and de facto measures could be statistically explained by de facto measures
of corruption, which subverts the fair application of rules on the books.
   The three drawbacks-the     inevitable role of judgment even in "objective" indi-
cators, the complexity and lack of knowledge regarding the links from rules to
outcomes of interest, and the gap between rules on the boolis and their
implementation on the ground-suggest         that although rules-based governance
indicators provide valuable information, they are insufficient on their own for
measuring governance. Rules-based measures need to be complemented by and
used in conjunction with outcome-based indicators of governance.


Outcome-Based Governance Indicators

Most indicators of governance are outcome based, and several rules-based indi-
cators of governance also provide complementary outcome-based measures. The
GII, for example, pairs indicators of the existence of various rules and procedures
with indicators of their effectiveness in practice. The Database of Political
Institutions measures not only such constitulional rules as the presence of a par-
liamentary system, but also outcomes of the electoral process, such as the extent
to which one party controls different branches of government and the fraction of
votes received by the president. The Polity-IV database records a number of out-
comes, including the effective constraints on the power of the executive.
   The remaining outcome-based indicators range from the highly specific to the
quite general. The OBI reports data on more than 100 indicators of the budget
process, ranging from whether budget documentation contains details of assump-
tions underlying macroeconomic forecasts to documentation of budget outcomes
relative to budget plans. Other less specific sources include the Public Expenditure
and Financial Accountability indicators. constructed by aid donors with inputs of
recipient countries, and several large cross-country surveys of firms-including

the Investment Climate Assessments of the World Bank, the Executive Opinion
Survey of the World Economic Forum, and the World Competitiveness Yearbook of
the Institute for Management Development-that               ask firms detailed questions
about their interactions with the state.
   Examples of more general assessments of broad areas of governance include
ratings provided by several commercial sources, including Political Risk Services,
the Economist Intelligence Unit, and Global Insight-DRI. Political Risk Services
rates 10 areas that can be identified with governance, such as "democratic
accountability," "government stability," "law and order," and "corruption." Large
cross-country    surveys of individuals such             as the          Afrobarometer and
Latinobarometro surveys and the Gallup World Poll ask general questions, such
as "Is corruption widespread throughout the government in this country?"
   The main advantage of outcome-based indicators is that they capture the views
of relevant stakeholders, who take actions based on these views. Governments,
analysts, researchers, and decisionmakers should, and often do, care about public
views on the prevalence of corruption, the fairness of elections, the quality of
service delivery, and many other governance outcomes. Outcome-based govern-
ance indicators provide direct information on the de fact0 outcome of how de jure
rules are implemented.
   Outcome-based measures also have some significant limitations. Such
measures, particularly where they are general, can be difficult to link back to
specific policy interventions that might influence governance outcomes. This is
the mirror image of the problem discussed above: Rules-based indicators of gov-
ernance can also be difficult to relate to outcomes of interest. A related difficulty
is that outcome-based governance indicators may be too close to ultimate develop-
ment outcomes of interest. To take an extreme example, the Ibrahim Index of
African Governance includes a number of ultimate development outcomes, such
as per capita GDP (gross domestic product), growth of GDE inflation, infant mor-
tality, and inequality. While such development outcomes are surely worth moni-
toring, including them in an index of governance risks making the links from
governance to development tautological.
   Another difficultyhas to do with interpreting the units in which outcomes are
measured. Rules-based indicators have the virtue of clarity: either a particular
rule exists or it does not. Outcome-based indicators by contrast are often
measured on somewhat arbitrary scales. For example, a survey question might
ask respondents to rate the quality of public services on a five-point scale, with
the distinction between different scores left unclear and up to the res~ondent.~In
contrast, the usefulness of outcome-based indicators is greatly enhanced by the
extent to which the criteria for differingscores are clearly documented. The World
Bank's Country Performance and Institutional Assessment (CPIA) and the
Freedom House indicators are good examples of outcome-based indicators based


10                                      The World Bank Researcll Observer; vol. 23, no. 1 (Spring 2008)

on expert assessments that provide documentation of the criteria used to assign
specific scores on the indicators they compile. In the case of surveys, questions
can be designed to ensure that responses are easier to interpret: rather than
    asking respondents whether they think "corruption is widespread," respondents
    can be asked whether they have been solicited for a bribe in the past month.
      An example illustrates some of the main advantages and disadvantages of the
    two types of measures. Figure 1 compares alternative indicators of democratic
    accountability, a key dimension of governance. The horizontal axis measures a
    very broad outcome-based indicator, taken from the 2005 Voice of the People
    survey, a large cross-country household survey (www.voice-of-the-people-net).It
    asks households to indicate whether they think elections in their country are free
    and fair. The vertical axis reports two indicators of the qualily of electoral insti-
    tutions, laken from Global Integrity. The points labeled "de jure" are based on a
    factual assessment of the existence of a number of specific institutions related to



    Figure 1. De lacto and de jure Indicators of Elections
----                                                                                                -


                      .- 100-                       *                      e         D
                      -3                      ee           a          * Isp "
                      .-
                      C


                      .-
                      -5
                      "2                    C O                                     m
                                                   *r
                          80-
                                                       Y U 4 BGR             GHAg    = ZAF
                                                                                    '


                    k 9
                          ,,-                      rn  NIC
                   u-0
                   al
                   c                        ETY
                   -            NGA:                                                   SEN
                   .-
                   z g                                                                 4
                   kz                     f'HL                                    8
                                                  9  PAK
                                                                           KEN IDN
                   -.-
                   EF                           'MEX
                   x-.q
                      e   40-             RUS
                   0

                                                  y = 23.09~+ 55.30              Defact0
                      D
                      C                                 R2=0.19
                          20 -
                      2
                      P

                           0 -
                                            I           1-                           1         I
                               0          0.2             0.4         0.6           0.8       1

                                             Voice of the PeopleHousehold Survey:
                                                Are electionsfree and fair?

     Note: i\RG is Argentina, ARM is Armenia. AZE is Azerbaijan. BEN is Benin, BRA is Brazil, BGR is Bulgaria.
ZAR is Democrdtic Republic of Congo, EGY is Egypt, ETH is Ethiopia. GEO is Georgia. GHA is Ghana, GTM is
Guatemala. IND is India. IDN is Indonesia. ISR is Israel, KEN is Kenya. KGZ is Kyrgyz Republic. T.BN is Lebanon.
LBR is Liberia. MEX is Mexico. MNP is Montenegro. MOZ is Mozambique, NPL 1s Nepal. NIC is Nicaragua. NGA
is Nigeria. PAK is Pakistan. PHL is Philippirles, ROM is Romania. RUS is Russia. SEN is Senegal. YIJG is Serbia.
SLE is Sierra Leone. WF is South Africa. SDN is Sudan. TJKis Tajikistan. TZA is Tanzania. UGA is Uganda. USA
is United States. \'NM is Vietnam. YEM is Yemen, and ZWE is Zimbabwe.
     Source: Authors' analysis based on data described in the text.




Kaufmar~r~and Kma!/                                                                                          11

elections, such as the existence of a legal right to universal suffrage and the exist-
ence of an election monitoring agency. The points labeled "defacto" capture the
assessment of Global Integrity's experts as to the effectiveness of these
institutions.'
   Several messages emerge from this figure. First, in some cases rules-based
measures of governance show remarkably little variation across countries, with
all countries receiving scores close to 100,indicating perfect scores on the de jure
basis of this important aspect of governance. As of 2005, for example, every
country surveyed by Global Integrity promised the legal right to vote, and a statu-
torily independent election-monitoring agency existed in all but three countries
(Lebanon, Montenegro, and Mozambique). Second, the links between a specific
objective indicator of rules and the broad outcome of interest (citizens' satisfaction
with elections) is at best very weak. with a correlation between the two measures
that is in fact slightly negative. Third, outcome-based indicators explicitly focusing
on the de Jncto implementation of rules can be useful. A noteworthy feature of
Global Integrity is its pairing of indicators of specific rules with assessments of
their functioning in practice. The correlation of the de facto measure with the
broad outcome measure of interest taken from the Voice of the People survey is
much stronger (0.46)than the correlation with the de jure measure. The corre-
lation is far from perfect, however, indicating the importance of relying on a
variety of indicators when assessing governance in a country.




Whose Views Should We Rely On?

A variety of governance assessments are produced by experts on behalf of com-
mercial risk-rating agencies and NGOs. The GI1 and the OBI, for example, rely on
locally recruited experts in each country to complete their detailed questionnaires
about governance, subject to peer review. Commercial organizations such as the
Economist Intelligence Unit rely on a network of local correspolldents in a large
set of countries to provide information underlying the ratings they produce. Other
advocacy organizations, such as Amnesty International, Freedom House, and
Reporters without Borders, also rely on networlts of respondents for the infor-
mation underlying their assessments.
   Governments and multilateral organizations are also major producers of expert
assessments. Some of the most notable include the Country Policy and Institutional
Assessments, produced by the World Bank, the African Development Bank, and the
Asian Development Bank. Each of these assessments is based on the responses of
each institution's country economists to a detailed questionnaire, responses that
are then reviewed for consistency and comparability across countries. The Public


12                                       The 1,VorId Hnrlk Res~nrchObservuc vul. 23, no. 1 (Spring 2008)

Expenditure and Financial Accountability indicators mentioned earlier are also
based on experts' views.
   Several large cross-country surveys of firms and individuals contain questions
on governance. These include the Investment Climate Assessment and
the Business Environment and Enterprise Performance Surveys conducted by the
World Bank; the Executive Opinion Survey of the World Economic Forum; the
World Competitiveness Yearbook; Voice of the People; and the Gallup World Poll.



Expert Assessments

Expert assessments have several major advantages, which account for their pre-
ponderance among various types of governance indicators. One is cost: it is much
less expensive to asli a selection of country economists at the World Bank to
provide responses to a questionnaire on governance as part of the CPIA process
than to carry out representative surveys of firms or households in a hundred or
more countries. The second advantage is that expert assessments can more
readily be tailored for cross-country comparability: Many of the organizations
listed in table 2 have elaborate benchmarking systems to ensure that scores are
comparable across countries. Finally, for certain aspects of governance experts are
the natural respondents for the type of information being sought. (Consider, for
example, the OBI'S detailed questionnaire on national budget processes, the par-
ticulars of which are not the sort of common lcnowledge that survey data can
easily collect.)
   Expert assessments nevertheless have several important limitations. A basic
one is that, like survey respondents, different experts may have different views
about similar aspects of governance. While this is not surprising, it suggests that
users of governance indicators should be cautious about relying too heavily on
any one set of expert assessments. These differences are evident in comparing the
CPTA ratings of the World Bank and the African Development Bank, which in
recent years harmonized their procedures for constructing CPIA ratings. An iden-
tical questionnaire covering 16 dimensions of policy and institutional perform-
ance is completed by two very similar sets of expert respondents-country
economists with in-depth experience working on behalf of these two organizations
in the countries they are assessing. Despite the homogeneity of the respondents
and the very similar rating criteria, there are nontrivial differences between the
two organizations' assessmellts on the 16 components of the CPIA (table 3). For
example, the 0.67 correlation between the two assessments on the question on
transparency, accountability, and corruption in the public sector is far from
perfect. suggesting that it is prudent to base assessments of governance for policy
purposes on the views of a variety of expert assessment^.^


Knujinann and Krany                                                              13

                                                                           - - - - - - -

Table 3. Correlation Among Alternative Indicators of Corruption
-

                                           Expert assessments                                  Surveys
                                                                            .-

                        World         African                                      World Economic
                        Bank       Development        Global     World Markets    Forum Executive          Gallup
lndicator               CPlA        Bank CPIA       Integrity       Online         Opinion Survey        World Poll
--

World Bank CPIA         1.OO          0.67            0.30          0.56                 0.25              0.13
African                                1.OO           0.49          0.51                 0.45              0.24
 Development Bank
 CPIA
Global Integrity                                      1.00          0.34                 0.29              0.11
World Markets                                                       1.OO                 0.88              0.59
 Online

World Economic                                                                           1.OO              0.70
 Forum Executive
 Opinion Survey
Gallup World Poll                                                                                          1.OO

  Source: Authors' analysis based o n data described in the text.




   The second criticism that the country ratings assigned by different groups of
experts are too highly correlated is just the opposite. Suppose that one set of
experts comes up with an assessment of governance for a set of countries based
on its own independent research and the second set of experts simply reproduces
the assessments of the first. In this case, the high correlation of two expert assess-
ments cannot be interpreted as evidence of their accuracy. Rather, it would reflect
the fact that the two sources make correlated errors in measuring g~vernance.~
   Nevertheless, even if the errors made by two data sources are highly, but not
perfectly correlated, there will be benefits to relying on both data sources. The
important empirical question is whether this hypothetical correlation of errors
across sources is large or not. Empirically identifying correlations in errors across
sources is difficult. Simply observing whether the assessments provided in the two
data sources are highly correlated is not enough, as the high correlation can
reflect the fact that both sources are either measuring governance accurately or
making correlated measurement errors.
   To make progress, one needs to make identifying assumptions. Kaufmann,
Kraay, and Mastruzzi (2006) detail two sets of assumptions that allow potential
sources of correlation in the errors to be disentangled. One is that surveys of firms
or individuals are less likely to make errors that are correlated with other data
sources than, for example, assessments by commercial risk-rating agencies. If this
is the case, however, one would expect that the assessments of commercial risk-
rating agencies would be very highly correlated with one another, but less so
with surveys. This turns out not to be the case. The average correlation of the five


14                                                  The World Bank Research Observer. vol. 23, no. 1 (Spring 2008)

major commercial risk-rating agencies for corruption in 2002-05 was 0.80. The
correlation of each of these assessments with a large cross-country survey of
firms was slightly higher (0.81), in contrast with what one would expect if the
rating agencies had correlated errors. Conducting this exercise across all six
aggregate governance indicators reveals at most modest evidence of error corre-
lation. While this is unlikely to be the final word on this important question, it is
a useful step forward to propose and implement tests of error correlation based on
explicit identifying assumptions.
   The third criticism is that expert assessments are subject to various biases.
Some researchers claim that many of these sources are biased toward the views of
the business community, which may have very different views of what constitutes
good governance than do other types of respondents. In short, goes the critique,
businesspeople like low taxes and less regulation, while the public good demands
reasonable taxation and appropriate regulation. This critique does not seem
particularly compelling. If it were true, the responses of commercial risk-rating
agencies, which serve mostly business clients, or the views of firms themselves to
questions about governance, should not be highly correlated with ratings provided
by respondents who are more likely to sympathize with the common good, such
as individuals, NGOs, or public sector organizations. Yet, in most cases, these
correlations are strong (Kaufmann, Kraay, and Mastruzzi 2007b). Cross-country
surveys of firms and of individuals, such as the World Economic Forum's
Executive Opinion Survey and the Gallup World Poll, yield similar corruption
rankings, with the two surveys correlated at 0.7 (table 3).
   Another potential source of bias in expert assessments, particularly those pro-
duced by NGOs, is that they are colored by the ideologicalorientation of the ratings
organization. Kaufmann,Kraay, and Mastruzzi (2004) find that the assessments of
think tanks and firm surveys are not systematically correlated with the political
orientation of a country's government, casting doubt on this possible source of
bias. A potentially greater problem of bias is at the country respondent level. For
example, the views of pro-government and antigovernment "experts" might be
very different,affecting both levels and trends over time. This risk is perhaps great-
est for the sources that rely on local experts, such as the GII. This risk is also much
more difficult to test for systematically. as the biases may affect individual country
scores without introducing systematic biases into the source as a whole.
Nevertheless, careful comparisons ofmany different data sources can often turn up
anomalies in a single source that require more careful scrutiny.


Surveys of Firms and Individuals

Governance indicators derived from surveys of firms and individuals have the fun-
damental advantage that they elicit the views of the ultimate beneficiaries of good


KauJinnnn and Kraa!,                                                                 15

governance, citizens and firms in a country. The views of these stakeholders
matter because they are likely to act on those views. If firms or individuals believe
that the courts and the police are corrupt, they are unlikely to try to use their ser-
vices (Hellman and Kaufmann 2004). Individuals are less likely to vote or to hold
their elected leaders accountable if they think that elections are not free and fair.
   Another advantage of governance indicators based on surveys of domestic firms
and individuals is greater domestic political credibility. Governments often dismiss
external expert assessments of governance as uninformed pontification by outsi-
ders. It is much harder for them to dismiss the views of their own citizens or of
firms operating in their country. Survey-based data on governance can therefore
be particularly useful in galvanizing the politics of governance reforms. The
experience of many countries implementing their own in-depth Governance and
Anti-Corruption diagnostics (assisted by the World Bank Institute and other
agencies and implemented with institutions in the requesting country), based on
in-country surveys of enterprises, users of services, and public officials, supports
this point: the views expressed by thousands of domestic stal<eholders provide
powerful input for action to reformist policymakers and civil society groups.
   Set against these important advantages of surveys are a number of disadvan-
tages. First, there is the usual array of potential problems with any type of survey
data, ranging from issues of sampling design to issues of nonresponse bias. Expert
assessments, which are based on the views of a very small number of respondents.
are less likely to be representative of the population of firms or househo~ds.~
While these generic issues are important for all surveys, the focus here is on diffi-
culties specific to measuring governance using survey data.
   Some survey questions on governance can be vague and open to interpretation.
An interesting example comes from the innovative recent work by Razafindrakoto
and Roubaud (2006). They use specially designed surveys in eight African
countries to contrast corruption perceptions based on household surveys with
those based on expert assessments. The unique feature of this exercise is that the
experts were asked to predict the country-level average responses from the house-
hold survey. Experts' ratings were essentially uncorrelated with the household
survey responses. The authors conclude that the household surveys capture the
"objective reality" of petty corruption and that the experts are just plain wrong.
   Their interpretation that there is measurement error only in the expert assess-
ment and not in the household survey is contestable. Households were asked
whether they had been "victims of corruption." There are a variety of reasons why
households might falsely think they were victimized by corruption. For example, a
patient waiting in line to see a state-provided doctor might think (incorrectly) that
people at the head of the line had bribed someone to get there. Conversely, house-
holds might well have paid a bribe, received the associated benefit, and found them-
selves quite satisfied and not at all "victimized" by the transaction. A more modest


16                                       The CZbrld Brrrrk Researcll Observer; vol. 23, rro. 1 (Spring 2008)

interpretation of their finding is that there likely is measurement error in both the
household survey and the matching expert assessments. Moreover, in many other
cases, expert assessments and household survey responses are strongly correlated
across much larger samplesof countries.
   Well-designed survey questions on corruption have become increasingly specific.
For example, questions in the Executive Opinion Survey of the World Economic
Forum in some years have asked firms to specifically report the fraction of contract
value solicited in bribes on public procurement contracts. Greater attention is also
being paid to techniques that enable respondentsto report more truthfully to sensitive
questions. For example, questions about corruption put to firms are often prefaced by
"in your experience,do firms like your own typically pay bribes for. ...?"Innovative
techniques such as randomized response methods are used to protect the confidenti-
ality of individual responses by allowing respondents to "camouflage" their response
to sensitive questions by generating some of their responses at random based on the
outcome of a coin toss, although these methods have not yet been widely used in
large cross-countrysurveys.' A related concern has to do with surveys in authoritar-
ian countries, where respondents might legitimately be fearful of responding truth-
fully to any question that might be interpreted as critical of the government.
   Another potential difficulty in cross-country surveys is cultural bias. It is often
argued that because respondents in different countries may have different norms
regarding what does or does not constitute corruption, their responses are not com-
parable across countries. Presumably, however, these cultural biases should not be
present in cross-country expert assessments that are deliberately designed to be
comparable across countries. PvIoreover, in many cases it turns out that surveys and
expert assessmentstend to produce very similar cross-country rankings. Kauf~nann,
Kraay, and Mastruzzi (2006) document strong correlations between expert assess-
ments and the World Economic Forum's Executive Opinion Survey for six different
dinlensions of governance. A glance at table 3 provides similar examples: the cross-
country correlation between the corruption assessments of World Markets Online,
a commercial rating agency, and the Executive Opinion Survey is 0.88. While
culture undoubtedly matters in interpreting survey responses across countries, the
problem does not appear to be a first-order difficulty.
   In short. each type of data has its own strengths and weaknesses. As neither
type of respondent is clearly superior for all purposes, it important to rely on a
diversity of data sources.



Should Aggregate or Individual Indicators Be Used?

Does it make sense to combine individual indicators of governance into aggregate
or composite indicators by combining information from multiple sources? Table 1

includes three aggregate indicators, the WGI, the Corruption Perceptions Index
(CPI)of Transparency International, and the very recently released Ibrahim Index
of African Governance.
   The WGI consist of six aggregate indicators of governance covering more than
200 countries and combining cross-country data on governance provided by 30
organizations. The CPI measures only corruption, using a smaller set of data
drawn from nine organizations. The WGI control of corruption indicator uses
these nine data sources, as well as 13 others not used in the CPI. The Ibrahim
Index is an extremely broad collection of a variety of types of governance indi-
cators and several very broad development outcomes, including per capita
income, growth, inequality, and poverty. This makes the Ibrahim Index by far the
broadest indicator surveyed here, but it also makes it difficult to think of it as a
pure governance indicator, because it contains many development outcomes as
well. However, three of the five components of the Ibrahim Index-based                    primarily
on subjective governance measures, such as those used by Transparency
International and the WGI--correspond more closely to established notions of
governance.


Ubiquitous Measurement Error

All governance indicators have limitations, which make them imperfect proxies
for the concepts they are intended to measure. The presence of measurement
error in all governance indicators that this implies is central to the rationale for
constructing aggregate indicators. It is useful to distinguish between two broad
types of measurement error that affect all types of governance indicators.
  Any specific governance indicator will have measurement error relative to the
concept it seeks to measure, because of intrinsic measurement challenges.
A survey question about corruption, for example, will have the usual sampling
error associated with it. Efforts to objectively document the specifics of the insti-
tutional environment or regulatory regime face challenges in coming up with a
factually accurate description of the relevant laws and regulations in each setting.
Measures of the composition and volatility of public spending, for example, which
are sometimes interpreted as indicators of undesirable policy instability, are
subject to all of the usual difficulties in measuring public spending consistently
across countries and over time. Finally, different groups of experts may come up
with different assessments of the same phenomenon in a particular country.
These divergences of opinion can also be interpreted as measurement error.
  To the extent that one is interested in broad concepts of governance, any
specific indicator is almost by definition an imperfect measure of the broader con-
cepts to which it pertains, no matter how accurate or reliable it is. A specific
assessment of corruption in public procurement would not be fully informative


18                                      The World Bank Resenrch Observe,: vol. 23. no. 1 (Spring 2008)

about overall corruption in the public sphere even if it were fully accurate about
this specific type of corruption. Information about the statutory requirements for
business entry regulation need not reflect the actual practice of how these
requirements are implemented on the ground, and they are not informative about
regulatory burdens in other areas. Information about freedom of the press is only
one of many factors contributing to the accountability of governments to their
citizens. Notwithstanding some clear advantages that specificity of an indicator
may have for some purposes, one should be careful not to interpret them as suffi-
cient statistics for broader notions of governance.
   How important is this measurement error? Unfortunately, the vast majority of
governance indicators do not explicitly aclcnowledge the extent of measurement
error. One of the few exceptions is the WGI, discussed below. Fortunately, some
simple calculations can shed light on the likely magnitude of measurement error
in individual governance indicators as well. The key to doing so is to identify pairs
of indicators that measure similar concepts, up to an unavoidable measurement
error component. A useful way to interpret the imperfect correlation between the
World Bank's CPIA and African Development Bank's CPIA regarding transpar-
ency and corruption is to note that both are measuring the same concept of
transparency and corruption but with a degree of measurement error. Intuitively,
the less measurement error there is in these two sources, the more correlated
they should be. Thus one can interpret the correlation between them as saying
something about the degree of measurement error present.
   More formally, think of the observed scores from two organizations, yl and y2,
as a combination of a signal of unobserved governance, g, and source-specific
noise,                             +                +E,).
       E I and e2 (that is, y l = g   EIand y2 = g         Assume that the variance
of measurement error in the assessments of the two organizations is the same,
and without loss of generality, that the variance of governance is one.'" Some
simple arithmetic reveals that the standard deviation of measurement error is
SD(E) =   d m ,where            p is the correlation between the two expert assess-
ments." For several pairs of indicators discussed in the article, this standard devi-
ation of error ranges from 0.70 to 1.53 (table 4). The standard errors associated
with standard deviation of error indicators such as the WGI are much smaller,
reflecting the benefits of aggregation in reducing noise in the individual indi-
cators. The standard error for the WGI estimate of control of corruption for a
typical country in 2006 is just 0.1 7. or less than a quarter of the standard error
of the most precise pair of individual indicators in this example.
   To appreciate the magnitude of this measurement error, it is useful to go one
step further and calculate the width of a 90 percent confidence interval for gov-
ernance based on any one of these individual indicators and on the additional
assumption that governance and the error term are jointly normally distributed.
The width of the confidence interval is 2x1.64xSD(gly) = 3.28xJI -?.


KnuJmnnn and Kraay                                                                19

Table 4. Measurement Error in Individual Governance Indicators

                                                                            Standard devintion         Width of
Measure                                                        Corre-lation      o f ~ r r o r    ronfidcnce interval
--

Transparency, accountability, and corruption                     0.67            0.70                   1.88

 World Bank CPIA-16
 African Development Bank CPIA-16
Business entry regulation                                        0.48             1.04                  2.37
 World Economic Forum Executive Opinion Survey

 Doing Business
Elections
 Global Corruption Baromeler Survey
 Global Intearitv Elections Index

  Source: Authors' analysis based on data described in the text.




Since the assumptions imply that 95 percent of countries would have governance
levels between -2 and 2, these figures imply that a 90 percent confidence interval
for governance for any individual country would span one-half to two-thirds of
the entire most likely range of the governance indicator.


Why Aggregate Indicators?

All indicators of governance include measurement error. Aggregate indicators of
governance can be a useful way of combining, organizing, and summarizing infor-
mation from different sources, thereby reducing the influence of measurement
error in any individual indicator. Aggregation also allows for the construction of
explicit margins of error for both the aggregate indicator and its component indi-
vidual indicators.
   The WGI illustrate how these margins of error can be calculated (box 1).The
statistical methodology underpinning the WGI (the unobserved-components
model) explicitly assumes that the true level of governance is unobservable and
that the observed empirical indicators of governance provide imperfect signals of
the fundamentally unobservable concept of governance. This formalizes the
notion that all available indicators are imperfect proxies for governance. The esti-
mates of governance that come out of this model are simply the conditional
expectation of governance in each country, conditioning on the observed data for
each country. Moreover, the unobserved-components model allows one to sum-
marize uncertainty about these estimates for each country with the standard
deviation of unobserved governance, conditional on the observed data. These
standard deviations can be used to construct confidence intervals for governance
estimates, often referred to informally as margins of error. Intuitively, the larger
the number of data sources available for a given country, the smaller these


20                                                 The World Bonk Research 0hsc~rvc.cvol. 2 3. rto. 1 (Spring 2008)

margins of error should be. The variance of the error term can be estimated
in each individual underlying governance indicator using this methodology,
following a calculation that generalizes the simple one discussed above.


---- -- - -


Box 1. The World~videGovernance Indicators: Critiques and Responses
                                                     --

The Worldwide Governance Indicators (WCiI) are among the most wrdely used cross-country governance
indicators (see Kaufmann. Kraay, and Mastruzzi 2007a for a description). They report on     SIX dimensions

of governance for more than 200 countries for 1996-2006. The indicators are based on hundreds of

underlying i~~d~vidualindicators drawn from 30 organizations, based on responses from tens of

thousands of citizens, enterprise managers, and experts.
  As one of the most prominent and widely used collections of cross-country governance indicators, the
WGI have naturally generated criticiqm. Most of these criticisms appear largely invalid (Kaufmann. Kraay,

and Mastruzi 2007bj.

Lnck o/ Comparabilltg over Tiir~enr~ilacross Cottntries
Several critics have raised concerns that the WCI are not comparable over time and across countries, that

the indicators use units that set the global average of governance to be identical in all periods, that

comparisons of pairs ol' countries or single countries over time are based on difltrent sets of underlying
data sources, and that there are substantial margirls of error in the aggregate WGI.

  These criticisms appear unjustified, for several reasons. First, there is 110clear evidence of a trend in
one direction or another in global averages of governance in any of the underlying individual data
sources (the overall evidence pointing to general stagnation). The choice of a constant global average is

therefore no more than an innocuous choice of units. Second. changes in the set of underlying data

sources on average contributes only minimally to changes over time in countries' scores on the aggregatc
WGI: the majority of cross-country comparisons using the aggregate WGI are based on a substantial
number of common data sources. Third, the presence of explicit margins of error in the WGI is an
important advantage. serving as a useful antidote to superficial comparisons of country ranks or country
performance over time that are often made with other governance indicators. A substantial fraction of

cross-country and over-time comparisons using the WGI result in statistically significant differences.

suggesting that thc WGI are informative.

Biases ir~Expert Assessments
Several critics have alleged various biases in the data sources underlying the WGI, including an
excessive emphasis on business-friendly regulation on the part of some data providers: ideological biases.

such as a bias against left-wing governments, on the part of some data providers; and "halo effects."
whereby countries with good economic performance receive better-than-warranted governance scores.

Convincing empirical evidence in support of such biases has not been provided. Empirical work by
Kaufmann. Kraay and Mastruzzi (discussed in the main textj suggests that these biases are qua~ltitatively
unin~portant.

CorrelatedPer[:rplior~Errors
Several critics have suggested that expert assessments make similar errors when assessing the same
country, leading to correlations in the perception errors across expert assessments. While this is plausible.
there is little convincing empirical evidence: work by Kaufmann. Kraay. and Mastruzzi (discussed below)

suggests that these biases are quar~titativelyunimportant.
                            -               --                        ---
                                                                                                 iCcintinurd)

                                             Box 1. Continued

  A related concern is that correlated perception errors will lead to the overweighting of such sources in
the aggregate WGI, which weights individual data sources by estimates of their precision, which in turn

are based on the observed intercorrelation among sources. Given the at best modest evidence of
correlated perceptions errors, this is unlikely to be quantitatively important. The WGI country rankings
are highly robust to alternative weighting schemes (Kaufmann,Kraay, and Mastruzzi 2006).

DefinitionalIssues
Some critics have taken issue with the definitions of governance and thus the assignment of individual

governance indicators to the six aggregate WGI. As there is no consensus on the definition of
governance, there cannot be any right or wrong definitions or corresponding measures of governance.
That said, most reasonable definitions of governance cover similar broad areas, and aggregate indicators
capturing these broad areas are likely to be similar. Moreover, as virtually all of the individual indicators
underlying the WGI are publicly available on the WGI Web site, researchers can easily construct
alternative indicators corresponding to their preferred notions of governance.

Reliance on "Subjective" Data
Various critics have argued that the perceptions-based data on which the WGI are based do no more
than reflect vague and generic perceptions rather than specific objective realities and that "specific.
objective, and actionable" measures of governance are needed to make progress in governance reforms.

Virtually all governance indicators necessarily involve some element of subjectivity Perceptions-based
data are extremely valuable, because they capture the views of relevant stakeholders who act on these
views. Moreover, the links from specific changes to policy rules are very difficult to link to changes in
outcomes of interest, making it is difficult to identify indicators that are "action worthy" as opposed to
merely "actionable."
-                                  ------



   From the standpoint of users the margins of error associated with estimates of
governance are nontrivial. For many pairs of countries with similar scores on the
2006 WGT control of corruption indicator, the confidence intervals overlap, indi-
cating that the small differences between them are unlikely to be statistically, or
practically, significant (figure 2). However, possible pair-wise comparisons
between countries do result in significant differences. Roughly two-thirds of the
possible pair-wise comparisons of corruption across countries result in differences
that are significant at the 90 percent confidence level, and nearly three-quarters
of comparisons are significant at the 75 percent confidence level. Clearly, far
fewer pair-wise comparisons would be significant if they were based on any single
individual indicator, whose margins of error had not been reduced by averaging
across alternative data sources. For the WGI control of corruption indicator, for
example, only 16 percent of cross-country comparisons based on a typical indi-
vidual data source, such as Global Insight-DRI, would be significant at the
90 percent confidence level.
   The WGI are unusual among governance indicators in their transparent recog-
nition of such margins of error. The vast majority of investment climate and gov-
ernance indicators simply report country scores or ranks, without quantifying the


22                                                The World Bank Resenrrh Observer;vol. 23, no. 1 (Spring 2008)

Figure 2. Margins of Error in Estimates of Governance in Selected Countries,2006




               0
               IS)

                                                     Margins of error


                                         Governancelevel




               8
             g $-
             a $
               0
               IS)




  Source: Kaufmann. Kraay, and Mastruzzi. 2007a




measurement error these rankings inevitably contain. This has contributed to a
spurious sense of precision among users of these indicators and to an overempha-
sis on small differences across countries.
   Of course, aggregate indicators have their own shortcomings. Foremost
among them is the inevitable loss of specificity. Averaging one indicator of judicial
corruption and another indicator of bureaucratic corruption arguably yields a
more informative indicator of overall corruption, but not necessarily a more
informative indicator of either of the two specific types of corruption. Averaging
an indicator of freedom of the press with an indicator of electoral integrity
yields a more informative indicator of overall democratic accountability, but not
of either particular concept. For some purposes the broad aggregate indicators
will be useful; for others the disaggregated indicators will be more useful. This
is not a shortcoming, however, because virtually all aggregate governance
indicators can readily be disaggregated into their constituent components, giving
the user the freedom to choose the appropriate level of aggregation for the task
at hand.12


KauJhnnn ond Krollg

   The second concern with aggregate indicators is that their effectiveness at
reducing measurement error depends crucially on the extent to which their
underlying sources provide independent information on governance. Some types
of expert assessments may make correlated errors in their governance rankings
(although empirical evidence suggests that these error correlations are not likely
to be very large). Aggregate indicators can mitigate only the component of
measurement error that is truly independent across the different underlying indi-
cators. This point is particularly relevant when contrasting multiple- and single-
source aggregate indicators.
  The WGT are multiple-source aggregate indicators, combining information from
a large number of sources. In contrast. many other data sources report aggregates
of their own subcomponents. For example, there is an aggregate CPIA rating in
conjunction with the 16 underlying components, and there are six aggregate
Global Integrity indicators, which combine information from more than 200
underlying individual indicators. All of the underlying individual indicators for a
given country are scored by the same respondents. As a result, any respondent-
specific biases are likely to be reflected in all of the individual indicators; the gain
in precision from relying on the aggregate indicators from these sources will not
be as large as when aggregate indicators are based on multiple sources.
  In summary, aggregate governance indicators can play a useful role in synthe-
sizing and summarizing the large variety of individual governance indicators.
Using aggregate indicators is one way to exploit the complementarities between
the different types of indicators (rules or outcomes, surveys or experts).
Aggregation can also increase the precision with which these aggregate indicators
measure broad but unobservable concepts of governance. Of course, for some pur-
poses, more specific indicators are useful. It is thus important to be able to easily
disaggregate aggregate indicators into their constituent components, as is the
case with the WGI.



Moving Forward

A sobering picture emerges from this review: while most indicators of governance
have many virtues, all face distinct challenges. Researchers, therefore, need to
look at a variety of indicators and sources when monitoring or assessing govern-
ance across countries, within a country, or over time. A few principles may be
useful as this work, as the use of governance indicators in public sector policy-
making and civil society monitoring, continues.
  Avoid Jalsc ciichotornies. Too often, discussions of governance indicators over-
emphasize distinctions between types of governance indicators, with insufficient
regard for the strong complementarities between them. Artificially, sharp


24                                       Tllr World Btrnk Krscorc.l~Ohserl~c~r:vol. 23, rlo. 1 (Spring 2OOX)

distinctions are often drawn between "subjective" and "objective" indicators of
governance, when, in fact, virtually all indicators of governance rely on the
judgments or perceptions of respondents in one way or another. In some cases,
even the terminology is misleading. The recently released Ibrahim Index of
African Governance, for example, touts itself as providing objective assessments of
governance, even though its core governance components are based primarily on
purely subjective data, including the Transparency International CPI and subjec-
tive ratings by the Heritage Foundation and the Economist Intelligence Unit.
   Distinctions between aggregated and disaggregated indicators often have an
artificial element also. Some aggregate indicators transparently disclose each dis-
aggregated source, enabling users to tale advantage of the complementarities
between the two types of indicators and blurring the distinction between the two.
For some purposes, it is useful to combine information from many individual indi-
cators into some kind of summary statistics, while, for other purposes, the disag-
gregated data are of primary interest. Even where disaggregated data are of
primary interest, however, it is important to rely on a number of independent
sources for validation, because the margins of error and the likelihood of extreme
outliers are significantly higher for a disaggregated indicator.
   An excessively narrow emphasis on "actionable" indicators detailing specific
policy interventions immediately under the control of governments can divert
attention from equally important discussions of which of these indicators are
"action worthy," in the sense of having significant impacts on outcomes of inter-
est. The answer is often context-specific and rarely obvious a priori. Focusing too
much on "actionable" indicators while downplaying scrutiny of outcome indi-
cators may result in undue emphasis on measures that may not translate into
concrete progress.
   Usc indicators appropriate for the task at hand. As with all tools, different types of
indicators are suited for different purposes. Governance indicators can be used for
regular cross-country comparisons. While many of these indicators have become
increasingly specific,they often remain blunt tools for monitoring governance and
studying the causes and consequences of good governance at the country level.
For these purposes, a wide variety of innovative tools and methods of analysis has
been deployed in many countries (reviewing these methods is beyond the scope of
this survey). Examples of in-country tools include the World Bank's Investment
Climate Assessments, the World Bank Institute's Governance and Anti-Corruption
diagnostics, the corruption surveys conducted by some chapters of Transparency
International, and the institutional scorecard carried out by the Public Affairs
Center in Bangalore, India. Many project-specific interventions and diagnostics
are possible to measure governance at this level.13
   Public and projessional scrutiny is essential lor the credibility of goverr?arrce ir-tdi-
crrtors. Virtually all of the governance indicators listed in table 2 are publicly

available, either commercially or at no cost to users. This transparent feature is
central to their credibility for monitoring governance. Open availability permits
broad scrutiny and public debate about the content and methodology of indi-
cators and their implications for individual countries. Many indicators are also
produced by nongovernmental actors, making it more likely that they are
immune from either the perception or the reality of self-interested manipulation
on the part of the government. Scholarly peer review can also strengthen the
quality and credibility of governance indicators. For example, articles describing
the methodology of the Doing Business indicators, the Database of Political
Institutions, and the WGI have appeared in peer-reviewed professional journals.
Transparency with respect to details of methodology and its limitation is also
essential for credible use of governance indicators. It is important that users of
governance indicators understand fully the characteristics of the indicators they
are using, including any methodological changes over time and time lags
between the collection of data and publication.
   It is thus of concern that some proposed and existing indicators of governance
are insufficientlyopen to public scrutiny. While the recent disclosure of the World
Bank's CPIA ratings for low-income countries represents a positive step, these
indicators are being disclosed for only about half of the roughly 130 countries for
which they are prepared each year, and none of the historical data from 2005 or
earlier are publicly available. Historical data on the CPIA ratings of the African
Development Bank and Asian Development Rank have also not been disclosed
publicly. This is unfortunate, given that the decision to selectively disclose recent
CPIA data and not to disclose historical CPIA data is made by the executive
boards of these organizations and therefore reflects the desire of the very govern-
ments these ratings are supposed to assess. Regarding transparency, it is also of
concern that although the Public Expenditure and Financial Accountability
initiative has been ongoing since 2000, it had resulted in indicators and reports
on just 42 countries as of March 2007, for only one period per country, only nine
of them publicly available. Moreover, because these reports are prepared in collab-
oration with the governments in question, their credibility may not be the same
as those associated with third-party indicators. Similar concerns affect recent
Organization for Economic Co-operation and Development-led efforts to construct
indicators of public procurement practices.
   Transparently acknowledge margins of'error of'all governance indicators. All govern-
ance indicators include measurement error and so should be thought of as imper-
fect proxies for the fundamentals of good governance. This is not just an abstract
statistical point, but rather one of fundamental importance for all users of govern-
ance indicators. Wherever possible, such margins of error should be explicitly
acknowledged, as they are in the WGI, and taken seriously when the indicators
are used to monitor progress on governance. At times the lack of disclosure of


26                                      Tlw CVorld Rank Ri>setrrcllObsrnsr; ~ ~ o23.
                                                                                 l . 110.1 (Sprirrg 2008)

margins of error is rationalized by suggesting that they would be missed by most
readers. Experience with the WGI suggests that this is not the case, with many
users recognizing and benefiting from this additional degree of transparency
about data limitations.
   Exploit the wealth of available indicators, recognizing that progress in developing new
indicators is likely to be incrementaI. Much more work needs to be done to exploit
the large body of disaggregated measures of governance already in existence.
Linking disaggregated indicators to disaggregated outcomes, both across countries
and over time, is likely to be an important area of research over the next several
years that is likely to have important implications for policymakers.
   There is also scope for developing new and better indicators of governance.
Work to improve such indicators will be important, as indicators are increasingly
used to monitor the success and failure of governance reform efforts. But given
the many challenges of measuring governance, it is important to recognize that
progress in this area over the next several years is likely to be incremental rather
than fundamental. Alongside efforts to develop new indicators, there is also a
case to improve existing indicators, particularly in increasing the periodicity of
heretofore one-off efforts and in broadening their country coverage (covering
industrial and developing economies), as well as covering issues for which data
are still scarce, such as money laundering.



Notes

Da~iielKaufmann is a director of global programs at the World Bank Institute: his ernail address is
dkauf1nann@wor1dbank.org. Aart Kraay is a lead economist in the Development Kesearch Group at
the World Ballk; his email address is akraay@worldbank.org. The authors would like to thank
Shanta Devarajari for encouraging them to write this survey. Simeon Djankov and three anonymous
referees for their helpful cornments, and Massimo Mastruzzi for assistance.
    1. For surveys of and user guides to governance indicators, see lJNUP (2005),Arndt and Oman
(ZOOh), and Knack i2006). Because of space constraints, no attempt is made here to review the
important body of work focused on in-depth wilhiu-country diagnostic measures of governance that
are not designed for cross-country replicability and comparisons.
    2. A fuller compilation of governance datasets is available at www.worldbank.org/wbi/govern-
anceldata.
    3. Indeed, this is reflected in the terminology of "actionable" governance indicators emphasized
in the World Bank's Global Monitoring Report (World Bank 2006).
    4. See King and Wand (2007) for a description of how this problem can be mitigated by 1he use
of "anchoring vignettes" that provide a common frame of reference to respondents in interpreting
the response scale. The basic idea is to provide an understandable anecdote or vignette describing
the situation faced by a hypothetical respondent to the survey. For example, "Miguel frequently finds
that his applications to renew a business license are rejected nr delayed unless they are accompanied
by an additional payment of 1,000pesos beyond the stated license fee." Respondents are then asked
to assess how great corruption as all obstacle is for Miguel's business, using a 10-point scale. Since
all respondents use the scale to assess the same situation, this rating can be used to "ancl~or"their


KauJn7ctrln and Kraay                                                                              27

resporlses to questions referring to their own situation.
    5. These two indicators are measured as the average of 14 "in law" comporlents and the LO "in
practice" components of the elections indicator of Global Integrity.
    6. Starting with the 2005 data, both the African Development Bank and the World Bank have
made their CPIA scores public. The African Development Bank docs so for all borrowing countries:
the World Bank does so only for countries eligible for its most conccssional lending.
    7. Kaufmann, Kraay, and Zoido-Lobat6n (1999a) show how the estimated margins of error of
their aggregate governance indicators would increase if thcy assume that the error terms made by
individual data sources were correlated. Recently Svensson (2005), Arndt and Oman (2006). and
Knack (2006) have raised this criticism again, largely without the benefit of systematic evidence.
Kaufmann, Kraay, and Mastruzzi (2007b)provide a dctailed response.
    8. This is not to say that all of the surveys used to measure governance are necessarily represen-
tative in any strict sense of the term. In fact, one general critique is that several large cross-country
surveys of firms that provide data on governance are riot very clear about their sample frame and
sampling methodology. The Executive Opinion Survey of the World Economic Forum, for example,
states that it seeks to ensure that the sample of respondents is representative of the sectoral and size
distribution of firms (World Economic Forum 2006). But it reports that it "carefully select[sl compa-
nies whose size and scope of activitics guarantee that their executives benefit from international
exposure" (p. 1 33).It is not clear from their documentation how these two conflicting objectives are
reconciled.
   9. A simple example is that respondents are aslced whether they have ever offered a bribe. Rut
before answering, the respondent is instructed to privately toss a coin and to answer "yes" if
either they have in fact offered a bribe, or the coin comes up heads. See Azfar and Murrell (2006)
for an assessment of thc extent to which randomized response methods correct for respondent
reticence and an innovative approach to using this methodology to weed out less than candid
respondents.
    10. The assumption of a common error variance is necessary in this simple example with two
indicators in order to achieve identification. In this example. just one sample correlation in the
data can be used to infer the variance of measurement error; just one measurement error var-
iancc can thus be identified. In more general applications of the unobserved components modcl,
such as the WGI, this restriction is not required because there are three or more data sources.
    11. For details on this calculation, see Kaufmann, Kraay, and Mastruazi (2004, 2006). Celb,
Ngo, and Ye (2004) perform a similar calculation comparing the African Development Bank and
World Bank CPIA scores. Their conclusion that the CPIA ratings have little measurement error is
driven largely by the fact that the authors focus on the aggregate CPIA scores, which are very
highly correlated between the two institutions. The focus here is on one of 16 specific questions;
at this level of disaggregation. the correlation between the two sets of ratings is considerably
lower.
    12. For example, virtually all of the individual indicators underlying the aggregate WGI are
available at www.govindicators.org.
    13. One of the best-known and best-executed recent studies of this type is a study of corrup-
tion in a local road-building project bv Olken (2007).




References

Acemoglu. Daron. 2006. "Constitutions. Politics, and Economics: A Review Essay on Persson and
   Tabellini's The Economic Eflrcts ~JConstitutions."Journal ~[Econornic.Litrrat~tre63(4):102545.

Arndt, Christiane, and Oman Charles. 2006. "KJses and Abuses of Goverr~arlceIndicators." OECD
   Development Center Study, Organisation For Economic Co-operation and Decelopment, Paris.


28                                               TI]? M'r~rldBnr~kReserrrrh Ohsrrvec     23.     1 (Spring 2008)
                                                                                    1101.    110.

Azfar. Omar, and Peter Murrell. 2006. "Identifying Reiic,ent Respondents: Assessing the Quality of
   Survey Data on Corruption and Values." University of Maryland, 1)epartment of Economicx,
   College Park Maryland.

G'elb. Alan, Brian Ngo, and Xiao Ye. 2004. "Implementing lbrformance-Rased Aid in Africa: The
   Country Policy and Institrltional Assessment." World Bank Africa Region Working Paper 77,
   Washington, D.C.

Hellman, loel, and Daniel Kaufmann. 2004. "The Inequality of Influence." In J. Kornai and S. Rose-
   Ackerman, eds.. Building a l'rustu~orth!l Stccte it1 Post-Socialist Tmnsition. New York: Palgrave
   McXlillan.

Kaufmann. Daniel. Aart Kraay, and l'ablo Zoido-Lobaton. 1999a. 'Aggregating Coverr~ance
    Indicators." Policy Rrsearch Working Paper 2195. World Bank. Washington, D.C.

-.          1999b. "Governance Matters." Policy Kesearch Working Papcr 2196. World Bank.
    Washington. 1j.C.

Kaufmann, Ijaniel. Aart Kraay, and Massimo hlastruzzi. 2004. "Governance Matters 111: Go~vernance
    Indicators for 1996, 1998. 2000 and 2002" W~rlrlBur~kBci~rronricKcview 18(2):253-87.

_       . 2005. "Governance Matters IV: Governance Indicators for 1996-2004." Policy Research
    Working Paper 3630. \Irorld Bank. Washington, D.C.
- __. 2006. "Governance Matters V: Governancr Indicators fbr 1996-2005." Policy Research
    Working Paper 4012. World Bank. Washington, D.C.
-      .    2007a. "(>overnance MaLters 1'1: Aggregate and Individual Governance Indicators for
    1996-2006." Policy Resrarch Working Paper 4280. World Rank. Washlngton, U.C.
-     .    2007b. "The Worldwide Governance Indicators Project: Answering the Critics." Policy
    Research Working Paper 4149. World Hank. CVashington,D.C.

Kautilya. 1992. [400 R.C.E.]T11rArtl~ctslinstra.New Delhi, India: Penguin Classic Editior~.

King. Gary, and Jonathan wand. 2007. "Comparing Incomparable Survey Kesponscs: Evaluating
    and Selecting Anchoring Vignettes." Political Ar~nf~/sis  1 5(1):46-66.

Knack. Steven. 21106. "Mcasuring Corruption in Eastern Europe and Central Asia: A Critique of the
    Cross-Country Indicators." Policv Research Department Working Paper 3968. World Rank.
    Washington. D.C.

North. Douglass. 2000. "Poverty in the Midst of Plenty." Hoover lnst~tutic~riDnily Report. October 2.
    (www.hoover.org.)

-.         2007. "Monitoring Corruption: Evidence rrom a Field Experiment in Indonesii~."]olrrncrl
    Po1itit;nl Bconoln!{ 11i(21: 200-49.

Persson. Torsten. and Guido 'I'abellini. 2005. Tl~eEcotiornic w ~ c tof Coristitrrtior~s.Cambridge, Mass.:
                                                                       s
    L1tT Press.

Razafindrakoto, Mireille, and Fran~oisRoubaud. 2006. '>ireInternational Llatabases on Corruption
    Rcllable, A Comparison of Expert Opmion Surveys and Household Surveys in Sub-Saharan
    Africa." Developrnent Research Institute, Developrnent Institutions and Long-Term Analysis
    (IRI~IDIALI,Paris.

Svensson, Jakob. 2005. "Eight Questions about Corruption." Jo~lrrialo/ Ecolron~ii,Pl~rsprct~\r~s19(3):
    19-42.

IiNDP (United Nations Development Programme). 2005. Govrrrratlcc. hdicutors: A Uscjrs (;~tirlr.
    New York: IJNDt?

World Bank. 1992. Govrrr~nncrand De~~rlo~~mi~r~t.Washington. 1j.C.

          2002. Building Lnstitutiorlsjbr .%liarkcts.Nerv York: Oxford Iinivcrsity.

          2006. Global Monitoring Report. Washington. D.C.

.       2007.    "Strengthening World     Bank     Group Engagement on                    Governance and
  Anticorruption." Joint Ministerial Committee of the Boards of Governors of the Bank and the
  Fund on the Transfer of Real Resources to Developing Countries, Washington. D.C. [www.world-
  bank.orglhtmllextdrlcomments/governancefeedhack/gacpaper.pdf].

World Economic Forum. 2006. The Global Competitiveness Report 2006-2007. New York: Palgrave
 Macmillan.




                                            The 12iorld Barlk Rrscc~rrh0hsc.rvc.r: vol. 1'3.110.1 (Spring 200X)

             Two Comments on "Governance
            Indicators: Where Are We, Where
            Should We Be Going?" by Daniel
                                                                                                    A



                     Kaufmann and Aart Kraay'

   Thefollowing comments by Shantayanan Devarajan and Simon Johnson provide two
perspectives on indicators in general and the World Governance Indicators in particular.




                                            Shantayanan Devarajan




The World Bank Research Observer publishes balanced surveys of the literature.
When the authors of a survey are also the proponents of one of the major indi-
cators being surveyed, it invites comments to ensure that balance is maintained.
       Kaufmann and Kraay provide a useful taxonomy of governance indicators,
distinguishing between those measuring "rules on the books" and "rules on the
ground" and those reflecting the views of experts and the results of surveys.
While providing a balanced overview of the pros and cons of different methods,
they make a strong case in favor of measuring rules on the ground based on an
aggregated mix of expert- and survey-based indicators, along the lines of their
World Governance Indicators (WGI).
   Any assessment of governance indicators-or                               any indicator for that matter-
must be based on the purposes to which the indicators will be put, as Kaufmann
and Kraay note. This comment examines how well the WGI and other indicators
perform in two specific instances.
   The first is the allocation of resources-such                           as the concessional aid provided
by the World Bank's International Development Association (IDA) or the IJS
Millennium Challenge Corporation (MCC)-across                                  low-income economies. These
resources are allocated according to a formula that includes, among other factors,
the productivity of aid in reducing poverty in a particular country-a                                         factor that
is partly a function of the quality of governance. An indicator is thus needed that




( .The Author 1008. Published by Oxford University Press on behalf of the International Rank for Reconstructio~~and
Development / .THEWOHII) RANK. All rights reserved. For permissions. please e-mail: journals.perrnissions(ci,oxfordjournals.org
doi:10.109 3lwbrullkn001           Advance Access publication Fehruary 12. 2008                                     23:31-36

measures the quality of governance across countries. By focusing on rules on the
ground, using a mix of expert- and survey-based information, aggregating across
indicators within a country, and providing standard errors around the means
of these aggregate indicators, the WGI provide a defensible method of making
cross-country comparisons. The IDA uses the judgment of World Bank experts in
assessing a country's policies and institutions, but that judgment is informed by
the WGI. The MCC uses five of the WGI, along with 11other indicators, to deter-
mine a country's eligibility for its programs.
   The WGI have produced some seemingly anomalous results (India, for instance,
ranks in the bottom quartile worldwide on "political stability and lack of vio-
lence").' But aggregating across surveys in each country is needed if results are
not to reflect surveys conducted across only a limited set of countries or some
other bias-inducing method.
   A second possible use of the WGI is to help identify the nature of the "govern-
ance problem" in a country. Bangladesh has ranked in the bottom quartile world-
wide (and considerably below the low-income country average) in all but one or
two of the aggregate WGI for the past decade and has been at or very near the
bottom of Transparency International's corruption indicators. At the same time,
the growth rate of per capita income has risen 0.1 percent every decade since the
1970s (the growth rate is now close to 5 percent), and poverty fell 0.8 percent in
five years (twice the rate in India). The country has already achieved universal
primary enrollment and gender equity in secondary school, and it is on track to
reduce child mortality by two-thirds (relative to 1990 levels) by 2015.
   In what sense, then, does Bangladesh have a "governance problem?" Is it poss-
ible that the data on development indicators have been mismeasured? Although
some development indicators may be weaker than they appear-there                           is some
evidence, for example, that enrollment rates among the lowest quintile of
Bangladesh's population remain extremely low (World Bank 2007)-the                         country
has clearly made tremendous gains in development over the past two decades. It
is possible that it would have performed even better absent it governance problems
(World Bank 2006). But, this does not answer the question of how Bangladesh's
development outcomes are so much better than those of other countries with
"better" governance.
   One explanation is that governance indicators in Bangladesh fail to capture
the fact that the country has a vibrant and active civil society that not only deli-
vers services, but provides some accountability to government. The WGI also
seem to overlook the increasingly mature media, including vernacular newspa-
pers, which play something of a watchdog role. But these measurement errors
may be second-order considerations compared with the fundamental facts that
the governance indicators are capturing. Bangladesh does score in the 30th per-
centile on the WGI voice and accountability indicator; it comes close to the


32                                      Thr World Bntlk K(,search Ohsert~e~;
                                                                          ~01.
                                                                             2 3, no. 1 (Spring 2008)

25th percentile on government effectiveness. But it ranks in the 5th percentile
on control of corruption. These rankings are very low for a country that is per-
forming so well.
   Another explanation is that the relation between governance and development
in Bangladesh is unique, because the Bangladeshi people have worked around the
country's governance problems to spur development. When the country was
born, out of a civil war, there was hardly a government. International and
national nongovernmental organizations (NGOs) filled the vacuum by delivering
basic services, such as health, family planning, and education, and by creating
microcredit schemes. As these efforts proved effective, the government made space
for these NGOs and the private sector, in some cases contributing to their finan-
cing. The government funds secondary education, for example, although 95
percent of it is provided by the nonstate sector. Similarly, Bangladesh's garment
export sector grew rapidly, thanks to duty drawback systems and bonded ware-
houses that enabled textiles to come into the country duty-free, circumventing a
highly opaque customs system.
   Of course, this explanation does not explain why Bangladesh was able to work
around its governance problems when so many other countries were not.
Bangladesh is a densely populated, homogeneous society, in which innovations
spread like wildfire. Soon after one village discovers something that works, neigh-
boring villages find out about it and adopt it. As a result, family planning, micro-
finance, and other programs took off in Bangladesh more easily than they might
have elsewhere. By the time the government grew strong enough to control the
NGOs and others, it was too late, microfinance, family planning, and private
schooling had already become commonplace. (To its credit, the government recog-
nized this and proceeded to support the providers of essential services with
finance.) The result is a country with weak governance indicators, but impressive
development.
   The Bangladesh case illustrates the fact that governance indicators such as
the WGI do not capture the multifaceted ways in which governance affects
development in a particular country. It would be dangerous to use indicators to
jump to simple conclusions without understanding the specific relation between
governance and development in a particular country; the indicators should cer-
tainly not be used by themselves to design policy responses to problems of weak
governance.
   This is the downside of having an indicator that permits intercountry compari-
sons of governance: the richness of country-specific detail is lost. Kaufmann and
Kraay recognize this tradeoff in their concluding discussion about more disaggre-
gated indicators. One should not go too far down this path, however, lest the
main benefit of the WGI and other such indicators-their        comparability across
countries-be   diluted in the quest for indicators that are more country specific. In

the Tinbergen tradition of not having more objectives than instruments, one
should not expect governance indicators to serve too many purposes.



Notes

Shantayanan Devarajan is the chief economist of the South Asia Region at the World Bank; his
email address is sdevarajan@worldbank.org.
   1. This result may not be as anomalous as it appears at first blush; however. according to the
Indian prime minister. 170 of 602 districts have a significant Naxalite (Maoist) presence (Singh
2005).




References

Singh. Manmohan 2005. "PM's Reply in Rajya Sabha to the Debate on Motion of Thanks to the
  President's Address." March 11.New Delhi.

World Bank. 2006. Can South Asia End Poverty in a Generation? South Asia Region. Washington,
  D.C.
.       2007. To the MDGs and Beyond: Accountability and Institutional Innovation in
  Bangladesh. Bangladesh Development Series No. 14. Dhaka



                                     Simon Johnson



Kaufmann and Kraay took a major step forward in thinking about governance
when (together with Pablo Zoido-Lobaton) they first published their World
Governance Indicators (WGT), in 1999. These indicators provided a new way to
combine comparable indices along various dimensions of governance. The WGI
included more countries and more pieces of data than had previously been avail-
able, complementing various other measures.
  From the beginning, the WGI have clearly indicated the underlying sources
of the data and how the data are constructed, explicitly reporting error bands
for all estimates. This was a major innovation and remains of paramount
importance.
  The authors (together with Massimo Mastruzzi) have also done a great service
by continuing to update the data and refine their methods. The WGI are now well
established as one of the standard sets of measures that any researcher or policy
analyst must consult. No measure is perfect, of course, but anyone trying to
establish that a particular set of results is or is not robust is well advised to work
carefully through the various vintages of the WGI.


34                                         Thr World Bank Resrnrclr Observer; vol. 23. no. 1 (Spring 2008)

   The appearance of the WGI coincided with a major backlash against carefully
measuring and thinking hard about governance. While this backlash initially
seemed rather academic and sometimes arcane, it had considerable immediate
impact on both policy discussions and the impetus to research in this area.
Rather than building on the WGI, serious endeavor in this area lost momentum,
as three main arguments against governance indicators gained currency.
   The first is that because the WGI explicitly report "errors," they cannot be
relied on. This argument is completely mistaken: similar error bands should be
reported for all the data used in economics, particularly for countries with weak
statistical systems.
   The second argument is that governance measures are not useful because
many countries are growing, despite weak governance. What are the uses of these
measures if they cannot predict who will and will not grow? While countries with
weak institutions can, and indeed often do, grow for prolonged periods, over the
longer run, it is hard to escape weak governance. Far from the exception, growth
spurts are actually a standard feature almost everywhere (Berg, Ostry, and
Zettelmeyer 2006)-as     is the inability of many countries with weak governance
to sustain the gains made during the good years.
   Some countries-albeit    only relatively few recently-have    managed to sustain
rapid growth, despite weak initial governance. Too little is known about these
cases, but one common element appears to be a strong focus on exports, pal-ticu-
larly of manufacturing goods (see Johnson, Ostry, and Subramanian 2007 for
specific country examples).
   These exceptions notwithstanding, if a country's governance indicators are
weak-as        measured by the WGI-the      presumption should be that sustained
growth will be difficult. Strengthening governance should help increase the odds
of sustaining growth.
   The third and the most interesting argument is that aggregate indicators miss
a great deal of rich detail, failing to pick up the de facto arrangements that effec-
tively take the place of more formal governance rules. Close observation of how
society is organized always turns up functional forms of improvisation-that       is,
ways to organize transactions even where contract enforcement is weak. Some
informal mechanisms may be quite efficient, in the sense that the transaction
costs are not much higher than they would have been had formal contracting
worked well.
   Some of this organization can occur in and around the manufactured export
sector. Initial improvisation could lead to more durable rules over time, both in
the manufacturing sector and, through institutional spillovers, in the country as
a whole.
   One needs to be very careful about drawing policy implications from apparent
conditions that cannot be measured; however, it is hard to make much progress


Drvancjan and Jahnsor~                                                            35

on the basis of anecdotal evidence. Informal governance arrangements may
shift and be fragile; they may also prevent people from entering new activities.
The limited work on this phenomenon suggests that informal governance is often
a weak substitute for well-functioning governance.
   What is needed is a WGI-type approach to measure governance at a more local
level, for both sectors and cities. There have already been ambitious steps in this
direction [for example, Kaufmann, Leautier, and Mastruzzi (2005)],and there has
been some indication that the World Bank, which is uniquely positioned in this
regard, would move in this direction. So far, however, progress has been limited.
   There are many good reasons to move carefully in this area, and no doubt we
need to think carefully about how to make best use of resources-something                              that
Kaufmann and Kraay reflect in this article, with their balanced assessment of
alternative sources within and outside the WGI. But it would be unfortunate if
the legitimate debate over interpretation of macrolevel governance measures were
to undermine confidence in this area of enquiry and prevent further work from
being undertaken. The field is still at an early stage. Better measures are needed
to gauge what works and what does not. Such measures should stand firmly on
the shoulders of the WGI.



Note

Simon Johnson is an economic counsclor and director of the Research Department at the
International Monetary Fund; his email address is sjohnson@imf.org.



References

Andrew Berg, Jonathan Ostry, and Jeromin Zettelmeyer. 2006. "What Males Growth Sustained?"
   IMF Working Paper. International Monetary Fund, Washington, D.C.

Kaufmann. Daniel, Frannie Leautier, and Massimo Mastruzzi. 2005. "Governance and the City: An
   Empirical Exploration into Global Determinants of Urban Performance." Policy Research Working
   Paper 3712. World Bank, Washinglorl, D.C.

Simon Johnson. Jonathan Ostry, and Arvind Subramanian. 2007. The Prospects for Sustained Growth
   in Africa: Benchmarkirlg the Constraints. NBER Working Paper 13120. Cambridge, MA: National
   Bureau of Economic Research.




                                              Thr World Bank Resecrrch Observer;vol. 23. no. 1 (Spring 2008)

    Walking up t h e Down Escalator: Public
              Investment and Fiscal Stability



                       William Easterly, Timothy Irwin, and Luis Serven


When growth-prornoting spending is cut so rnuch that the presvnt value of'filture gov-
ernment revenues falls by rrlore than the iinmediate improvement in the cash deficit,
fiscal adjustnlent brcoines like walking up the down escalatol: Although short-terrn cash
flows matte6 too tight a focus on thrm encourages governinents to invest too little.
Cash-flow targets also encourage governrilents to shift investment spending off budget by
seeking private in~?esti?zentin public projects, irrespective of ~ t rrld fiscal or econonlic
                                                                                          s
benefits. To deal with this problem, some obser\?ers have sllggested excluding certain
investnlents (such as those undertaken by public enterprises deemed commercial or
finanred by inultilaterals) frorrl cash-flow targets. These sfopgap rerrledies may help
protect some investments, but they do not provide a satisfactory solution to the under-
lyiny probbn. Governinents can more effpctively reduce the kiasrs created by the focus
on short-term cash flows by dewloping indicators of the long-term fiscal effects of
their decisions, including accounting and economic measures of net worth, and, where
appropriate, including such nlrusures in fiscal targets or even fiscal rilles. JEL codes:
0 2 3 E62 H60 H54



A popular phrase during the era of macroeconomic stabilization of the 1990s was
"adjustment with growth." The focus of this article is on the surprising possibility
that some types of fiscal austerity not only fail to bring growth, but they may not
even bring "adjustment" in the long run.
    Consider the following anecdote from the World Bank's own budgeting experi-
ence. In 2993 the World Bank Research Department unexpectedly produced a
bestseller entitled The East Asia Miracle. The Research Department soon exhausted
its adrninistrativc budget allocation for reprinting the book. The World Bank's
centralized budget department denied a request for extra budgetary resources for
printing more copies of the book on the grounds that the Research Department


!, Thc Author 2008. Published by Oxford Iinivcrsity Press on behalf of the International Bank for Reconstructiol~and
Development 1 rlltWORIL, nnNv.All rights reserved. Forpcrnlissions.please e-mail:journals.perrnissions@c~x~ordiour~~als.org
doi:I 0.109 3/wbro/llon014        Advance Access publication J a ~ ~ u a28. 2008
                                                                        r y                                   23:37-ih

had already exceeded its printing budget--even though producing more copies of
the book would have more than paid for itself!
   This kind of unreason is not confined to the world of bureaucratic budgetary
management; it also extends to fiscal policy practice. The primary concern of
most fiscal programs is to ensure public sector solvency, commonly viewed as an
essential ingredient of macroeconomic stability. Solvency is by definition an inter-
temporal concept, relating to the present value of revenues and expenditures and
encompassing both assets and liabilities. A cut in public investment that lowers
growth will lower the present value of revenues; it is conceivable that the govern-
ment's intertemporal position deteriorates at the same time as the cash deficit
improves. In practice, however, it is customary to assess the strength of public
finances almost exclusively on the basis of the cash deficit (or "overall balance")-
that is, the rate of acquisition of debt by the public sector.
   Latin America offers a good illustration of this practice. There is rising concern
across the region that the fiscal adjustment that many countries had to undertake
since the early 1990s may have come with an excessive fall in public investment
(figure 1).To the extent that the response of private investment has been insuffi-
cient to offset the decline of public investment in key sectors, such as infrastruc-
ture, current levels of public investment are perceived by many as too low to
support long-term growth rates consistent with rapid poverty reduction.
   Political economy considerations make adjustment difficult at the best of times.
The recent backlash against free-market reforms in Latin America and the long-
standing sensitivity to conditions perceived as imposed by outsiders make it more
important than ever that adjustment programs be well conceived.



Figure 1. Primary Deficit and Public Infrastructure Investment in Latin America, 1980-
2001




                     --e-Primarydeficit      i    Public~nvestmentin infrastructure


  Nntt: Figure is based on data from Argentina, Bolivia. Brazil, Chile. Colombia. Costa Kica, Mexico, and Peru.
  Sourre: Calderon and Serven (2004).



38                                                 The World Barlk Research Obspn>er,vol. 23. no. 1 (Spring 2008)

   The international evidence suggests that Latin America's experience is the rule
rather than the exception. Declines in infrastructure spending often account for
the lion's share of fiscal deficit reduction, as Hicks (1991) shows for developing
economies in a cross-regional context, Easterly and Servkn (2003) show for Latin
America, and Estache (2004) shows for Sub-Saharan Africa. For industrial
countries, Roubini and Sachs (1989) and De Haan et al. (1996) find that capital
expenditure falls disproportionately at times of fiscal stringency. Balassone and
Franco (2000) show that fulfillment of the Maastricht deficit targets sped the
decline of public investment in the European Union (figure 2):of the nine countries
that exceeded the deficit target in 1992, eight met it in 1997. In all eight, public
investment had fallen relative to GDP; in seven of them, it had also fallen relative to
total primary expenditure. In contrast, three of the six countries that met the target
in 1992 raised their public investment in the subsequent years.
   The tendency toward compression of public investment at times of fiscal austerity
underlies the fact that investment is the most volatile of all public spending items, as
Talvi and Vegh (2000) document using data from developing economies and Lane
(2003)documents using data from industrial countries. Of course, declining public
investment would be of little consequence if it reflected improved spending efficiency
or were fully matched by increased investment by the private sector. In most
countries in Latin America, the only developing region for which adequate data are
available, this may have been the case in the telecommunications sector. But the
evidence suggests that in most infrastructure sectors in most countries, private
investment did not offsetpublic sector retrenchment (Calderonand Serven 2004).
   Declining investment is a cause for concern when it results in decreased
accumulation of public capital and public capital is productive. This is not always


                                      - -                       -


Figure 2. Primary Deficit and Public Investment in the European Union, 1980-2002




                             I+Primary       deficit         Public investment    I

 Notr: Figure is based on data Crom Austria. Belgiurn. Finland. France. Germany. Spain. Sweden, and the United
Kingdom.

 Source: OECD Economic Outlook database.



Easterly, Irwin, and Servin                                                                                39

the case; many projects labeled as public investment can be white elephants,
which bring no future output benefits. The link between public investment spend-
ing and capital accumulation can be fragile if investment involves significant
waste-when     projects are poorly selected and public procurement is inefficient or
beset by corruption, for example (Pritchett 2000). With weak governance, public
investment may become a vehicle for dispensing political favors rather than
acquiring productive assets (Keeferand Knack 2007).
   The empirical literature is far from unanimous on the contribution of public
capital to aggregate output or growth; this lack of agreement is hardly surprising
in the context of growth empirics. Nevertheless, most studies, especially the more
recent ones, do find a positive impact. The conclusions appear to depend in part
on the approach followed: studies using measures of physical infrastructure assets
find significantly positive output contributions in the vast majority of instances,
whereas those that measure public capital using cumulative investment flows
tend to be less conclusive, likely for the reasons outlined in the preceding para-
graph. In some cases, however, both approaches yield similar results; using both
financial and physical measures of public capita, for example, Ferreira and Araujo
(2005)find significantly positive output effects in Brazil.
   Moreover, even if wasteful public investment spending weakens the link
between spending and outcomes, an across-the-board reduction in public invest-
ment will still result in cuts in productive infrastructure projects. Sacrificing such
projects weakens the economy's growth potential; the right response is instead to
protect high-return projects from spending cuts. If government does otherwise, it
is trying to walk up the down escalator.
   This article offers a selective overview of these issues. It draws from the Latin
American experience, because it has been relatively well documented, but its con-
cerns are much more general. Indeed, they cut across developing as well as indus-
trial regions, although they are more pressing in developing countries, which still
have a long way to go in building up their infrastructure capital stocks. The
article draws policy implications for the design and monitoring of fiscal targets
consistent with both solvency and the efficient utilization of fiscal resources.
   The article is organized as follows. The next section reviews the shortcomings
of the current approach to fiscal discipline. The second and third sections deal
with two types of remedies: granting exceptions to existing fiscal targets and
introducing new targets. The last section offers some concluding comments.



Sliortcomi~igso f the Standard Approach t o Fiscal Discipline

Fiscal adjustment programs typically focus on the short-term time path of the
government's cash deficit, whose measurement is usually the center of attention


40                                     Thr 12brld Bnrrk Rrsearch Obsrrve~vol. 23, no. 1 (Spring 2008)

of fiscal accounting. Short-term cash deficits and debt are the lcey fiscal concern
of official creditors and form the basis of loan conditions in the fiscal and macro-
economic dimensions. They are also closely scrutinized by multilateral insti-
tutions, private creditors and investors, and economic analysts.
   There are good reasons why these fiscal aggregates should be closely watched. The
cash deficit approximates the government's financing needs, which are a primary
concern for the fiscal authorities as well as financial market participants. It can also
give an indication ofthe public sector's contribution to overall aggregatedemand and
thus its stance from the viewpoint of short-term stabilization, although the primary
deficit (whichexcludesinterest payments) may be preferable for this purpose.
   Debt and the cash deficit can be misleading as solvency measures, however.
because they do not talce into account the assets and future income the govern-
ment may acquire by incurring debt today. This, of course, is hardly surprising:
liquidity and solvency are fundamentally different concepts; as in corporate
finance, different indicators are needed to gauge them. A corporation does not
seek to maximize just this year's cash flow; it seeks to maximize the present dis-
counted value of all future cash flows. Telling the public sector to improve the
cash balance no matter what would be like telling Apple to forgo investing in a
new ipoda factory in order to improve this year's cash flow.
   Solvency assessments based on debt and the cash deficit implicitly treat all
public expenditures in the same way, because they all pose the same claim on
today's fiscal resources. This blurs the distinction between public investrncnt and
public consumption and, more precisely, between expenditures that yield future
fiscal benefits and those that do not-even     though they may have radically differ-
ent implications for tomorrow's public revenues and therefore for solvency itself.
   Such practice distorts the tradeoffs faced by fiscal policy, both across time and
among differentkinds of public expenditures. Across time, binding debt and cash
deficit targets today tend to encourage postponement of expenditures and
advancement of tomorrow's revenues, even if their present value, which is the
relevant concern for solvency, remains unchanged (or declines as a result of
delaying urgently needed expenditures, for example). Across expenditure types,
liquidity targets pose a one-for-one tradeoff at the margin, regardless of the type
of expenditures involved, whereas solvency targets do not. Faced with these trade-
offs,governments having to strengthen public finances frequently choose adjust-
ment paths that, by altering the time profile, the composition of expenditures, or
both, attain the prescribed liquidity targets without any significant improvement
in solvency They resort to deferring payments to the first day of the next year,
accumulating arrears to government worlcers or suppliers, advancing the collec-
tion of taxes, awarding higher pensions instead of increased wages, or granting
guarantees instead of subsidies. Easterly (1999) and Easterly and Serven (2003)
provide a variety of examples of this kind of illusory fiscal adjustment.


Ensterlg. Irwin, nnrl Srrvin                                                          41

   Thus, other things being equal, governments facing binding liquidity targets
today may devote too few resources to expenditures that yield returns tomorrow.
This effectof liquidity targets on public spending composition is additional to the
biases introduced by other political economy factors. These factors (which include
governments' short-time horizons and political clientelism) can distort spending
choices by discouraging public expenditures whose benefits accrue in the future
in favor of those with immediate fiscal or political payoffs. Far from correcting
these distortions, the conventional approach to fiscal discipline magnifies them.
  If fiscal adjustment disproportionately cuts infrastructure spending that
enhances growth, it can lead to a vicious circle in which low growth generates
unsustainable debt dynamics, which force fiscal adjustment implemented through
investment cuts, which lowers growth further and prompts additional fiscal
retrenchment and investment cuts. In other words, if debt stabilization is pursued
primarily by cutting productive spending, destabilization can ensue.
  This phenomenon has been documented in both industrial and developing
countries. In industrial countries, Alesina and Perotti (1997) find that fiscal cor-
rections based primarily on public investment contraction are typically unsuccess-
ful: they have an adverse effect on growth, and their stabilizing effect on public
finances is eventually reversed. Calderon and Serven (2003) review the impact of
public investment cuts in selected Latin American countries over the past decade.
Their calculations suggest that the ensuing slump in growth and tax collection
may have greatly weakened the intended solvency-enhancing effects of the capital
expenditure decline.
  These issues concern all kinds of public expenditures that generate future fiscal
benefits. Public infrastructure investment is the leading example, to the extent
that public capital yields financial returns that the government can capture.
Conceptually, infrastructure investments can be divided into three groups:

    Investment that generates direct financial returns through user fees, such as
    ports, airports, railways, and toll roads.
    Investment that does not generate user fees but increases growth and future
     tax collection.
     Investment that generates no future fiscal or growth benefits, whether or not
    the project has a positive social return (as in the case of environmental
     projects).

   The first two types of projects may pay for themselves-that                           is, generate a
stream of financial returns whose present value exceeds the cost of the projects.
On solvency grounds, deficit financing of those projects-termed                                          "self-
liquidating" (Mintz and Smart 2007)-is           most likely to be justified, because
such projects increase government net worth, even if they raise public debt in
the short run. In practice, for the projects to increase government net worth,


42                                      Thr World Bnnk Researrlt 0l)servc.r. ~ w l .2 3 , rlo. 1 (Sprirrg 2008)

the government must be able to capture the returns. For the first type of
projects, user fees must be sufficient to cover project costs; for the second type
of projects, taxes must be high enough to translate the additional growth into
sufficient additional revenues.
  In the absence of user fees, many growth-enhancing projects may fail to gener-
ate sufficient tax revenues to cover their cost. With the low (marginal)tax collec-
tion rates of many developing economies, the growth impacts have to be
considerable to yield the required tax revenue increases. For example, with a tax
rate of 0.2, the output contribution would have to be five times as high as the
project's user cost for the government to break even. So if the user cost is about
10 percent (say, a 5 percent real interest rate and a 5 percent rate of deprecia-
tion),the project's marginal productivity must be at least 50 percent for the given
tax rate to yield sufficient revenues (see Serven 2007 for the analytics of this and
similar calculations).
   Such high productivity is more likely to arise in situations in which the initial
endowment of public capital is low (relative to that of other productive assets)-
specifically, when public capital services are substantially underprovided so that
the marginal product of capital exceeds its user cost by a wide margin.
Empirically, the international evidence appears to be consistent with the view
that the marginal productivity of infrastructure capital is higher in developing
economies, especially poorer ones, than in industrial countries (Calderon and
Serven 2007).
   Probably as a reflection of these country-specific ingredients, empirical results
are mixed regarding whether public investment may be self-financing through its
growth and tax-collection effects. Perotti (2004) examines this issue in five
countries in the Organisation for Economic Co-operation and Development
(OECD),using a vector autoregression approach. He finds that in Canada and the
United Kingdom, the extra public capital makes a negative growth contribulion:
in Australia and the United States, the growth and tax collection effects finance
only 20-30 percent of the investment cost. Only in Germany do these effects
finance more than 100 percent of the cost. Using similar techniques but a
broader sample of industrial countries, Pereira and Pinho (2006) find that p~lblic
investment is roughly self-financing in France, Greece, and Ireland and more than
self-financing in Germany and Italy. The growth effects are large in the majority
of countries considered.
   Because developing economies possess smaller infrastructure capital endow-
ments than industrial countries do, infrastructure capital might be expected to
have a higher marginal productivity and so to come closer to being self-financing.
Ferreira and Araujo (2005) find that public infrastructure investment is self-
financing in Brazil, although it takes 10 years or more for the government to
collect sufficient tax revenues to recoup the investment cost.

   Even if public sector projects fall in the intermediate area of having high
returns for the economy as a whole but insufficient returns for public finances to
improve public sector solvency, it may still be suboptimal to cut such projects
during periods of fiscal austerity. An ideal marginal revenue collection scheme
would allow the public sector to capture the returns, thereby eliminating the
wedge between economywide and public sector returns. After all, the business of
the public sector is precisely to provide public goods that yield a high return for
the economy as a whole.
   In addition, public investment projects are more likely to exhibit higher
marginal productivity ex post if the government's ex ante project evaluation capa-
bilities are sufficiently strong that they select high-return projects and reject low-
return ones. This, however, is far from assured in practice. Many developing econ-
omies lack ex ante and even ex post project evaluation capabilities. (One exception
is Chile, which has thorough procedures for evaluating projects: Fontaine 1997.)
   Unconditional endorsement of public infrastructure spending would lead to
wasteful investments, as Tanzi and Davoodi (2002) convincingly argue. Roads
would be built to nowhere, and useful roads would not be maintained. Power
plants would lie idle after being built too far ahead of demand. Water supply net-
works would be fully used but still burden the budget, because the tariff increases
on which their financial viability was predicated were not allowed. Some invest-
ments would be well motivated but poorly informed; others would be motivated
by bribes, patronage, or photo opportunities (Keefer and Knack 2007).
   Infrastructure investment is the expenditure item that has attracted most atten-
tion in the ongoing debate about the design of fiscal policy. But the link between
spending composition and solvency arises in a broader context. On the one hand,
not all public investment projects yield future income to the government. On the
other hand, some current expenditures do yield future fiscal returns.
Infrastructure operation and maintenance expenditure is a case in point.
Operation and maintenance determines the useful life of capital and hence has a
"capital-creating" effect similar to that of investment. If public capital yields finan-
cial returns to the government, so does operation and maintenance. In fact, the
financial, as well as social, return on operation and maintenance expenditure
may well exceed that of new capital when the assets are not being properly main-
tained (Rioja 2003a, b; Kalaitzidakis and Kalyvitis 2004; Serven 2007).
   This does not imply that developing countries should rush to raise public
investment or that, as a rule, public investment increases should be financed with
debt (or in any other particular way). The decision to invest should be guided by
the return on the investment. That return is determined primarily by the mar-
ginal productivity of public capital, itself dependent on the government's ability to
select good projects and the (relative) scarcity of services rendered by public
capital (for example, the availability of infrastructure services). Both return and


44                                       T l ~ rMiorit1 Bnnk Krsearch Ohservr~vol. 2 3. no. 1 (Spring 2008)

cost calculations should embody risk adjustments to take account of uncertainty.
Indeed, assessments of the effect on net worth of public investment projects
should err on the side of caution, particularly when the government's initial
indebtedness is high, because in such cases even small changes in interest rates
may have very large adverse effects on public finances.



Excluding Certain Public Investments from Fiscal Targets

Broadly speaking, there are two possible ways to address the bias against pro-
ductive public spending implicit in existing fiscal targets. One is to retain the
targets but exempt from their action certain public investments deemed more
likely to enhance growth and solvency. The other is to adopt new fiscal targets.
This section reviews the first alternative; the next section discusses the second.


Privately Financing Public Investment Projects

One way to place investment projects beyond the reach of short-term deficit and
debt targets is to have private firms finance them. Indeed, across the developing
world, many governments have turned to the private sector to finance new invest-
ments. In Latin America, for example, many governments have privatized their
telecommunications firms and parts of their power and water industries. In the
transport sector, private firms are often engaged in public-private partnerships in
which the government retains an important financial role but the private sector
finances investment. Chile and Colombia have had roads privately financed under
arrangements in which the government provides revenue or foreign exchange
guarantees. Other countries have begun to use a different form of public-private
partnership in which a private firm finances an asset (such as a school, hospital,
or prison), but the government purchases the service under a long-term contract.
   These arrangements may improve the returns to investment and thus enhance
government solvency. In many cases, however, concerns about efficiency and sol-
vency have played a minor role, and the resort to private financing has been
guided primarily by the desire to evade the pressure of liquidity targets on public
investment. Projects conceived with such a purpose in mind may not be well
designed from the point of view of efficiency or solvency.
   The difference between liquidity and solvency effects is particularly apparent in
privately financed projects in which the government purchases the service under
a long-term contract. In such projects, explicit debt is replaced by similar commit-
ments that are typically off balance sheet, without any major change in the mag-
nitude of the government's financial obligations. It is also apparent when the
government provides guarantees to private investors-such        as guarantees of' the


Ensterl!/, Irwin. c~ndServen                                                       45

private firm's debt or revenue-that    leave the public sector bearing much of the
investment risk (Hemming and International Monetary Fund 2006; Irwin
2007b). Even when such guarantees are not formally offered ex ante, they may
be provided ex post through renegotiation of concession agreements. The bailout
of the Mexican toll road program in 1997, for example, cost 1.0-1.7 percent of
GDP (World Bank 2005; see also Guasch 2004).
   On the whole, private financing has not come to play the dominant role in the
provision of infrastructure services in Latin America or elsewhere that some
observers expected. Although private financing now dominates telecommunica-
tions and some other infrastructure industries in some countries, it still plays a
small role in roads and water and sanitation-something                    that is unlikely to
change in the near term. Moreover, it would be undesirable for decisions about
the ownership of infrastructure firms to be driven by short-term fiscal constraints.
When private ownership works better than public ownership in terms of effi-
ciency, equity, or both, a state-owned firm should be privatized, even if ownership
of the firm requires little investment. Likewise, when public ownership works
better, the firm should remain public, even if major investment is required. Private
financing may thus lessen the problem caused by liquidity targets in certain
cases, but it is not an appropriate response to the general problem.


Excluding Specific Public Projects

Another option is to exclude from fiscal targets certain investments undertaken
by the public sector. A recent proposal would exclude projects financed by multi-
lateral institutions on the grounds that such projects are more likely than others
to be carefully screened and designed. This idea has not garnered much support,
partly because the fungibility of money means that the marginal financing from
multilateral institutions would not necessarily support the intended projects.
Furthermore, in many developing economies, total multilateral flows are too
small to make a big difference.
   A second proposal, developed and refined by the International Monetary Fund
(IMF 2004), is to exclude from fiscal targets investments by public enterprises
that are deemed to be commercially run. This proposal (which is not new; see
Afonso 2005) is, in principle, potentially important for countries in which public
enterprises are included in the public sector aggregates monitored under fiscal
programs. This is the case in Latin America but not in most other regions.
   In practice, this approach poses several problems. First, appropriate criteria to
identify commercially oriented public enterprises are difficult to establish. Second,
the fact that enterprises that meet likely criteria are the exception rather than the
rule in many countries (IMF 2005) detracts from the practical relevance of this
approach for public investment.


46                                     TJIPMiorld Bank Research Observe,: vol. 23, no. 1 (Spring 2008)

  Another difficulty is that excluding commercially run public enterprises from
targets may further restrict investment elsewhere in the public sector if those
enterprises make a positive net contribution to the aggregate budget surplus.
Exclusion of those enterprises would then make fiscal targets more, rather than
less, stringent for productive public expenditure. One way around this difficulty is
to exclude the investments. but not the savings, of these firms from fiscal targets.
Doing so, however, adds complexity, which detracts from the transparency of the
approach. An alternative would be to relax the targets by the net saving of the
excluded public enterprises.
  The fundamental problem with this proposal is that investments by enterprises
that are not commercially run may still have offsetting fiscal and economic
benefits-as        is the case of many investments in roads, for example. In other
words, the investments of commercially run public enterprises may not be the
ones with highest priority from a fiscal or social perspective. Removing restrictions
only on investment by commercially run public enterprises may still leave an
overall public investment program far removed from the socially desirable one. For
example, it is unlikely that Brazil's infrastructure needs would be significantly alle-
viated by allowing more investment by the commercially run public oil company
PETROBRAS. Such outcomes are a general problem with any proposals on the
basis of the special treatment of specific investments.



Developi~ i New Fiscal Targets Incorporating Measures
                      g
of Net Worth

The limitations of "selective" approaches that exempt certain investments suggest
that it is important to consider more fundamental changes, in particular, whether
governments can develop measures of net worth that are sufficiently accurate and
objective to be used as a basis for fiscal targets. Measuring net worth-the     differ-
ence between the value of assets and the value of liabilities-requires      forecasts.
The value of an asset is the present value of the net revenues it will generate and
such revenues that are usually uncertain. Likewise, the value of a liability is the
present value of the payments it will cause the government to make, and such
payments are often uncertain. Because of these uncertainties, indicators of net
worth are inherently approximate.
   Governments seeking to beautify their reported fiscal positions can take advan-
tage of uncertainty to overestimate future revenues and underestimate future
costs. Governments seeking to protect some category of public spending on politi-
cal grounds can exaggerate the value of the "assets" it creates, even if the spend-
ing is actually wasteful. If an indicator of net worth is too vulnerable to such
manipulation it has too little credibility to be useful.


Enstrrly, lnvin, and Survc;n

    Two indicators are used to measure net worth, one generated by modern
accrual accounting, the other by long-term fiscal projections. This section exam-
ines the accuracy and reliability of each.


Modern Accrual Accounting

Like traditional cash accounts, modern accrual accounts include information on
short-term cash flows. Unlike traditional cash accounts, accrual accounts also
include a balance sheet showing assets and liabilities. As a result, accrual
accounts generate a measure of net worth. They also include a measure of the
surplus or deficit that is not based on current cash flows. That measure includes
revenues that have been earned but are not yet collected and bills that are
payable but not yet paid. Crucially, investment itself is not counted as an expense
in the period of investment; only the depreciation of the investment is included.
The difference between accrual revenues and accrual expenses gives an income-
statement surplus that is roughly equal to the increase in the government's net
worth.
    To see the implications of accrual accounting, consider a government investing
$200 million in a power plant, financed entirely by borrowing (this example is
taken from Irwin 2007aj. Assume that, in the first year, no revenue is received,
no operating costs are incurred, and no depreciation occurs. Under cash account-
ing, the government's accounts show $200 million in extra expenditure, which
increases the cash deficit and debt by the same amount (table 1).Under modern
accrual accounting (table 21, these consequences of the investment on the govern-
ment's cash flows and debt are revealed, but so, too, are the consequences for the
government's assets. The accounts report that the investment has no net effect on
the government's net worth or income-statement surplus.
    Accrual-based accounting standards for financial reporting have the advantage
of being designed to limit the self-serving bias that uncertainty makes possible.
Most obviously, financial reports, whether based on cash or accrual accounting,


Table 1. Debt-Financed Investment in Cash Accounting

Itc~nl                                                                                               Anlorrnt ($ million)

Revenues                                                                                                         ()

Expenditure                                                                                                   200

Surplus                                                                                                     - 200

Debt                                                                                                          200

   Note: Cash surplus is the sum of cash disbursed to operations and cash disbursed to investment
   Source: Authors.
                                                            --                - -




48                                                  The Worltl Bonk Rcscrarch O b s e r ~ ~rv~l.2 3, no. 1 (Spring 2008)
                                                                                            e ~

Table 2. Debt-Financed Investment in Modern Accrual Accounting

ltrnl                                                                                        Amount ($ nlillion)

Income statement

   Revenue                                                                                             0

   Expenses                                                                                            0

   Income-statement surplus                                                                            (1

Balance sheet

   Assets                                                                                           20 0

   Liabilities                                                                                      200

   Net worth                                                                                           0

Cash-flow statement

   Cash disbursed to investment                                                                     200

   Cash surplus                                                                                     200
                                                                                                  -


   Cash from financing                                                                              200

   Note: Cash surplus is the sum of cash disbursed to operations and cash disbursed to investment.
   Solrrce: Authors.




must be audited by an independent auditor. In addition, accrual accounting stan-
dards tackle bias by preferring measures that are objectively verifiable even at the
cost of some relevance. Some standards, for example, require an asset to be
valued by recording its acquisition cost and then depreciating the cost according
to a simple formula. The resulting value can only approximate the asset's true
value, but the measure is less vulnerable to bias than alternative measures. When
standards require the reporting of market value, they sometimes require the
valuation to be performed by an independent expert (other than the auditor).
Accounting scandals show that these safeguards can fail to prevent biased report-
ing, but they are surely better than nothing. Moreover, although the risk of mis-
leading accrual information is real, it does not provide a strong argument against
the adoption of accrual accounting, because cash-based reporting is at least as
vulnerable to manipulation, as argued earlier.
    Reporting according to modern accrual accounting standards is, however, more
costly than reporting according to cash standards, and it can take years for a gov-
ernment to move from cash to accrual accounting. These costs are more likely to
be justified in middle- and high-income countries than in the poorest developing
economies. Many high-income countries (including Australia, Canada, New
Zealand, the United Kingdom, and the United States) have already made the
transition, and many middle-income countries (including Chile, Indonesia, the
Philippines, and South Africa) are adopting accrual accounting.
    Although accrual accounting generates valuable information missed by tra-
ditional cash accounting, it is not sufficient for the assessment of net worth, even

in middle- and high-income countries. For one thing, accounting values can
diverge too much from true values. After many years of inflation, the depreciated
acquisition cost of an asset may greatly underestimate the present value of the
cash flows it will generate. In addition, accounting values of assets that generate
user fees and higher tax payments at best capture only the value of the user fees,
because the present value of future tax revenues does not count as an asset from
the conventional accounting perspective. In contrast, durables are generally
treated as assets and valued at their depreciated acquisition or replacement cost,
even if they generate no future cash flows from either user fees or taxes.
Expenditure on a bridge to nowhere can create an accounting asset even if it
generates no tolls and does nothing to increase economic output.


Long-Term Fiscal Projections

Long-term fiscal projections have the potential to remedy some of the short-
comings of accrual accounting. Such projections, prepared in various ways by
countries such as Australia, New Zealand, the United Kingdom, and the United
States, can include estimates of the government's operating and investing cash
flows over the next 50-75 years, which can then be discounted back to the
present to arrive at an estimate of the government's net worth. Crucially, all
expected cash flows under current policies can be projected, including taxes and
welfare expenditure.
  The projections can include public investment, expenditure on operations and
maintenance, and payments to privately financed firms in public-private partner-
ships. Revenues from user fees can be included. If evidence suggests that some
investments increase tax revenue by generating more taxable economic activity,
this extra revenue can also be included.
  Long-term fiscal projections can be designed to take account of the uncertainty
of future cash flows. On the one hand, future revenues and expenditure can be
adjusted for risk: risky tax revenues, for example, can be discounted at a higher
rate than more predictable pension spending. On the other hand, the projections
can show how net worth changes with critical assumptions about life expectancy,
health care costs, and output. If output is modeled as a function of the stock of
public capital, the projections can show how sensitive the government's net
worth is to this assumption.
  The biggest disadvantage of long-term projections is a corollary of their useful-
ness: generating the relevant information requires estimates that are subject to
enormous uncertainty. What will the future rate of growth of GDP be? How will it
be affected, if at all, by public investment?The large extent of reasonable disagree-
ment about such estimates implies a large range of reasonable estimates of the
government's net worth. This makes room for self-serving projections.


50                                      TIip M'orld Brink Keseor-01Obserwr; vol. 23. no. 1 (Spring 2008)

Table 3. Benefits and Drawbacks of Alternative Sources of Fiscal Indicators

                                                            Modern accrual    Long-term
Feature                               Cash accounting       accounting        projections

Provides information on short-term   Yes                    Yes               Yes

 cash flows

Provides information on net worth.   No                     Partially         Yes

 given current policies

Incorporates uncertainty              Avoids the issue      Partially         Yes

Limits self-servingforecast bias      Yes                   Yes               Not easily

  Source: Authors.




   Though uncertainty in the estimates is unavoidable, a government can make
its projections more credible. It can allow others to see how the results are gener-
ated by making its projection model publicly available (in a spreadsheet on the
Internet, for example). For critical parameters or key variables, it can use the esti-
mates of a panel of independent experts, as Chile does in estimating its structural
surplus. It can prepare and publish standards that it will follow in making the
projections. And it can legislate that an independent auditor must opine on
whether the projections follow those standards and reflect the stated assumptions.
International organizations and financial institutions could help develop the
standards and expertise necessary for some of these steps.
   Each approach has benefits and drawbacks (table 3). Cash accounting offers
the necessary information on liquidity and, because of its short-term focus. limits
opportunities for some sorts of bias, but it provides no information on the crucial
issue of government net worth. Modern accrual accounting fills this gap using
methods designed to limit bias, but partly because of the concern to limit bias, it
includes some poor estimates of values and provides no information on crucial
elements such as future tax revenues. Long-term projections can overcome these
problems, but only at the cost of requiring more estimates, creating more leeway
for bias. Given the advantages and disadvantages of each approach, the best strat-
egy, at least for middle- and high-income countries, would seem to be for govern-
ments to develop both accrual accounting and long-term projections while
continuing to monitor short-term cash flows.



Fiscal Targets and Fiscal Rules

Indicators of net worth can also form the basis for new fiscal targets and rules
that promote solvency without sacrificing public investment. One alternative is
the so-called golden rule, according to which governments can borrow only to


Easterly. Irwin, and Serven                                                               51

invest in the creation of new assets. This idea of separating the current and
capital budgets is hardly new, but it has recently been revived (a classic reference
is Musgrave 1939;see Bassetto and Sargent 2005 for historical background). The
British government has adopted a version of the golden rule that "over the econ-
omic cycle" allows it to borrow only to invest and only as long as public sector
net debt remains below 40 percent of GDP (H.M. Treasury 2004).
   Accrual accounting suits the golden rule, because another way of stating the
rule is to say that governments must not run an income-statement deficit (that is,
they must not reduce their net worth). Because the income-statement surplus
excludes expenditure on investment, the golden rule encourages investment.
Because the income-statement surplus includes depreciation, however, adopting
the golden rule is not the same as simply exempting investment from the fiscal
targets-it   amounts to exempting net investment from fiscal targets. Observance
of the golden rule over a long period of time would eventually result in a public
debt stock no larger than the public capital stock so that to a first approximation
the outstanding debt would be fully backed by public assets.
   The limitations of typical accounting mean that the golden rule does not
ensure solvency or even expected solvency. The reason is that the assets may not
yield an expected return high enough to cover the interest on the debt that
financed their acquisition. Furthermore, by treating current and capital spending
differently, the golden rule offers an incentive for opportunistic misclassification of
expenditures.
   An alternative that, in principle, avoids these problems is the permanent-
balance rule. Roughly stated, it requires governments to set tax rates at a constant
fraction of output that over the long run pays for the government's present and
future expenditure (Buiter and Grafe 2004). Named by analogy with Milton
Friedman's permanent-income hypothesis, the rule allows governments to borrow
when revenue is temporarily low or when current investment opportunities are
greater than future investment opportunities. Long-term fiscal projections suit the
permanent-balance rule, because such projections are required to determine the
required minimum tax rate. Implementing the permanent-balance rule success-
fully would thus require addressing the problems discussed earlier about the
reliability of long-term fiscal projections, which is no easy matter.
   A compromise solution that avoids some of the problems with both the golden
rule and the permanent-balance rule is the modified golden rule recently proposed
by Mintz and Smart (2007). They suggest that governments should be able to
borrow to invest in self-liquidatingassets, such as power plants that generate future
revenues for the government but not in assets that generate no revenues, such as
typical public schools. Furthermore, just as firms do not finance all their assets
with debt, governments should be limited to borrowing only some fraction of the
value of the revenue-generating assets. These features would result in a rule that is


52                                       Thr World Bank Hc~senrchObsc.r\,ec 1101.23. t~o.1 (Spring 2008)

more conducive to solvency than the golden rule, without relying as much on
potentially unreliable long-term forecasts as does the permanent-balance rule.



Conclusions

In many industrial and developing economies, governments have cut back on
public investment as they have brought their budgets closer to balance. Although
budget cuts were probably necessary, the cuts in public investment may have
been counterproductive, because much theory and evidence suggest that public
investment has the potential to increase future output.
   In the worst case, investment cuts trigger a vicious circle, in which the sub-
sequent deterioration of future revenue forces further investment cuts, leading to
yet further deterioration, further investment cuts, and so on. What is supposed to
be fiscal adjustment in this case actually has the same consequences as fiscal pro-
fligacy. Cutting investment to promote solvency becomes the fiscal equivalent of
walking up the down escalator-riders     step up only to end up below whert: they
started.
   The cuts in public investment should have come as no surprise when most
countries measure their fiscal position not in terms of net worth but in terms of
short-term cash flows and gross debt, and cutting investment can reduce debt
and short-term cash flows, even as it reduces net worth. The problem afflicts both
industrial and developing economies, but it is more pressing in developing econ-
omies, which have not yet built up their public capital stocks.
   The decline in public investment suggests the need to rethink fiscal strategies.
In some cases, it may be best to increase public investment and accept a higher
short-term cash deficit in exchange for higher tax and user-fee revenues later.
This strategy is unlikely to be right for all countries, however. Those with good
infrastructure and bad fiscal positions may indeed do well to cut public invest-
ment. Countries with high taxes and debt may do best to increase public invest-
ment but finance it by cutting current expenditure. Still others, with high debt
and little room for cuts in current expenditure, may have no choice but to raise
taxes or forgo improvements in their infrastructure. Each case must be analyzed
on its merits, with-given   the tendency to be optimistic in forecasting growth and
the performance of investments-a      degree of skepticism. One general lesson is
that appropriate spending composition has to be an essential part of fiscal adjust-
ment and consolidation strategies, because it affects growth outcomes. In other
words, spending targets and growth forecasts cannot be set without regard to the
composition of expenditure, as they currently are.
   All governments are likely to benefit from better fiscal information. The idea is
not to abandon measures of debt and short-term cash flows, which are clearly

important, but rather to supplement them with measures of assets, yielding a
measure of net worth and its change over time. What is needed is information
that allows governments to quickly determine when improvements in short-term
cash flows are coming at the expense of declining net worth. Two means of gener-
ating such information are constructing long-term projections of fiscal cash flows
and adopting modern accrual accounting.
   Better fiscal information is helpful irrespectiveof whether the government follows
quantitative fiscal rules or targets. But the question also arises whether govern-
ments should set themselves fiscal rules or targets incorporating measures of invest-
ment or net worth. Because debt and short-term cash flows matter, rules or targets
based exclusively on net worth may not be helpful. But combining net worth with
conventional fiscal measures may have merit. The United Kingdom, which adopted
a version of the golden rule combined with a debt target, offers one example. An
even better option might be a modified golden rule that allows borrowing to
finance a portion of the cost of cash-generating assets but also requires that some
proportion be financed by current taxes. There is no obvious "best" solution, but
whatever the specific solution chosen, it is clearly time to change the exclusive
focus on public sector liabilities and bring public sector assets into the picture when
designing fiscal adjustment. It is much easier to walk up the up escalator.




Notes

William Easterly is professor of economics at New York University, where he is also a faculty affiliate
of Africa House and co-director of the Development Research Institute: his email address is william.
easterly@nyu.edu. Timothy Irwin is a senior economist in the Finance. Economics, and Urban
Department at the World Bank; his email address is tirwin@worldbank.org. Luis Serven (corre-
sponding author) is research manager for macroeconomics and growth in the Development
Research Group at the World Bank; his email address is Iserven@worldbank.org. An earlier version
of this article was prepared for the World Bank's Latin American Regional Studies Program. The
authors are grateful to Penelope Brook, Antonio Estache, Jose Luis Irigoyen. Guillermo Perry, Sergio
Rebelo, Augusto de la Torre, and three anonymous referees for helpful comments.




References

Afonso, J. 2005. "Fiscal Space and Public Sector Investment in Infrastructure." IPEA Texto para
   Discussao 1141, Instituto de Pesquisa Econdmica Aplicada. Brasilia.

Alesina, A., and R. Perotti. 1997. "Fiscal Adjustments in OECD Countries: Composition and
   Macroeconomic Effects." Internationai Monetary Fund Staff Papers 44:210-48.

Balassone. E, and D. Franco. 2000. "Public Investment. the Stability Pact and the Golden Rule."
   Fiscal Studies 21(2):207-29.


54                                            The World Bank Research Obsurveer; wl. 23. no. I (Spring 2008)

Bassetto, M.. and T. Sargent. 2005. "Politics and Efficiency of Separating Capital and Ordinary
   Government Budgets." NBER Working Paper 11030. National Bureau of Economic Research,
   Cambridge, Mass.

Buiter, W, and C. Grafe. 2004. "Patching Up the Pact: Suggestions for Enhancing Fiscal
   Sustainability and Macroeconomic Stability in an Enlarged European Union." Ecorloinics of
   Transition 12(1):67-102.

Calderon, C., and L. Serven. 2004. "Trends in Infrastructure in Latin America." World Bank Policy
   Research Working Paper 3400, Washington, D.C.
.       . 2007. "Is Infrastructure Capital Productive?" World Bank, Washington, D.C.
De Haan. J.. J. Sturm, and B. Sikken. 1996. "Government Capital Formation: Explaining the
   Decline." WeltwirtschaftlichesArchiv 132(1):55-74.

Easterly, W 1999. "When Is Fiscal Adjustment an Illusion?" Economic Policy 14(28):55-86.

Easterly, W., and L. Serven. 2003. The Limits of Stabilization: Infrastructure, Public Deficits and Growth
   in Latin America. Stanford. Calif.: Stanford University Press.

Estache, A. 2004. "What Do We Know about Sub-Saharan Africa's Infrastructure and the Impact of
   Its 1990s Reforms?" World Bank. Washington, D.C.

Ferreira. E, and C. Araujo. 2005. "On the Economic and Fiscal Effects or Infrastructure Investment
   in Brazil." Funda~Bo Getulio Vargas da Escola de Pos-Gradua~Boem Economia. Ensaio
   Econbmico 613. Siio Paulo, Brazil.

Fontaine, E. 1997. "Project Evaluation Training and Public Investment in Chile." American Economic
   Review 87(2):63-7.

Guasch. J. Luis. 2004. Granting and Rrrzegotiating Infrastructure Concessions: Doing It Right. WBI
   Development Studies. Washington. D.C.: World Bank.

Hemming, R., and International Monetary Fund. 2006. Public-Private              Partnerships. Government
   Guarantees, and Fiscal Risk. Washington, D.C.: International Monetary Fund.

Hicks. N. 1991. "Expenditure Reductions in Developing Countries Revisited." Journal of Internationnl
   Devi~lopment3(1):29- 37.

International Monetary Fund. 2004. "Public Investment and Fiscal Policy." Washington, D.C.

.          2005. "Public Investment and Fiscal Policy: Lessons from the Pilot Country Studies."
   Washington. D.C.

Irwin, T. 2007a. 'Accrual Accounting, Long-Term Fiscal Projections, and Public Investment in
   Infrastructure." In G. Perry, L. Serven. and R. Suescun eds.. Fiscal Policy. Stabilization, and
   Growth: Prudence or Abstinence? Washington, D.C.: World Bank.
.          2007b. Governmerlt Guarantei~s:Allocating and Valuing Risk in Privately Financed Infrnstructure
   Projects. Washington. D.C.: World Bank.

Kalaitzidakis, E, and S. Kalyvitis. 2004. "On the Macroeconomic Implications of Maintenance in
   Public Capital." Journal of Public Economics 88(3-4):69 5- 712.

Keefer. Philip, and Steven Knack. 2007. "Boondoggles, Rent-Seeking and Political Checks and
   Balances: Public Investment under Unaccountable Governments." Review of Economics and
   Statistics 89(3):566- 72.

Lane. E 2003. "The Cyclical Behavior of Fiscal Policy: Evidence from the OECD." Journal of Public
   Economics 87(12):2661-75.

Mintz, J., and M. Smart. 2007. "Incentives for Public Investment under Fiscal Rules." In G. Perry.
   L. Serven, and R. Suescun, eds., Fiscal Policy. Stabilization, and Growth: Prudence or Abstinence?
   Washington. D.C.: World Bank.



Easterly. Irwin, and Srrvin                                                                            55

Musgrave, R. 1939. "The Nature of Budgetary Balance and the Case for a Capital Budget." American
   Economic Review 29(2):260- 71.

OECD. Various years. Economic Outlook. Paris.

Pereira, A.. and M. Pinho. 2006. "Public Investment, Economic Performance and Budgetary
   Consolidation: VAR Evidence for the 12 Euro Countries." College of William and Mary,
   Department of Economics Discussion Paper 40, Williamsburg, Va.

Perotti, R. 2004. "Public Investment: Another (Different) Look." IGIER Working Paper 277.
   Innocenzo Gasparini Institute for Economic Research. Milan.

Pritchett. L. 2000. "The Tyranny of Concepts: CUDIE. Cumulated, Depreciated. Investment Effort Is
   Not Capital." Journal of' Ecor~omicGrowth 5(4):361-84.

Rioja. E. 2003a. "Filling Potholes: Macroeconomic Effects of Maintenance versus New lnvestments
   in Public Infrastructure." Journal of Public Ecor~omics87(9- 10):2281-304.
.         2003b. "The Penalties of Inefficient Infrastructure." Review of Developmer~tEcorlornics 7(1):
   127-37.

Roubini, N., and J. Sachs. 1989. "Government Spending and Budget Deficits in the Industrial
   Countries." Eronornic Policy 4(8):99- 1 32.

Serven. L. 2007. "Fiscal Discipline, Public Investment, and Growth." In G. Perry, L. Serven, and R.
   Suescun, eds., Fiscal Policy, Stabilization, and Growth: Prudence or Abstinence? Washington, D.C.:
   World Bank.

Talvi. E., and C. Vegh. 2000. "Tax Base Variability and Procyclical Fiscal Policy." NBER Working
   Paper 7499. National Bureau of Economic Research. Cambridge, Mass.

Tanzi, Vito, and Davoodi Hamid. 2002. "Corruption. Public Investment, and Growth." In George T.
   Abed, and Sanjeev Gupta, eds., Corruption and Eronomic Performance. Washington D.C.:
   International Monetary Fund.

Treasury, H.M. 2004. "Long-Term Public Finance Report: An Analysis of Fiscal Sustainability."
   December. London.

World Bank. 2005. Infrastructure in Latin America: Recent Developments and Key Challeng~s.
   Washington, D.C.: World Bank.




                                                 The M'orld Btrnk Research Obserwr. vol. 23. no. 1 (Spring 2008)

      What Can Countries in Other Regions
       Learn from Social Security Reform in
                                                    Latin America?



                                    Indermit S. Gill, Ceren Ozer, and Radu Tatucu


About a dozen colintries in Latin America have enacted reforms that include elrlments
being contemplated elsewhere, including the partial privatization of social security. It is
not e a q to draw universal lessons for social security refornl from the experience of
countries sudl as Argentina, Chile, and Mexico, however, where sizeable public pension
systems went bankrupt before the populations aged, mainly because of inismanagement.
Most developing economies have much snlaller social security systenls. Relatively well-
managed systems in industrial countries face problems that are long term in nature and
have been brought about by an aging population. The experiences of Latin America never-
theless offer some general lessons for countries in other parts of the world. These lessons
relate to changes in labor market incentives accompanying refortrls and how workers
react to them, government actions that have met with success in managing the transition
to funded pensions, and the expectations of individuals from social security systems.
Latin America's reforms suggest that the most effective approach is to keep payroll taxes
low, governments solvent, and social security systems focused on providing reasonable
insurance against poverty in old age. JEL codes: G23, H3I, H53, H55, J26.




Latin America has long experience with social security reforms. Since the early
1980s, especially during the 1990s, about a dozen countries in the region have
attempted to radically reform their social security systems. Many observers have
studied social security reforms and outcomes in Latin America to draw lessons.
Among the more widely discussed social security reform experiences have been
those of Chile, Argentina, and Mexico. Gill. Packard, and Yerrno (2005), among
others, have analyzed these experiences, with the objective of informing pension


.!, The Author 2008. Published by Oxford University Press on behalf of the International Bank for Reconstruction and
Development i 1 . 1 1 ~L W R I . ~I I A ~ K .
                                           All rights reserved. For permissions, please e-mail: j.ournals.permissions@oxfordjournals.org
doi:10.1093iwbrollkrnOll                        Advance Access publication January 15 . 2008                                 23:57-76

Table 1. Actual and Projected Total Fertility and Life Expectancy at Birth, by Region

                                            Total fertility (children per woman)          Life mpecpectancy at birth



Region                               1970- 75    2000-05         Low      Medium High    2000-05        2045- 50

World
Africa
Asia                                    5.08        2.47         1.42      1.91  2.41       67.3            77.2

Europe                                  2.16        1.40         1.33      1.83  2.33       73.7            80.6
Latin America and the Caribbean         5.05        2.55         1.36      1.86  2.36       71.5            79.5
Northern America                        2.01        1.99         1.35      1.85  2.35       77.6            82.7
Oceania                                 3.23        2.32         1.42      1.92  2.42       74.0            81.2

  Source: Jousten 2007, based on data from United Nations Secretariat 2005.




policy in Latin America. This article attempts to draw lessons for other countries
from the Latin American experience.
   As in Latin America, many other developing regions are experiencing signifi-
cant demographic change. Birth rates have fallen, and life expectancy is on the
rise (table 1).
   Latin America offers the most varied experience with structural pension
reform. Thus it provides insights into how radical reform of pension systems can
help meet the pressures created by rising longevity and persistent old-age poverty.
   The fiscal deficits and increasing contingent liabilities (obligations to pay sums
dependent on future events) of generous public pension systems, often combined
with system mismanagement, created an immediate impetus for governments to
institute structural pension reforms in Latin America. Rising pension costs raise
questions of fiscal sustainability across the globe, but so far only a few countries
have engaged in major structural reform. Many focused instead on parametric
changes-adjusting          the size and scope of their single-pillar social security systems
(the mandatory, pay-as-you-go, publicly provided part) by changing the rates of
contributions, the benefit calculations, and the retirement age. This article
assesses whether the demographic and social pressures these countries face
require structural reform of social security, along the lines of the reforms adopted
in Latin America.
   This article does not address the special challenges of reforming pension
systems for government workers: it focuses on the social security system for the
employees of enterprises. Civil service pensions put significant fiscal pressures on
more than half of the world's countries-including                        some of the largest developing
economies, such as Brazil, China, and India-which                               have separate pension
schemes for civil servants. Civil service pension reform is a contentious issue. Not


58                                                The World Bank Research Obserirr: vo1. 23, no. 1 (Spring 2008)

surprisingly, in most Latin American countries the reforms did not affect private
and public sector employees equally. For political reasons reforming governments
often avoided structural changes to the public pension systems benefiting the mili-
tary and civil servants.
   The article is organized as follows. The next section summarizes social security
problems and reforms in various regions. The following section assesses the
performance of reform in Latin America and draws lessons from that experience
for other countries. The last section draws implications for policymakers consider-
ing reforms.



Social Security Problems and Reforms

The underlying conceptual framework is based on the "comprehensive insurance"
concept of Ehrlich and Becker (1972, 2000). In the face of a possible loss a "com-
prehensive insurance" approach suggests that individuals can insure against the
loss, take steps to lower the likelihood that the loss will occur, or do nothing. The
purchase of insurance transfers income from "good" to "bad" times in order to
reduce the magnitude of losses in bad times. Individuals can insure themselves in
two ways: through mechanisms that pool the risk of the loss occurring among
those who are exposed to this risk or by consumption smoothing through individ-
ual savings ("self-insurance").In a perfect world, pensions could be left to private
insurance and to individuals' voluntary saving decisions. In the presence of
imperfect information, missing markets, and other distortions government invol-
vement becomes necessary.
    In addition to insurance and consumption-smoothing objectives the two other
primary objectives of pensions are poverty relief and redistribution. Redistribution
complements the role of progressive taxation, for example, by subsidizing the con-
sumption smoothing of individuals who earned little during their working years
(Barr and Diamond 2006). Pension policy may also have secondary goals. such
as improving the operation of labor and capital markets and encouraging individ-
uals to save more. Some might argue that promoting economic growth is an
additional objective of pensions. One of the key debates on pensions centers on
the relative weights of these different objectives.
    Another debate concerns whether pensions should be pay-as-you-go or funded.
Most state-run pension schemes remain pay-as-you-go. Private schemes are gen-
erally funded, with pensions paid from a fund built up over time by members'
contributions. Many countries in Latin America added mandatory contributions
to private-funded pensions to their existing pay-as-you-go schemes.
    There is general agreement on the desirability of regulated voluntary pensions;
disagreements remain over whether private schemes should be mandatory.


Gill, Ozer: and Tatucu                                                             59

Supporters of the mandatory-funded private schemes emphasize that individual
incentives to work, save, and incur risks are least distorted if the mandated saving
flows into privately managed accounts. Individuals' pension-related decisions
involve long-run choices and require a good understanding of pensions products,
which are often complex. Both factors create information problems, which
reduce-often          considerably-people's          ability to male choices that maximize their
long-term well-being (Barr and Diamond 2006). Arenas and Mesa-Lago (2006)
find, for example, that many workers in Chile lack the data and skills to make an
informed selection of the best pension provider. Barr and Diamond (2006) argue
that imperfect information in this context cannot be addressed simply by offering
more information, because the issue at hand is an information-processing
problem.
   The most common component of a pension system is a national defined-benefit
scheme in which pension benefits depend on a worker's wages and age (table 2).
Some countries also require that workers contribute to individual retirement
accounts, and pensions are paid from the accumulated funds. Many countries
also encourage workers to contribute to individual accounts managed by financial
institutions. The last two kinds of pensions are called defined-contribution plans
(for a more detailed explanation of the economics of pensions, see Barr and
Diamond 2006).
   A recent review of pension systems in 53 countries-all                              30 Organization for
Economic Co-operation and Development (OECD) members plus 23 countries in
Eastern Europe and Central Asia, Latin America and the Caribbean, and the
Middle East and North Africa-shows                    that the primary differences across systems
lie in three main areas (Whitehouse 2007). First, the level of social security


   -


Table 2. Instruments for Old-Age Income Security

                                                                                 Mainstc~y:savirlg
                                  Mairlstu!y: pooling
Nature o] instrlrnlent                Murrdntory                      Mnrtdatorg                    Volrintary
                       --                                                                           --

Common name                 First pillar                     Second pillar                 Third pillar

Main function               Insure against poverty in        Smooth consumption            Smooth consumption
                             old age, reduce income            over life cycle              over life cycle

                             inequality

Main role of                Detine benefits                  Define contributions          Define incentives
 goverr~ment
Principal risk bearer       Government                        Worker                       Worlcer

Financial instrument        Unfunded pay-as-you-go            Funded individual            Funded tax-preferred
                                                               accounts                      individual accounts

  Solrrcc.:Authors' compilation.




60                                                   Th(z M'orld Bnrlk Ktpsmrcl~ODserwc vol. 2 3. no. 1 (Spring 2 0 0 8 )

benefits (the target replacement rates) varies considerably. There is a negative link
between the target replacement rate for mandatory pensions and the relative
importance of voluntary private pensions. In countries such as Canada, the
United Kingdom, and the United States, where voluntary private pension pro-
visions are widespread, mandatory pensions are relatively small.
   Second, the relative emphasis on pooling and saving differs significantly.
Countries such as Australia, Canada, New Zealand, and the United Kingdom
emphasize the pooling function; Nordic countries and most Latin American
countries also have progressive systems. In continental Europe (outside the
Nordic countries) and the Middle East and North Africa pensions are more
strongly correlated with earnings, signaling a greater emphasis on the savings
function.
   Third, the relative reliance on public and private sectors varies greatly. In
addition to the larger role for voluntary and private provision in countries with
low target replacement rates, many countries also involve the private sector in
running the mandatory pension system. The best-known cases of pension reforms
that increase the role of the private sector are in Latin America and the
Caribbean, followed by Eastern Europe and Central Asia. The private sector plays
an important role in mandatory pension provision in about one-third of the high-
income OECD countries. At the other end of the spectrum, countries in the
Middle East and North Africa do not yet involve the private sector in mandatory
pensions, and voluntary provisions barely exist.
    The experiences of Brazil, China, India, the Russian Federation, and South
Africa illustrate the range of problems faced by developing economies and
transition economies. Reformsin these and other countries are described below.



Brazil

Conditions in Brazil today resemble those in some Latin American countries
before they undertook the type of structural reforms assessed below. Pension
reform was motivated mainly by fiscal pressures. Despite having a young popu-
lation, Brazil's level of public expenditure on pensions is large. Subsidies to cover
Brazil's public pension regimes' deficits rose from 4.6 percent of GDP in 1998 to
5.6 percent in 2004 (Giambiagi and de Mello 2006). So far pension reform has
involved streamlining the system's mandatory first pillar and developing a third
pillar of voluntary, complementary, personal saving schemes, without creating a
second pillar of mandatory individual saving plans. Since the late 1990s, reform
of the regime for private sector workers has been aimed at tightening eligibility
conditions. reducing replacement rates, and increasing the share of population
covered by social security.


Gill. 0zc.c trrld Tatncu

East Asia

East Asia has a diverse set of pension systems. Malaysia and Singapore have large
provident fund systems operating under public administration at the national
level on the basis of defined contributions. The Republic of Korea, the Philippines.
and Thailand have OECD-style        defined-benefit pension schemes (though with
lower coverage rates than industrial countries), with more emphasis on
redistribution.
   Like other communist countries, China formerly had defined-benefit pay-as-
you-go pension systems covering urban public sector employees, and contri-
butions were largely the responsibility of state-owned enterprises. Rising pension
expenditures (caused partly by the use of early retirement as a mechanism to deal
with excess workers at state enterprises) and declining contributions (the result of
poor performance by state enterprises, rising unemployment, a growing informal
sector, and weak enforcement) motivated China to seek the best reform alterna-
tives (Asher and others 2005).


South Asia

India, like other countries in South Asia, is still at the beginning of its demo-
graphic transition. It has a low ratio of pensioners to workers, who continue to
contribute to social security schemes. Although only 13 percent of India's labor
force is covered by pensions, pension debt is becoming a serious issue. Implicit
pension debt is estimated at 25 percent of GDP nationally, and in some states the
extent of the problem is much greater. India is in the process of passing into law a
new pension system that would shift all new central government employees to a
defined contribution plan from the current defined-benefit scheme, shifting the
risk of retirement financing from the government to individuals (Shah 2006).
Participants in the new scheme will have access to a range of investment products
from selected private sector companies. The new pension system will be offered on
a voluntary basis to private sector workers. Aiming to fulfill the social protection
dimension of pensions, India's noncontributory pensions target the elderly poor:
the means-tested schemes administered by states and supplemented with federal
funds reach 1of every 10 elderly Indians.


Eastern Europe and Central Asia

Regionally, Europe and Central Asia is second only to Latin America in terms of
pension reform activity. In the early transition period, countries in the region
faced serious challenges to their social security systems as output fell, contri-
butions declined, and the number of beneficiaries grew. Transition proved to be


62                                     The World Bank Research Observe6 vol. 23. no. 1 (Spring 2008)

challenging politically. Some countries chose parametric reforms, such as raising
the retirement age, with or without changing the benefit formulas; others,
especially the European Union accession countries, undertook structural reform.
Ten countries in Europe and Central Asia created second pillars of mandatory,
funded individual accounts.
   The Russian Federation began structural reforms of its pension system in 2001.
The new system comprises three pillars. The first pillar, the major component of
the system, is a publicly managed pay-as-you-go, defined-benefit scheme that con-
sists of a flat basic benefit and a notional defined-contribution scheme. The
second pillar is a mandatory defined-contribution scheme with mixed public-
private management. The third pillar is the voluntary privately managed com-
ponent (OECD 2006).



Middle East and North Africa

Countries in the Middle East and North Africa have already put in place defined-
benefit pension systems financed on a pay-as-you-go basis. Egypt, Iran, and Libya
have also developed noncontributory pension schemes. Even with young popu-
lations, many pension systems in the region are not financially sustainable
without reform. Despite this, reforms have been limited, with Lebanon and
Morocco among the few countries considering systemic pension reform.



Sub-Saharan Africa

Noncontributory schemes in Sub-Saharan Africa exist in only a few countries,
such as South Africa, where they are financed by general revenues. As in South
Asia, the pension reform agenda in Sub-Saharan Africa is driven by the fiscal
pressures arising from civil service pensions. Average coverage in the region is less
than one-fifth of the labor force, with the rest of the population relying on its own
resources and informal old-age support.
    Pension systems around the world are thus diverse, with every country facing a
unique set of problems. But there are some common features. Many countries in
the developing world-including      major economies such as China and India-
have to deal with expanding social security systems that cover only small parts of
their populations, and they have to do so within tight fiscal constraints. Industrial
countries face problems that are long term in nature and have been brought
about by an aging population. While the immediate concern behind the reform
process has often been fiscal sustainability, getting the incentives right is equally
important.


Gill. OZECand Tatucu

Performance i n Latin America and Lessons o f Experience

The experiences of Latin America offer some general lessons for countries in other
parts of the world. Chile first adopted structural reform in 1981. Argentina,
Bolivia, Colombia, Mexico, Peru, and Uruguay followed in the 1990s, and Costa
Rica, El Salvador, the Dominican Republic, Ecuador, and Nicaragua followed after
2000. The notable exceptions to reform are Brazil and Republica Bolivariana de
Venezuela.
   The details of the reforms vary across countries. What is common is that a
publicly mandated and administered pay-as-you-go component operated on a
defined-benefit basis was retained and a publicly mandated but privately adminis-
tered system of defined-contribution personal accounts was added. Governments
also made some attempts to increase voluntary saving through defined incentives,
such as Investment Retirement Accounts in the United States, which encourage
individual retirement savings through tax benefits.
   But the headline item has been the new system of personal accounts.
Mandatory personal accounts constitute the second pillar. Second-pillar reforms
can be classified into three categories. The first is the "Chilean model," which
made private accounts mandatory for all new workers; Bolivia, El Salvador, and
Mexico also adopted this model. The second is what might be called the
"Peruvian model," which Colombia also adopted. Under this model. new workers
are given a choice between a downsized pay-as-you-go pension and a private
account. Under the third approach. which can be termed the "Argentine model,"
new workers have a pay-as-you-go tier combined with a private account tier;
Costa Rica and Uruguay also adopted this model.
  Until the early 2000s, Latin American countries were inclined to adopt some
variant of these three approaches, with a tendency for later reformers to select
the Chilean model, which gave new workers no choice but the personal accounts.
Since then, pension reforms are as likely to eschew privatization entirely. Brazil,
for example, has chosen to reform the parameters of its pay-as-you-go pensions,
and Ecuador and Nicaragua have decided to postpone structural reforms.
  Assessment of the fiscal, financial, and labor market effects of the reforms
reveals that the results have been mixed. Performance in a variety of areas is
assessed below.


Effect of Reform on System Balances

Fiscal imbalances were the primary motivation behind reforms, just as they
appear to be the main concern in India, the Russia Federation, and the United
States. The reforms seem to have had some success. Simulations by Gill, Packard,
and Yermo (2005) for eight Latin American reformers indicate that the rate of


64                                   Tllr World Bank Research Obscrvec v01. 23.110.1 (Spring 2008)

Figure 1. Pension-related Long-term Deficits after Reforms




  Solirce: Based on Gill. Packard, and Yermo (2005).




accumulation of pension debt fell sharply in most countries as a result of the
reform (figure 1).In Bolivia, for example, pension-related debt would have been
almost 160 percent of GDP in 2030 without reforms but is less than 50 percent
of GDP with reforms. In Uruguay pension-related debt ratios for 2030 would be
about 150 percent without reform and 70 percent with reform.
   These are the long-term effects. In the immediate aftermath of reforms,
however, these countries had to deal with the transition costs, as contributions
were divided from paying benefits to the elderly to investing in the private
accounts of worliers. The promised benefits to current pensioners and older
worliers under the old system had to be paid, while part of the payroll tax flowing
in had to be diverted to fund individual accounts. With contributions diverted
into funded pension accounts, governments had to find ways of financing existing
pay-as-you-go liabilities. For a variety of reasons, in some countries these tran-
sition costs proved to be higher than expected at the time of the reform. In
Bolivia, for example, the pension-related deficit has been rising instead of falling,
as projected.
   More important is the fact that many countries had to finance the transition
through increased government debt, some of it held by the new pension funds.
With the investment regulations favoring government debt and with the thin
capital markets in much of the region, Latin American workers essentially


Gill.Ozcy and 'Rztucli                                                            65

Figure 2. Importance of Government Bonds in Private Account Portfolios




                Government securities           Financial institutions      Corporatebonds and equities



  Source: Based on Gill. Packard. and Yermo (2005).




swapped pay-as-you-go debt for government bonds. Fully two-thirds of the
average investment portfolio consisted of government securities (figure 2). While a
case can be made that this debt is more secure, the case of Argentina-where                                the
government wrote down its debt by more than two-thirds-suggests                                    that this
greater security is a matter of degree. Moreover, with less than 20 percent of
these funds going to corporate bonds and equities, the growth effects of the
reform were also likely weak.
  The reforms aimed at improving labor market efficiency by strengthening the
links between contributions and benefits and by reducing the regressive transfers
that characterized the previous social security systems. They appear to have been
successful in reducing regressive transfers (figure 3).
  Although reforms have addressed within-system equity concerns, the most
inequitable aspect of unreformed social security systems in Latin America was
that they excluded large shares of the population from even a semblance of
income security. Reforms have been less effective in addressing this problem.
While the closer links between benefits and contributions may have increased
participation, the effect was small. Participation rates in most countries have
essentially flat-lined at levels ranging from about 10 percent to about 67 percent
of the active labor force (figure 4). This lackluster performance--despite closer
links between contributions and benefits-can                     be attributed to high (and rising)


66                                               The World Bar~kResearch Observer; vol. 23, r~o.1 (Spring 2008)

Figure 3. Effect of Reforms on Within-system Equity


            I         1  Reform I reform
                                  No    I




       -3
               Men  \Wornen  Men lWomen Men  b r n a Men  lWornm  Men IWornen Men IWomn  Men lWm    Me-

                  Chlle         Peru     Colornb~a    Argenl~na    Uruguay      Mexico     Bol~v~a   El Salvador




  Note: Figure shows percentage-point difference in internal rates of return earned from national retirement
security system by wealthiest and poorest workers.
  Source: Based on Gill, Packard. and Yermo (2005).


payroll taxes, as discussed below. But some of it also reflects the high manage-
ment and insurance fees private pension providers charge.

Effect of Reform on Financial Markets

Reforms stimulated financial markets-but                      often at a high price for contributors. In
Latin America as a whole the administrative costs of private schemes have been
considerably higher than those of public schemes (Arenas de Mesa and Mesa-Lago
2006). In theory, private systems can reduce administrative costs through compe-
tition. In practice, multiple private providers lose the advantage of econon~iesof
scale, and considerable resources are spent on advertising and sales commissions.
Administrators in private systems charge a commission (as a percentage of wages)
for managing the old-age program plus a premium, transferred to an insurance
company, to cover disability and survivor risks (Mesa-Lago 2006).
    The evidence indicates that ensuring a captive clientele for these pension funds
has led to the growth of a new industry in the reforming countries; assets held by
pension funds more than doubled as a share of GDP between 1998 and 2004. But
while financial market development has been hastened, contributors have generally
paid a high price. This was especially the case in the early days of reform in
countries such as Chile; for workers who had contributed for a decade or longer.


Gill, O z e ~and'R~tuclr                                                                                        67

Figure 4. Pension System Participation Rates, before and after Reform, 1980-2000

          70 0
     -                                                                                                  7
      E

      "
      ).
          60.0  -
      B
     C

          50.0  -
     -
     -
     m
     .-
                -
     .-   40.0

     e
     0


                -
     a
          30.0
     g
     a


          20.0  -
     m                                                                                                       i
     o1
     I
     5    1 0 0 -
     0
     a
           0 . 0 ,   ,    ,  ,    ,      , ,  ,  ,     ,  ,    ,   , ,      ,    ,     ,     ,     ,    ,
      - -                                                                                       -
             29""      , g B " . $ @ , g B " , g B " , @ @ + , 9 . " @ @ s

                        -
               Argenbna  - - -I-
                              - -Bolivia        Ch~le   - -     Colombia  ----rcElSalvador               Mexico

      --O--    Uruguay           Costa Rica     Ecuador         Nicaragua         Brazil




  Note: Rerorm dates are as follows: Argentina. 1994; Bolivia. 1997: Chile, 1981: Colombia. 1994: El Salvador,
1998; Mexico. 1997: Uruguay, 1996.
  Source: Based on Gill. Packard, and Yerrno (2005).



more than a quarter of contributions may have gone toward management and
insurance fees. Even today, 15-20 percent of contributions go to management fees,
and workers have to pay insurance fees as well (figure 5). Administrative costs may
be even higher for low-wage workers in some countries. Low-wage workers often
did not participate in the system before reform because of the weak connection
between contributions and benefits; they may now find the new systems unattrac-
tive because of higher payroll tax rates and onerous administrative fees.
   Administrative costs have generally come down as the pension fund industries
have matured, but in some countries, such as Peru, the fees charged to contribu-
tors have not fallen commensurately. This has meant high profits for the fund
managers. Between 1998 and 2002 the share of workers' contributions going to
fees remained steady in Peru, while fund expenses fell. As a result, profit rates sky-
rocketed (figure 6).
   Chile's recent experience appears to be similar. Competition among private
pension providers in Chile has been limited because of the small number of
administrators and high and increasing concentration among the largest funds
(Arenas de Mesa and Mesa-Lago 2006). Even capable Latin American govern-
ments appear to find it difficult to effectively regulate these oligopolies.



68                                                The World Hrlnk Kt~srtrrchOhsrrvrr: \luI. 2 3, rlo. 7 (Spring 2008)

                   --           - -                          -


Figure 5. Administrative Fees Paid by Workers, as Percentage of Total Contribution, 2002




            Argentina  Bollv~a   Chlle  Colomb~a El Salvador Mexico    Peru  Uruguay         Average




   Source: Based on Gill. Packard. and Yermo (2005).



Lessonsfor Other Countries

Not all of these findings are relevant for other countries, because the countries
and the social security issues being debated are different. However, some general
lessons do emerge.




Figure 6. Pension Fund Costs,Fees, and Profits in Peru, 1998-2002

      100.0                                                                         30
                                                                                       Feeslnet
       90.0                                                                            contributions
  .-                                      .-.o=.
                                               Operational expenseslNetfees         25 and expenses1
  r 80.0                                                                               net fees (percent)
  U

 -a    70.0
                                                                                    20
 2-
  ._-  60.0
  2
  ir
       50.0                                                                          15
  0
  E-   40.0

 4-
  d

 $ 30.0                                                                             10


       20.0
                                                                                    5
       10.0

        0.0                                                                         0




  Source: Based on Gill. Packard, and Yermo (2005).



Gill. Ozec and Tatucu

Effect of payroll taxes on labor market incentives. While the move to private
accounts had a small but positive effect on participation, the negative effect of
higher payroll taxes may have been greater. Given near universal coverage of
social security in developed countries and many transition economies, the discus-
sion here is focused on the incentives to work rather than to contribute to social
security. In particular, the concern is about the distorting effect of social security
on the age of retirement.
   The Latin American experience appears to support those who argue against
raising the payroll tax rate. Two findings are noteworthy. The first is that there is
little evidence from Latin America that, in the presence of high transaction costs,
individual accounts led to stronger labor market incentives, as evidenced in partici-
pation rates. Payroll taxes went up in all countries except Chile and Uruguay
(figure 7).
   The second finding is that diverting payroll contributions from pay-as-you-go
systems to individual accounts appears to have adverse fiscal implications that are
far more potent. Most Latin American countries had to create space for the
second pillar, which required that they downsize and redesign the first pillar.
Chile, Bolivia, and Mexico provided a minimum pension guarantee to low-income
workers whose personal accumulations fell below a specified amount. Many struc-
tural reformers in Latin America had to deal with financing the transition. One
option was to increase taxation (through payroll or broader taxes, such as
income tax or a general consumption tax) or borrowing (by issuing conventional


Figure 7. Payroll Tax Rates before and after Reform




            Chile     Peru   Colombla Argentina Uruguay Mexico    Bolivia El Salvador Costa Rlca Nicaragua Dominican
                                                                                                           Republic




  Source: Based un Gill. Packard. and Yermo (2005).



70                                                  The World Bank Research Ohservt,r, c~ol.23, no. 1 (Spring 2008)

public sector debt). Another was to reduce public spending4n pensions or in
general or create new revenue (through privatization, for example).
   When labor is mobile across sectors and the informal economy is large, structur-
ing the premiums for social insurance programs as payroll taxes may be ineffective.
Greater reliance on a broad tax base, such as an income or consumption tax
instead of a payroll tax, may be more efficient. Using a broader tax is also more
consistent with the poverty prevention and redistributive functions of the remain-
ing public pooling pillar after introduction of the multipillar model, because such a
tax reduces the wedge between the formal and informal parts of the labor force.
The public pooling pillar enables individuals and households to manage shocks to
their income should they need to, enabling them to be more enterprising.


Effect ofpnvatization. The pension debate in Latin America has centered on two
costs associated with the move to private accounts. One is the administrative fees
charged by the special pension funds set up to manage these accounts and the
costs of annuitizing the accumulated funds. The other is the fiscal costs associated
with the transition to private accounts. Latin America's experience with fiscal
costs may be relevant for other countries.
    The cases of Chile and Argentina provide contrasting experiences of the inter-
action of pension reform and fiscal effort. In Chile a strong fiscal effort character-
ized the lead-in to the pension reform: fiscal surpluses averaged more than 5
percent of GDP in the years before the 1981 reform, so that Chile's fiscal deficits
after the reform were mild and short-lived. In contrast, Argentina did not substan-
tially bolster its fiscal situation in the years leading up to its 1994 reform.
Though it ran small fiscal surpluses in the two years before the reform, there is
reason to believe that its fiscal stance after the reform was worse than indicated
by published figures. Payroll tax deductions reduced revenues and increased
pension system deficits. About half of the deterioration of the consolidated public
sector fiscal deficit between 1994 and the 2001 crisis was caused by the worsen-
ing social security balance.
    The degree of protection against policy risk offered by privatizing a portion of
mandated pensions may also be exaggerated. The experience in countries such as
Argentina illustrates how any government-organized social security system-
whether directly administered or simply mandated--can fall prey to politicians.
Since the start of the system about half of all privately managed assets have been
invested in government bonds. During the 2001 crisis, when the goverriment
forced the pension funds to swap dollar-denominated government bonds for peso
debt, the share of government bonds in the private funds' portfolio rose above
three-quarters. Argentina is not unique in this regard. In Mexico, for example,
several years after the reform the share of government bonds in pension fund
portfolios is as high as in Argentina.


Gill, O ; ~ Kand Ritucu                                                            71

   Pension system deficits contributed significantly to the deterioration of
the fiscal balance in Argentina and other Latin American reformers. Many
observers would underscore the importance of having a relatively strong fiscal
position before undertaking structural reforms and of reducing the implicit debt
of unfunded pay-as-you-go systems before making the debt explicit by shifting
to a funded second pillar. But there are also concerns that this replacement
could actually worsen the fiscal balances as reneging on explicit debt may be
more costly than eroding real pension benefits (that is, reneging on implicit
debt). The Latin American experience supports the views of advocates of
general fiscal discipline rather than social security privatization as a prerequi-
site for ensuring a stable domestic financial sector and a friendly environment
for private saving.


Worker objectives in buying old-age insurance. The experience in some Latin
American countries raises questions about what individuals expect from their gov-
ernments. Fiscal stability appears to be necessary for governments to fulfill these
expectations; other conditions may also be required.
   Latin America's experience can offer useful insights on how to curb rising
pension costs and prevent pensioner poverty at the same time. Indeed, many devel-
oping economies already face rising pension spending, often combined with signifi-
cant pensioner poverty (Barr 2006). Bourguignon et cxl. (2004) calculate the
incidence of poverty among the elderly in 19 Latin American countries. Using
household surveys to simulate the fiscal cost and impact on poverty rates of various
uniform pension schemes, they show that a universal minimum pension would
substantially reduce poverty among the elderly in all countries except Argentina,
Brazil, Chile, and Uruguay, where minimum pension systems already exist.
   Evidence from Chile and Peru, where the results of surveys (Gill, Packard, and
Yermo 2005) designed to determine how households manage economic risk are
available, reveals something about what workers expect from governments.1 At
the time of the survey in Peru the government had not instituted a minimum
pension guarantee; the survey revealed that private financial institutions were
trusted more than all three branches of government. It also revealed that more
risk-averse workers chose private funds over the reformed but still risky govern-
ment pay-as-you-go option.
   In Chile the survey results were more revealing. Two decades after reform,
workers seem to be using a system intended to act primarily as a vehicle for
savings-with    a small pooling component-mainly            as a risk-pooling mechanism.
Each cohort of workers that completes the minimum contribution requirements
appears to be content with qualifying simply for the government's guarantee of
minimum pension-a       modest means-tested amount of about 80-90 percent of
the minimum wage.


72                                    Tlre L2iorld Rank Reseclrcll Ohs~rver:w l . 2 3 , no. 1 (Spring 2 0 0 8 )

   Some researchers attribute this outcome to the moral hazard associated with
low-income worlters realizing that any contributions beyond this are a pure tax.
In fact, this behavior occurs less among the working poor and more among
middle- and higher income groups; it is more consistent with a desire to purchase
some insurance against old-age poverty. The switch to other savings instruments
after they have qualified for this insurance also indicates that workers see the
mandated private accounts as relatively expensive or risky compared with other
investments. There is evidence that other retirement investments-housing,
household enterprise, even the education of children-in      Chile are perceived as
less risky than saving in the reformed pension system (Packard 2002). There is
also evidence that households prefer to gain eligibility for the low, government-
guaranteed annuity and continue to save outside the system, despite the variable
but high real returns they could earn in the system. This evidence suggests that
they may place greater value on security than on real rates of return (Gill,
Pacltard, and Yermo 2005).
   In countries where government is generally viewed as reliable, one could make
the case that workers view the social security system more as a mechanism for
insurance against poverty and less as a vehicle for saving to smooth consumption.
The implication may be that the social security benefit structure should be made
more progressive or the system made even more progressive than it is currently in
developed countries.



Main Policy Implications

It is difficult to draw universal lessons on how to reform social security systems
from the experience in Latin America. Developed countries already have universal
coverage and well-developed financial markets, many developing economies
outside Latin America do not have well-developed contributory social security
systems. and transition economies in Europe and Central Asia face entirely differ-
ent challenges than emerging markets in Latin America.
   These differences notwithstanding, the Latin American experience provides
some useful information about the behavior of (rational) workers, the responses
of (profit-seeking) firms, and the responsibility of (fiscally constrained) govern-
ments. Put another way, the experience provides insights into how workers and
firms react to changes in the structure of social security systems, what workers
expect from their governments. and how governments can meet these expec-
tations. The main policy pointers appear to be the following:
   .
     Keep payroll taxes low. Strengthening the links between contributions and
     benefits can improve labor market incentives somewhat, but higher payroll
     tax rates will offset these benefits.

     Keep benefits frugal. Public pension benefits should be small and secure, in
     order not to unduly discourage saving for old age while providing insurance
     against poverty in old age.
    Keep governn~entssolvent. Fiscal prudence is the most important rule for gov-
    ernments that wish to provide both a safe environment for private saving and
     reliable insurance against old-age poverty.



   While these lessons emerge from the experience in Latin America, they are also
consistent with fundamental principles of the economics of insurance. Ehrlich
and Becker (2000) and others propose that optimal insurance implies that rarer
and more idiosyncratic losses are better pooled, while frequent and more systemic
losses should be saved for. This principle can be applied to the losses associated
with old age. The blessing of rising longevity implies that losing the capacity to
earn is an increasingly frequent loss for individuals. The blessing of falling
poverty rates implies that being poor in old age is becoming an increasingly rare
loss. Rising longevity necessitates a shift to self-insurance or saving as the way to
smooth consumption over one's lifetime, while falling poverty implies a shift to
market insurance or pooling.
  The role of governments is to facilitate these actions by individuals to insure,
self-insure, and self-protect. Since there are relatively few serious impediments to
the ability of individuals to save for old age, the role for governments in encoura-
ging saving for old age should be secondary and diminish. In contrast, in the case
of poverty, because of the "social" nature of the loss being insured against and
well-known problems with insurance markets, the role for governments is
primary.
  Much of the discussion in any country should center on the role of govern-
ments in helping individuals save and smooth consumption over their lifetimes
and the need to help individuals insure against the losses associated with becom-
ing destitute in old age. While it is clear that the mainstay for consumption
smoothing should be individual saving, it is less clear what role the government
should play in getting individuals to save. In the case of destitution in old age,
however, the role for governments is clearer: it needs to provide an instrument for
insurance against the increasingly rare loss associated with falling into poverty.
  For various reasons social security systems have historically bundled these two
functions. The implication is that as the role of government in saving is scaled
down, the insurance function becomes more, not less, important. As Lindbeck
and Persson (2003, p. 60) note in their cross-regional survey of social security
reforms, "Reforms do not diminish the need for basic, or guaranteed, pensions.
Ouite the contrary; growing reliance on quasi-actuarial and actuarially fair
systems which in themselves do not encompass any systematic intra-generational


74                                      The World Bnrrk Rcsc~rrrchOhser~,e,:vol. 23, fro. I(Spring 1008)

redistributive elements, makes it even more imperative to maintain a safety net to
prevent poverty in old age."




Indermit S. Gill is the director of World Development Report 2009 at the World Bank; his email
address is Igill@worldbank.org. Ceren Ozer is an economist in the South Asia Poverty Reduction
and Economic Management Unit at the World Bank; her email address is cozer@worldbank.org.
Radu Tatucu is a junior professional in the East Asia Poverty Reduction and Economic Management
Unit at the World Bank; his email address is rtatucu@worldbank.org. The authors would like to
thank Jeffrey Brown, Peter Diamond, Andras Simonovits, and participants at a workshop at the
2006 American Economic Association meetings for useful comments, as well as three anonymous
referees for many suggestions that improved this article.
   1. Data were collected in specially designed surveys on risk, savings, and social insurance
(Encuestas de Prevision de Riesgos Sociales) conducted in Santiago. Chile, in January 2000, and in
Lima. Peru, in May 2002.




References

Arenas de Mesa. Alberto, and Carmelo Mesa-Lago. 2006. "The Structural Pension Reform in Chile:
   Effects, Comparisons with Other Latin American Reforms, and Lessons." Oxford Review of
   Economic Policy 22(1):149-67.

Asher. Mukul. Nicholas Barr. Peter Diamond, Edward Lim, and James Mirrlees. 2005. "Social
   Security Reforms in China: Issues and Options." Policy Study of the China Economic Research
   and Advisory Program. London School of Economics and Political Science. London. (http:/'/econ.
   Ise.ac.uk/staff/nb/index~own.html~

Barr. Nicholas. 2006. "Pensions: Overview of the Issues." Oxford Review of Econornic Policg 22(1):
   1-14.

Barr, Nicholas, and Peter Diamond. 2006. "The Econonlics of Pensions." Oxford Review of Economic.
   Policy 22(1):15-39.

Bourguignon. Franqois, Martin Cicowiez. Dethier Jean-Jacques. Leonardo Gasparini, and Pierre
   Pestieau. 2004. "What Impact Would a Minimum Pension Have on Old-Age Poverty? Evidence
   from Latin America." Paper presented at the conference "Keeping the Promise of Old Age
   Security." Bogota. June 22-2 3.

Ehrlich. Isaac, and Gary Becker. 1972. "Market Insurance, Self-Insurance and Self-Protection."
   Jollrnal of Political Econorny 80(4):623-48.

  .        2000. "Market Insurance. Self-Insurance, and Self-Protection." In G. Dionne, and S.
   Harrington, eds., Foundations of Insurance 1:'conomics: Reodings in Economics and Finance. Boston:
   Kluwer Academic Publishers.

Giambiagi. Fabio, and Luiz de Mello. 2006. "Social Security Reforms in Brazil: Achievements and
   Remaining Challenges." OECD Economics Department Working Paper 534. Organisation for
   Economic Co-operation and Development.Paris.

Gill. Indermit. Truman Packard, and Juan Yermo. 2005. Kerping thcl Promisr of Socirrl Security irr
   1,rrti11ilrnrricn. Stanford, CA: Stanford University Press for the World Bank.

Jousten, Alain. 2007. "Public Pension Reform: A Primer." IMF Working Paper 07/28. International
   Monetary Fund. Washington. D.C.

Lindbeck. Assar. and Mats Persson. 2003. "The Gains from Pension Reform." Journal of Economic
   Literutlrre 41(1):74-112.

Mesa-Lago, Carmelo. 2006. "Private and Public Pension Systems Compared: An Evaluation of the
   Latin American Experience." Review of Political Economy 18(3):317- 34.

OECD (Organisation for Economic Co-operation and Development). 2006. Reform and Cl~allengesfor
   Private Pensions in Russia. Private Pension Series 7. Paris. (http://www.oecd.org/document/40/
   0.2340.en~2649~34853~36734824~111~111,00.htm1).

Packard. Truman G. 2002. "Pooling. Savings and Prevention: Mitigating The Risk of Old Age
   Poverty in Chile". World Bank Working Paper No. 2849. Washington, (http://econ.worldbank.
   OrgJ.
Shah, Ajay. 2006. "Indian Pension Reform: A Sustainable and Scalable Approach." In David A.
   Kelly, and Ramkishen S. Rajan, and Gillian H.L. Goh eds., Managing Globalization: Lessoils from
   China and Indiu. Singapore: World Scientific Publishing Company. (http://www.mayin.org/
   ajayshah/PDFDOCS/Shah2005-sustainable-pension-reform.pdf).

lJnited Nations Secretariat. 2005. World Populution Prospects: The 2004 Revisiotr Highlights.
   New York: United Nations.

Whitehouse. Edward. 2007. Pensions P~~lnoramu:Retirement-lncomr S!ystetrls it! 53 Courltries.
   Washington, D.C.: World Bank.




                                              The World Hnrlk Resenrcl~Obscr~vr;vol. 2 1, no. 1 (Spring 2008)

Why OECD Countries Should Reform Rules
                                                 of Origin



                                      Olivier Cadot             Jaime de Melo


MTith preferential trade agreements on the rise worldwide rules of origin-which                                     are
necessary to prevent trade deflectioil-are                   attracting increasing nttentiot~.At the same
time, preference erosion for Generalized System of Preferences (GSP) recipients is
increasing      resistanc7e to further              nlultilateral       negotiations.      Drawing      on dijferent
approaches, this article shows that the current system of rules of origin that is used by
the European Union and the United States in preferential trade agreements (including
the GSP) and that is sirnilar to gistems used by other Organisation for Economic
Co-opemtion and Development countries should be drastically sitnplijied if developed
economies really want to help developing economies integmte into the world trading
s~stein.In addition to diverting resources for adininistrative tasks, current rules of
origin cart-:! significant compliance costs. Mort. fundamentally, it is becoming increas-
ingly clear that they are often been designed to force developing economies to buy ineffi-
cient intermediate products from developed economies to "pay for" preferential access for
the final product. The evidence also suggests that a significant share of the rents associ-
ated with market access (net of rules of origin compliance costs) is captured by developed
economies. Finally, the restrictiveipness of rules of origin is found to be beyond the levels
that would be justified to prevent tmde deflection, suggesting a capture by special interest
groups. The article outlines some alternative paths to reforms. JEL codes: F13, F15



Rules of origin are an integral part of proliferating free trade agreements-
countries belong to an average of six, according to a recent tally by the World
Bank (2005,table 2.1)-and                     nonreciprocal preferential trade agreements such as
the Generalized System of Preferences (GSP).~Given the lack of progress on har-
monization at the World Trade Organization (WTO)and given that regionalism is
here to stay, rules of origin are likely to be increasingly important in the world
trading system.


I  The Author 2007. Published by Oxbrd University Press on behalf of the International Bank for Reconstruct~onand
Ilevelopmcnt / TINw o ~ wHANK. All rights reserved. For permissions, please e-mail: journals.permissions~~ox~ordjo~~r~~als.org
doi:10.109 3/wbro/lkmO10           Advance Access publication October 4. 7007                                7 3:i7-105

   The primary justification for rules of origin in preferential trade agreements is
to prevent "trade deflection," or taking advantage of low external tariffs or weak
customs-monitoring capacities to bring in imports destined for more protected
markets in a trading bloc (possibly after superficial conditioning or assembly). In
effect, rules of origin are needed to prevent trade deflection for all preferential
trade agreements short of customs unions, where trade deflection is not an issue
because members have a common external tariff. Beyond the largely unimportant
issue of tariff revenue, what is at stake is the unwanted extension of preferences to
out-of-bloc producers, which would erode the value of those preferences to eligible
producers. In preferential trade agreements between developed and developing
economies, rules of origin are also sometimes justified on "developmental"
grounds because they can help foster integrated manufacturing activities in devel-
oping economy partners.
   However, this article provides evidence that, by their complexity, rules of origin
impose substantial compliance costs on preferred producers. For instance, in
addition to regime-wide rules, the European Union has more than 500 product-
specific rules of origin (Cadot, de Melo, and Pondard 2006). As a result, these
rules are increasingly difficult to observe. In the least developed economies the
rules divert scarce customs resources from other tasks such as trade fa~ilitation.~
In preferential trade agreements between developed and developing economies,
forcing developing economy producers to source relatively inefficient intermediate
goods locally or from developed economy partners rather than from the most
price-competitive sources (as in, say, Asia) increases inefficiency and raises costs.
The result is reduced value of preferences (compounding preference erosion in
particular for least developed economies) and rent creation for developed country
producers.
   This potential for rules of origin to become a form of "export protection" was
first observed by Krueger (1998) during negotiations for the North American
Free Trade Agreement (NAFTA). It applies to all preferential trade agreements
(including nonreciprocal preferential schemes) granted by Organisation for
Economic Co-operation and Development (OECD) countries to developing econo-
mies. Moreover, there is an overwhelming evidence that this protectionist effect of
rules of origin is not incidental but by design. Because rules of origin, unlike
more traditional forms of trade protection such as voluntary export restraints or
antidumping provisions, have so far largely escaped WTO disciplines: they are
thus potentially a choice instrument for creeping protectionism.
   New evidence reported in this article shows that the burden imposed by the
rules of origin applied by the two main protagonists in preferential trade agree-
ments, the European Union and the United States, is substantial whenever prefer-
ential margins are anything more than negligible. All told, the detailed evidence
gathered here suggests that the current system of rules of origin applied by


78                                      The W'orld Barlk Research Obser\'~~:VOI. 2 3, no. 1 (Spring 2 0 0 8 )

developed economies is out of hand and defeats both the spirit of reforms
aimed at bringing greater transparency to the multilateral trading system and the
development-friendly intent of preference schemes.
  In a recent communication, the European Union decided to consider simplify-
ing its rules of origin.3However, other OECD countries have so far refrained from
reforming their rules and have opposed any discussion of reform of preferential
rules of origin at the WTO. This article is a contribution to an overdue debate on
how to design benign, transparent, and WTO-compatible rules of origin.
  This article is organized as follows. The first section briefly recounts how
product-specific rules of origin are defined in EU and U.S. preferential schemes
and proposes an ordinal restrictiveness index summarizing their complexity. This
index is shown to be correlated with EU and U.S. most favored nation tariffs (and
thus, with the depth of trade preferences). The second section presents a simple
framework for quantifying the costs associated with rules of origin: distortionary,
administrative, and rent-transfer. The third section provides direct evidence of the
effect of rules of origin on preference use and rent sharing using preference utiliza-
tion rates and unit values. The fourth section qualifies the direct evidence by
considering the Asian exception and the natural experiment provided by compar-
ing the EU Everything But Arms initiative and the U.S. African Growth and
Opportunity Act (AGOA),which have similar tariff-preference margins but differ-
ent rules of origin. The fifth section provides further indirect evidence. The sixth
section draws policy implications from the article's findings and makes recom-
mendations for simplifying existing rules of origin.



Rules of Origin: Definition and Measurement

Rules of origin in preferential trade agreements have two components: a small set
of regime-wide rules and a large set of product-specific rules, typically defined at
the Harmonized System six-digit level of disaggregation (HS-6). Both rules
together are to ensure sufficient transformation. Because the European Union and
the United States are the main users of preferential trade schemes among OECD
countries, this article follows the approach of Cadot, de Melo, and Portugal-Perez
(2005), describing briefly the rules for NAFTA, which have been in place for a
long time and correspond closely to those applied by the United States in other
preferential trade agreements, and those for the European Union's "Pan-European
system (PANEURO),"also called the "single-list" because it covers the common set
of product-specific rules of origin that the European Union applies in all its prefer-
ential trade agreements (regime-wide rules differ across the European Union's
preference schemes such as the GSP or Cotonou Agreement). The analysis starts
with regime-wide rules then turns to product-specific rules of origin.

Regime-wide Rules

Regime-wide rules usually include five components (these and other terms are
defined in the glossary at the end of the article):

      A de rninirnis (or tolerance) criterion that stipulates the maximum percentage
      of nonoriginating materials that can be used without affecting the origin of
      the final product.
      A cumulation rule.
      A provision on whether "roll-up" applies.
      The status of duty drawbacks.
      The applicable certification method.

    Table 1 describes how these regime-wide rules differ between the European
Union and the United States.


Table 1. EIJ and U.S. Examples of Regime-wide Rules of Origin

Preferential trade   De rninirnis or tolerance  Absorption                   Drawback
agrrrrnrnt                     rule              (roll up)    C~itrl~ilatior~ allouvd        Certification rnethod

NAFTA               7 percent (except          Yes (except    Bilateral     Not after 7     Self-certification

                     agricultural and          autos)                        Yrs

                     industrial products),
                     7 percent of weight
                     for goods in chapters

                     50-63

United States-      10percent (except          Yes            Bilateral     Not             Self-certification

 Chile               agricultural and                                        mentioned

                     processed agricultural

                     products)

U. S.-GSP           10percent, 10percent       Not            Bilateral.    Not             Solf-certification
                     of weight for goods in    mentioned       limited       mentioned

                     chapters 50-63                            diagonal

Cotonou             15 percent                 Yes            Full          Not             Two-step private

 Agreement                                                                   mentioned       and public and
                                                                                             limited self-

                                                                                             certification

ELI GSP             10percent (except          Yes            Bilateral.    Not             Two-step private

                     goods in chapters                         limited       mentioned       and public and

                      50-631'                                  diagonal                      limited self-
                                                                                             certification

  Note: Classification is carried out at the six-digit Harmonized System tariff line level. Each cell is the
percentage of tariff lines that have the rules of origin in the corresponding row and in the corresponding
column.
  "Goods in chapters 50-63 (textilesand apparel) do not benefit from a de n~inirnisprovlslon.
  Source: Cadot, de Melo, and Portugal-Perez 2005, table 1.




80                                                 T ~ 12brld Bank Xesearcli Obsc~rvrr:wl. 23, no. I
                                                         P                                          (Spring 20081

   Even for regime-wide rules, table 1 gives the impression of "made-to-measure"
rules. It also shows that regime-wide rules differ across preferential trade agree-
ments for the same developed economy partner, confirming the hub-and-spoke
characteristic of preferential trade agreements between developed and developing
economies. Certification methods also differ between EU and U.S. preferential
trade agreements; certification is easier to carry out in U.S. agreements, at least in
principle, than in EU ones.



Product-SpecificRules of Origin

Devising methods for determining sufficient processing (or substantial transform-
ation) has turned out to be very complex in all existing preferential trade agree-
ments because the Harmonized System was not designed to define the origin of
goods. Three criteria are used by the European Union and the United States to
determine whether sufficient transformation has taken place in activities requir-
ing processing (that is, anything but crude products):

     A change of tariff classification (at various levels of the Harmonized System),
     meaning that the final product and its imported components should not
     belong to the same tariff classification (in other words, that the local PI-oces-
     sing should be substantial enough to induce a change of tariff classification).
     A critical threshold for value added (in short, a value content rule).
     A specific manufacturing process (a so-called "technical requirement").

   For crude products the typical rule is "wholly obtained," which permits no
foreign content whatsoever, although other rules apply in special cases, such as
fish products.
   Both NAFTA (whose rules are also used in other U.S. preferential trade agree-
ments) and PANEITRO have a long list of criteria-including           such technical
requirements as the "triple transformation" requirement in textiles and apparel,
which requires apparel to be woven from originating fabric and yarn. Criteria also
include exceptions (making them more stringent) and allowances (making them
less stringent). NAFTA relies more heavily on changes of tariff classification,
though often in combination with other criteria. PANEURO relies mostly on value
content and wholly obtained criteria, with wholly obtained criteria prevalent for
GSP and African, Caribbean, and Pacific (ACP) exports of primary products with
little processing.
   As Krishna (2006) points out, when analyzing rules of origin, the devil is in
the details because the complexity of rules of origin is what provides an opportu-
nity for special interests to influence their design and administration. While many
facets of rules of origin have been explored, rigorous empirical study of their

effects has been hampered by two difficulties, one relating to data on utilization
rates, the other to measurement of the rules' restrictiveness.
   First, data on preference utilization have been made freely available to the
public only recently for the United States but not yet for the European Union ( for
example, Brenton and Manchin 2003 and the studies collected in Cadot,
Estevadeordal et al. 2006).
   Second, because rules of origin are a set of complex, heterogeneous legal rules,
it has proved difficult to develop a reliable measure of their restrictiveness to serve
as a synthetic indicator (much like effective rates of protection are a synthetic
indicator of the restrictiveness of a country's trade regime). Estevadeordal (20001
has proposed an ordinal index of product-specific rules of origin restrictiveness (or
R-index), taking values between one and seven, with higher values corresponding
to more restrictive rules of origin. The index, constructed from a simple observa-
tion rule at the HS-6 level, where rules of origin are defined, is described below.
   The observation rule is as follows (Cadot, de Melo, and Portugal-Perez 2005).
Let CC stand for a change of chapter, CH for a change of heading, CS for a
change of subheading, and CI for a change of item. A change of classification at
the item level can be taken as less stringent than one at the subheading level, and
so forth. So the criterion for classifying changes of tariff classification criteria is



But a change of tariffclassification is often accompanied by one or two (in a few
cases even three) additional requirements, such as value content rules, technical
requirements, exceptions, or allowances. The observation rule assigns higher
index values to changes of tariff classification when these requirements are added
and lower ones in the case of allowances. For instance. a change of heading is
given an index value of four, which rises to a five when accompanied by a techni-
cal requirement or exception but shrinks to three when accompanied by an
allowance.
   Though not amenable to quantification as effective rates of protection, the
R-index plays the same analytical role; it is intended as an overall indicator of how
trade-inhibiting the requirements that must be met by a product to obtain origi-
nating status. There is preliminary evidence that preferences have hidden compli-
ance costs and that those compliance costs may be related to rules of origin.
Table 2 shows evidence for the textile and apparel sector under NAFTA, the EU
GSE and the Cotonou Agreement (which grants tariff-free access for most ACP
products to the EU market).4 Although NAFTA's and Cotonou's preference
margins are equal, at 10.4 percentage points, their utilization rates vary widely:
50 percent for Cotonou compared with 79.9 for NAFTA. Cotonou's low rate of
uptake despite deep preferences suggests hidden barriers. ACP countries benefit
from full rather than diagonal cumulation (that is, intermediate purchases from


82                                       T l i p llbrlrl Rrrtrk Rr'stv~r~flOhserver;~ 1 0 1 .23. no. 1 (Spring 2008)

                         -   --


Table 2. Preferences and Utilization Rates for Textiles and Apparel

                                            Number o j              Utilization rate       Prejerence margin
Prejerential trade agreement               observations                (percent)           (percentage points)
                 p~




NAFTA (2001)                            618                              79.9                    10.4

EU GSP (2004)                           16,555 (HS-8)                    52.2                      1.8
                                          12,920 (HS-6)

Cotonou Agreement (2004)                1.370 (HS-8)                     50.0                    10.4


  Note: Averages are unweighted. HS-6 is the six-digit Harmonized System level; HS-8 is the eight-digit
Harmonized System level.
  Source: Cadot, de Melo, and Portugal-Perez (forthcoming),table 3b.




Table 3. Preferences and Utilization Rates, All Goods

                                                                        PreJerence margin
                                                                                                --

Prejermtial trade agreement                        T2 4 percent"          T 2 8 perccnta    T 2 12 percenta

North American Free Trade ~ ~ r e e m e n t ~        87 (1.239)            86.0 (558)         82.8 (287)

GSP'                                               50.2 (1.297)            52.5 (91)          66.2 (44)

Cotonou AgreementC                                 92.5 (1,627)            94.3 (892)         96.4 (566)

  Note: Averages are unweighted. Numbers in parentheses are the number of tariff lines.

  a T; = (tyFN- tPREF)/(1 trREF)is the preference margin.
                            +
  b~omputedat the six-digit Harmonized System tariff-line level with 2001 data.
  'Computed at the eight-digit Harmonized System tariff-line level with 2004 data for 92 countries (GSP) and
37 countries (Cotonou Agreement) qualifying for prekrential market access.
  Source: Cadot, de Melo, and Portugal-Perez. (forthcoming), table 2 .




all partners qualify as originating) and a 15 percent tolerance rule compared
with only 10 for the GSE which also excludes the textile and apparel sector
(chapters 50-63) from the 10percent tolerance rule.
    Table 3 shows that the evidence of hidden costs goes beyond the textile and
apparel sector, where differences in uptake at similar margins may reflect compo-
sition effects. Define the preferential margin r by the normalized difference
between most favored nation and preferential tariffs




Table 3 shows that, contrary to expectations, when the preferential margin rises,
utilization rates fall for NAFTA. This suggests that an omitted variable is positively


Cadot and de Melo                                                                                            83

Table 4. Tariff Peaks and the R-index

                                                                     Restrictiveness-index. value

                                                North American Free Trark.Agroen~erlt                    RINEURO
-                                   --



Tariff peaks"                                                 6.2 (257)                                  5.2 (780)

Low tariffsb                                                  4.8 (1.432)                                3.9 (3,241)

Total number of tariff lines                                3.555                                      4,961

  Note: Numbers in parentheses are numbers of tariff lines. Restrictiveness indexes are unweighted.
  "Tariff lines whose tariffs exceed three tirnes the GSP average.
   Tariffs lines whose tariffs are less than one-third of the GSP average.
  Source: Cadot, de Melo, and Portugal-Perez 2005, table 3.




correlated with tariffs but negatively correlated with preference utilization. Rules
of origin are an obvious culprit.
   Table 4 shows that lines with tariff peaks (that is, with tariffs more than three
times the average), where preferential margins are highest, do have higher R-
index values than those with low tariffs. This relationship holds for both NAFTA
and PANEURO.
  Figure 1 confirms the patterns in tables 2-4; utilization rates do not really
increase with tariff-preference margins. For NAFTA, they actually decrease due
largely to the influence of the textile and apparel sector, where tariff preferences
are deep and rules of origin stringent.



Quantifying the Effects of Rules of Origin

Although product-specific rules of origin, as already noted, take a variety of legal
forms (changes of tariff classification, value content rules, technical requirement,
and the like), they can all be represented conceptually as floors on domestic value
added. Suppose that a producer in Madagascar wishes to sell a shirt under prefer-
ential access in the European Union, this shirt is made with both originating
intermediate goods (that is, intermediate goods that are either local, EU-made, or
imported from other qualifying countries, according to cumulation rules) and
nonoriginating intermediate goods, say from Bangladesh, China, or India. Now
assume that to satisfy origin requirements (whether change of tariff classification,
value content rule, or technical requirement), the Malagasy producer uses a
higher proportion of originating inputs than would be the case in the absence of
rules of origin (which is precisely the rule's purpose).
  Let superscript R denote a choice restricted by rules of origin. Unrestricted
value added is vai, and restricted value added is vay, so rules of origin content


84                                                    T11cIthrld Bank Research Ohsrrwr: lid. 2 3, no. I(Spring 2008)

Figure 1. Average Utilization Rates for Different Preferential Margin Thresholds




                                          .-
                                          I                               +CotonouAgreement

                                                20  '1                         GSP                 H
                                                                               NAFTA

                                                 O        ,    ,     ,       l      ~      ~     l   ~     I     l


                                                             Average utilization rate (percent)


  Note: Cotonou Agreement includes 37 countries, computed at the eight-digit Harmonized System level: GSP
includes 92 countries, computed at the eight-digit Harmonized System level; and the NAFTA includes 3
countries, computed at the six-digit Harmonized System level. Data are unweighted averages computed at the
most disaggregated tariff-line level (table 2). Averages are based on more than 100 observations except for GSP
(minimum of 27 observations for preference margins. 7, equal to or greater                     than LO percent.).
  Source: Cadot, de Melo, and Pondard 2006.




reduces to va; > vai,whether or not it explicitly takes the form of a value content
rule. Thus, conceptually a value content rule can be thought of as a generic rule
that can play the role of all others by quantifying the objective common to all.
This principle is important because it underlies an approach to rules of origin
reform, discussed later, that substitutes a value content rule-possibly,                             although
not necessarily, at differentiated rates across products-for                            the current array of
instruments. It also highlights how information on rules of origin restrictiveness
can be aggregated across instruments and subsumed into a single restrictiveness
index. which itself can be then aggregated across product lines by averaging.
   Five results emerge from the quantitative analysis of the relationship between
rules of origin restrictiveness and preference uptake:


For a given preference margin a higher restrictiveness index translates into a
   lower utilization rate, all other things being equal.
For a given restrictiveness index a higher tariff-preferencemargin translates into a
   higher utilization rate, all other things being equal.

The compliance decisions of individual firms are binary; how the decisions aggre-
  gate into industry-wide utilization rates depends on the unobserved distribution
  of compliance costs.
A lower pass-through of tariff preferences for the least developed economies (due
  to low bargaining power) implies lower uptake of preferences, all other things
  being equal.
Improvements in the uptake of preferences can be obtained either from reductions
  in the restrictiveness of rules of origin or from cost-reducing administrative sim-
  plifications (such as transparent and uniform criterion).


  The third result implies that the statistical relationship between R-index values,
preference margins, and utilization rates can only be "noisy" (that is, affected by a
large unexplained component) at the aggregate (product-line) level. But notwith-
standing the noise introduced by unobserved firm characteristics (which could be
investigated only with firm-level data that are not currently available), figure 1
suggests an unambiguous relationship between preference margins, rules of
origin restrictiveness, compliance costs, and utilization rates. It also suggests that,
without a proxy for rules of origin restrictiveness such as the R-index, attempting
to evaluate the effect of tariff-preference margins on the uptake of those prefer-
ences may lead to omitted-variable bias.
  Keeping in mind that this framework captures only some of the effects associ-
ated with rules of origin, several observations are in order.5 First, administrative
costs act as a technical barrier to trade; they result in resource waste, and in the
welfare calculus of the effects of rules of origin they are more costly than the
usual deadweight losses. Second, if costs are associated with certification, requests
for preferential status would not be observed when preference margins are low.
Third, compliance costs are particularly high for differentiated products, for
which there can be quality as well as price differences between eligible (local)and
noneligible intermediate goods. Because part of those costs is passed on to consu-
mers in the countries that determine the rules of origin, high utilization rates
does not necessarily imply that rules of origin have small effects.
  Stiff rules of origin may inhibit or deflect trade altogether, not just the uptake
of preferences. This was shown in the case of the Europe Agreements, free-trade
agreements signed in 1991 between the European Union and the Central and
Eastern European countries. Tumurchudur (2007a) showed that a large share of
the exports from Central and Eastern Europe was deflected from EU markets by
rules of origin, resulting in heavy losses. Evidence of trade-inhibiting effects is
also apparent in the evolution of textile and apparel exports under AGOA and the
Everything But Arms initiative, which is discussed in the exception and quasi-
natural experiment section below.



                                        The World Rank Research Observer; vol. 2 3 , no. 1 (Spring 2 0 0 8 )

Direct Evidence

In the absence of firm-level data Carrkre and de Melo (2006) assume that the pre-
ference utilization rate for product line i (the percent of exports sent under the
preferential regime rather than the most favored nation one), referred to as Ui,
rises with the tariff-preference margin, ri (which may be just equal to the most
favored nation tariff when preferential access means tariff-free access) and shrinks
with rules of origin compliance costs c" That is, Ui =f(ri          - cf) where f(.)is an
increasing function, and, c; = (Rooi),where g(.) is an increasing function (true
compliance costs are firm-specific and are thus unobserved; all that is observed is
the presence of Rooi).These assumptions lead to an estimable relation of the form




where Rooik is a set of dummy variables indicating the presence of product-specific rules
of origin (change of tariff classifications, exceptions, and so on). Results from estimating
equation (3) on NAFTA data confirm that utilization rates rise with preferential margins
and shrink in the presence of rules of origin (Cadot, de Melo, and Portugal-Perez 2005 for
results using data for the European Union).
   Carrere and de Melo (2006) combined their estimates with R-index values to
compute an estimated ad valorem equivalent of total rules of origin compliance
costs (administrative costs and costs due to higher input costs). Their estimates
range from 3.5 percent for a change of chapter to more than 15 percent for com-
binations of rules of origin involving technical requirements. The strongly inhibit-
ing effect of technical requirements appears to be an empirical regularity.
   Even if the estimates are robust to a range of specifications, it is difficult to infer
a sense of robustness from estimates derived from a relation like equation (3)
because so much heterogeneity and so many "unobservables" influence prefer-
ence uptake. Estimates have proved fairly sensitive to the inclusion of control vari-
ables, in particular when using EU GSP data.
   An alternative is to restrict the analysis to products, for which the sole criterion
used to determined origin is a value content. Drawing on the variation in EU
value content criteria across product lines with value content the sole criterion,
Cadot, Carrere, and Strauss-Kahn (2007) estimate an equation similar to
equation (3), in which however the dummy variables for rules of origin are
replaced with the continuous value content rule value^.^ Using dummy variables
for Harmonized System sections to control for heterogeneity across sectors and
restricting the sample to tariff lines with substantial tariff-preference margins
(above 2 or 5 percent), they find that utilization rates rise, all other things being
equal, with the maximum foreign content allowed by the value content rule.


Cadot and de Melo

Table 5. Estimated Effects on Preference Utilization and Rent Transfer of Relaxing a Value
Content Requirement



                                                   7,2 2'Y"          Tj2 5%"            rj 2 2%          Ti 2 5%"
                                                  ACP +GSP         ACP+ GSP               GSP              GSP

Number of observations                             19.261             5,958

Mean preferential margin    (;i)
                               (percent)             3.74              5.14

Mean utilization rate (percent)                      0.12              0.17

Mean value content (percent of unit price)          58.8              58.2

Mean value of imports (euros)                     1,475,182        2,376.301
Simulation: local content requirement reduced by 10 percentage points

Change in preference utilization rate                2.0               5.2

 (percentage points)

Total rent transfer from increased                  21. 7             37. 4

 utilization (millions or euros)"
                                                                                              - -


  aEvaluatedat the mean value of imports.
  Source: Authors' computations based on Cadot, de Melo, and Portugal-Perez (Torthcoming),table 6.




   Since a single value content criterion is a serious candidate for reform, at least
in the case of the European Union (Stevens et al. 2006 and Cadot, de Melo, and
Pondard 2006), table 5 reports two illustrative simulations based on these esti-
mates. The mean local content requirement is 58 percent and preference margin
3-5 percent depending on the sample; mean utilization rates are rather
low-between         12 and 22 percent. The bottom of the table shows the first-round
effects (no supply response) of reducing the local-content requirement by 10 per-
centage points. Utilization rates rise by 2-5 percentage points (row 6), raising the
rent transfer by 21 -3 7 million, for a mean value of imports of 1.5-3.0
billion.
   To fully grasp the welfare effects of rules of origin, the rent distribution between
the exporting and importing country must be factored in. This implies estimating
the pass-through effect of tariffs on consumer prices (that is, the extent to which
preferences translate into a higher producer price for exporters). Estimates for
AGOA preferences (Olarreaga and 0zden 2005) and for the Caribbean
Community (0zden and Sharma 2006) are that between one-third and one-half
of tariff reductions are passed on to producers.
   However, part of the border-price increase could reflect the compliance
costs discussed above. Using a monopolistic-competition model with differen-
tiated products in which Mexican exporters can export product j either to the
rest of the world (under most favored nation status, at price                                py) or      to the


88                                               The World Bank R~~searchObserver, w l . 2 3, nu. 1 (Spring ZOOX)

Table 6. Exports. Unit Costs, and Prices under Preferential Market Access arid a Binding
Minimum Local Content Requirement

                                                                               Sinrlllations


                                                      ( 1 )        (2)            (31"         (4)            (5)
                                                                       ppppp---




Preference margin (percent)                          1 0          10             10            1I)          10

Administrative unit costs (percent of                  0           0               (1           2.5           1.O

 unit price)

Ilnconstrained, minimum local content                40           40             40            40           36

 requirement (percent of unit price)

Constrained, minimum local content                                50             50            50           40

 requirement (percent or unit price)

Preferential exports (percent change                 15.9         11.1           -0.1 5         7.1         10.7

 from scenario with no preferential

 access)

lJnit costs (percent change from                       0            1.9            6.7          1.9           0.4
 scenario with no preferential access)

IJnil net price (percent change from                   2.9          3.0            3.4          2.2           2.1

 scenario with no preferential access)

  no ti^: Unit net price set equal to 1, initial output to 100, and value-added to 20. All output is exported (40
percent to preference-receiving destination). For columns 1-4 nonoriginating inputs are set to 75 percent oT
intermediate good input purchases. This implies that initial (unconstrained) local content is 20  +0.25   ( 8 0 )=
40. Setting the minimum local content requirement at 50 percent implies reducing nonoriginating intermediate
goods to 62.5 percent of intermediate good purchases. For colunlrl 5 nonoriginating inputs are set at 80 percent
and reduced to 75 percent through the minimum local content rule.
  'Same as column 2 but with low value for the elasticity of substitution between originating and
nonoriginating materials (0.5 instead of 2).
  So~rrcc~:Authors' cornputations adapted from model in Cadot rptrrl. 2005.




lJnited States (under NAFTA, at price                       py), Cadot      et al. (2005) estimate the
following relationship

                        NAFTA markup = a,, a, ~j     +       +alCCj+a3TECH,+
                                                                                            E,                (4)

where "NAFTA markup" is the percentage by which Mexico's NAFTA shipment prices
are raised over comparable most favored nation shipment prices, CCj is a dummy vari-
able marking a change of tariff classification at the chapter level, and TECHj is a

dummy variable marking a change of a technical requirement.
    When estimated at the HS-8 level, equation (4) is the best tool to compare
prices in different markets. With complete pass-through (k= 1 in equation (4)
the estimated coefficient for a1 would be close to one, but Cadot et a]. (2005)
find it substantially below one. They also obtain negative and significant estimates
for (a7,a3)indicating that rules of origin costs are at least to some extent passed
on to consumers. Once rules of origin are taken into account, the backward

pass-through of preferences to producer prices falls from 80 percent of the margins
to only 50 percent. They also show, using input-output links, that U.S. producers
of intermediate goods are able to retain a substantial share of the rents generated
by rules of origin downstream. That is, stiff rules of origin on, say, Mexican shirts
exported to the United States significantly raise the price of fabric exported by the
United States to Mexico for use in those shirts. This reflects the fact that rules of
origin create a captive market for U.S. intermediate goods.



An Exception and a Quasi-Natural Experiment

The covariation of utilization rates and margins does not account for all the
effects of rules of origin. Case studies such as those reported in Cadot, de Melo,
and Pondard (2006) and Stevens et al. (2006) provide useful complementary evi-
dence, although they conclude that each case is different, thereby explaining if
not justifying the current maze. An exception and a quasi-natural experiment
are drawn here, with both suggesting that rules of origin are, as they stand,
unnecessarily restrictive.


Asian Exception

In a world where rules of origin are as cumbersome and complicated as they are
(Estevadeordal and Suominen 2006 for a detailed description), Association of
Southeast Asian Nations (ASEAN), Free Trade Area (AFTA) and the ASEAN-
China Free Trade Area (ACFTA) stand out as exceptions. To obtain originating
status (that is, to fulfill the criterion of sufficient processing), either the wholly
obtained criterion (for a few agricultural products) or a single-value content rule
requiring 40 percent local content (for most products) is used. This rule has been
relaxed by allowing a choice between criteria for countries that found it too con-
straining. For instance, under ACFTA the importer can choose a change of tariff
classification can be used as an alternative to the 40 percent local content for
obtaining origin for leather goods, and some specific process criteria are also
accepted for some textile products.7
   So why are rules of origin under AFTA less stringent than elsewhere? First,
until recently Asian regionalism was more about cooperation than about prefer-
ential trade. Under the aegis of the United States, Asia-Pacific Economic
Cooperation was set up specifically to avoid preferential trade and the formation
of an Asian trade bloc. Much of the region's integration in the world economy
has been driven by unilateral tariff reductions. Second, regional trade has made
possible the rise of the Asian manufacturing matrix in which labor-intensive
stages of production initially carried out in Japan-and             later in the Republic of


90                                       The World Bank Research Observe&vol. 23. no. 1 (Spring 2008)

Korea-were       outsourced to the region's lower wage countries. The resulting
regional production networks have contributed to the price-competitiveness of
Asia's exports, which has benefited the whole region. Stiff rules of origin would
have jeopardized this successful model.
  This 'asian exception" has been conducive to the successful development of
Asian countries that have fully participated in "verticalizing" trade (the develop-
ment of cross-border supply chains generating trade in intermediate products). In
this unusual setup (relative to other global trading patterns), intraregional trade
in politically sensitive final products where protection is highest was insignificant.
Thus, the political-economy forces that would usually lead to the complex rules of
origin observed elsewhere have not been at work so far. As a result, low-income
countries such as Cambodia and Lao PDR have been able to participate in the
fragmentation of production according to comparative advantage.8 Arguably,
Asia's simple and uniform rules of origin requirement is an example of the kind
of rules of origin that would really be development-friendly.


AGOA and Everything but Arms: a Natural Experiment

In the textile and apparel sector, the choice area for obscure and trade-inhibiting
rules of origin, the one notable exception is the U.S. preferences granted to 22
Sub-Saharan African least developed economies under AGOA. Thus, comparing
African apparel exports to the European Union and the United States provides a
quasi-experimental situation in which the effects of rules of origin on the uptake
of trade preferences are analyzed. This quasi-experimental situation, first studied
by Brenton and 0zden (2005), comes from the combination of different rules of
origin with very similar rates of preference margins (textiles and apparel receive
approximately the same protection in the EU and U.S. markets. In 2001 the EU-
15's most favored nation tariff was 10.1 percent compared with 11.7 percent for
the United States, and duty-free access applied to both Everything But Arms eli-
gible and the 34 AGOA-eligible African countries).
   To qualify for preferential access to the U.S. market, an exporter must prove
that the garments are produced, cut, and sewn in the area benefiting from prefer-
ential access (here, AGOA). Cotton products must be made from originating
fabric, yarn, and thread, with diagonal cumulation somewhat relaxing the
requirement, since fabric originating in other member countries qualifies.
However, this rule, known as "the triple transformation" rule, was relaxed for 22
least developed economies under AGOA's "special regime," which permits the use
of third-country fabri~.~That is, the special regime reduces the transformation
requirement to a single transformation (from fabric to garment).
   Fifteen of AGOA's special regime beneficiaries are also eligible for the European
Union's Everything But Arms initiative. But no such relaxation applies to exports


Cadot and de Melo                                                                  91

to the European Union under either the Cotonou Agreement or Every But Arms
preferences. EU rules of origin for apparel require production from originating
yarn, which implies a "double transformation" from yarn to fabric and from
fabric to clothing. The European Union's "double-transformation" rule obviously
males compliance difficult for countries that have no textile industry. Small or
poor countries that cannot profitably produce fabric-weaving                                   is a capital-inten-
sive activity involving expensive machinery, particularly for woven products-
should not, from an economic-efficiency viewpoint, set up the vertically inte-
grated local value chains that would satisfy the double-transformation rule.
   In apparel preference utilization rates are very high under both AGOA (97.36
percent in 2004) and Everything But ArmslCotonou (94.9 percent). Cotonou has
rules similar to those that Everything But Arms has for apparel. However, export
volumes evolved quite differently for the 15 least developed economies that benefit
from both schemes. Figure 2 shows a substantial increase in the value of apparel
exports with AGOA's entry into force in 2000 (in particular for Lesotho and
Madagascar). By contrast, the value of exports from this same group of countries
did not rise following the adoption of Everything But Arms-in                                   fact it fell slightly.
Of course, the exports that remained flat for those countries should come as no
surprise since they already benefited from Cotonou preferences, which give almost


Figure 2. Apparel Exports of 22 Countries Benefiting from the AGOA Special Regime, 2004



                                              countriesa
                                     - - + - - U.S. imports from 7 top
                                              exportersb

                                     +         EU imports from 22
                                              countriesa
                                               EU imports from 7 top
                                              exportersb




  Note: "Benin. Botswana, Cameroon. Cape Verde. Ethiopia. Ghana. Kenya. Lesotho. Madagascar. Malawi. Mali.
Mozambique. Namibia. Niger, Nigeria. Rwanda, Senegal. Sierra Leone. Swaziland. Tanzania. Uganda, and
Zambia. '~otswana. Cameroon. Ghana. Kenya. Lesotho. Madagascar, Namibia. Nigeria, and Swaziland.
  Source: Portugal-Perez (2007)based on the WTO Integrated Data Rase"

                                         -




92                                                   Tltr M'orld Bank Resc~crrclrOl,serv~,r;vol. 2 3 , no. 1 (Spring 2 0 0 8 )

as much access as Everything But Arms (with slightly more lenient rules on
cumulation). In effect, nothing changed for them on this front, and along with
other ACP countries they largely continued to request access under Cotonou, with
which they were familiar, rather than Everything But Arms. But AGOA's special
regime did not merely trigger a catch up of U.S.-bound exports toward already
high levels of EU-bound exports: it dwarfed them. Thus, unlike AGOA's special
regime neither Cotonou nor Everything But Arms appeared to have offered a pre-
ference mix (tariff preferences and rules of origin) conducive to export growth.
   Because the data in figure 2 are computed at the HS-6 product level, it is safe
to assume that heterogeneity in export composition is largely controlled for. This
is confirmed by formal econometric evidence. In a model that controls for differ-
ences in preference margins and for demand shifters in the EU and U.S. markets,
Portugal-Perez (2007) finds that relaxing rules of origin for apparel (captured by
a dummy variable corresponding to the introduction of the AGOA's special
regime) raised apparel exports significantly for beneficiary countries. Because the
special regime was not introduced in the same year for all countries, its effects are
well identified statistically, and Portugal-Perez' results strongly suggest that the
difference in performance apparent in figure 2 is indeed attributable to differences
in rules of origin regimes.
   AGOA's special regime seems to have encouraged growth not only at the "inten-
sive margin" (higher volumes) but also at the "extensive margin" (diversification
by addition of new products). As new products were exported to both countries
(an active extensive margin), the rate of increase in new products was several
orders of magnitude higher for the US.-bound goods than for EU-bound ones,
which is an important achievement. Product diversification is one measure of
industrialization, particularly at early stages of the economic development process
(Cadot, Carrere, and Strauss-Kahn 2007 and references therein). Controlliilg for
other factors, countries that have a more diversified industrial base enjoy less
volatile growth and are better poised to absorb shocks. Only three countries
in Sub-Saharan Africa-Lesotho,       Madagascar, and Senegal-export      more than
50 products to either the European Union or the United States. Thus, if the devel-
opment objective of rules of origin is to be taken seriously, encouraging export
growth at the extensive margin is important, and in this regard Everything But
Arms and Cotonou's performance are again disappointing compared with that of
AGOA's special regime.
   Taken together, the brief discussion here on the Asian exception and the com-
parison of AGOA with the Everything But Arms initiative suggests two results:


Limited differences between preferential regimes can have drastic effects on their
  performance; AGOA's relaxation of the triple transformation rule gave a signifi-
  cant boost to Sub-Saharan African apparel exports.


Cndot and de Melo

Utilization rates are an incomplete measure of the performance of preferential
  regimes, as the inhibiting effect of stiff rules of origin can be felt on trade
  volumes as well.




Indirect Evidence

Taking inspiration from the early work by Herin (1986) for EFTA, Cadot, de Melo,
and Portugal-Perez (forthcoming) applied revealed-preference arguments to esti-
mate upper and lower bounds of compliance costs. Arguably, this nonparametric
approach could be more robust than the parametric evidence reported above. By
revealed preference, for products with 100 percent utilization rates the net benefit
of preferences is positive for all firms. Since everyone uses the preferences, the ad
valorem equivalent of compliance costs cannot be larger than the tariff-preference
margin. Conversely, for products with zero percent utilization rates, since no one
uses the preferences, the compliance cost cannot be smaller than the preference
margin.
  For remaining sectors (those with utilization rates between 0 and 100 percent)
the story is more complicated because of firm heterogeneity, so assumptions must
be made. Cadot, de Melo, and Portugal-Perez (forthcoming) argue that, firm het-
erogeneity notwithstanding, the average exporter (in terms of compliance costs) is
not too far from indifference between the preferential and the most favored nation
regimes, which means that the compliance cost is about equal to the tariff-
preference margin. Applying this reasoning gives trade-weighted ad valorem esti-
mates of 4.7-8.2 percent depending on sectors for PANEURO and 1.8-1.9
percent for NAFTA-values      in line with the econometric estimates of Carrere and
de Melo (2006) reported earlier.
  How then should requests for preferential status be interpreted when tariff pre-
ferences are nil? Beyond (likely) errors in data transcription, the logical possibility
would be that administrative costs are negligible, but this contradicts the evidence
(the nonparametric approach described in the previous paragraph gave estimates
of pure administrative costs slightly above 3 percent in ad valorem form).
Francois, Hoekman, and Manchin (2006) elegantly addressed this problem by
modeling the determinants of utilization rates for EU trade with ACP countries in
a switching-regression framework where the relationship between the variable of
interest (utilization rates) and explanatory variables varies between two regimes:
one for low-margin sectors and the other for high-margin ones. The dividing
point between the two regimes is determined by the data using an algorithm
developed by Hansen (2000).'~They found that exporters start requesting prefer-
ences when preferential margins are in the 4.0-4.5 percent range, a result that


94                                       The World Bank Research Observer; vol. 23. no. 1 (Spring 2008)

is also broadly consistent with the nonparametric estimates of compliance costs
reported above.
   Other studies using aggregate bilateral trade data also suggest costs associated
with the presence of rules of origin. Using a gravity model of bilateral trade,
Anson et al. (2005) find that after controlling for the other determinants of the
volume of bilateral trade, including the presence of free trade agreements, the
intensity of bilateral trade is inversely related to the values taken by the R-index.
Using a similar framework, Augier, Gasiorek, and Tang (2005) find that the
volume of bilateral trade is lower when cumulation is on a bilateral rather than a
full basis, leading them to suggest that rules of origin should be relaxed to allow
for full cumulation.
   The evidence reported so far in this article is overwhelming: rules of origin are
burdensome and foster economic inefficiency. But this article also argues that
they have a role in combating trade deflection, so calling them trade barriers is
not enough. To make progress in designing "clean" rules of origin, a key part of
the argument is to tell apart, in their current characteristics (and in particular
their restrictiveness), how much is attributable to their antideflection role com-
pared with how much is simply capture by special interests. Portugal-Perez
(2006) tries to address this issue by decomposing variations in the R-index into a
component attributable to trade deflection and one associated with lobbying or
political-economy motives. He estimates this decomposition for Mexican textile
and apparel exports to the United States under NAFTA using the following
equation



where (Rooi)is R-index values at the HS-6 level. The regressors are the trade deflection
vector, which includes a proxy for the extent of product differentiation (the more homo-
geneous the product, the more there is to gain from arbitraging even small differences in
external tariffs), and differences in external tariffs (the larger these differences the more

there is to arbitrage). Political-economy variables including the level of the United States'

most favored nation tariff (a proxy for lobbying power) revealed comparative-advantage
indexes and the value of Mexican exports to the rest of the world (a proxy for potential
penetration of the U.S. market).
   Portugal-Perez finds strong and quite robust correlations, suggesting that both
sets of factors are at work in explaining cross-sectoral variations in rules of origin
restrictiveness. Using estimated parameter values, he constructs a counterfactual
distribution of R-index values across goods in the absence of political-economy
correlates (that is, by setting y = 0 in equation (5).The two distributions (actual
and counterfactual) are reported in figure 3. They show that political-economy
concerns (which shift the actual distribution to the right of the counterfactual)
contribute to the overall restrictiveness of rules of origin. Drawing on the


Cadot and de Melo                                                                         95

Figure 3. Counterfactual Distribution for R-Index




  Source: Portugal-Perez 2006, figure 3.




estimates discussed earlier by Carrere and de Melo (2006), he concludes that
capture by special interests may have raised the costs of rules of origin an average
of 3.5- 11 percent of good value, a very steep increase in the face of the shallow
preferences that are generally granted.
   Simulation methods provide another way of obtaining orders of magnitude of
rules of origin effects on trade. Francois, Hoekman, and Manchin (2006) use
their estimate of compliance costs to simulate the effects of trade liberalization by
developed economies on low-income countries in a multiregional trade model.
Despite preference erosion, low-income countries gain instead of losing from trade
liberalization by the European Union because the "rectangle" deadweight losses
associated with compliance costs are eliminated.
  Table 6 provides alternative estimates from a partial-equilibrium perspective,
taking as an example a GSP country benefiting from a 10 percent preferential
margin in the EU (or U.S.) market (row 1)but forced to raise its minimum local
content from the value in row 3 (40 percent, except in column 5) to the value in
row 4 (50 percent, except in column 5). When present, administrative costs, also
expressed as a percentage of the unit price, are given in row 2. The table's bottom
three rows show the effect of rules of origin on equilibrium exports and prices.
  Column 1 shows the benefits that accrue to the GSP producer from receiving
a 10 percent preference margin with no constraint on the sourcing of inputs. For
this constellation of elasticity (all are on the high side to reflect the likelihood


96                                          T l ~ cWorld Bunk Kest,arch Observrr.1~01.2 3, no. 1 (Spring 2 0 0 8 )

that products from different origin are close substitutes, whether at the intermedi-
ate- or final-good level), the pass-through is 2.9 percent (row 7) out of a prefer-
ence margin of 10 percent, in line with econometric estimates mentioned in the
section on direct evidence. Exports increase by 16 percent, but costs do not
increase because inputs are bought at constant world prices.
   Column 2 shows what happens when the producer must reduce the use of
nonoriginating materials to meet a value content rule of 50 percent (a 25
percent increase from column 1).For the example, where value added is 20
percent and unconstrained purchases of nonoriginating intermediate goods equal
75 percent of the value of total intermediate good purchases, raising the
minimum local content from 40 to 50 percent implies that purchases of nonorigi-
nating intermediate goods must be reduced to 62.5 percent. The result of forcing
producers to shift away from preferred intermediate goods is a higher unit pro-
duction cost resulting in lower export volume, with the 1.9 percent increase in
unit cost passed on to EU and U.S. consumers. Matters get worse if substitution
possibilities for materials from different origins are low (column 3), which might
be the representative of industries with a lot of transformation and many pro-
duction stages.  ''
   Column 4 mirrors column 2 but adds administrative compliance costs of
2.5 percent. This further penalizes the GSP producer, even though part of this
cost increase can again be passed on to consumers in the importing country. Of
course, if GSP producers were competing with close substitutes, they would be
unable to pass on the price increase. Finally, column 5 considers a simulation
that might be fairly representative of an industry with enough originating inter-
mediate good purchases that the shift to a 40 percent minimum local content
would not affect producers much. In this case, the net price to producers might
go up by about one-third of the preference margin, resulting in a modest supply
response of about 1 0 percent.



Implications for Reform

If rules of origin are a legitimate way to prevent trade deflection by mandating
that sufficient processing take place in the preferential zone, the accumulated evi-
dence reported in this article indicates that they have gone vastly beyond that
role, becoming akin to technical barriers to trade. Various estimates suggest that
the compliance costs associated with meeting origin requirements in preferential
trade agreements range between 3 and 5 percent of final product prices-a       very
stiff price tag for preference margins that are often thin, given that most favored
nation tariffs are low in most sectors except textiles and apparel. Controlling
for preferential margins. utilization rates are lower in product lines with more

restrictive rules of origin and when producers are limited in the sourcing of their
intermediate good purchases.
   Because of their trade-inhibiting effects, rules of origin hinder the integration
of preference-receiving least developed economies in the world economy and thus
work at cross-purposes with the development-policy goals of EU and U.S. prefer-
ences. For Sub-Saharan African countries supplying apparel products to the
European Union, even high utilization rates hide obstacles to export growth
caused by the double-transformation requirement imposed on those products.
   This article also shows that in the case of the European Union and the United
States, the two largest users of preferential trade agreements, rules of origin are
stricter for products with tariff peaks where preferences could be most valuable.
The correlation between the presence of tariff peaks and that of highly restrictive
rules of origin suggests capture by protectionist interests, a hypothesis largely con-
firmed by political-economy theory and evidence. Moreover, because rules of
origin have so far escaped WTO disciplines-whereas               other, more traditional
trade-policy instruments are brought under increasingly stringent ones-they
stand as a choice candidate for creeping protectionism.
   Despite the prevalence of capture by special interests, two quasi-natural
experiments point to broad directions for reform. First, the relaxation of the U.S.
triple-transformation requirement in textile and apparel for Sub-Saharan
African producers under AGOA has proved to strongly encourage export diversi-
fication and growth compared with exports destined to the European Union,
which are subject to stricter rules under the Everything But Arms initiative
(which otherwise features similar preference margins). Second, low-income
Asian countries operating under simple and benign rules of origin have been
able to rapidly integrate themselves into cross-border supply chains and have, as
a result, tremendously benefited from the verticalization of world trade.
   These observations suggest that a multilateral agenda for preferential rules of
origin reform, a key step in bringing preferential trade agreements under WTO
disciplines, would have to move along three dimensions: harmonization, simplifi-
cation, and relaxation. Harmonization between trading blocs, although unlikely
to be attained anytime soon, is desirable in view of the "spaghetti bowl" of prefer-
ential trade agreements and is a prerequisite for simple and mutually consistent
cumulation rules. The European Union has set an example in this regard with
the PANEURO system, designed precisely to facilitate cumulation across preferen-
tial zones.
   For simplification arguments in favor of a single across-the-board rule are
much like those in favor of uniform tariffs-that    is, simplification fosters transpar-
ency and mitigates capture. Clearly, technical requirements should be targeted for
elimination first because they are the most opaque, difficult to harmonize, and
capture-prone instruments. Leaving aside agricultural products that could still


98                                    The World Bank Research Observer; vol. 23. no. 1 (Spring 2008)

operate under the wholly obtained criterion and keeping in mind that any
uniform rule will affect industries and countries differently, two avenues could be
considered; a simple change of tariff classification, say at the subheading (HS-6)
level so that it is not too restrictive or a uniform value-content rule.
   Some information can be gleaned in this regard from the European Union's
recent review. The change of tariff classification has the advantage of simplicity,
transparency, and low administrative costs. But the Harmonized System tariff
nomenclature was designed to collect trade statistics, not to separate products and
confer origin, so defining the change of tariff classification at a uniform level
would produce erratic results across sectors. This would call for exceptions to uni-
formity, opening up a Pandora's Box of special deals. Moreover, a change of tariff
classification that would not easily lend itself to differential treatment for least
developed economies should be an objective (see below).
   Notwithstanding conceptual clarity, a value content rule may be less than
straightforward to apply in practice.12 It may increase producer risk due to the
sensitivity of costs to exchange-rate. wage, and commodity-price fluctuations and
is also burdensome to apply for customs officials. However, it is simple to specify
and transparent, and it allows for differential treatment of least developed econ-
omies. All told, if properly specified, it is the best candidate for an across-the-
board criterion, ideally in combination, at the exporter's choice, with a change of
tariff classification. In this spirit Tumurchudur (2007b) estimated for each good
the maximum foreign content that would make a value content rule equivalent
to the current array of NAFTA's rules of origin. Her method consisted of three
steps.
   First, she estimated the statistical relationship between utilization rates and
rules of origin, including value content rules. Second, she inverted that relation-
ship to find the rate of a value content rule that would give a utilization rate
equal to the current one. Third, she calculated the trade-weighted average of that
maximum content. This neutral average turns out to be a very low 21 percent of
the good's value in maximum foreign content, confirming the diagnosis that
NAFTA's rules of origin are very restrictive. More important, this rate provides a
transparent and fully comparable benchmark which is to base discussions of
reform and harmonization.
   If the slow pace of harmonization talks at the WTO is any indication, the
reform agenda described above may be overambitious by several orders of magni-
tude; even if the European Commission manages to complete the agenda, compe-
tition between systems may trigger similar rounds of simplification elsewhere.
including in free trade agreements between developing economies in Africa and
Latin America, whose rules of origin are often directly inspired by NAFT4 and
PANEURO. However, the outcome of the EU reform process is highly uncertain at
this stage; moreover. even if the plan to adopt an across-the-board value content


(hdol r1111l dr Mclo                                                             99

criterion survives, it is not clear that the rate of this value content rule would be
uniform. Nor is it certain (perhaps even less) that it would relax the restrictive-
ness of the current system.
   More immediate, win-win steps may be a better way to proceed. A simple first
step would consist of eliminating rules of origin requirements for tariff lines with
preferential margins below 3 or perhaps even 5 percent (the rate could be agreed
upon in the context of multilateral negotiations at the WTO). This would be an
all-around winning proposition since resources would be freed for other purposes,
especially in developing economies, but also for consumers in developed econom-
ies, who would no longer bear part of the increased costs associated with compli-
ance. A second step would be to allow for differential treatment not across
sectors, but across beneficiaries, with low value content requirements for least
developed economies reflecting the empirical observation that the "slices" of value
added in least developed economies through cross-border production networks are
generally thin. In this regard, the experience with the U.S. special regime granted
in textile and apparel to African producers under AGOA is most encouraging.




Appendix. Glossary of terms

Harmonized System. A system of classification for traded goods in which all
countries belonging to the World Customs Organization participate. It classifies
traded goods into (by increasing order of disaggregation) 2 1 sections (one digit),
99 chapters (two digits). 1,417 items (four digits), and 4.998 subitems (six
digits). Beyond that (eight- and ten-digit), classification systems are no longer har-
monized across countries and are subject to frequent classification changes.


Preference Margin. The difference between most favored nation and preferential
tariffs.


Preference Pass-Through. The percentage of a tariff-preference margin that is
"appropriated" by exporters in the form of an increase in the export price. It is
inversely related to the bargaining power of importers.


Preferential Status. Whether a good is eligible for the preferential tariff rate.


Technical Requirement. Rule of origin that imposes a certain type of production
process or the use of certain specified technology or standard.


100                                     TIIPWorld Bunk Rcsectrch Observer. vol. 2 3, rlo. I(Spring 2008)

Trade Deflection. Use of the country with the lowest external tariff by importers in
a free trade agreement (which reduces tariff revenue for others). This notion is
distinct from Vinerian "trade diversion."


Utilization Rate. Share of exports shipped under the preferential (as opposed to
most favored nation) regime.


Regime-wide Rules of Origin

Absorption or Roll-up. Principle that allows nonoriginating materials that have
acquired origin by meeting specific processing requirements to maintain this
origin when used as input in a subsequent transformation. In other words, the
nonoriginating materials are no longer talcen into account in calculating value
added. The roll-up or absorption principle is used in most preferential trade agree-
ments (in particular, the EU GSP and the Cotonou Agreement), although a few
have exceptions for the automotive sector.


Cumulation. Principle that allows producers from one member country in a prefer-
ential trade agreement to import nonoriginating materials from another member
country without affecting the final product's originating status. There are three
types of cumulation rules: bilateral, diagonal, and full Bilateral cumulation. It is
the most common type and applies to trade between two partners in a preferential
trade agreement. It stipulates that producers in country A can use inputs from
country B without affecting the final good's originating status as long as the
inputs satisfy the area's rules of origin.


Diagonal Cumulation. Under diagonal cumulation (the basic principle of the EU's
PANEURO system), countries in a preferential trade agreement can use materials
that originate in any member country as if the materials originated in the
country where the processing is undertaken. Under full cumulation all stages of
processing or transformation of a product within countries in a preferential trade
agreement can be counted as qualifying content regardless of whether the proces-
sing is sufficient to confer originating status to the materials themselves. Full
cumulation allows for greater fragmentation of the production process than bilat-
eral and diagonal cumulation.


Duty Drawbacks. Refunds to exporters of tariffs paid on imported intermediate good
inputs. Many preferential trade agreements. especially in the Americas, mandate
the elimination of duty-drawback schemes for exports to partner countries on the
grounds that a duty drawback claimed by a producer in country A to export to
country B would put that producer at a competitive advantage compared with

domestic producers in country B given that the producer in country A already
benefits from the elimination of intrabloc tariffs. Eliminating duty drawbacks as
part of a preferential trade agreement can harm the profitability of final-good
assembly for export to partner countries in the area, although tariff escalation,
when present, already provides some protection for final-assembly operations
(because it implies lower tariffs on intermediate goods than on final ones).


Product-Specific Rdes of Origin

Allowance. An amendment to a mandated change of tariff classification that
excludes some categories from noneligibility (that is, a final good belonging to,
say, chapter 11 can embody imported inputs belonging to any other chapter or
from chapter 11itself but between headings X and Y).

Change of Tariff Classification. Rule of origin requiring that a final good made with
imported inputs belong to a Harmonized System category that differs from that of
its imported inputs (as proof of transformation). The mandated change of tariff
classification can be specified at the chapter (two digits), heading (four digits),
subheading (six digits), or item (eight digits) level.

Exception. An amendment to a mandated change of tariff classification that
excludes some categories from eligibility (that is, a final good belonging to, say,
chapter 11 can embody imported inputs belonging to any other chapter except
headings X to Y).


Value Content. Rule of origin requiring a minimum percentage of local value
(materials or value added) or a maximum percentage of foreign value.




Olivier Cadot is Professor of Economics at the University of Lausanne, associated scholar at Centre
dlEtudes et de Recherches sur le Developpement International (CERDI),and fellow at the Centre for
Economic Policy Research (CEPR): his email address is olivier.cadot@unil.ch. Jaime de Melo (corre-
sponding author) is Professor of Economics at the University of Geneva, associated scholar at CERDI,
and fellow at CEPR: his email address is demelo@ecopo.unige.ch. The authors thank Paul Brenton
and Marcelo Olarreaga for many useful suggestions and their colleagues and co-authors Celine
Carrere, Antoni Estevadeordal, Alberto Portugal-Perez, Akiko Suwa-Eisenmann, and Bolormaa
Tumurchudur for permission to draw on joint work. They also thank three referees for comments
on a previous draft.
   1. According to this same tally, 45 developing economies having signed bilateral trade agree-
ments with a developed country, and 90 of the 109 preferential trade agreements between deve-
loped and developing ecorlomies have been created since 1990.
   2. According to a survey administered by the World Customs Organization to customs officials in
developing economies (as reported by Brenton and Imawaga 2004). 67 percent of respondents in


102                                          Tl~rWorld Bank Research Observer; vol. 2 3, no. 1 (Spring 2008)

Sub-Saharan Africa agree that dealing with-rules of origin under overlapping trade agreements
causes problems, and a       majority also agrees that rules of origin are more labor-intensive.
Administering rules of origin detracts from other objectives of tax collection and trade facilitation.
   3 . Because meeting the requirements is difficult and appears unnecessarily complex, in view of
the European Commission's objective to grant some preferential access to its market for GSP-eligible
countries, on 16 March 2004 the commission adopted Communication COM (2005) on "The Rules
of Origin in Preferential Trade Arrangements." The communication explores alternative rules of
origin that would be simpler and more development friendly. A key proposal under consideration is
to replace the current product-specific rules of origin with a single rule based on a minimum of orig-
inating value added.
   4. By comparison, the average preferential margin (computed over tariff lines with positive
tariffs) was 4.5 percent for NAFTA (almost all tariffs had been eliminated on NAFTA trade by
2001). 2.4 percent for GSP-eligible countries, and 4.6 for ACP countries (not eligible for Everything
But Arms status). Data for the European Union are for 2004, when 62 percent of trade for GSP-eli-
gible countries and over 8 0 percent of trade for ACP countries took place at zero tariffs (some ACP
also benefited from Everything But Arms status at zero tariffs in the EU market).
    5. Krishna (2006) discusses other effects that are more difficult to quantify: effects such as rules
of origin-jumping investment and effects on intermediate prices. Thoenig and Verdier (2006) also
consider the implications of rules of origin for multinationals confronted with outward-processing
decisions.
    6. The United States rarely uses a value content criterion as the sole requirement for origin, and
when it does it tends to rely on a single 40 percent foreign content requirement. The European
Union has value content criteria ranging from 50 to 15 percent of domestic value added.
    7. Cumulation is, in principle, only diagonal (see the glossary in the appendix), but the domestic
content can be calculated as an aggregate of value added in any ASEAN member state; so in effect
AFTA provides for full cumulation, although, as noted by Brenton (2006), the rules stipulate that
the final stage of manufacture must be carried out in the exporting member state (what constitutes
"the final stage" is not defined). Because vertical links and outsourcing are very important in Asia,
full cumulation considerably relaxes the requirements of satisfying origin.
    8. To drive home the importance of trade in intermediate goods, consider the following example.
On the basis of the input-output        data in Baldwin (2006, table 1 for Indonesia, Malaysia.
Philippines, and Thailand (middle-income Asian countries), an average of 35-40 percent of inter-
mediate goods are sourced outside AFTA. For example, take an activity with 1 0 percent value-added
and 4 0 percent nonoriginating intermediate goods-that        is, 36 percent of the final unit product
price is nonoriginating. Originating value for this activity would be 64 percent. Then take the plaus-
ible example of an activity with the same value added but with 60 percent of materials nonoriginat-
ing; originating value falls to 46 percent, barely above the 40 percent minimum currently stipulated
in AFTA.
    9. The special regime was recently extended until 2015. Figure 2 lists the 22 beneficiary
countries.
    10. The algorithm is in essence a grid search over cutoffs whose criterion is the minimization of
the concentrated sum of squared errors of the ordinary least squares regressions in the two regimes.
    11. The decline in exports to the preferential-giving destination suggests that producers
would choose to export under most favored nation status. In the illustrative simulations reported
here, with constant elasticity throughout and smooth substitution possibilities across the origin
for intermediate good purchases and export destination sales, producers pass on cost increases to
consumers.
    12. The authors of this article are aware of concerns voiced by the private sector in the
course of the EU review about the practical dificulty of a value content criterion for small firms
and, if based on costs, its potential to force unwanted disclosure of strategic information to
powerful EU buyers that would enhance their ability to squeeze rents from developing country
producers.


Cadot rir~ddc Melo                                                                                   103

References

Anson. J.. 0. Cadot. C. Carrere. A. Estevadeordal, J. de Melo, and B. Tunlurchudur. 2005. "Rules of
   Origin in North-South Preferential Trading Arrangements with an Application to NAFTA."
   Review of It~ternationalEconomics 13(3):501-17.

Augier, F!, M. Gasiorek, and C. Tang. 7005. "The Impact of Rules of Origin on Trade Flows."
   Economic Policy 20(43):567-24.

Baldwin, R. 2006. "Managing the Noodle Bowl: The Fragility of East Asian Regionalism." CEPR
   Discussion Paper 5561. Centre for Economic Policy Research, London.

Brenton. E 7006. "Notes on Rules of Origin with Implications for Regional Integration in South
   East Asia." World Bank. Washington, D.C.

Brenton. F!, and H. Imagawa. 2004. "Rules of Origin, Trade and Customs." In L. de Sokol,
   and J. Wulf eds., Custorrls Modernization Handbook. Washington. D.C.: World Bank.

Brenton, E, and M. Manchin. 2003. "Making EU Trade Agreements Work: the Role of Rules of
   Origin." The Rt~rldEconorny 26(5):755-69.

Brenton, F!, and C. 0zden. 2005. "Trade Preferences for Apparel and the Role of Rules of Origin:
   The Case of Africa." World Bank, Washington, D.C.

Cadot. 0.. C. Carrere, and L! Strauss-Kahn. 2007. "Export Diversification: What's Behind the
   Hump?"University of Lausanne. Switzerland.

Cadot, 0.. J. de Melo, and A. Portugal-Perez. 2005. "Market Access and Welfare under Free Trade
   Agreements: the Case of Textiles under NAFTA." Miorld Rank Economic Review 19(3):379-405.
.        (forthcoming). "Rules of Origin for Preferential Trading Arrangements: Implications for the
   ASEAN FTA of EU and US Experience." Journal of'Economic Intc~gratiotl.

Cadot, 0.. J. de Melo, and E. Pondard. 2006. "Evaluating the Conseyuc~ncc~sofa Shift to a Value-addcif
   Method for Determining Origin in the European Union's GSP Preferential Scl~erne."Report prepared
   for the European Commission, Brussels.

Cadot, 0.. A. Estevadeordal, and A. Suwa-Eisenmann. 2006. "Rules of Origin as Export Subsidies."
   In 0. Cadot eds., The Origin of Goods: Rulcls of Origin in Regional Trad~Agrclements. London: Oxford
   University Press.

Cadot, 0.. C. Carrere. J. de Melo, and B. Tumurchudur. 7006. "Product Specific Rules of Origin in
   EU and ITS Preferential Trading Arrangements: an Assessment." World Tradi~ Review 5(2):
   199-224.

Cadot. 0.. A. Estevadeordal. A. Suwa-Eisenmann, and T. Verdier eds. 2006. The Origin of Goods:
   Rules of Origin in Regional Trade Agreclmclnts. London: Oxford University Press.

Carrere, C., and J. de Melo. 2006. 'Are Different Rules of Origin Equally Costly? Estimates from
   NAFTA." In 0. Cadot eds., The Origin of Goods: Rules of Origin in Regional Trade Agreements.
   London: Oxford University Press.

Estevadeordal, A. 2000. "Negotiating Preferential Market Access: the Case of NAFTA." Journal of
   world Trade 34(1):141-66.

Estevadeordal, A,, and K. Suominen. 7006. "Mapping Rules of Origin Around the World." In
   0. Cadot eds.. Thr Origin of Gooifs: Rules of Origitr it1 Regiorlal Trade Agreements. London:
   Oxford University Press.

Francois. J., B. Hoekman, and M. Manchin. 2006. "Preference Erosion and Multilateral Trade
   Liberalization." World Rsnk Economic Review 20(7):197-216.

Hansen, B. 7000. "Sample Splitting and Threshold Estimation." Econornetric~~         68(3):575-603.



104                                              The World Bank Hesearcl~Obsertjer; vol. 23. 110. 1 (Sprirlg 2008)

Herin. J. 1986. "Rules of Origin and Differences between Tariff Levels in EFTA and in the EC." EFTA
   Occasional Paper 13. Geneva: European Free Trade Agreement.
Krishna. K. 2006. "Understanding Rules of Origin." In 0. Cadot eds., The Origin 01Goods: Rules of
   Origin in Regional Trade Agreements. London: Oxford University Press.
Krueger. A.O. 1998. "Free Trade Areas versus Customs Unions." Journal 01Development Economic's
    54(1):lhY-87.

0l;trreaga. M.. and C. 0zden. 2005. 'RGOA and Apparel: Who Captures the Tariff Rent in the
   Presence of Preferential Market Access?" Worlrl Economg 28(1 ):63 77.

&den. C.. and G. Sharma. 2006. "'The Price Effects of Preferential Market Access: Caribbean Basin
   Initiative and the Apparel Sector." World Bar~kErol~omicReview 20(2):241-60.

Portugal-Perez. A. 2006. "Disentangling the Determinants of Rules of Origin in North-South
   Preferential Trade Agreements: Evidence for NAPTA." University of Geneva.

  .       2007. "The Costs of ROOin Apparel: African Apparel Exports to the US and EIT." University
   of Geneva.

Stevens, C.. M. Gasiorek, J. Chweijczak, and J. Kennan. 2006. "Creating Development Friendly Rules
   of Origin in the EU." Overseas Development Institute, International Economic Development
   Group. London. (ww~~.odi.org.ukliedgipublications/Rules~of~0rigin~PinalReport.pdf).

l'hoenig, M., and T. Verdier. 2006. "The Impact of ROO on Strategic Outsourcing: An I 0
   Perspective." In 0 . Cadot eds.. Tllf Origin of Goods: Rules 01Origin in Regional Trade Agretvnents.
   London: Oxford University Press.

Tumurchudur, B. 2007a. "Rules of Origin and Marlcet Access in the Europe Agreements."
   University of Lausanne, Switzerland.

.         2007b. "Reforming NAPTA's Rules of Origin." University of Lausanne, Switzerland.

World Bank. 2005. Global Economic Prospects 2005: Trade. Regionalism and Developmant. Washington,
   D.C.

WTO (World Trade Organization). 2002. "Rules of Origin in Regional Trade Agreements." WT/REG/
   W145. Geneva. Switzerland.

Wulf. L. de, and J. Sokol eds. 2004. Customs Moderr~i:rrtion Handbook. Washington, D.C.: World Bank.




Cadot cind cle 1\.11~lo