Policy Research Working Paper 10051 Measuring What Matters Principles for a Balanced Data Suite That Prioritizes Problem-Solving and Learning Kate Bridges Michael Woolcock Development Economics Development Research Group May 2022 Policy Research Working Paper 10051 Abstract Responding effectively and with professional integrity to the cumulative concern is that these risks will inhibit rather the many challenges of public administration requires rec- than promote the core problem-solving and implementa- ognizing that access to more and better quantitative data tion capabilities of public sector organizations, an issue of is necessary but insufficient. Overreliance on quantitative high importance everywhere but especially in developing data comes with its own risks, of which public sector man- countries. The paper offers four cross-cutting principles for agers should be keenly aware. This paper focuses on four building an approach to the use of quantitative data—a such risks. The first is that attaining easy-to-measure targets “balanced data suite”—that strengthens problem-solving becomes a false standard of broader success. The second is and learning in public administration: (1) identify and that measurement becomes conflated with what manage- manage the organizational capacity and power relations ment is and does. The third is that measurement inhibits that shape data management; (2) focus quantitative mea- a deeper understanding of the key policy problems and sures of success on those aspects which are close to the their constituent parts. The fourth is that political pres- problem; (3) embrace a role for qualitative data, especially sure to manipulate key indicators can lead, if undetected, for those aspects that require in-depth, context-specific to falsification and unwarranted claims or, if exposed, to knowledge; and (4) protect space for judgment, discretion, jeopardizing the perceived integrity of many related (and and deliberation in those (many) decision-making domains otherwise worthy) measurement efforts. Left unattended, that inherently cannot be quantified. This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at k8bridges@gmail.com and mwoolcock@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Measuring What Matters: Principles for a Balanced Data Suite That Prioritizes Problem- Solving and Learning Kate Bridges (Independent Consultant) and Michael Woolcock (World Bank)1 JEL codes: C80, H83, O20 Keywords: Mixed Methods, Public Administration, Data Curation, Organizational Learning, Problem Solving 1 The views expressed in this paper are those of the authors alone, and should not be attributed to the World Bank, its executive directors, or the countries they represent. Our thanks to Galileu Kim, Daniel Rogger, Christian Schuster and participants at an authors’ workshop for helpful comments and constructive suggestions. More than twenty years of collaboration with Vijayendra Rao have also deeply shaped the views expressed herein. Remaining errors of fact or interpretation are solely ours. The final version of this paper will be a chapter in Daniel Rogger and Christian Schuster (eds.) Government Analytics: An Empirical Guide to Measurement in Public Administration (Washington, DC: World Bank, forthcoming). I. Introduction “What gets measured gets managed; and what gets measured gets done” is one of those ubiquitous (even clichéd) management phrases that hardly requires explanation; it seems immediately obvious that the data generated by regular measurement and monitoring is what makes possible the improvement of results. Less well known than the phrase itself is the fact that although it is commonly attributed to the acclaimed management theorist Peter Drucker, Drucker himself never actually said it. 2 In fact, Drucker’s views on the subject were reportedly far more nuanced, along the lines of V. F. Ridgway, who argued over 65 years ago that not everything that matters can be measured and not everything that can be measured matters (Ridgway 1956). Simon Caulkin, a contemporary business management columnist, neatly summarized Ridgway’s argument, in the process expanding the truncated to-measure-is-to-manage phrase to “What gets measured gets managed — even when it’s pointless to measure and manage it, and even if it harms the purpose of the organisation to do so.” 3 Ridgway and Caulkin’s warnings – repeated in various guises by many since 4 – remind us that an indiscriminate usage of quantitative measures and an undue confidence in what they can tell us may turn out to be highly problematic in certain situations, sometimes derailing the very performance improvements that the data was intended to support (Merry et al 2015). We hasten to add, of course, that seeking more and better quantitative data is clearly a worthy aim in public administration (and elsewhere). Many important gains in human welfare (e.g., recognizing and responding to learning disabilities) can be directly attributed to interventions conceived and prioritized on the basis of the empirical documentation of the reality, scale, and consequences of the underlying problem. The wonders of modern insurance are possible because actuaries can quantify all manner of risks over time, space and groups. What we will argue in the following sections, however, is that access to quantitative data alone is not a sufficient condition for achieving many of the objectives that are central to public administration and economic development. This paper has five sections. Following this Introduction (Section I), we lay out in Section II the ways in which the collection, curation, analysis, and interpretation of data is embedded in contexts: no aspect takes place on a blank slate. On one hand, the institutional embeddedness of the data collection and usage cycle – in rich and poor countries alike – leaves it susceptible to a 2 According to the Drucker Institute; see https://www.drucker.institute/thedx/measurement-myopia/ (accessed 6 December 2021). 3 See https://www.theguardian.com/business/2008/feb/10/businesscomment1 (accessed 6 December 2021). 4 See, for example, former USAID Administrator, Andrew Natsios (2011), citing Lord Wellington in 1812, on the insidious manner in which measures of ‘accountability’ can compromise rather than enable central policy objectives (in Wellington’s case, winning a war). For his part, Stiglitz has argued that “What you measure affects what you do. If you don’t measure the right thing, you don’t do the right thing.” (As quoted in the New York Times, October 4, 2009.) Pritchett (2014), exemplifying this point, notes (at least at the time of his writing) that the Indian state of Tamil Nadu had 817 indicators for measuring the delivery of public education, but none that actually assessed whether students were learning – in this instance, an abundance of “measurement” and “data” was entirely disconnected from (what should have been) the policy’s central objective. In many cases, however, it is not always obvious, especially ex ante, what constitutes the “right thing” to measure – hence the need for alternative methodological entry points to elicit what these might be. 2 host of possible ways in which subsequent delivery efforts may be compromised, stemming from an organization’s lack of capability to manage and deploy data in a consistently professional manner. At the same time, the task’s inherent political and social embeddedness ensure that it will be susceptible to influence by existing power dynamics and the normative expectations of those leading and conducting the work, especially when the political and financial stakes are high. In contexts where much of everyday life transpires in the informal sector – thereby rendering it “illegible” to, or enabling it to actively avoid engagement with, most standard measurement tools deployed by public administrators – sole reliance on formal quantitative measures will inherently only capture a slice of the full picture. In Section III, we highlight four specific ways in which an indiscriminate increase in the collection of what is thought to be “good data” can lead to unintended and unwanted (potentially even harmful) consequences. The risks are that: (1) the easy-to-measure becomes a misleading or false measure of broader reality; (2) measurement becomes conflated with what management is and does; (3) an emphasis on what is readily quantified inhibits a fuller and more accurate understanding of the underlying policy problem(s) and their constituent elements; and (4) political pressure to manipulate selected indicators leads, if undetected, to falsification and unwarranted expectations – or, if exposed, to the perceived compromised integrity of otherwise worthy measurement endeavors. Thankfully, there are ways to anticipate and mitigate these risks and associated unintended consequences. Having flagged how unwanted outcomes can emerge, we proceed to highlight, in Section IV, some practical ways in which public administrators might thoughtfully anticipate, identify, and guard against them. We discuss what a balanced suite of data tools might look like in public administration and suggest four principles that can help us apply these tools to the greatest effect, thereby enabling the important larger purposes of data to be served. We stress from the outset that our concerns are not with methodological issues per se, or the quality or comprehensiveness of quantitative data; these are addressed elsewhere, in every econometrics textbook, and should always be considered as part of doing ‘normal social science’. The concerns we articulate are salient even in a best-case scenario, in which analysts have access to great data acquired from a robust methodology, though obviously they are compounded when the available data is of poor quality – as is often the case, especially in low-income countries – and when too much is asked of it. II. How data is impacted by the institutional and socio-political environment in which it is collected For all administrative tasks, but especially those entailing high-stakes decision-making, the collection and use of data is a human process inherently subject to human foibles (Porter 1995). Much of this is widely accepted and understood: for example, key conceptual constructs in development (such as ‘exclusion’, ‘household’, ‘fairness’) can mean different things to different people and translate awkwardly into different languages. With this in mind professional data collectors will always give serious attention to ‘construct validity’ concerns in an effort to ensure 3 there is close alignment between the questions they ask and the questions their informants hear. 5 For present purposes we draw attention to issues given less attention, but which are critical nonetheless, namely institutional and political factors that together comprise the context shaping which data is (and is not) collected, how it is collected, from whom, how well it is curated over time, and how carefully conclusions and their policy implications are drawn from analyses of it. We briefly address each item in turn. i. Institutional embeddedness of data Beyond the purposes to which it is put, the careful collection, curation, analysis, and interpretation of public data is itself a complex technical and administrative task, requiring broad, deep, and sustained levels of organizational capability. In this section, we briefly explore three institutional considerations shaping these factors: dynamics shaping the (limited) ‘supply’ and refinement of technical skills; the forging of a professional culture that is a credible mediator of complex (potentially heated) policy issues yet sufficiently robust to political pressure; and the related capacity to infer what even the best data analysis ‘means’ for policy, practice, and problem-solving. These issues apply in every country, but are especially salient in low-income countries, where the prevailing level of implementation capability in the public sector is likely to be low, and where the corresponding expectations of those seeking to improve it by expanding the collection and use of quantitative data may be high. At the individual level, staff with the requisite quantitative analytical skills are likely to be in short supply, because acquiring such skills requires considerable training, while those who do have them are likely to be offered much higher pay in the private sector. (One could in principle ‘outsource’ some data collection and analysis tasks to external consultants but doing so would be enormously expensive and potentially compromise the integrity and privacy of unique public data.) So understood, it would be unreasonable to expect the ‘performance’ of data-centric public agencies to be superior to other service-delivery agencies in the same context (e.g., public health). Numerous studies suggest the prevailing levels of implementation capability in many (if 5 Social science methodology courses classically distinguish between four key issues that are at the heart of efforts to make empirical claims in applied research: (1) ‘construct validity’ (the extent to which any concept, such as ‘corruption’ or ‘poverty’, matches particular indicators), (2) ‘internal validity’ (the extent to which causal claims have controlled for potential confounding factors, such as sample selection bias), (3) ‘external validity’ (the likelihood that claims are generalizable at larger scales, to more diverse populations, or novel contexts), and (4) ‘reliability’ (the extent to which similar findings would be reported if repeated or replicated by others). See, among many others, Johnson et al (2019). Of these four issues, qualitative methods are especially helpful in ensuring construct validity, since certain terms may mean different things to different people in different places, complicating matters if one seeks to draw comparisons across different linguistic/cultural/national contexts. In survey research, for example, it is increasingly common to include what is called an ‘anchoring vignette’ – a short real-world example of the phenomena in question, such as an instance of ‘corruption’ by a government official at a port – before asking the formal survey question so that cross-context variations in interpretation can be calibrated accordingly (see, among others, King and Wand 2007). Qualitative methods can also contribute to considerations pertaining to internal validity (Cartwright 2017) and external validity – helping to identify the conditions under which findings ‘there’ might apply ‘here’ (Woolcock 2018; see also Cartwright and Hardie 2012). 4 not most) low-income countries are far from stellar (Andrews et al 2017). 6 For example, Jerven’s (2013) important work in Africa on the numerous challenges associated with maintaining the System of National Accounts – the longest-standing economic data collection task asked of all countries, from which their respective GDP is determined – portends the difficulties facing less high-profile metrics (see also Sandefur and Glassman 2015). 7 Put differently: if many developing countries struggle to curate the single, longest-standing, universally-endorsed, most important measure asked of them, on what basis do we expect these countries to manage lesser, lower-stakes measures? To be sure, building quantitative analytical skills in public agencies is highly desirable; for present purposes, our initial point is a slight variation on the old adage 8 that the quality of outcomes derived from quantitative data is only as good as the quality of the ‘raw material’ and the competence with which it is analyzed and interpreted. Fulfilling an otherwise noble ambition to build a professional public sector whose decisions are informed by ‘evidence’ requires a prior and companion effort to build the requisite skills and sensibilities. Put differently, precisely because effective data management is itself such a complex and difficult task, in contexts where agencies struggle to implement even basic policy measures at a satisfactory level (e.g., delivering mail, ensuring attendance at work) it is unlikely that, ceteris paribus, invoking such agencies to also take a more ‘data driven’ approach will elicit substantive improvement. More and better ‘data’ will not fix a problem if the absence of such data is not itself the key problem or the ‘binding constraint’; as such, the priority issue is discerning what is in fact the key policy problem and its constituent elements. From this starting point, more and better data can be part of, but not a substitute for, strategies for enhancing the effectiveness of public sector agencies. Even if both data management and broad institutional capability are functioning at high and complementary levels, there remains the structural necessity of interpreting what the data means. Policy inference from even the best data and most rigorous methodology is never self-evident; it must always be undertaken in the light of theory. This might sound like an abstract academic concern, but it is especially important when seeking to draw lessons from, and/or make big decisions regarding the fate of, complex interventions. This is so because a defining characteristic of a complex problem is that it generates highly variable outcomes across time, space, and groups. 6 If such agencies/departments do in fact happen to perform especially strongly – in the spirit of the ‘positive deviance’ cases of government performance in Ghana provided in McDonnell (2020) – then it would be useful to understand how and why this has been attained. For present purposes, our point is that, perhaps paradoxically, we should not expect, ex ante, that agencies/departments in the business of collecting and curating data for guiding policy and performance to themselves be exemplary exponents of the deployment of that data to guide their own performance – because doing so is a separate ontological task, requiring distinctive professional capabilities. Like the proverbial doctors, if data analysts cannot “heal thyselves” we should not expect other public agencies to be able to do so merely by infusing them with more and better data. 7 A special issue of The Journal of Development Studies (Volume 51, Issue 2) was dedicated to this issue. For example, on the enduring challenges associated with agricultural data – another sector with a long history of data collection experience – see Carletto, Jolliffe, and Banerjee (2015). 8 Popularly known as GIGO: garbage in, garbage out. 5 Promoting gender equality, for example, is a task that rarely generates rapid change – it can take a generation (or several, or centuries) for rules invoking/requiring equal participation in community meetings, or equal pay for equal work, to become the ‘new normal’. 9 So, assessed over a five-year timeframe, a ‘rigorous’ methodology and detailed data may likely yield an empirical finding showing that a given Gender Empowerment Project (GEP) has had “no impact”; taken at face value, this is precisely what ‘the data’ would show and the type of policy conclusion (“GEP doesn’t work”) that would be drawn. However, interpreted in the light of a general theory of change incorporating the likely impact trajectory that GEP-type interventions follow – i.e., a long period of stasis eventually leading to a gradual but sustained take-off – a “doesn’t work” conclusion would be unwarranted; five years is simply too soon to draw such a firm conclusion (Woolcock 2018). 10 High quality data and a sound methodology alone cannot solve this problem: GEP may well be fabulous, indifferent, useless, or a mixture of all three, but discerning which of these it is – and why, where, and for whom it functions in the way it does – will require the incorporation of different kinds of data into a close dialogue with a practical theory of change fitted for this sector, this context, and the development problem being addressed. ii. Socio-political embeddedness of data Beyond these institutional concerns, a second important form of embeddedness shaping data collection, curation, and interpretation is the manner in which all three are shaped by socio- political processes and imperatives. All data is compiled for a purpose; in public administration, the scale and sophistication of the required data is costly and complex (therefore requiring significant financial outlay and thus is in competition with rival claimants). Data is frequently called upon to adjudicate both the merits of policy proposals ex ante (e.g., the Congressional Budget Office in the US) and the effectiveness of programmatic achievements ex post (the World Bank’s Independent Evaluation Group), which frequently entails entering into high-stakes political gambits: e.g., achieving signature campaign proposals in the early days of an administration and proclaiming their subsequent widespread success (or failure) as election time beckons again. (See more on this below.) Beyond the intense political pressure ‘data’ is asked to bear in such situations, a broader institutional consideration is the role large-scale numerical information plays in “rendering legible” (Scott 1998) complex and inherently heterogeneous realities, such that they can be 9 See the evolution of the early and subsequent work on gender inclusion in rural India (Ban and Rao 2007, Duflo 2012, Sanyal and Rao 2018). 10 This does not mean, of course, that nothing can be said about GEP after five years -- managers and funders would surely want to know by this point whether the apparent “no net impact” claim is a result of (a) poor technical design; (b) weak implementation; (c) contextual incompatibility; (d) countervailing political pressures; or (e) insufficient time having elapsed. Moreover, they would likely be interested in learning whether GEP’s zero “average treatment effect” is nonetheless a process of offsetting outcomes manifest in a high standard deviation (meaning GEP works wonderfully for some groups in some places but disastrously for others), and/or is yielding unanticipated or unmeasured outcomes (whether positive or negative). For present purposes, our point is that reliance on a single form and methodological source of data is unlikely to be able to answer these crucial administrative questions; with a diverse suite of methods and data, however, such questions become both ask-able and answerable. (See Rao et al 2017 for an instructive example, discussed below.) 6 managed, mapped, and manipulated for explicit policy purposes. We hasten to add that such “thin simplifications” (Scott’s term) of reality can be both benign and widely beneficial: comprehensive health insurance programs and pension systems have largely tamed the otherwise debilitating historical risks of, respectively, disease and old age by generating premiums based on general demographic characteristics and the likelihood of experiencing different kinds of risks (e.g., injuries, cancer) over the course of one’s life. A less happy aspect of apprehending deep contextual variation via simplified (often categorical) data, however, is the corresponding shift it can generate in the political status and salience of social groups. The deployment of the census in colonial India, for example, is one graphic demonstration of how the very act of ‘counting’ certain social characteristics – such as the incidence of caste, ethnicity and religion – can end up changing these characteristics themselves, rendering what had heretofore been relatively ‘fluid’ and ‘continuous’ categories as ‘fixed’ and ‘discrete’. In the case of India, this massive exercise in data collection on identity led to “caste” being created, targeted and mobilizable as a politically salient characteristic that had (and continues to have) deep repercussions (e.g., at independence, when Pakistan split from India, and more recently the rise of Hindu nationalism) (see Dirks 2011). 11 More recently, influential scholars have argued that the infamous Hutu/Tutsi massacre in Rwanda was possible at the scale at which it was enacted because of ethnic categories being formalized and fixed via public documents whose origins lie in colonial rule (e.g., Mamdani 2002). For Scott (1998), public administration can only function to the extent its measurement tools successfully turn wide-spread anthropological variation, such as languages spoken, into singular modern categories and policy responses (e.g., to ensure that education is conducted in one national language, in a school, on the basis of a single curriculum 12); the net welfare gains to society might be unambiguous, but poorer, isolated, marginalized, and less numerous groups are likely to bear disproportionately the costs of this trade-off. If official ‘data’ itself constitutes an alien or distrusted medium by which certain citizens are asked to discern the performance of public agencies, merely providing (or requiring) “more of it” is unlikely to bring about positive change. In such circumstances, much antecedent work may need to be undertaken to earn the necessary trust from citizens, and to help them more confidently engage with their administrative systems. 13 By way of reciprocity, perhaps it will also require such systems to interact with 11 One could say that this is a social scientific version of the Heisenberg Uncertainty Principle, in which the very act of measuring something changes it. See also Breckenridge (2014) on the politics and legacy of identity measurement in pre- and post-colonial South Africa, and Hostetler (2021) on the broader manner in which imposing singular (but often alien) measures of time, space and knowledge enabled colonial administration. More generally, Sheila Jasanoff’s voluminous scholarship shows how science is a powerful representation of reality, which when harnessed to technology can reduce “individuals to standard classifications that demarcate the normal from the deviant and authorize varieties of social control” (Jasanoff 2004: 13). 12 Among the classic historical texts on this issue are Peasants into Frenchman (Weber 1976) and Imagined Communities (Anderson 1983). For more recent discussions, see Lewis (2015) on “the politics and consequences of performance measurement” and Beraldo and Milan (2019) on the politics of Big Data. 13 This is the finding, for example, from a major empirical assessment of cross-country differences regarding Covid- 19 (Bollyky et al 2022), wherein – controlling for a host of potential confounding variables – those countries with both high infections and high fatalities are characterized by low levels of trust between citizens and their government, and between each other. See further discussion on this study and its implications below. 7 citizens themselves in ways that more readily comport with citizens’ own everyday (but probably rather different) vernacular for apprehending the world, interpreting events, and responding to them. Either way, it is critical that officials be wary of the potentially negative or unintended effects of data collection, even when it may begin with a benign intention to facilitate social inclusion and more equitable policy ‘targeting’. 14 III. The unintended consequences of an indiscriminate pursuit of “more data” There is a sense in which it is axiomatic that more and better data is always a good thing. But the institutional and socio-political embeddedness of data generation and its use in public administration (as discussed in the preceding section) means we need to qualify this otherwise laudable assertion by focusing on where and how challenges can arise. With that in mind, we turn our attention in this section to those instances wherein the increased collection of what is thought to be “good data” has led to perverse outcomes. Here we highlight four such outcomes that may materialize as the result of an undue focus on issues, concepts, inputs or outcomes which happen to be most amenable to being quantified. 1. The easy to measure may become a false standard of success What may start as a well-intentioned managerial effort to be better at quantifying meaningful success can end up generating instead a blinkered emphasis on that which is simply most easy to quantify. The result can be a skewed or false sense of what a project has (or has not) achieved, and how, where, and for whom outcomes were achieved. In a recent study, we demonstrate how a variety of institutional incentives align across the Government of Malawi and the World Bank in such a way that both Government of Malawi and World Bank officials consistently favor easy-to-measure indicators (inputs and outputs, or what we refer to as “changes in form rather than function”) as the yardstick of project success (Bridges and Woolcock 2017). This was a quintessential example of what strategy writer Igor Ansoff describes as a situation in which “managers start off trying to manage what they want, and finish up wanting what they can measure.” 15 As a result of evaluating Public Financial Management (PFM) projects that were implemented over the course of twenty years in Malawi, we show that almost 70% of what projects measure or aim for is “change in terms of whether institutions look like their functioning counterparts (i.e., have the requisite structures, policies, systems and laws 14 The British movie ‘I, Daniel Blake’ provides a compelling example of how even the literate in rich countries can be excluded by administrative systems and procedures that are completely alien to them – e.g., filling out forms for unemployment benefits on the Internet that require them to first “log on” and then “upload” a “CV”. The limits of formal measurement to bring about positive policy change has long been recognized; when the Victorian-era writer George Eliot was asked why she wrote novels about the lives of the downtrodden rather than contribute to official government reports more formally documenting their plight, she astutely explained that “appeals founded on generalizations and statistics require a sympathy ready-made, a moral sentiment already in activity…” (cited in Gill 1970:10). Forging such a Smithian ‘sympathy’ and ‘moral sentiment’ is part of the important antecedent work that renders ‘generalizations and statistics’ legible and credible to those who might otherwise have no reason for engaging with, or experience interpreting, such encapsulations of reality. 15 Quoted in Cahill (2017: 152). 8 in place)” whereas only 30% of what is measured can be said to be “functional” – that is, focused on “purposeful changes to budget institutions aimed at improving their quality and outcomes” (Andrews 2013: 7). What’s more, we find that World Bank PFM projects have considerably more success in achieving the “form” results than the “functional” ones. Unsurprisingly, demonstrable improvements in actual performance are far harder to achieve than changes that are primarily regulative, procedural or systems oriented. Unfortunately, an emphasis on what is easy-to-measure obfuscates this reality and allows reform “success” to be claimed. In practice, Malawi’s history of PFM reform is littered with projects that claim “success” based on hardware procured, software installed, legislation developed, and people trained, whereas even a basic analysis reveals stagnation or even regression in terms of more affordable spending decisions, spending that reflects budgeted promises, greater ability to track the flow of funds, or a reduction in corruption. As long as the World Bank and the Malawian government focus on the “form” measures, they are able to maintain the illusion of success. That is, until something like Malawi’s 2013 “Cashgate” crisis – in which about US$ 32 million in government funds were spectacularly revealed to have been misappropriated between April and September 2013 – lifts the lid on the deep-rooted financial management problems that have remained largely unaffected by millions of dollars of reform efforts. In this sense, Malawi is a microcosm of many institutional reform efforts globally. Although similar financial reforms have been globally implemented in a manner that suggests some level of consensus about “what works”, the outcomes of those reforms are varied at best and often considerably lower than anticipated (Andrews 2013). In the same way that an emphasis on the easy-to-measure can lead to an over-estimation of success, it can also contribute to an under-estimation. Reforms can sometimes yield meaningful change via what McDonell (2017) calls “the animating spirit of daily practice” but end up being missed because managers do not have good means of measuring, attributing, and enhancing these kinds of shifts. For example, when researching the impact of technical assistance to a large government health program in Nigeria, we found that there were strong indications of important innovations and shifts taking place at the local level, including in aspects as difficult to shift as cultural practices regarding contraceptives (Bridges and Woolcock 2019). These shifts in practice and their impact on contraceptive uptake could not be apprehended by aggregated state- wide indicators, however, and since no measurement was being done below this level, the progress and valuable lessons of such interventions were being missed. Another example of the importance of having access to a broader suite of data comes from an assessment of a program in rural India seeking to promote participatory democracy in poor communities, where the curation of such a data suite enabled more nuanced and constructive lessons to be drawn (see Rao, Ananthpur, and Malik 2017). The results of the initial randomized controlled trial (RCT) deemed the program to have had no mean impact – and if that was the only data available, that would have been the sole conclusion reached. 16 Upon closer inspection, 16 We fully recognize that, in principle, econometricians have methods available to both identify outcome heterogeneity and the factors driving it. Even so, if local average treatment effects are reported as zero the ‘no impact’ conclusion is highly likely to be the (only) key take-away message. The primary benefit of incorporating 9 however, it was learned that there was in fact considerable variation in the program’s impact. The average of this variation may have been close to zero, but for certain groups the program had worked quite well, for others it had had no impact, while for still others it had been detrimental. Who were these different groups, and what was it about them that led to such variable outcomes? A companion qualitative process evaluation 17 was able to discern that the key difference was the quality of implementation received by different groups, the level of support provided to them by managers and political leaders, and variations in the nature and extent of local-level inequalities (which in turn shaped which groups were able to participate, and on what terms). The administrative rules and implementation guidelines provided to all groups were identical, but in this case a qualitative process evaluation was able to document the ways and places in which variable fidelity to them yielded widely different outcomes (albeit with no net impact). Moreover, the qualitive data was able to discern subtle positive effects from the program that reliance on the quantitative survey instrument alone would have missed. 2. Measurement becomes conflated with management An extension of the above point is that undue emphasis on quantitative data can lead to measurement becoming a substitute for rather than a complement to management. This is evident when only that which is quantifiable receives any significant form of managerial attention, an outcome made possible when the easily quantifiable becomes the measure of success, in turn becoming the object of management’s focus, typically to the exclusion of all else. As Wilson (1989: 161) famously intoned in this classic study of bureaucratic life, “[w]ork that produces measurable outcomes tends to drive out work that produces immeasurable outcomes” (Wilson 1989: 161). In one sense this is hardly surprising; the need for managers to make decisions on the basis of partial information is difficult and feels risky, so anything that claims to fill that gap and bypass the perceived uncertainty of subjective judgement will be readily welcomed. “The result”, Simon Caulkin argues, “both practically and theoretically, is to turn today’s management into a technology of control that attempts to minimise rather than capitalise on the pesky human element.” 18 And in public administration, a managing-it-by-measuring-it bias can mean that, over time, the bulk of organizational resources end up neglecting the “pesky human element” of change processes, despite the fact that it is this element which is often central to attaining any transformational outcomes managers are seeking. This dynamic characterizes key aspects of the Saving One Million Lives (SOML) initiative, an ambitious health sector reform program launched by the Government of Nigeria. The original goal of SOML was to save the lives of one million mothers and children by 2015; to this end, both qualitative and econometric methods is the capacity of the former to identify factors that were not anticipated in the original design (see Rao 2022). In either case, Ravallion’s (2001) injunction to “look beyond averages” when engaging with complex phenomena is worth being heeded by all researchers (and those that interpret the researchers’ findings), no matter their disciplinary or methodological orientation. 17 On the use of mixed methods in process evaluations, see Rogers and Woolcock (forthcoming). 18 https://www.treasurers.org/hub/treasurer-magazine/decision-making-how-make-most-data 10 SOML gave priority to a package of health interventions known as ‘the six pillars’. 19 The World Bank actively supported SOML, using its “Program for Results” instrument 20 to financially reward Nigerian states based on improvements from their previous best performance on six key indicators. 21 Improvements were to be measured through yearly household surveys providing robust estimates at the state level. In practice, of course, these six pillars (or intervention areas) were wildly different in their drivers and complexity – improvement within them was therefore destined to move at different trajectories and different speeds for different groups in different places. State actors, keen to raise their aggregate measure of success and get paid for it, soon realized that there was some gaming to be done. Our field research documented how the emphasis on singular measures of success introduced a perverse incentive for states to focus on the easier metrics at the expense of the hard (Bridges and Woolcock 2019). Interviews with state officials revealed that front-line staff were increasingly focusing their time and energies on those constituent variables that they discerned were easiest to accomplish (e.g., dispensing vitamin supplements) over those that were harder or slower – typically those that involved a plethora of “pesky human elements” – such as lowering maternal mortality or increasing contraceptive use. Thus, in selecting certain outcomes for measurement and managing these alone, others inevitably end up being sidelined. Likewise, a recent report on Results-based Financing (RBF) in the Education Sector (Dom et al. 2020) finds evidence of a “diversion risk” associated with the signposting effect of certain reward indicators, with important areas deprioritized because of the RBF incentive. For example, in Mozambique they find that an emphasis on simple process indicators and focus on targets appears to have led to officials diverting resources and attention away from “more fundamental and complex issues”, such as power dynamics in the school council, the political appointment of school directors, or the teachers’ use of training. Dom et al. also report evidence of “cherry- picking risks”, in which less costly or politically favored subgroups or regions see greater resources, in part because they are more likely to reach a target. For example, in Tanzania, they found evidence that the implementation of school rankings based on exam results was correlated with weaker students not sitting, presumably in an effort by the schools to raise average exam pass rates. This tendency becomes a particular issue when the sidelined outcomes end up being the ones we care most about. Andrew Natsios (2011), the former Director of the United States Agency for 19 The six pillars were (i) Maternal, newborn and child health; (ii) Childhood essential medicines and increasing treatment of important childhood diseases; (iii) Improving child nutrition; (iv) Immunization; (v) Malaria control; and (vi) the Elimination of Mother to Child Transmission (EMTCT) of HIV. 20 A PfR is one of the World Bank’s three financing instruments. Its unique features are that it uses a country’s own institutions and processes, and links disbursement of funds directly to the achievement of specific program results. Where ‘traditional’ development interventions proceed on the basis of ex ante commitments (e.g., to designated ‘policy reforms’, to the adoption of procedures compliant with international standards), PfR-type interventions instead reward attainment of predetermined targets, typically set on the basis of extrapolating from what recent historical trajectories have attained. 21 According to the Program Appraisal Document “each state would be eligible for a grant worth $325,000 per the percentage point gain they made above average annual gain in the sum of six indicators of health service coverage.” The six indicators are: Vitamin A, Pentavalent3 immunization, Use of ITNs by children under 5, Skilled birth attendance, Contraceptive prevalence rate, and Prevention of mother-to-child transmission of HIV. 11 International Development (USAID, an organization charged with “demonstrating the impact of every aid cent that Congress approves”), argues compellingly that the tendency in aid and development towards what he called “Obsessive Measurement Disorder” (OMD) is a manifestation of a core dictum among field-based development practitioners – namely “that those development programs that are most precisely and easily measured are the least transformational, and those programs that are most transformational are the least measurable.” The change we often desire most is in very difficult-to-measure aspects, such as people’s habits, their cultural norms, leadership characteristics, or mindsets. This reality is also aptly illustrated in many anti-corruption efforts, whereby imported solutions have managed to change the easy-to-measure – new legislation approved, more cases brought, new financial systems installed, more training sessions held – but failed to shift cultural norms regarding the non-acceptability of whistle blowing or the social pressures for nepotism (Andrews 2013). A failure to measure and therefore manage these informal drivers of the problem ensures that any apparent reductions in fund abuses tend to be short-lived or illusory. This phenomenon is hardly limited to poor countries. A more brutal example of how what cannot be measured does not get managed, with disastrous results, can be found in the UK’s National Health System. While investigating the effects of competition in the NHS, Propper et al. (2008) discovered that the introduction of inter-hospital competition improved waiting times while also substantially increasing the death rate following emergency heart attacks. The reason for this was that waiting times were being measured (and therefore managed), while emergency heart-attack deaths were not tracked, and were thus neglected by management. The result was shorter waiting times but more deaths as a result of the choice of measure. The authors note that the issue here was not intent, but the extent to which one target consumed managerial attention, to the detriment of all else; as they note, it “seems unlikely that hospitals deliberately set out to decrease survival rates. What is more likely is that in response to competitive pressures on costs, hospitals cut services that affected [heart-attack] mortality rates, which were unobserved, in order to increase other activities which buyers could better observe” (Propper at al. 2008). More recently, in October 2019 the Global Health Security Index sought to assess which countries were “most prepared” for a pandemic, using a model that gave the highest ranking to the United States and the United Kingdom, largely on the basis of these countries’ venerable medical expertise and technical infrastructure, factors which are readily measurable. 22 Alas, the model did not fare so well when an actual pandemic arrived soon thereafter: a subsequent analysis, published in the Lancet on the basis of pandemic data from 177 countries between January 2020 - September 2022, found that “[p]andemic-preparedness indices … were not meaningfully associated with standardised infection rates or IFRs [infection/fatality ratios]. Measures of trust in the government and interpersonal trust, as well as less government corruption, had larger, statistically significant associations with lower standardised infection rates” (Bollyky et al et al 2022, p. 1). Needless to say, variables such as ‘trust’ and ‘government corruption’ are (a) hard to measure, (b) hard to incorporate into a single theory anticipating or informing a response to a pandemic, and (c) map awkwardly onto any corresponding policy instrument. For present purposes, the 22 See https://www.statista.com/chart/19790/index-scores-by-level-of-preparation-to-respond-to-an-epidemic/ 12 inference we draw from these findings is not that global indices have no place; rather, it suggests the need, from the outset, for curating a broad suite of data when anticipating and responding to complex policy challenges, the better to promote real-time learning. Doubling down on what can be readily measured limits the space for eliciting those ‘unobserved’ (and perhaps unobservable) factors that may turn out to be deeply consequential. 3. An emphasis on the easy to quantify inhibits understanding of the foundational problem An indiscriminate emphasis on aggregated, quantitative data can erode important nuances about the root causes of the problems we want to fix, thereby hampering our ability to craft appropriate solutions and undermining the longer-term problem-solving capabilities of an organization. All too often the designation of indicators and targets has the effect of causing people to become highly simplistic about the problems they are trying to address. In such circumstances, what should be organizational meetings held to promote learning and reflection on what is working and what is not become instead efforts in accounting and compliance. Reporting, rather than learning, is incentivized and management increasingly focuses on meeting the target numbers rather than solving the problem. Our concern here is that, over time, this tendency progressively erodes an organization’s problem-solving capabilities. The education sector is perhaps the best illustration of this; time and again practitioners have sought to codify “learning” and time and again this has resulted in an obfuscation of the actual causes underlying the problem. In a well-intentioned effort to raise academic performance, “the standards movement” in education promoted efforts hinged on quantitative measurement, as reported in the league tables of the Program for International Student Assessment (PISA). 23 PISA runs tests in mathematics, reading, and science every three years with groups of fifteen- year-olds in countries around the world. Testing on such scale requires a level of simplicity and “standardization”, thus the emphasis is on written examinations and extensive use of multiple- choice tests so that students’ answers can be easily codified and processed (Robinson and Aronica 2015). Demonstrating competence on fundamental learning tasks certainly has its place, but critics have increasingly argued that such tests are based on an incorrect assumption that what drives successful career and life outcomes is the kind of learning that is capable of being codified via a standardized test (Khan 2021, Claxton and Lucas 2015). In reality, the gap between the skills that children learn and are tested for, and the skills that they need to excel in the 21st century, is becoming more obvious. The World Economic Forum noted in 2016 that the traditional learning captured by standardized tests falls short of equipping students with the knowledge they need to thrive. 24 Yong Zhao, the presidential chair and director of the Institute for Global and Online Education in the College of Education at the University of 23 These tables are based on student performance in standardized tests in mathematics, reading, and science, which are administered by the Paris-based Organisation for Economic Co-operation and Development (OECD). 24 https://www.weforum.org/agenda/2016/03/21st-century-skills-future-jobs-students/ https://www.weforum.org/agenda/2016/01/the-10-skills-you-need-to-thrive-in-the-fourth-industrial-revolution - “Whereas negotiation and flexibility are high on the list of skills for 2015, in 2020 they will begin to drop from the top 10 as machines, using masses of data, begin to make our decisions for us. A survey done by the World Economic Forum’s Global Agenda Council on the Future of Software and Society shows people expect artificial intelligence machines to be part of a company’s board of directors by 2026. Similarly, active listening, considered a core skill today, will disappear completely from the top 10. Emotional intelligence, which doesn’t feature in the top 10 today, will become one of the top skills needed by all.” 13 Oregon, points out that there is in fact an inverse relationship between those countries that excel on PISA tests and those that excel in aspects like entrepreneurism, for example (see Figure 1). 25 Figure 1: The inverse relationship between those countries who excel on PISA tests and those that excel in entrepreneurism While a focus on assessing learning is laudable – and a vast improvement over past practices (e.g., in the Millennium Development Goals) of merely measuring attendance (World Bank 2018) – for present purposes the issue is that the drivers of learning outcomes are far more complex than a quantifiable content deficit in a set of subjects. This is increasingly the case in the 21st century, which has brought with it a need for new skills and mindsets that go well beyond the foundational numeracy and literacy skills required during the Industrial Revolution (Robinson and Aronica 2015). A survey of chief human resources and strategy officers by the World Economic Forum finds a significant shift between 2015 and 2020 in the top skills future workers will need, with “habits of mind” like critical thinking, creativity, emotional intelligence and problem-solving ranking well ahead of any specific content acquisition. 26 None of this is to say that data does not have a role to play in measuring the success of an educational endeavor. Rather, the data task in this case needs to be informed by the complexity of the problem and the extent to which holistic learning resists easy quantification. 27 25 http://zhaolearning.com/2012/06/06/test-scores-vs-entrepreneurship-pisa-timss-and-confidence/ 26 World Economic Forum 2016 New Vision for Education: Fostering Social and Emotional Learning Through Technology. https://www3.weforum.org/docs/WEF_New_Vision_for_Education.pdf accessed February 2022. 27 Many companies and tertiary institutions are ahead of the curve in this regard. Recently, over 150 of the top private high schools in the U.S., including Phillips Exeter and Dalton – storied institutions which have long relied on the status conveyed by student ranking—have pledged to shift to new transcripts that provide more comprehensive, 14 Finally, relying exclusively on high level aggregate data can result in presuming uniformity in underlying problems, and thus lead to the promotion of simplistic and correspondingly generic solutions. McDonnell (2020) notes, for example, that because many developing countries have relatively high corruption scores, an unwelcome outcome has been that all the institutions in the country tend to be regarded by would-be-reformers as similarly corrupt and uniformly ineffectual. In her impressive research on “clusters of effectiveness”, however, she offers evidence of the variation in public sector performance within states, noting how the aggregated data on ‘corruption’ masks the fact that the difference in corruption scores between Ghana’s best- and worst-rated state agencies approximates the difference between Belgium (WGI = 1.50) and Mozambique (WGI = –.396), in effect “spanning the chasm of so-called developed and developing worlds.” The tendency of reform actors to be guided by simplistic aggregate indicators – such as those that are used to determine a poor country’s ‘fragility’ status and eligibility for IDA funding – has prevented a more detailed and context-specific understanding of lessons that could be drawn from positive outlier cases, 28 or what McDonnell refers to as “the thousand small revolutions quietly blooming in rugged and unruly meadows.” 4. Pressure to improve select indicators leads to falsification and unwarranted impact claims (thereby jeopardizing the perceived integrity of broader measurement efforts) As an extension of our previous point regarding how the easy-to-measure can become the yardstick for success, it is important to acknowledge that public officials are often going to be under extreme pressure to demonstrate success in these selected indicators. Once data itself, rather than the more complex underlying reality, becomes the primary objective by which entire governments publicly assess (and manage) their ‘progress’, it is essentially inevitable that vast political pressure will be placed on these numbers to bring them into alignment with expectations, imperatives, and interests. Similar logic can be expected at lower units of analysis (e.g., field offices), where it tends to be even more straightforward to manipulate data entry and analysis. This in turn contributes to a perverse incentive to falsify or skew data, to aggregate numbers across wildly different variables into single indices, and to draw unwarranted inferences from them. An example of when this risk is particularly acute is when annual “global rankings” are publicly released (assessing, for example, a country’s ‘investment climate’, ‘governance’, and gender equity), thereby shaping major investment decisions, credit ratings, eligibility for funding from international agencies, and the fate of senior officials charged with “improving” their country’s place in these global league tables. Readers will surely be aware of the case at the World Bank in September 2021, when an external review revealed that the ‘Doing Business’ indicators had been subjected to such pressure, with alterations being made to certain indicators from certain qualitative feedback on students while ruling out any mention of credit hours, GPAs, or A–F grades. And colleges – the final arbiters of high school performance – are signaling a surprising willingness to depart from traditional assessments that have been in place since the early 19th century. From Harvard and Dartmouth to small community colleges, more than 70 U.S. institutions of higher learning have weighed in, signing formal statements asserting that competency-based transcripts will not hurt students in the admissions process. 28 See Milante and Woolcock (2017) for a complementary set of dynamic quantitative and qualitative measures by which a given country might be declared a “fragile” state. 15 countries. 29 Such rankings are now omnipresent, and if they are not done by one organization then they will inevitably be done by another. Even so, as The Economist magazine concluded, some might regard the ‘Doing Business’ episode as “proof of ‘Goodhart’s law’, which states that when a measure becomes a target, it ceases to be a good measure.” At the same time, it pointed out that there is a delicate dance to be done here, since “the Doing Business rankings were always intended to motivate as well as measure, to change the world, not merely describe it” and “[i]f these rankings had never captured the imagination of world leaders, if they had remained an obscure technical exercise, they might have been better as measures of red tape. But they would have been worse at cutting it.” 30 Such are the wrenching trade-offs at stake in such exercises, and astute public administrators need to engage in them with their eyes wide open. Even (or especially) at lower units of analysis, where there are perhaps fewer prying eyes or quality-control checks, the potential is rife for undue influence to be exerted on data used for political and budgetary allocation purposes. Fully protecting the integrity of data collection, collation and curation (in all its forms) should be a first-order priority, but so too is the need for deploying what should be standard ‘risk diversification’ strategies on the part of managers, namely, not relying on single numbers or methods to assess inherently complex realities. IV. Principles for an expansive, qualified data suite that fosters problem-solving and organizational learning In response to the four risks identified above, we offer a corresponding set of cross-cutting principles for addressing them. Figure 2 summarizes the four risks in the left-hand column and presents the principles as vertical text on the right, illustrating the extent to which the principles, when applied in combination, can serve to produce a more balanced data suite that prioritizes problem-solving and learning. 29 https://thedocs.worldbank.org/en/doc/84a922cc9273b7b120d49ad3b9e9d3f9-0090012021/original/DB- Investigation-Findings-and-Report-to-the-Board-of-Executive-Directors-September-15-2021.pdf 30 https://www.economist.com/finance-and-economics/2021/09/17/how-world-bank-leaders-put-pressure-on-staff- to-alter-a-global-index 16 Figure 2: Four risks with corresponding principles for mitigating them, to ensure a balanced data suite. (Note that the principles are ‘cross-cutting’, in the sense that they apply in some measure to all the risks, not one-to-one.) 1. Identify and manage the capacity and power dynamics that are going to shape your task The data collection and curation process takes place not in isolation but in a densely populated political and institutional ecosystem. It is difficult, expensive, and fraught work; building a professional team capable of reliably and consistently doing this work – from field-level collection and curation at headquarters to technical analysis and policy interpretation – will be as challenging as it is in every other public sector organization. Breakdowns can happen at any point, potentially compromising the integrity of the entire endeavor. As such, it is important for managers not just to hire those with the requisite skills but to cultivate, recognize and reward a professional ethos wherein staff are able to do their work in good faith, shielded from political pressure. Such practices, in turn, need to be protected by having clear, open and safe procedures staff can use for reporting undue pressure being placed upon them, complemented by accountability to oversight or advisory boards comprising several external members selected for their technical expertise and professional integrity. In the absence of such mechanisms, noble aspirations for pursuing an ‘evidence-based policy’ agenda risk becoming perceived as a means of providing merely ‘policy-based evidence.’ The contexts within or from which data is collected are also likely to be infused with their own socio-political characteristics. Collecting data on the incidence of crime and violence, for example, requires police to faithfully record such matters and their response to them, but to do so in an environment where there may be strong pressures to under-report such matters, whether out 17 of personal safety concerns, lack of adequate administrative resources, or pressure to show that a given unit’s performance is improving (where this is measured by showing a ‘lower incidence’ of crime). In this respect, good diagnostic work will reveal the contours of the institutional and political ecosystem wherein the data work will be conducted, and the necessary authorization, financing, and protection sought; it will also help managers learn how to understand and successfully navigate that space. 31 32 The inherent challenges of engaging with such issues might be eased somewhat if those closest to them see data deployment not as an end in itself or an instrument of compliance but rather regard it as a means to higher ends, namely learning, practical problem solving, and enhancing the quality of policy options, choices and implementation capability. 33 A related point is that corresponding efforts need to be made to clearly and accurately communicate to the general public those findings that are derived from data managed by public administrators, especially when these findings are contentious or speak to inherently complex issues. This issue has been readily apparent during the Covid-19 pandemic, with high-stakes policy decisions (e.g., requiring vulnerable populations to forgo income) needing to be made on the basis of limited but evolving evidence. Countries such as Vietnam have been praised for the clear and consistent manner in which they issued Covid-19 response guidelines to citizens (Ravallion 2020), but the broader point is that even when the most supported decisions are based on the best evidence generated by the most committed work environments, it remains important for administrators to appreciate that one’s very acts of large-scale measurement and empirical interpretation, especially when enacted by large public organizations, will potentially be threatening to or misunderstood by the very populations they are seeking to assist. 2. Focus the collection of quantitative data on those aspects closest to the problem If we wish to guard against the tendency to falsely ascribe success based on the achievement of poorly selected indicators, then we should ensure that any indicators used to claim or deny reform success are as readily operational and close to the service delivery problem as possible. Output and process indicators are all useful in their own way, but we should not make the mistake of conflating their achievement with “problem fixed”. The tendency to claim reform success based on whether a new mechanism or oversight structure has been created, a new law passed, or percentage of participation achieved often comes with strong institutional incentives of course, but if meaningful change is sought, these need to be countered. All of these measures are changes in form that, while useful as indicators of outputs being met, can be achieved (and 31 For development-oriented organizations, a set of tools and guidelines for guiding this initial assessment – crafted by USAID and ODI (London) and adopted by certain parts of the World Bank – is ‘Thinking and Working Politically Through Applied Political Economy Analysis: A Guide for Practitioners’. https://usaidlearninglab.org/sites/default/files/resource/files/pea_guide_final.pdf 32 Hudson et al (2016) offer a guide for “everyday Political Analysis”, which introduces a stripped-back political analysis framework designed to help frontline practitioners make quick but politically-informed decisions. It aims to complement more in-depth political analysis by helping programming staff to develop the 'craft' of political thinking in a way that fits their everyday working practices. https://www.dlprog.org/publications/research-papers/everyday- political-analysis 33 On the application of such efforts to the case of policing in particular, see Sparrow (2018). 18 have been in the past) without any attendant functional shifts in the underlying quality of service delivery. Officials can guard against this tendency by taking some time to ensure that an intervention is focused on specific problems, including those that matter at a local level, and that the intervention’s success and attendant metrics are accurate measures of that problem being fixed. Tools like the PDIA Toolkit (see below) can help guide practitioners in this process. 34 Figure 3 illustrates the step-by-step “problem-driven iterative adaptation” (PDIA) approach, which is designed to help practitioners break down their problems into root causes, identify entry points, search for possible solutions, take action, reflect upon what they have learned, adapt and then act again. By embedding any intervention in such a framework, practitioners can ensure that success metrics are well-linked to functional, locally felt problems. Whatever tool is applied, the goal should be to arrive at metrics of success that represent a compelling picture of the root performance problem being addressed (and hopefully solved). So, for example, in our education example, metrics such as number of teachers hired, percentage of budget dedicated to education, and number of schools built are all output measures that say nothing about actual learning. Of course, there are assumptions that these outputs lead to children learning, but as many recent studies now show, such assumptions are routinely mistaken; these indicators can be achieved even as actual learning regresses (Pritchett 2013, World Bank 2018). By contrast, when a robust measure of learning – in this case literacy acquisition – was applied in India, it allowed implementers to gain valuable insights about which interventions actually made a difference, revealing that teaching to a child’s actual level of learning, not their age or grade, had led to marked and sustained improvements. Crucially, such outcomes are the result of carefully integrated qualitative and quantitative approaches to measurement (Banerjee et al 2016). 34 The PDIA toolkit: A DIY Approach to Solving Complex Problems, https://bsc.cid.harvard.edu/PDIAtoolkit (prepared by Salimah Samji, Matt Andrews, Lant Pritchett and Michael Woolcock) is designed by members of Harvard’s Building State Capability program to guide government teams through the process of identifying, deconstructing and solving complex problems. See in particular the section on “Constructing your problem”, which guides practitioners through the process of defining a problem that matters and building a credible, measurable vision of what success would look like. 19 Figure 3: The Problem Driven Iterative Adaptation (PDIA) process Going further, various cross-national assessments around the world are trying to tackle the complex challenge of finding indicators that measure learning not just in the acquisition of numeracy, science and literacy skills but in competencies that are proving to be increasingly valuable in the 21st century: grit, curiosity, communication, leadership and compassion. PISA for example, has included an “innovative domain” in each of its recent rounds, including creative problem solving in 2012, collaborative problem solving in 2015, and global competence in 2018. In Latin America, the Latin American Laboratory for Assessment of the Quality of Education (LLECE) is including a module on socio-emotional skills for the first time in its assessment of sixth grade students in 2019, focusing on the concepts of conscience, valuing of others, self- regulation and self-management. 35 Much tinkering remains to be done but the increase in assessments that include skills and competencies such as citizenship (local and global), social- emotional skills, ICT literacy and problem solving is a clear indication of willingness to have functional measures of success, capturing outcomes that matter. In summary then, those public administrators who wish to guard against unwarranted impact claims and ensure metrics of success are credible can begin by making sure that the intervention itself is focused on a specific performance problem that is locally prioritized and thereafter ensure that any judgement on that intervention’s success or failure is based not on output or process metrics but on measures of the problem being fixed. And having ensured that measures of success are functional, it is important that practitioners allow flexibility of implementation where possible, so that strategies can shift if it becomes clear from the collected data that they are not making progress on fixing the problem, possibly due to mistaken assumptions regarding their theory of change. 35 https://www.globalpartnership.org/sites/default/files/document/file/2020-01-GPE-21-century-skills-report.pdf 20 3. Embrace an active role for qualitative data The issues we have raised thus far, we argue, imply that public administrators should adopt a far more expansive concept of what constitutes “good data”, namely one that includes insights from theory and qualitative research. Apprehending complex problems requires different forms and sources of data; correctly interpreting empirical findings requires active dialogue with reasoned expectations about what outcomes should be attained by when. Doing so helps avoid creating distortions that can generate (potentially wildly) misleading claims regarding “what’s going on, and why” and “what should be done”. Specifically, we advocate for the adoption of a complementary suite of data forms and sources that favors flexibility, is focused on problem-solving (as opposed to being an end in itself), and values insights derived from seasoned experience. In the examples we have explored above, it was reliance on a single form of data (sometimes even a single number) that rendered it vulnerable to political manipulation, to unwarranted conclusions, and to being unable to bear the decision-making burdens thrust upon it. More constructively, it was the incorporation of alternative methods and data in dialogue with a reasoned theory of change that enabled decision- makers to be capable of anticipating and addressing many of these same concerns. To this end, we have sought to get beyond the familiar presumption that the primary role of qualitative data and methods in public administration research (and elsewhere) is to provide distinctive insights into the idiosyncrasies of an organization’s “context” and “culture” (and thus infuse some “color” and “anecdotes” for accompanying boxes). 36 Qualitative approaches can potentially yield unique and useful material that contributes to claims about whether policy goals are being met and delivery processes duly upheld (Cartwright 2017); these can be especially helpful when the realization of policy goals requires integrating both adaptive and technical approaches to implementation – e.g., responding to Covid-19. But perhaps the more salient contributions of qualitative approaches, we suggest, are to (a) help explore how, for whom, and from whom data of all kinds are being deployed as part of broader imperatives to meet political requirements and administrative logics in a professional manner; and (b) to elicit either novel or heretofore ‘unobserved’ variables shaping policy outcomes. 36 As anthropologist Mike McGovern (2011: 353) powerfully argues, taking context seriously “is neither a luxury nor the result of a kind of methodological altruism to be extended by the soft-hearted. It is, in purely positivist terms, the epistemological due diligence work required before one can talk meaningfully about other people’s intentions, motivations, or desires. The risk in foregoing it is not simply that one might miss some of the local color of individual ‘cases’. It is one of misrecognition. Analysis based on such misrecognition may mistake symptoms for causes, or two formally similar situations as being comparable despite their different etiologies. To extend the medical metaphor one step further, misdiagnosis is unfortunate, but a flawed prescription based on such a misrecognition can be deadly.” More generally, see Hoag and Hull (2017) for a summary of the anthropological literature on the civil service. Bailey (2017) provides a compelling example of how insights from qualitative fieldwork help explain the strong preference among civil servants in Tanzania for providing new water infrastructure projects over maintaining existing ones. Though a basic benefit/cost analysis favored prioritizing maintenance, collective action problems among civil servants themselves, the prosaic challenges of mediating local water management disputes overseen by customary institutions, and the performance targets set by the government all conspired to create suboptimal outcomes. 21 4. Leave room for judgment Our caution is against using data reductively: as a replacement or substitute for managing. Management must be about more than measuring. A good manager needs to be able to accommodate the immeasurable, since so much that is important to human thriving is in this category; dashboards et al certainly have their place, but if these were all that was needed then ‘managing’ could be conducted by machines. We all know from personal experience that the best managers and leaders are those that take a holistic interest in their staff, making the time and effort to understand the subtle, often intangible processes that connect their respective talents. As organizational management theorist Henry Mintzberg (2015) wisely puts it, Measuring as a complement to managing is a fine idea: measure what you can; take seriously what you can’t; and manage both thoughtfully. In other words: If you can’t measure it, you’ll have to manage it. If you can measure it, you’ll especially have to manage it. Have we not had enough of leadership by remote control: sitting in executive offices and running the numbers—all that deeming and downsizing? 37 Contrary to the “what can’t be measured can’t be managed” idea, we can manage the less measurable if we embrace a wider set of tools and leave space for judgment. The key for practitioners is to begin with a recognition that measurability is not an indicator of significance and that professional management involves far more than simply “running the numbers”, as Mintzberg puts it. Perhaps the most compelling empirical case for the importance of ‘navigating by judgment’ in public administration has been made by Honig (2018), in which he shows – using a mix of quantitative data and case study analysis – that the more complex the policy intervention, the more necessary it becomes to grant discretionary space to front-line managers, and the more necessary such discretion is to achieving project success. Having ready access to relevant, high-quality quantitative data can aid in this ‘navigation’ but true navigation requires access to a broader suite of empirical inputs. In a similar vein, Ladner (2015: 3) points out that “standard performance monitoring tools are not suitable for highly flexible, entrepreneurial programs as they assume that how a program will be implemented follows its original design”. In order to avoid ‘locking in’ a theory of change that prevents exploration or responsive adaptation, some practitioners have provided helpful suggestions for how to use various planning frameworks in ways that support program learning. 38 39 The Building State Capability team highlights lighter touch methods, such as their PDIA “check ins”, which include a series of probing questions to assist teams in capturing 37 https://mintzberg.org/blog/measure-it-manage-it. Says Mintzberg: “Someone I know once asked a most senior British civil servant why his department had to do so much measuring. His reply: ‘What else can we do when we don’t know what’s going on?’ Did he ever try getting on the ground to find out what’s going on? And then using judgment to assess that?” 38 Teskey (2017) and Wild et al (2017) give examples of an adaptive logframe, drawn from DfID experiences, that sets out a set of clear objectives at the outcome level, and focuses monitoring of outputs on the quality of the agreed rapid-cycle learning process. 39 Strategy Testing (ST) is a monitoring system that The Asia Foundation developed specifically to track programs that are addressing complex development problems through a highly iterative, adaptive approach. 22 learning and maximizing adaptation. Teskey and Tyrrel (2017) recommend the utility of participating in regularized formal and informal Review and Reflection (R&R) points, during which a contractor can demonstrate how politics, interests, incentives and institutions were systematically considered in problem selection and design, and in turn justify why certain choices were made to stop, drop, halt or expand any activity or budget during implementation. The common connection across all these tools is that they are seeking to carve out meaningful space for qualitative data and the hard-won insights borne out of practical experience. In summary then, public administrators can embed the recognition that management must be about more than measuring by firstly, recognizing that that whatever they choose to measure will inevitably create incentives to neglect processes and outcomes that cannot be measured (or are hard to measure) but are nonetheless crucial for discerning whether, how, where, and for whom policies are working. Following that recognition, they need to be very careful what they choose to measure. Secondly, they can actively identify what they cannot (readily) measure that matters, and take it seriously, developing strategies to manage that as well. A key part of those strategies will be that they create space for judgment, allowing meaningful space for qualitative data inputs and the practical experience of embedded individuals (focus group discussions, case studies, semi-structured interviews, review and reflection points etc.) and will treat these inputs as equally valid alongside more quantitative ones. In terms of more long-term strategies to manage the immeasurable, administrations can work towards developing organizational systems that foster navigation. Such systems might include, for example, (i) a management structure that delegates high levels of discretion so as to allow those on the ground the ability to navigate complex situations; (ii) recruiting strategies that foster high numbers of staff with extensive context-specific knowledge; and (iii) systems of monitoring and learning that encourage the routine evaluation of theory with practice. V. Conclusion Quantitative measurement in public administration is undoubtedly a critical arrow in the quiver of any attempt to improve the delivery of public services. And yet, since not everything that matters can be measured and not everything that can be measured matters, a managerial emphasis on measurement alone can quickly and inadvertently generate unwanted outcomes and/or unwarranted conclusions. In the everyday practices of public administration, effective and professional action requires forging greater complementarity between different epistemological approaches to collecting, curating, analyzing, and interpreting data. We fully recognize that this is easier said than done. The risks of reductive approaches to measurement are not unknown, and yet simplified appeals to “what gets measured gets managed” persist because they offer managers a form of escape from those “pesky human elements” that are difficult to understand, even more so to shift. Most public administrators might agree in principle to the need for a more balanced data suite to navigate their professional terrain, yet such aspirations are too often honored in the breach: under sufficient pressure to ‘deliver results’, staff from the top to the bottom of an organization can be readily tempted to reverse engineer their behavior in accordance with what ‘the data’ says (or 23 can be made to say). Management as measurement is tempting for any individual or organization that fears the vulnerability of their domain to unfavorable comparisons with other (more readily measurable and ‘legible’) domains, the complexity of problem-solving, and the necessity of subjective navigation that it often entails. But given how heavily institutional and socio-political factors shape how data is collected, how well it is collected and curated, and how it can be manipulated for unwarranted purposes, a simplistic approach to data as an easy fix is virtually guaranteed to obscure learning and hamper change efforts. If administrations genuinely wish to build their problem-solving capabilities, then access to more and better quantitative data will be necessary, but it will not be sufficient. Beginning with an appreciation that much of what matters cannot be (formally) measured, public administrations must routinely remind themselves that promoting and accessing data is not an end in itself: data’s primary purpose is not just monitoring processes, compliance, and outcomes, but contributing to problem-solving and organizational learning. More and better data will not fix a problem if the absence of such data is not itself the key problem or the ‘binding constraint’. Administrations that are committed to problem-solving therefore will need to embed their measurement task in a broader problem-driven framework and seek to integrate complementary qualitative data and to value embedded experience so that they might apprehend and interpret complex realities more accurately. Their first priority in undertaking good diagnostic work should be to identify and deconstruct key problems, using varied sources of data, and to then track and learn from potential solutions authorized and enacted in response to the diagnosis. Accurate inferences for policy and practice are not derived from data alone; close interaction is required between data (in various forms), theory and experience. In doing all this, public administrators will help mitigate the distortionary (and ultimately self-defeating) effects of managing only that which is measured. Box 1: Summary of lessons for promoting problem-solving and learning in public administration Principles Actions for practitioners 1. Identify and • Professional principles and standards for collecting, curating, manage the analyzing and interpreting data must be made clear to all staff – capacity and from external consultants to senior managers – in order to affirm power relations and enforce commitments to ensuring the integrity of the data itself that shape your and the conclusions drawn from it. measurement • Make measurement accountable to advisory boards with relevant task. external members. • Communicate measurement results to the public in a clear and compelling way, especially on contentious, complex issues. 2. Focus • Make sure that the measurement approach itself is anchored to a quantification on specific performance problem. those aspects 24 which are close • Measurement investments should be targeted at those performance to the problem. problems that are prioritized by the administration. • Thereafter ensure that any judgements on an intervention’s success or failure are based on credible measures of the problem being fixed and not simply on output or process metrics. • Where measures of success relate to whether the intervention is functioning, practitioners should allow flexibility in the implementation of the intervention (where possible) and in the related measurement of its functioning. In this way, implementation strategies can shift if it becomes clear from the collected data that they are not making progress on fixing the problem. 3. Embrace a role • Include qualitative data collection as a complement to quantitative for qualitative data. data and a • This may be as a prelude to future large-scale quantitative theory of instruments, or as perhaps the only available data option for some change. aspects of public administration in some settings (such as those experiencing sustained violence or natural disasters). • Draw on qualitative methods as a basis for eliciting novel or ‘unobserved’ factors driving variation in outcomes. • Tie measurement (both qualitative and quantitative) back to a theory of change. If the implementation of an intervention is not having its intended impacts on the problem, assess whether there are mistaken assumptions regarding the theory of change. 4. Leave room for • Consider carefully what you choose to measure, recognizing that judgment, whatever you choose will inevitably create incentives to neglect discretion and processes and outcomes that cannot be measured. deliberation; • Actively identify what you cannot (readily) measure that matters, because not and take it seriously, developing strategies to manage that as well. everything that • This will include identifying those aspects of implementation in the matters can be public sector that require inherently discretionary decisions. measured. • Employ strategies that value reasoned judgement, allowing meaningful space for qualitative data inputs and the practical experience of embedded individuals, treating such inputs as having value alongside more quantitative ones. • In the longer term, develop organizational systems that foster “navigation by judgment”. For example: (i) a management structure that delegates high levels of discretion so as to allow those on the ground the space to navigate complex situations; (ii) recruitment strategies that foster high numbers of staff with extensive context- specific knowledge; and (iii) systems of monitoring and learning that encourage the routine evaluation of theory against practice. 25 References Anderson, Benedict (1983) Imagined Communities: Reflections on the Origins and Spread of Nationalism London: Verso Andrews, Matt (2013) The Limits of Institutional Reform in Development: Changing Rules for Realistic Solutions New York: Cambridge University Press Andrews, Matt, Lant Pritchett and Michael Woolcock (2017) Building State Capability: Evidence, Analysis, Action New York: Oxford University Press Bailey, Julia (2017) ‘Bureaucratic blockages: Water, civil servants and community in Tanzania’ Policy Research Working Paper No. 8101 Washington, DC: World Bank Ban, Radu and Vijayendra Rao (2008) ‘Tokenism or agency? The impact of women’s reservations on village democracies in South India’ Economic Development and Cultural Change 56(3): 501-530 Banerjee, Abhijit, Rukmini Banerji, James Berry, Esther Duflo, Harini Kannan, Shobhini Mukherji, Marc Shotland, and Michael Walton (2016) ‘Mainstreaming an effective intervention: Evidence from randomized evaluations of “Teaching at the Right Level” in India’ Working Paper No. w22746. Cambridge, MA: National Bureau of Economic Research Beraldo, Davide and Stefania Milan (2019) ‘From data politics to the contentious politics of data’ Big Data & Society 6(2): 2053951719885967. Bollyky, T.J., Hulland, E.N., Barber, R.M., Collins, J.K., Kiernan, S., Moses, M., Pigott, D.M., Reiner Jr, R.C., Sorensen, R.J., Abbafati, C. and Adolph, C. (2022) ‘Pandemic preparedness and Covid-19: An exploratory analysis of infection and fatality rates, and contextual factors associated with preparedness in 177 countries, from Jan 1, 2020, to Sept 30, 2021’ The Lancet, February 1 Breckenridge, Keith (2014) Biometric State: The Global Politics of Identification and Surveillance in South Africa, 1850 to the Present Cambridge, UK: Cambridge University Press Bridges, Kate and Michael Woolcock (2017) ‘How (not) to fix problems that matter: assessing and responding to Malawi's history of institutional reform’ Washington, DC: World Bank Policy Research Working Paper No. 8289 Bridges, Kate and Michael Woolcock (2019) ‘Implementing adaptive approaches in real world scenarios: A Nigeria case study, with lessons for theory and practice’ Washington, DC: World Bank Policy Research Working Paper No. 8904 Cahill, Jonathan (2017) Making a Difference in Marketing: The Foundation of Competitive Advantage London: Routledge Carletto, Calogero, Dean Jolliffe, and Raka Banerjee (2015) ‘From tragedy to renaissance: improving agricultural data for better policies' The Journal of Development Studies 51(2): 133-148 Cartwright, Nancy (2017) ‘Single case causes: What is evidence and why’, in Hsiang-Ke Chao and Julian Reiss (eds.) Philosophy of Science in Practice, pp. 11-24. New York: Springer 26 Cartwright, Nancy, and Jeremy Hardie (2012) Evidence-Based Policy: A Practical Guide to Doing it Better New York: Oxford University Press Claxton, Guy and Bill Lucas (2015) Educating Ruby: What our Children Really Need to Learn New York: Crown House Publishing Dirks, Nicholas (2011) Castes of Mind Princeton, NJ: Princeton University Press Dom, Catherine, Alasdair Fraser, John Patch, Joseph Holden (2020) ‘Results-Based Financing in the Education Sector: Country-Level Analysis. Final Synthesis Report’ Submitted to the REACH Program at the World Bank by Mokoro Ltd. Duflo, Esther (2012) ‘Women empowerment and economic development’ Journal of Economic Literature 50(4): 1051-79 Gill, Stephen (1970) Introduction, in Elizabeth Gaskell, Mary Barton: A Tale of Manchester Life London: Penguin Hoag, Colin and Matthew Hull (2017) ‘A review of the anthropological literature on the civil service’ Policy Research Working Paper No. 8081 Washington, DC: World Bank Honig, Dan (2018) Navigation by Judgment: Why and When Top-Down Management of Foreign Aid Doesn't Work New York: Oxford University Press Hostetler, Laura (2021) ‘Mapping, registering, and ordering: Time, space and knowledge’, in Peter Fibiger Bang, C. A. Bayly and Walter Scheidel (eds.) The Oxford World History of Empire: Volume One: The Imperial Experience New York: Oxford University Press, pp. 288-317 Jasanoff, Sheila (2004) ‘Ordering knowledge, ordering society’, in Sheila Jasanoff (ed.) States of Knowledge: The Co-Production of Science and the Social Order London: Routledge, pp. 13- 45 Jerven, Morten (2013) Poor Numbers: How We Are Misled by African Development Statistics and What to Do About It Ithaca, NY: Cornell University Press Johnson, Janet Buttolph, Henry T. Reynolds, and Jason D. Mycoff (2019) Political Science Research Methods (9th edition) Thousand Oaks, CA: Sage Khan, Salman (2012) The One World Schoolhouse: Education Reimagined London: Hodder & Stoughton King, Gary, and Jonathan Wand (2007) ‘Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes’ Political Analysis 15 (1): 46-66 Honig, Dan (2018) Navigation by Judgment: How and When Top-Down Management of Foreign Aid Doesn’t Work New York: Oxford University Press Ladner, Debra (2015) ‘Strategy testing: An innovative approach to monitoring highly flexible aid programs’ Working Politically in Practice Case Study 3. San Francisco: The Asia Foundation Lewis, Jenny M. (2015) ‘The politics and consequences of performance measurement’ Policy and Society 34(1): 1-12 McDonnell, Erin (2020) Patchwork Leviathan: Pockets of Bureaucratic Effectiveness in Developing States Princeton, NJ: Princeton University Press 27 McGovern, Mike (2011) ‘Popular development economics: An anthropologist among the mandarins’ Perspectives on Politics 9(2): 345-355 Merry, Sally Engle, Kevin E. Davis, and Benedict Kingsbury (eds.) (2015) The Quiet Power of Indicators: Measuring Governance, Corruption, and Rule of Law New York: Cambridge University Press Milante, Gary and Michael Woolcock (2017) ‘New approaches to identifying state fragility’ (with Gary Milante) Journal of Globalization and Development 8(1) Mintzberg, Henry (2015) https://mintzberg.org/blog/measure-it-manage-it Natsios, Andrew (2011) ‘The clash of the counter-bureaucracy and development’ Essay, Center for Global Development. Available at: https://www.cgdev.org/sites/default/files/1424271_file_Natsios_Counterbureaucracy.pdf Porter, Theodore M. (1995) Trust in Numbers: The Pursuit of Objectivity in Science and Public Life Princeton, NJ: Princeton University Press Preskill, Hallie, Srikanth Gopal, Katelyn Mack and Joelle Cook (2014) ‘Evaluating complexity: Propositions for improving practice’ Boston: FSG. http://www.fsg.org/publications/evaluating-complexity Pritchett, Lant (2013) The Rebirth of Education: Schooling Ain’t Learning Washington, DC: Center for Global Development Pritchett, Lant (2014) ‘The risks to education systems from design mismatch and global isomorphism: Concepts, with examples from India’ WIDER Working Paper No. 2014/039 Helsinki: UNU-WIDER Propper, Carol, Simon Burgess and Denise Gossage (2008) ‘Competition and quality: evidence from the NHS internal market 1991-9’ The Economic Journal 118(525): 138-170 Rao, Vijayendra (2022) ‘Can economics become more reflexive? Exploring the potential of mixed-methods’ Policy Research Working Paper No. 9918, Washington, DC: World Bank Rao, Vijayendra, Kripa Ananthpur and Kabir Malik (2017) ‘The anatomy of failure: An ethnography of a randomized trial to deepen democracy in rural India’ World Development 99(11): 481-497 Ravallion, Martin (2001) ‘Growth, inequality and poverty: looking beyond averages’ World Development 29(11): 1803-1815 Ravallion, Martin (2020) ‘Pandemic policies in poor places’ CGD Note (April 24), Washington, DC: Center for Global Development Ridgway, V. F. (1956) ‘Dysfunctional consequences of performance measurements’ Administrative Science Quarterly 1(2): 240-47 Robinson, Ken and Lou Aronica (2015) Creative Schools: Revolutionizing Education from the Ground Up London: Penguin UK Rogers, Patricia and Michael Woolcock (forthcoming) ‘Process and implementation evaluation methods’, in Anu Rangarajan and Diane Paulsell (eds.) Oxford Handbook of Social Program Design and Implementation Evaluation New York: Oxford University Press 28 Sandefur, Justin, and Amanda Glassman (2015) ‘The political economy of bad data: Evidence from African survey and administrative statistics’ The Journal of Development Studies 51(2): 116-132 Sanyal, Paromita, and Vijayendra Rao (2018) Oral Democracy: Deliberation in Indian Village Assemblies New York: Cambridge University Press Scott, James C. (1998) Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed New Haven: Yale University Press Sparrow, Malcolm (2018) ‘Problem-oriented policing: matching the science to the art’ Crime Science 7(1): 1-10 Teskey, Graham (2017) ‘Thinking and working politically: are we seeing the emergence of a second orthodoxy?’ ABT Associates, Governance Working Paper Series Teskey, Graham and Lavinia Tyrrel (2017) ‘Thinking and working politically in large, multi- sector facilities: Lessons to date’ Canberra: ABT Associates, Governance Working Paper Series, Issue 2 Weber, Eugen (1976) Peasants into Frenchmen: The Modernization of Rural France, 1870-1914 Palo Alto, CA: Stanford University Press Wild, Leni, David Booth and Craig Valters (2017) ‘Putting theory into practice: How DFID is doing development differently’ London: ODI Wilson, James Q. (1989) Bureaucracy: What Government Agencies Do and Why They Do It New York: Basic Books World Bank (2018) World Development Report 2018: Learning to Realize Education’s Promise Washington, DC: World Bank World Bank (2021) World Development Report 2021: Data for Better Lives (Chapter 6) Washington, DC: World Bank Woolcock, Michael (2018) ‘Reasons for using mixed methods in the evaluation of complex projects’, in Michiru Nagatsu and Attilia Ruzzene (eds.) Contemporary Philosophy and Social Science: An Interdisciplinary Dialogue London: Bloomsbury Academic, pp. 149-171 29