ACCOUNTABILIY: THE LAST MILE ON THE ROUTE TO QUALITY SERVICE DELIVERY EVIDENCE FROM JORDANIAN SCHOOLS i &PRIMARY HEALTH CENTERS TABLE OF CONTENT ACKNOWLEDGMENTS ....................................................................................................................................................v ACRONYMS AND ABBREVIATIONS ............................................................................................................................. vii I. ACCOUNTABILITY AND QUALITY OF SERVICE DELIVERY ...................................................................................1 1.1 INTRODUCTION ......................................................................................................................................................1 1.2 ACCOUNTABILITY AND PROVIDER EFFORT .......................................................................................................2 1.2.1 Top-down Accountability ............................................................................................................................3 1.2.2 Bottom-up Accountability ...........................................................................................................................3 1.2.3 Within-facility Accountability ......................................................................................................................4 1.3 MOTIVATION OF THE PRESENT REPORT ..............................................................................................................6 1.4 REPORT ROADMAP ...............................................................................................................................................7 II. EDUCATION QUALITY, TEACHER EFFORT, AND ACCOUNTABILITY .................................................................12 2.1 INTRODUCTION ....................................................................................................................................................12 2.1.1 Teacher Effort ...............................................................................................................................................13 2.1.2 Holding Teachers Accountable to Increase Teacher Effort ...............................................................14 2.1.3 Roadmap to the Chapter .........................................................................................................................16 2.2 THE EDUCATION SECTOR IN JORDAN ..............................................................................................................16 2.3 PRINCIPAL MONITORING AND TEACHER EFFORT ..........................................................................................20 2.3.1 Data ...............................................................................................................................................................20 2.3.2 Empirical Strategy .......................................................................................................................................21 2.3.3 Results ............................................................................................................................................................30 2.4 MONITORING, TEACHER EFFORT, AND STUDENT LEARNING IN JORDAN ..................................................31 2.4.1 Empirical Strategy .......................................................................................................................................31 2.4.2 Results ............................................................................................................................................................37 2.5 COMPARATIVE CASE STUDY IN JORDANIAN SCHOOLS...............................................................................38 2.5.1 Methodology ...............................................................................................................................................38 2.5.2 Results ............................................................................................................................................................39 2.6 CONCLUSIONS ....................................................................................................................................................41 III. HEALTHCARE QUALITY, PROVIDER EFFORT, AND ACCOUNTABILITY ..............................................................45 3.1 INTRODUCTION ....................................................................................................................................................45 3.1.1 Healthcare Provider Effort .........................................................................................................................45 3.1.2 Holding Healthcare Providers Accountable to Increase Provider Effort ..........................................47 3.1.3 Roadmap to the Chapter .........................................................................................................................50 3.2 THE HEALTH SECTOR IN JORDAN ......................................................................................................................51 3.3 CMO MONITORING AND PROVIDER EFFORT .................................................................................................55 3.3.1 Study Sample ...............................................................................................................................................55 ii 3.3.2 Respondent Selection ................................................................................................................................56 3.3.3 Instruments and Measures ........................................................................................................................56 3.3.4 Administration of the Instruments .............................................................................................................61 3.3.5 Statistical Analyses ......................................................................................................................................61 3.3.6 Results ............................................................................................................................................................62 3.3.7 Study Limitations ..........................................................................................................................................67 3.4 CONCLUSIONS ....................................................................................................................................................67 IV. CONCLUSIONS AND POLICY RECOMMENDATIONS ....................................................................................71 APPENDIX A: EDUCATION SECTOR .............................................................................................................................80 APPENDIX B: SENSITIVITY ANALYSIS .............................................................................................................................86 APPENDIX C: HEALTH SECTOR .....................................................................................................................................87 BOXES Box 1: Sampling .............................................................................................................................................................20 Box 2: Limitations of the Principal Monitoring Index ...............................................................................................22 Box 3: Caveat For Teacher Effort Measures .............................................................................................................25 Box 4: Bivariate Correlations Among Measures of Teacher Effort .......................................................................27 Box 5: Multilevel Mediation Analysis ..........................................................................................................................31 Box 6: Robustness Check and Sensitivity Analysis ...................................................................................................36 FIGURES Figure 1: Public Education Expenditure as a Share of Total Government Expenditure and Average PISA Math Scores ...................................................................................................................................................................17 Figure 2: Public Education Expenditure as a Share of GDP and Average PISA Math Scores ........................17 Figure 3: Principal Monitoring Measures ...................................................................................................................19 Figure 4: Principal Monitoring Index ..........................................................................................................................22 Figure 5: Measures of Teacher Effort Mapped Against the FFT ............................................................................23 Figure 6: Creating an Environment of Respect and Rapport...............................................................................25 Figure 7: Providing Feedback to Students ...............................................................................................................25 Figure 8: Designing Student Assessments .................................................................................................................26 Figure 9: Designing Coherent Instruction .................................................................................................................26 Figure 10: Causal Pathways of Principal Monitoring on Student Learning ........................................................31 Figure 11: Letter Sound Knowledge ...........................................................................................................................32 Figure 12: Reading Comprehension ..........................................................................................................................32 Figure 13: Number Identification ................................................................................................................................33 Figure 14: Word Problems ............................................................................................................................................33 Figure 15: Life Expectancy: Jordan, MENA Average, and Selected Other Countries, 1980-2011 ................49 Figure 16: Infant Mortality Versus Income and Total Health Spending, 2011 ....................................................50 Figure 17: Maternal Mortality Relative to Income and Spending, 2010 .............................................................50 Figure 18: Total Health Expenditure as a Share of GDP and Income Per Capita, 2011 ..................................52 Figure 19: Relationship Between Monitoring and Rights-Based Care by Sanction Level ...............................65 iii TABLES Table 1: Study Instruments ...........................................................................................................................................21 Table 2: Measures of Teacher Effort ..........................................................................................................................24 Table 3: Control Variables ...........................................................................................................................................28 Table 4: Substantive Effects – Principal Monitoring and Teacher Effort..............................................................29 Table 5: Measures of Student Outcomes .................................................................................................................34 Table 6: Control Variables Included in the Mediation Analysis ............................................................................35 Table 7: Number of Primary Health Facilities Sampled by Governorate ............................................................54 Table 8: Contents of Data Collection Instruments ..................................................................................................55 Table 9: Measures of Provider Effort ..........................................................................................................................56 Table 10: Measures of Within-facility Accountability..............................................................................................57 Table 11: Potential Confounding Factors .................................................................................................................58 Table 12: Percentage of Healthcare Providers Following CPGs (N=2,101) .......................................................60 Table 13: Percentage of Providers Practicing Rights-Based Care (N=2,101) ....................................................61 Table 14: Correlations Between Indicators of Provider Effort (N=122) ................................................................61 Table 15: Correlations Between Within-Facility and Top-Down and Bottom-Up Measures of Accountability (N=122). ...............................................................................................................................................63 Table 16: Relationship Between Accountability Practices and Provider Effort (N=122) .................................63 iv ACKNOWLEDGMENTS This report is the product of the collaborative effort by a core team led by Tamer Rabie and comprised of Samira Nikaein Towfighian, Cari Clark and Melani Cammett. We sincerely appreciate the strategic guidance and support of Ferid Belhaj, Ernest Massiah, Safaa El Tayeb El-Kogali, Hana Brixi, Pilar Maisterra, Tania Meyer, and Haneen Ismail Sayed. This work would have not been possible without the distinguished collaboration and warm hospitality of government officials at the Ministry of Planning and International Cooperation, Ministry of Education, Ministry of Health, and High Health Council of the Hashemite Kingdom of Jordan. We are especially grateful to Rajaa Khater, Firyal Aqel, Feda Jaradat, and Ikram Khasawneh for their most valuable insights and comments. We are also thankful to all members of the Technical Advisory Committee, who provided excellent guidance and support throughout this study. We are greatly indebted to Alan Potter and Brett Casper for their most capable research assistantship, and to Son Nam Nguyen and Dina Abu- Ghaida for their very helpful comments in their capacity as peer- reviewers. We are especially thankful to Ellen Lust for her significant intellectual contribution at the early stages of this study. We are also thankful to Samira Halabi for her contributions. This report also benefited from the administrative support of Fatima- Ezzahra Mansouri and Mariam Wakim, the editorial work of Amy Gautam, and data collection efforts of the Dajani Consulting team. We are very grateful to the United States Agency for International Development (USAID) and to RTI International for the design and implementation of the Early Grade Reading Assessment (EGRA), Early Grade Math Assessment (EGMA) and the Snapshot of School Management Effectiveness (SSME) tools, and for sharing their datasets. Finally, we are thankful to the MENA Multi-Donor Trust Fund for funding this work and for their support. v ACRONYMS AND ABBREVIATIONS CHC Community Health Committee CHCC Comprehensive health centers CLASS Classroom Assessment Scoring System CMO Chief Medical Officer CPG Clinical practice guideline EGMA Early Grade Math Assessment EGRA Early Grade Reading Assessment EMIS Education Management Information System FFT Framework for Teaching GDP Gross domestic product HCAC Healthcare Accreditation Council JD Jordanian dinar MENA Middle East and North Africa MOH Ministry of Health NCD Noncommunicable disease OECD Organisation for Economic Co-operation and Development PHCC Primary healthcare center PISA Program for International Student Assessment PMS Performance management system RMS Royal Medical Service SD Standard deviation SI Sequential ignorability TIMSS Trends in International Mathematics and Science Study USAID United States Agency for International Development vi ACCOUNTABILIY, THE LAST MILE ON THE ROUTE TO QUALITY SERVICE DELIVERY EXECUTIVE SUMMARY International evidence calls for greater attention to provider effort to improve quality of education and healthcare service delivery. In many developing countries, governments have invested substantial resources in the provision of basic services such as healthcare and education. These investments frequently yield minimal improvements in student learning and health outcomes, however. One reason can be found in a growing body of research that suggests investment in the structural dimensions of service quality beyond a certain threshold is unlikely to improve service delivery outcomes. Indeed the quantity and quality of structural determinants of education and healthcare services such as infrastructure, classroom and medical supplies, and even teacher and medical training are largely irrelevant if teachers and healthcare providers do not exert the requisite effort to translate these inputs into effective teaching and medical service. Essentially, providers must exert adequate levels of effort by coming to work regularly and complying with technical and professional standards to provide high-quality education and healthcare services. Promoting adequate provider effort necessitates accountability, including effective within-facility accountability - the focus of this report. To exert adquate effort, providers must feel they are accountable for the quality of service they provide. Yet a sense of accountability among providers does not necessarily occur naturally, often requiring mechanisms to monitor and incentivize provider effort. These mechanisms can come from the top-down, bottom-up, or within-facility. As the name implies, top-down accountability aims at promoting provider effort through government oversight. Bottom-up accountability gives citizens the means to directly hold providers accountable. Both of these approaches play an important role in improving provider accountability. Within the accountability framework, the role of supervisors in the facilities where service provision occurs has thus far been underemphasized. By capitalizing on the technical knowledge of supervisors within health centers and schools and on their proximity to the actual service delivery exchange, within-facility accountability may be able to overcome some of the limitations of top-down and bottom-up mechanisms, substantially contributing to improved provider accountability. This report contributes to addressing this underemphasis, specifically focusing on the linkages between within-facility accountability and provider effort in the health and education sectors in Jordan. In the case of healthcare, a study was developed to generate novel insights from an original survey instrument. Notably, this is the first nationally representative study in Jordan to measure within-facility accountability and provider effort in primary health care facilities, and the first study in the Middle East and North Africa (MENA) region to investigate these linkages. The study relies on a nationally representative sample of 122 primary healthcare facilities where data are collected through patient exit interviews, and surveys administered to chief medical officers (CMO), doctors, and nurses who work at vii the centers, and where available, a representative of the community health committee. In the case of education, an empirical analysis was conducted, relying on existing data collected through principal, teacher, and student surveys, third-party classroom observations and school inventories, and math and reading student assessments from a nationally representative sample of 156 schools. The latter was complemented by a comparative case study of six Jordanian schools using statistical matching and a process-tracing procedure. Jordan provides an excellent case to study the role of accountability in improving the quality of education and healthcare service delivery. In the last two decades, Jordan has achieved close to universal primary school enrollment (97 percent) and completion (93 percent), as well as high enrollment (88 percent) and completion (90 percent) rates at the secondary level, on par with OECD countries. Yet international student assessments keep refocusing the country’s attention on what actually matters: student learning. In spite of high levels of educational attainment, 15-year-old Jordanians’ average mathematics, language, and science PISA (Program for International Student Assessment) scores rank among the lowest of PISA- participating countries. Similarly, grade 8 students’ average achievement in both mathematics and science nearly bottoms the list of TIMSS (Trends in International Mathematics and Science Study) - participating education systems. Indicators as such are somewhat unexpected given Jordan’s internationally comparable expenditure levels in the education sector. Public education expenditure as a share of total government expenditure stood at roughly 10.3 percent in 2012, slightly above the OECD average for that same year (9.8 percent), and on par with, for example, strong PISA-performers such as Germany, Austria, and Poland. Further, public education expenditure as a share of GDP was 3.4 percent in 2011, just below the OECD average (5.2 percent), and yet at the same level as Singapore, Japan, and China’s administrative regions of Macao and Hong Kong –the top PISA-performers. Similarly, the past two decades have witnessed remarkable progress in improving the health status of the population. Life expectancy at birth increased from 69.9 years in 1990 to 73.7 years in 2012; maternal mortality declined from 86 per 100,000 live births in 1990 to 50 in 2013; infant mortality reduced from 34 per 1,000 live births from 1990 to 17 in 2012; and the under-5 mortality rate declined from 39 per 1,000 live births to 21 in the same time period . Despite these gains, Jordan’s health indicators, especially infant and maternal mortality, suggest that considerable health gains can be made in terms of quality of care. While it may be concluded that the underlying dynamics for the perceived inadequate quality of services in Jordan are fueled by limited resources going into the system, the evidence suggests otherwise. In 2011, Jordan’s public spending on health as a percentage of GDP stood at 6 percent, almost double that of the MENA average of 3 percent. This was mirrored in per capita health expenditures, which stood at US$392. This is well above the averages for low- and middle-income countries and for developing countries in the MENA region, although it is not the highest in the region. Jordan stands out within the region and among countries of similar economies more generally for its high levels of public health spending. The evident contrast between Jordan’s high spending on education and healthcare services and the somewhat inadequate levels of student learning and health outcomes achieved by the country suggests that the quality production function in Jordan is not constrained by structural inputs, but rather limitations on how providers translate inputs into services. Thus, this study seeks to understand how within-facility accountbaility mechanisms can be used to improve service delivery in a country where structural inputs are largely already in place, providing a valuable case study for countries in MENA as well as in other regions. viii Evidence from Jordanian schools and primary healthcare centers reveals that effort put forth by teachers and healthcare providers in their jobs is seemingly low. Taking stock of existing education data, the study identifies four substantive measures of teacher effort that are aligned with teachers’ professional standards in Jordan, as stipulated by the Kingdom’s Civil Service Bureau. Teachers are expected to strive to: (i) provide continuous feedback to students; (ii) respond to students’ questions in a way that is conducive to creating a respectful and emotionally supportive environment for learning; (iii) design a range of student assessment methods that provide a variety of performance opportunities for students; and (iv) consider specific student performance and needs while designing lessons. The study finds that effort put forth by teachers in meeting these standards is seemingly low. Only one in five teachers mark all pages of students’ copybooks, while roughly 25 percent of teachers mark only a few pages, and 3.4 percent do not mark even a single page. When a student is unable to answer a question, students report that as many as 70 percent of teachers simply repeat the exact same question to the same student again, or ask another student instead, while 5.4 percent of teachers scold the student, or send her outside of the classroom or to stand in a corner. Moreover, almost two in three teachers report using only one or two methods of student assessment, and as little as one-fourth of all teachers report using these assessments to inform their lesson planning. While these findings are exclusive to teachers in early primary grades, they may be indicative of a wider challenge present across education levels in the Kingdom. Findings from the analysis of the original data collected in the primary healthcare sector similarly show provider effort (measured as absenteeism, the expenditure of clinical effort during a patient encounter, the amount of time spent with patients, and the provision of rights-based care) is low in multiple areas. During field visits to health centers, 17 percent of health providers on average were reported absent (both excused and unexcused). On average 17% absenteeism is better than studies have found in other similarly developed countries. However, the average represents substantial variation across facilities. While some clinics were operating fully staffed, others were missing over half of their providers, suggesting a lack of access to care. Based on interviews conducted with patients exiting healthcare facilities, study findings highlight low provider effort during the clinical encounter. On average, health providers performed only half of key exam elements, suggesting that diagnoses and other health-related decisions are made with limited clinical information. Further, these decisions occur during clinical encounters that last as little as 4 minutes. The average length of an encounter was 10 minutes, but thorough, high-quality, rights-based care is difficult to deliver in the span of 10 minutes, let alone 4. The data substantiated this: shorter encounters were associated with lower clinical effort and a lower likelihood of the provision of rights-based care, although on average, patients reported that they received respectful, responsive, rights-based care. Increasing principal and CMO monitoring may yield tangible improvements in provider effort in the workplace. School principals and Chief Medical Officers (CMOs) are well placed to identify low levels of provider effort when they see them, as they: are trained as teachers and medical doctors; have spent numerous years teaching in the classroom and providing clinical services; and share the same work space as the teachers and healthcare providers they oversee. This study provides new evidence about the critical role that school principals and CMOs can play in strengthening provider accountability and assisting teachers and healthcare providers to exert the effort needed to provide quality services. The education study finds that principal monitoring –as measured by a constructed composite index of monitoring practices– is indeed a strong predictor of teacher effort but that the effect of principal monitoring is a function of principals’ ability to observe teacher effort in a given effort area. Further, the ix study reveals that principal monitoring is strongly associated with student learning, and that such association is mediated by those areas of teacher effort that are observable to the principal. Findings in the health sector mimic those in education. Health providers exert greater effort in examining and treating patients and spend more time with patients when CMOs institute and carry out monitoring procedures at the facility level. Reaping the highest values from principal and CMO monitoring necessitates a strong incentives environment. Despite the effort gains that are possible through appropriate monitoring, the accountability environment in Jordan’s education and health sector s provides very few incentives for teachers and healthcare providers to exert the highest level of effort possible. On one hand, financial incentives to encourage provider effort are absent. At the central level, salary schemes for teachers and healthcare providers are only tied to providers’ credentials and years of ex perience, providing no incentive for providers to perform to their knowledge frontier. At the facility level, principals’ and CMOs’ limited managerial autonomy and constrained facility budgets preclude the use of financial incentives, while their inability to hire and fire staff curbs the impact of their efforts to bolster provider accountability. On the other hand, principals and CMOs in Jordan seldom rely on nonfinancial mechanisms to incentivize provider effort. And when they do, they mostly make use of mechanisms to sanction, largely underutilizing the potential of positive nonfinancial incentives. The move toward performance-based education and health systems in Jordan is imperative. The largely adequate structural inputs in Jordan’s education and health sectors stand in sharp contrast to the seemingly low effort exerted by teachers and healthcare providers, significantly hindering the country’s ability to provide high -quality services. This calls for a move towards performance-based education and health systems in the Kingdom, whereby provider accountability is put at the heart of each sector’s reform agenda. Moving toward such performance-based accountability systems requires Jordan to ponder four key considerations: (i) the need to select and establish adequate indicators to measure provider performance; (ii) the requirement to standardize and systematize the collection of performance indicators; (iii) the need to design and tie effective rewards and sanctions schemes to performance indicators to incentivize high provider effort; and (iiii) the need to institute mechanisms that keep principals and CMOs accountable and providing the necessary training and managerial autonomy to allow them to better perform their supervisory roles are required to champion such an important undertaking across the country. Doing so requires a systems approach that integrates performance-based accountability into a larger performance management system in which performance indicators inform the design of strategic professional development programs for providers. x I. ACCOUNTABILITY AND QUALITY OF SERVICE DELIVERY CHAPTER I 1.1 INTRODUCTION In many developing countries, governments have invested vast resources in the provision of basic services such as healthcare and education. Public health facilities and schools now extend across national territories even in rural areas. In many countries, a booming private sector has emerged to compete with public services, at least for those consumers who can afford them. As a result, citizens enjoy unprecedented access to basic services, particularly in middle-income countries. But the provision of healthcare and education does not guarantee that people receive the correct diagnoses and treatment they require, or develop the literacy, numeracy, and other life skills that they need to become productive and informed members of society. In many countries, this is because the quality of social services presents distinct challenges that prohibit citizens from obtaining their de jure entitlements to basic schooling or healthcare in the public sector. At the most fundamental level, the quality of healthcare or education can be disaggregated into several dimensions related to the structure, process, and outcome of the delivery of services (Donabedian 1988; RAND 2012). The structural dimension of quality refers to the material and human resources and the physical and organizational characteristics of the facility where service delivery occurs (Donabedian 1988). This includes the availability and condition of relevant equipment, the level of training among staff members, inputs, supplies, and appropriate infrastructure up and down the supply chain. The process-oriented component of quality includes the technical and interpersonal processes through which services are provided (Donabedian 1988). Process measures assess the degree to which staff members apply their technical knowledge to deliver the service in question in an appropriate and responsive manner, and the extent of provider adherence to guidelines or standards specific to the service delivery type. Finally, outcome measures of quality denote the results of the service exchange. This can include intermediate outcomes, such as utilization of healthcare services or enrollment rates for schools, as well as metrics to capture physical and financial access to services (Roberts et al. 2008). In addition, at the far end of the spectrum, quality outcomes include human development outcomes such as the health status of patients, student learning, and client satisfaction. A growing body of research on the quality of both health and education suggests that prioritizing investment in the structural dimensions of quality beyond a certain threshold or ceiling is likely to yield minimal benefits for health and educational outcomes (Das and Hammer 2014; Cristia et al. 2012; Glewwe et al. 2004; Hanushek 2003; RAND 2012). In other words, while investments in physical and organizational structures are needed and desirable, concentrating only on these dimensions and ignoring the incentive environment and its influence on what actually happens in patient-provider or student-teacher interactions will ultimately produce minimal gains in patient health outcomes or student learning (Mitchell et al. 1998; Hanushek 2003). Many developing countries have already invested substantially in the social sectors and yet human development outcomes have not improved at a commensurate rate (Das and Hammer 2014; Glewwe et al. 2013; RAND 2012). The real challenges to health and educational systems relate to quality of service delivery, which is less easily measured than expenditures and, as noted above, plays a key role in improving outcomes (World Bank 2003). This calls for a shift in emphasis from “having the right things” – structures – to “doing the right things” – processes –to “having the right things happen” – outcomes (Mitchell et al. 1998). 1 Doing the right things requires that teachers and healthcare providers come to work regularly, comply with technical and professional standards, and exert sufficient effort to ensure that community members receive the services required to meet their needs. This assumes a certain level of provider knowledge without which the quality production function would be compromised (Darling-Hammond 1999; Das et al. 2015; Goldhaber and Brewer 1996). However, what providers are capable of doing – measured through applied knowledge or competence –is oftentimes not predictive of what they actually do in practice – denoting their level of exerted effort or performance (Das et al. 2015; Hanushek and Luque 2003; Hanushek and Rivkin 2006; Kane, Rockoff, and Staiger 2008; Rethans et al. 1991). Traversing the chasm between what providers know and what they actually do – or the “know- do gap” – is of paramount importance to meet quality standards in the delivery of social services and to impact human development outcomes. Service providers must work to their knowledge frontiers and consistently meet their professional duties and responsibilities. In short, finding ways to increase provider effort is critical to the quality of service delivery in both the education and health sectors (Das and Hammer 2014; Donabedian 1988; RAND 2012). 1.2 ACCOUNTABILITY AND PROVIDER EFFORT The notion of accountability rests on a relationship in which one party is answerable to another and is liable for his or her actions. In the realm of service delivery, this implies that a doctor or teacher feels an obligation to provide good-quality services and, at a minimum, fulfills the terms of an explicit or implicit set of commitments to the patient or student. Furthermore, the provider is prepared to take responsibility for her actions. Accountable providers are more likely to exert the effort required to carry out their duties effectively, increasing the quality of services delivered (Andrabi, Das, and Khwaja 2014; Björkman and Svensson 2009; Hastings and Weinstein 2008; Pandey, Goyal, and Sundararaman 2009; Pradhan et al. 2014).1 A sense of accountability among providers does not always occur naturally. Rather, it can and more often than not needs to be promoted through a variety of mechanisms (Banerjee and Duflo 2006; Chaudhury et al. 2006). Based on a two-dimensional conceptualization (Schedler 1999), effective accountability requires on the one hand monitoring and oversight mechanisms that allow one to find facts and generate evidence of actual performance and to prevent eventual underperformance or high performance from going unnoticed. On the other hand, it necessitates mechanisms of enforcement that align incentives to ensure that good performance is rewarded and poor performance sanctioned (Schedler 1999). These two broad types of mechanisms – monitoring and incentives – can increase the likelihood that providers will come to work, adhere to standards and guidelines, and be responsive to client needs toward the provision of good-quality services. On its own, monitoring – or even just the knowledge that monitoring may occur – can sometimes provide sufficient motivation for teachers, doctors, and other staff members to fulfill their professional obligations (Panagopoulos 2010). Monitoring arrangements can be formal or informal and can take place at multiple levels – by superiors within the facility, community members, or local officials. In the absence of consequences, however, monitoring may not induce behavioral changes. Monitoring is more likely to be effective when coupled with incentives, which can be negative or positive (Willis- Shattuck et al. 2008; World Bank 2004). Sanctions in response to failures to carry out professional duties or rewards for good performance provide one set of motivations, for which monitoring is a prerequisite. Sanctions can be financial, as in penalties such as lost wages or benefits, or nonfinancial, such as public reprimands or professional demotions. Rewards can also be financial or nonfinancial. Financial incentives, whether in terms of salary or allowances, play a clear role in motivating providers to carry out their duties and remain in their posts. As elaborated in subsequent chapters, though, a growing body of research attests to the critical – and sometimes even more important – role of nonfinancial incentives in 1Kosack and Fung (2014) present a framework to explain the conditions that shape the effectiveness of interventions to improve provider accountability. 2 shaping provider effort (Ashraf, Bandiera, and Jack 2014; Francois and Vlassopoulos 2008; Willis-Shattuck et al. 2008; Mathauer and Imhoff 2006). These include a broad array of incentives such as official recognition for a job well executed or the availability of continuing education and training programs that are tied to good performance. Managerial techniques – such as providing continuous feedback on staff performance, encouraging new ideas or initiatives, or involving staff in critical decisions that may affect them – can also motivate providers by fostering a positive work environment and enabling professional staff to gain recognition from their superiors, colleagues, peer groups, or communities (Dieleman, Gerretsen, and van der Wilt 2009; Harris, Cortvriend, and Hyde 2007). Effectively monitoring provider performance is essential for enforcing all forms of incentives – whether negative or positive, financial or nonfinancial – to increase provider effort. To allocate financial and nonfinancial incentives, local health and education officials, managers, and other decision makers must have ways to evaluate staff members. Researchers and policy makers have tested a wide variety of policies, institutional arrangements, and management tools to improve the accountability of service delivery and increase the likelihood that service providers show up for work and adhere to established standards of good practice. These efforts can be classified as top-down, bottom-up, or within-facility approaches.2 1.2.1 Top-down Accountability As the name implies, top-down accountability aims at promoting provider effort through government oversight. Within the public sector, it includes formal administrative jurisdictions at the national, provincial, district, municipal, village or local levels, and involves agencies engaged in the provision of services or those charged with the financing or regulation of service providers, whether public or private. Government supervision and regulation of service facilities and their personnel often entail a compact, or an explicit or implicit agreement between the state and providers, to induce doctors or teachers to meet their obligations, usually in return for performance-based rewards or penalties (World Bank 2004). Tools associated with top-down accountability entail official oversight over the performance and output of service facilities, usually by local government officials who then report up to superiors. For example, electronic methods used by local officials to monitor providers’ attendance, such as through smartphones or other devices, have been shown to increase staff attendance (Callen et al. 2013; Dhaliwal and Hanna 2014; Banerjee and Duflo 2006). Attaching incentives to monitoring mechanisms, such as through performance-based pay schemes, bonuses, promotions, and official recognition by local governments, have proven to induce greater commitment and compliance with standards (Banerjee, Duflo, and Glennerster 2008; Chimhutu, Lindkvist, and Lange 2014; de Walque et al. 2015; Gertler and Vermeersch 2013; Huillery and Seban 2014; Muralidharan and Sundararaman 2011). Top-down accountability faces challenges, however. In many developing countries, democracies and nondemocracies alike, the state’s regulatory capacity is lacking. Even the most well- intentioned government officials may not be able to induce social service providers to fulfill their obligations either because they lack sufficient information on performance or because they have insufficient means to enforce the terms of a compact. As a result, doctors or teachers may fail to show up to work, underperform in their jobs, mistreat or neglect patients and students, or solicit bribes before they will carry out their basic duties. 1.2.2 Bottom-up Accountability Given the potential limitations of top-down accountability, bottom-up accountability gives citizens the means to directly hold providers accountable (World Bank 2004). Patients, students, and their families are well placed to monitor their providers since they have the most direct contact with 2 This classification is aligned with the accountability relationship framework described in the 2004 World Development Report, “Making Services World for Poor People” (World Bank 2004). 3 doctors, teachers, or other professional staff at their local service facilities. Methods of boosting bottom- up accountability generally rely on formal and informal means of exercising citizen or community influence over their providers. For example, community recognition of good providers that respond to the natural human desire to achieve favorable acknowledgement is a powerful source of motivation and is a relatively low-cost, informal means of inducing improved provider effort (Björkman and Svensson 2009; Panagopoulos 2010). More formal, institutionalized forms of client power are also an option. Local management of facilities, such as through school-based management or health committees composed of community members, introduces a hands-on method of monitoring provider behavior and influencing the operations of schools or health centers. Similarly, community control over budgeting for social expenditures, such as through block grant programs that empower local residents to decide on spending priorities, potentially induce providers to increase their effort to gain more resources (Olken, Onishi, and Wong 2012). The potential negative repercussions of poor performance can shape provider behavior. The threat of exit by introducing a choice of providers may incentivize teachers or health center staff to exhibit greater effort and improve human development outcomes. For example, Couch et al. (1993) find that competition from private schools leads to better test scores in the United States. In the case of healthcare, Bloom et al. (2015) find that opening a new hospital in districts in England increases management performance in existing hospitals. Similarly, greater control over hiring and firing by facility- based committees can reduce provider absenteeism and increase commitment to professional duties (Duflo, Dupas, and Kremer 2015; King and Ozler 2005). Bottom-up accountability, too, has serious limitations. First, collective action is often hard to achieve among disparate groups of citizens, unless pre-established social or personal ties have already brought them together (Lieberman 2003; Singh 2010; Tsai 2007). Second, even when these disparate groups are able to overcome collective action problems and organize, they often lack sufficient technical knowledge about social service sectors, giving them an informational disadvantage in the provider- client relationship (see Akerlof 1970, cited in Das and Hammer 2014). Third, ordinary citizens – rather than political and economic elites – often lack influence over decision makers and officials, limiting their ability to affect change in the behavior of local providers (Blimpo and Evans 2011; Patrinos, Barrera- Osorio, and Fasih 2009; Pradhan et al. 2014). Lastly, even if all of these obstacles can be overcome, initiatives designed to encourage greater citizen participation and control over the allocation of resources are subject to elite capture, limiting the efficacy of citizen voice in promoting provider effort (Dasgupta and Beard 2007; Platteau 2000). 1.2.3 Within-facility Accountability Within the accountability framework, one aspect has been underemphasized, notably the role of supervisors in the facilities at the frontlines of service delivery. The way in which accountability is promoted and ensured within schools and health centers is critical to inducing compliance with technical and professional standards and other measures of provider effort, which in turn can improve the quality of health and education service provision. Thus, mechanisms of promoting provider effort within facilities deserve more attention than they have received thus far in development research as they may be able to address some of the limitations of both top-down and bottom-up accountability mechanisms. An emphasis on within-facility accountability capitalizes on the technical knowledge of supervisors within health centers and schools and on the proximity to the actual service delivery exchange between providers and clients. These advantages address two issues that may bedevil efforts to build accountability in service delivery, notably the “observability challenge” and the “farther outcome problem.” 4 The Observability Challenge The observability challenge refers to the inherent difficulty in observing and evaluating what doctors and teachers actually do in their workplaces and especially in clinical examination rooms and classrooms. Monitoring requires that supervisors have sufficient technical knowledge to distinguish between different levels and types of provider effort, as well as proximity to observe these periodically. As trained physicians, chief medical officers (CMOs) have the background to assess whether doctors follow proper protocols or prescribe the correct treatment plan. Similarly, principals are qualified to determine whether teachers provide adequate instructional support to students, which requires knowledge of the principles of pedagogy and instructional evaluation. CMOs and school principals are located within facilities, have the mandate to observe the performance of their employees, and possess the technical know-how to interpret what they see. The combination of these attributes therefore gives these facility-level managers unique advantages in fulfilling the monitoring function of accountability. The Farther Outcome Problem The farther outcome problem refers to the use of outcomes that are more easily observed and quantified, such as health outcomes or student test scores, to gauge and incentivize provider effort, rather than what actually occurs in clinical examination rooms and classrooms. But the use of such indicators may not be effective in improving the quality of services for two reasons. First, many factors that influence easily observed and quantifiable outcomes are outside the control of providers. Individual, family, and contextual characteristics all influence students’ test scores and patients’ health outcomes, making it challenging to attribute changes in these outcomes to service providers’ actions . Second, relying on easily observable and quantified outcomes creates incentives that may not promote optimal provider behavior. In the education sector, reliance on test scores as performance assessment criteria can incentivize teachers to “teach to the test” or invest more i n test preparation while neglecting actual student learning. In the case of healthcare, reliance on farther outcomes such as patient surveys may incentivize providers to prescribe medications that are not necessary to satisfy patients who lack the knowledge to accurately evaluate provider performance (Das and Hammer 2014). A focus on within-facility assessments of teacher effort can help overcome this problem. Because they are often trained as teachers and have spent numerous years teaching in the classroom before entering school administration, school principals can detect and assess different dimensions of teacher effort when they see them. Similarly, CMOs have the technical knowledge to rely on more proximate measures of effort that can be generated within facilities by reviewing medical records, observing clinical interactions, or employing other methods. Essentially, within-facility accountability may be able to overcome the observability challenge and the farther outcome problem because it relies on supervisors who are proximate to and knowledgeable of client/provider interactions. Key tools to promote within-facility accountability entail systems to monitor and incentivize greater provider effort (Dieleman, Gerretsen, and van der Wilt 2009; Harris, Cortvriend, and Hyde 2007; Kabene et al. 2006; West et al. 2006). Monitoring can involve random checks of medical records or verification of teachers’ lesson plans. It may also involve joining health providers in clinics or conducting classroom observations. Other tools may incorporate the use of surveys to gauge client satisfaction or systems to track provider absenteeism at the facility. Within health centers and schools, managers can institutionalize a variety of positive and negative incentives to encourage doctors or teachers to apply their knowledge and training in clinical interactions or in the classroom, thereby exerting high levels of effort. Positive incentives might include financial rewards and bonuses, if budgets permit. They might also entail nonfinancial approaches, such as fostering workplace satisfaction by building a team-oriented culture, granting staff members greater autonomy in their daily responsibilities, or recognizing staff through “employee-of-the-month” awards and related approaches. Examples of negative incentives are official reprimands and sanctions, the withholding of salaries or imposition of financial penalties, or, at the extreme, the suspension or termination of employment. Research on human resource management indicates that positive incentives are more likely to induce 5 greater provider effort than sanctions, which can backfire by reducing workplace morale (Ashraf, Bandiera, and Jack 2014; Mathauer and Imhoff 2006; Willis-Shattuck et al. 2008). Within-facility accountability does not operate in a vacuum and therefore cannot be entirely divorced from top-down and bottom-up forms of accountability. The engagement of local authorities and the broader community may compel CMOs or school principals to monitor more rigorously and to design and implement programs that elicit greater commitment to professional responsibilities. Indeed, the very fact that the efficacy of a given monitoring or incentive scheme varies across different studies suggests that the context in which implementation occurs moderates its impact (Pritchett and Sandefur 2013). The involvement of local authorities, communities, and management, adaptation to local circumstances, and the active involvement of local staff to identify and implement solutions to problems increase the success of policies aimed at better performance of service facilities in low- and middle-income countries (Christenson and Cleary 1990; Dieleman, Gerretsen, and van der Wilt 2009; Johnson, Monk, and Swain 2000). Thus, the full value of within-facility accountability may be best realized when it works in coordination with top-down and bottom-up forms of accountability. 1.3 MOTIVATION OF THE PRESENT REPORT Ultimately, doctors, teachers, and other staff members who perform well and devote themselves to fulfilling their duties are accountable providers. The key challenge, then, is to seek effective ways to boost provider accountability. A combination of distinct mechanisms – monitoring and incentives – is likely to yield the most marked improvements in provider performance. These mechanisms can operate at multiple levels through top-down, bottom-up, and within-facility accountability.3 Research on accountability and quality of services has thus far underemphasized within-facility accountability ‒ the focus of this report. Compelling reasons exist to devote further policy attention to this node in the service delivery chain. The managers of service institutions (e.g., CMOs and school principals) are uniquely well situated and qualified to monitor and incentivize higher provider effort given their technical skills, experience, and proximity to the service delivery exchange. Furthermore, to the extent that they entail minimal expense and work with existing human capital, policies to promote within-facility accountability through monitoring and incentives can be cost-effective and feasible. Given that service delivery institutions are located at the nexus of local government offices and communities, within-facility accountability is also affected by external influences, whether from state agencies, civil society organizations, or citizens themselves. Ultimately, the way in which accountability is promoted and ensured within health centers and schools determines how effective these measures will be in invreasing provider effort. This report investigates the linkages between within-facility accountability and provider effort in the health and education sectors in Jordan through an original study in primary healthcare facilities and rigorous analyses of existing data on the education sector in Jordan. The limited contributions of the structural dimensions of quality to human development outcomes and the extensive resources that Jordan has already invested in its health and educational systems, which are not likely to yield substantial additional payoffs, justify this focus. Indeed, Jordan ’s social expenditures are relatively high vis-à-vis other countries in the region, with average public expenditures on health and education amounting for between 7-8 percent of gross domestic product (GDP) in the past three decades, 3 Such mechanisms have been reflected upon in the MENA Flagship Report “Trust, Voice, and Incentives: Learning from Local Success Stories in the Middle East and North Africa” (World Bank 2015), which examined the role of trust, incentives, and citizen engagement as critical determinants of service delivery performance in both the health and education sectors in MENA countries. Examining the powerful role of bottom-up accountability mechanisms, the report incorporated two case studies from Jordan (one in health and another in the education sector) where communities have managed to attain extraordinary outcomes using innovative local solutions to the prevailing problems. The present study builds on this previous endeavor by expounding on the accountability mechanisms within service delivery facilities. 6 whereas governments in other Middle East and North Africa (MENA) countries outside of the oil-rich Gulf countries spent between 5.4-6.4 percent of GDP on the social sectors in the same period (Cammett et al. 2015). In comparison with other middle-income countries, Jordan also exhibits high social expenditures. For example, between 1996 and 2013, average public spending on health as a percentage of GDP was about twice as high in Jordan as in the average middle-income country. Such high levels of spending do not necessarily buy superior health outcomes: in 2013, public expenditure on health was 3 percent of GDP in Jordan and only 1.4 percent in Sri Lanka, yet the infant mortality rate was 16.3 per 1,000 and life expectancy at birth was 73.9 years in the former country while the infant mortality rate was 8.7 per 1,000 and life expectancy was 74.2 in the latter, despite Sri Lanka’s low per capita GDP (World Bank 2015). The gap between expenditures and outcomes is especially evident in the education sector. Enrollment rates in Jordan are high, but the performance of 15-year-old Jordanians on international assessments reveals that it is one of the lowest scoring countries participating in the PISA (Program for International Student Assessment) exam (OECD 2012). At the same time, public expenditure on education as a percentage of GDP was 3.4 percent in 2011, below the OECD average of 5.2 percent but on par with levels in countries that performed much more strongly in international assessment, such as Singapore, Japan, and Hong Kong (OECD 2012; WDI 2012). The failure of structural investment to substantially improve outcomes in Jordan appears to also hold in the broader MENA region. Analysis of the health and education sectors in Jordan thus provides a valuable case study for understanding provider service in the entire region.4 Finding ways to encourage doctors and teachers to fulfill their professional duties and work to their knowledge frontiers promises to yield tangible improvements while entailing minimal additional financial outlays. 1.4 REPORT ROADMAP Chapter II describes the role of accountability in promoting teacher effort and student learning and provides a brief overview of the Jordanian education sector. The chapter then presents the design, methods, and results of a rigorous empirical study linking accountability mechanisms used by school principals to teacher effort and student outcomes in Jordan. Chapter III first focuses on the role of accountability in improving the delivery of healthcare in the Jordanian health sector and describes the design, methods, and results of an original research study on the relationship between accountability mechanisms used by CMOs and health provider effort in Jordan. Chapter IV builds on the lessons of these original research studies to elaborate a series of policy recommendations aimed at capitalizing on within-facility accountability to improve provider effort and, ultimately, human development outcomes in Jordan. 4 In the broader MENA region, health and education expenditures as a percentage of GDP were cut minimally or remained stable during periods of fiscal austerity while rank averages of the Human Development Index declined markedly from the 1990s through the 2010s (Cammett et al. 2015). The MENA region devoted a higher percentage of GDP to health during the 1990s than East Asia yet had significantly lower health outcomes (World Bank 2002). Literacy is also lower than expected given income levels. In 2010, adult literacy in the developing countries of the MENA region was 77.9 percent, as compared with 81.4 percent in low- and middle-income countries and 98.3 percent in OECD countries (World Development Indicators). Indicators of academic performance, such as the Trends in International Mathematics and Science Study (TIMSS), which measures fourth- and eighth- grade student outcomes and is administered every four years internationally to a large sample of countries, as well as the PISA of the OECD, indicate that students in the region fare poorly in comparison with students in countries with similar per capita income levels (TIMSS 2007). 7 References Akerlof, George A. 1970. "The Market for "Lemons": Quality Uncertainty and the Market Mechanism." The Quarterly Journal of Economics 84 (3):488-500. Andrabi, Tahir, Jishnu Das, and Asim Ijaz Khwaja. 2014. "Report Cards: The Impact of Providing School and Child Test Scores on Educational Markets." Cambridge, MA: Harvard University. Ashraf, Nava, Oriana Bandiera, and B Kelsey Jack. 2014. "No Margin, No Mission? A Field Experiment on Incentives for Public Service Delivery." Journal of Public Economics 120:1-17. Banerjee, Abhijit, and Esther Duflo. 2006. "Addressing Absence." Journal of Economic Perspectives 20 (1):117-132. Banerjee, Abhijit V., Esther Duflo, and Rachel Glennerster. 2008. "Putting a Band ‐Aid on a Corpse: Incentives for Nurses in the Indian Public Health Care System." Journal of the European Economic Association 6 (2‐3):487-500. Björkman, Martina, and Jakob Svensson. 2009. "Power to the People: Evidence from a Randomized Field Experiment of a Community-Based Monitoring Project in Uganda." Quarterly Journal of Economics 124 (2):735-769. Blimpo, Moussa P, and David K Evans. 2011. " School-based Management and Educational Outcomes: Lessons From a Randomized Field Experiment." World Bank Working Paper. World Bank, Washington, D.C. Bloom, Nicholas, Carol Propper, Stephan Seiler, and John Van Reenen. "The impact of competition on management quality: evidence from public hospitals." The Review of Economic Studies 82, no. 2 (2015): 457-489. Callen, Michael Joseph, Saad Gulzar, Ali Hasanain, and Muhammad Yasir Khan. 2013. " The Political Economy of Public Employee Absence: Experimental Evidence from Pakistan." SSRN Working Paper 2316245. Cammett, Melani, Ishac Diwan, Alan Richards, and John Waterbury. 2015. A Political Economy of the Middle East. 4th ed. Boulder, CO: Westview. Chaudhury, N., J. Hammer, M. Kremer, K. Muralidharan, and F.H. Rogers. 2006. "Missing in Action: Teacher and Health Worker Absence in Developing Countries." Journal of Economic Perspectives 20 (1):91-116. Chimhutu, Victor, Ida Lindkvist, and Siri Lange. 2014. "When Incentives Work Too Well: Locally Implemented Pay for Performance (P4P) and Adverse Sanctions Towards Home Birth in Tanzania - A Qualitative Study." BMC Health Services Research 14 (1):23. Christenson, Sandra L, and Maureen Cleary. 1990. "Consultation and the Parent-Educator Partnership: A Das, Jishnu, Alaka Holla, Aakash Mohpal, and Karthik Muralidharan. "Quality and accountability in healthcare delivery: audit evidence from primary care providers in India." World Bank Policy Research Working Paper 7334 (2015). Cristia, J., P. Ibarrán, S. Cueto, A. Santiago, and E. Severín. 2012. “Technology and child development: Evidence from the One Laptop per Child program.” IZA Discussion Paper No. 6401, Forschungsinstitut zur Zukunft der Arbeit GmbH, Bonn, Germany. Couch, Jim F., William F. Shughart, and Al L. Williams. "Private school enrollment and public school performance." Public Choice 76, no. 4 (1993): 301-312. Darling-Hammond, Linda. "Target Time Toward Teachers." Journal of Staff Development 20, no. 2 (1999): 31-36. 8 Das, Jishnu, Alaka Holla, Aakash Mohpal, and Karthik Muralidharan. "Quality and accountability in healthcare delivery: audit evidence from primary care providers in India." World Bank Policy Research Working Paper 7334 (2015). Das, Jishnu, and Jeffrey Hammer. 2014. "Quality of Primary Care in Low-Income Countries: Facts and Economics." Annual Review of Economics 6:525–553. Dasgupta, Aniruddha, and Victoria A. Beard. 2007. "Community Driven Development, Collective Action and Elite Capture in Indonesia." Development and Change 38 (2):229-249. doi: 10.1111/j.1467- 7660.2007.00410.x. de Walque, Damien, Paul J. Gertler, Sergio Bautista-Arredondo, Ada Kwan, Christel Vermeersch, Jean de Dieu Bizimana, Agnès Binagwaho, and Jeanine Condo. 2015. "Using Provider Performance Incentives to Increase HIV Testing and Counseling Services in Rwanda." Journal of Health Economics 40:1-9. doi: http://dx.doi.org/10.1016/j.jhealeco.2014.12.001. Dhaliwal, Iqbal, and Rema Hanna. 2014. "Deal with the Devil: The Successes and Limitations of Bureaucratic Reform in India." National Bureau of Economic Research.. NBER Working Paper No. 201482. Cambridge, MA. Dieleman, Marjolein, Barend Gerretsen, and Gert Jan van der Wilt. 2009. "Human Resource Management Interventions to Improve Health Workers' Performance in Low and Middle Income Countries: A Realist Review." Health Research Policy and Systems 7 (7):1-13. Donabedian, Avedis. 1988. "The Quality of Care: How Can it be Assessed?" Journal of the American Medical Association 260 (12):1743-1748. Duflo, Esther, Pascaline Dupas, and Michael Kremer. 2015. "School Governance, Teacher Incentives, and Pupil–Teacher Ratios: Experimental Evidence from Kenyan Primary Schools." Journal of Public Economics 123:92-110. doi: http://dx.doi.org/10.1016/j.jpubeco.2014.11.008. Francois, Patrick, and Michael Vlassopoulos. 2008. "Pro-Social Motivation and the Delivery of Social Services." CESifo Economic Studies 54 (1):22-54. Gertler, Paul, and Christel Vermeersch. 2013. "Using Performance Incentives to Improve Medical Care Productivity and Health Outcomes." NBER Working Paper No. 19046. Cambridge, MA. Glewwe, Paul, Eric A. Hanushek, Sarah Humpage, and Renato Ravina. 2013. "School Resources and Educational Outcomes in Developing Countries: A Review of the Literature from 1990 to 2010." In Education Policy in Developing Countries, edited by Paul Glewwe. Chicago: University of Chicago Press. Glewwe, Paul, Michael Kremer, Sylvie Moulin, and Eric Zitzewitz. 2004. "Retrospective vs. Prospective Analyses of School Inputs: The Case of Flip Charts in Kenya." Journal of Development Economics 74 (1):251-268. Goldhaber, Dan D., and Dominic J. Brewer. "Evaluating the Effect of Teacher Degree Level on Educational Performance." (1996). Hanushek, Eric A. "The Failure of Input ‐ based Schooling Policies*." The economic journal 113, no. 485 (2003): F64-F98. Hanushek, Eric A., and Javier A. Luque. "Efficiency and equity in schools around the world." Economics of education Review 22, no. 5 (2003): 481-502. Hanushek, Eric A., and Steven G. Rivkin. "Teacher quality." Handbook of the Economics of Education 2 (2006): 1051-1078. Harris, Claire, Penny Cortvriend, and Paula Hyde. 2007. "Human Resource Management and Performance in Healthcare Organisations." Journal of Health Organization and Management 21 (4-5):448-59. doi: 10.1108/14777260710778961. 9 Hastings, Justine S., and Jeffrey M. Weinstein. 2008. "Information, School Choice, and Academic Achievement: Evidence from Two Experiments." Quarterly Journal of Economics 123 (4):1373- 1414. Huillery, Elise, and Juliette Seban. 2014. Performance-Based Financing, Motivation and Final Output in the Health Sector: Experimental Evidence from the Democratic Republic of Congo. (No. 2014- 12). Sciences Po. Johnson, Sally, Martin Monk, and Julian Swain. 2000. "Constraints on Development and Change to Science Teachers' Practice in Egyptian Classrooms." Journal of Education for Teaching: International research and pedagogy 26 (1):9-24. Kabene, Stefane M., Carole Orchard, John M. Howard, Mark A. Soriano, and Raymond Leduc. 2006. "The Importance of Human Resources Management in Health Care: A Global Context." Human Resources for Health 4 (20):1-17. Kane, Thomas J., Jonah E. Rockoff, and Douglas O. Staiger. "What does certification tell us about teacher effectiveness? Evidence from New York City." Economics of Education review 27, no. 6 (2008): 615-631. King, Elizabeth M, and Berk Ozler. 2005. "What’s Decentralization Got to Do with Learning? School Autonomy and Student Performance'." Kyoto University: Interfaces for Advanced Economic Analysis. DP 54:51-60. Kosack, Stephen, and Archon Fung. 2014. "Does Transparency Improve Governance?" Annual Review of Political Science 17:65-87. Lieberman, Evan. 2003. Race and Regionalism in the Politics of Taxation in Brazil and South Africa . Cambridge: Cambridge University Press. Mathauer, Inke, and Ingo Imhoff. 2006. "Health Worker Motivation in Africa: The Role of Non-Financial Incentives and Human Resource Management Tools." Human Resources for Health 4 (1):1-17. Mitchell, Pamela H., Sandra Ferketich, and Bonnie M. Jennings. "Quality health outcomes model." Image: The Journal of Nursing Scholarship 30.1 (1998): 43-46. OECD 2012, "Program for International Student Assessment 2012." Paris Olken, Benjamin A., Junko Onishi, and Susan Wong. 2012. " Should Aid Reward Performance? Evidence from a field experiment on health and education in Indonesia." NBER Working Paper Cambridge, MA. Panagopoulos, Costas. 2010. "Affect, Social Pressure and Prosocial Motivation: Field Experimental Evidence of the Mobilizing Effects of Pride, Shame and Publicizing Voting Behavior." Political Behavior 32 (3):369-386. Pandey, Priyanka, Sangeeta Goyal, and Venkatesh Sundararaman. 2009. "Community Participation in Public Schools: Impact of Information Campaigns in Three Indian States." Education Economics 17 (3):355-375. Patrinos, Harry Anthony, Felipe Barrera-Osorio, and Tazeen Fasih. 2009. Decentralized Decision-Making in Schools: The Theory and Evidence on School-Based Management. Washington, D.C.: World Bank. Platteau, Jean-Philippe. 2000. Institutions, Social Norms, and Economic Development . Amsterdam: Harwood Academic Publishers. Pradhan, Menno, Daniel Suryadarma, Amanda Beatty, Maisy Wong, Arya Gaduh, Armida Alisjahbana, and Rima Prama Artha. 2014. "Improving Educational Quality Through Enhancing Community Participation: Results from a Randomized Field Experiment in Indonesia." American Economic Journal: Applied Economics 6 (2):105-126. Pritchett, Lant, and Justin Sandefur. 2013. "Context Matters for Size: Why External Validity Claims and Development Practice Do Not Mix." Journal of Globalization and Development 4 (2):161-197. 10 RAND. 2012. Teachers Matter: Understanding Teachers' Impact on Student Achievement. Santa Monica, CA: RAND Corporation. Rethans, Jan-Joost, Ferd Sturmans, Riet Drop, Cees Van Der Vleuten, and Pie Hobus. "Does competence of general practitioners predict their performance? Comparison between examination setting and actual practice." Bmj 303, no. 6814 (1991): 1377-1380. Schedler, Andreas. 1999. "Conceptualizing Accountability." In The Self-Restraining State: Power and Accountability in New Democracies, edited by Andreas Schedler, Larry Diamond and Marc F. Plattner, 13-28. Boulder, CO: Lynne Rienner Publishers. Singh, Prerna 2010. "We-Ness and Welfare: A Longitudinal Analysis of Social Development in Kerala, India." World Development 39 (2):282-293. TIMSS 2007 Assessment. International Association for the Evaluation of Educational Achievement (IEA). Publisher: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Tsai, Lily. 2007. Accountability without Democracy: Solidary Groups and Public Goods Provision in Rural China. Cambridge, U.K.: Cambridge University Press. West, Michael A, James P Guthrie, Jeremy F Dawson, Carol S Borrill, and Matthew Carter. 2006. "Reducing Patient Mortality in Hospitals: The Role of Human Resource Management." Journal of Organizational Behavior 27 (7):983-1002. Willis-Shattuck, Mischa, Posy Bidwell, Steve Thomas, Laura Wyness, Duane Blaauw, and Prudence Ditlopo. 2008. "Motivation and Retention of Health Workers in Developing Countries: A Systematic Review." BMC Health Services Research 8 (1):1-8. World Bank. 2002. Reducing Vulnerability and Increasing Opportunity: Social Protection in the Middle East and North Africa. Washington, D.C.: World Bank. World Bank. 2003. Better Governance for Development in the Middle East and North Africa : Enhancing Inclusiveness and Accountability. Washington, DC: World Bank. World Bank. 2015. “World Development Indicators 2015”. Washington, DC. World Bank. World Bank. 2004. World Development Report 2004: Making Services Work for the Poor. Washington, D.C.: World Bank. 11 II. EDUCATION QUALITY, TEACHER EFFORT, AND ACCOUNTABILITY HEWLETT-PACKARDCHAPTER II 2.1 INTRODUCTION To improve student learning, researchers and policy makers have strived to better understand the different school factors involved in the education process and to estimate their relative contribution to student learning. These factors include school infrastructure and facilities, classroom supplies, learning materials, the curricula, class size, the school principal, and the teacher, and can be thought of in terms of inputs within the school education production function. In the last decade, an increasing number of rigorous impact evaluations have been conducted to test the individual contribution of many of these inputs to student learning. From supplying instructional flip charts in Kenya (Glewwe et al. 2004), to reducing class sizes in the United States (Krueger 1999; Krueger and Whitmore 2001), Bolivia (Urquiola 2006), Israel (Angrist and Lavy 1999), and India (Banerjee et al. 2007), to equipping schools and students with computers in Peru (Cristia et al. 2012) and Colombia (Barrera-Osorio and Linden 2009), the evidence consistently suggests that these inputs have a very small effect on student learning –if at all. The evidence also points to the fact that among all school inputs, teachers are what matter the most. In fact, a teacher is estimated to have two to three times the impact of any other school factor on student learning (RAND 2012). Effect sizes attributed to a one standard deviation (SD) increase in teacher quality range from 0.08 SD for reading and 0.11 SD for math (Kane et al. 2008) to as high as 0.26 SD and 0.36 SD for reading and math, respectively (Nye et al. 2004). In other words, when using the more conservative estimates, moving a student from a teacher in the 5th percentile of teacher quality to the 95th percentile in the United States increases student outcomes by roughly 0.33 SD. In developing countries, the impact of teacher quality is even larger, with a similar move yielding a 0.85 SD increase in student outcomes (Bau and Das forthcoming). The magnitude of such an effect becomes evident when compared to the effect sizes associated with a full academic year of instruction of roughly a 0.25 SD increase in test scores (Kane 2004), and typical measures of income achievement gaps of 0.7 – 1 SD (Hanushek and Rivkin 2010). The effects are even more dramatic when considering that a series of high- or low-quality teachers throughout school years compounds these effects and can lead to unbridgeable gaps in student learning levels (Bruns and Luque 2014). With teacher quality the single most effective school input to improve or undermine student learning, a key priority for policy makers and school administrators is to identify the drivers of teacher quality and find ways in which they can be boosted. Contrary to common belief, a recent strand of the education literature has found that high-quality teachers cannot be reliably identified based on easily observable characteristics such as their level of education, certification status, or years of experience – important determinants of teacher pay in many countries. With regard to teachers’ level of education, the evidence suggests that having, for example, a master’s degree in the United States, or a university degree in over 30 developing countries studied, has no systematic relationship with teacher quality as measured by student outcomes. Moreover, there is little indication that specialized training (in addition to or in place of a university degree) has any impact on student learning (Hanushek and Rivkin 2006; Hanushek and Luque 2003). The picture is no different for teacher certification. A study conducted in New York City public schools found that, on average, the certification status of a teacher (certified, uncertified, or alternatively certified) has at most small impacts on student test scores (Kane, Rockoff, and Staiger 2006). As for teachers’ years of experience in the profession, the literature in developed and developing countries alike indicates that teacher experience improves quality in the early years (1-2 years) of teaching, however, increased experience beyond this has no effect on teacher quality. (Hanushek et al. 2005; Bau and Das forthcoming). In summary, these observable characteristics together explain no more than 5 percent of the variation in teacher quality (Bau and Das forthcoming). 12 One explanation is that some teachers do not perform up to their knowledge frontier. Arguably, teachers’ education, specialized training, certification, and years of experience, in addition to other inherent characteristics such as their talent, shape the higher bound of their knowledge or ability to translate a given level of school-related inputs (e.g., the curricula, classroom supplies, learning materials) into learning for their students. What teachers know, however, might not be consistently reflected in what they actually do in the classroom. This “know-do” gap might be a key explanatory factor of the large and persistent differences in teacher quality among those teachers with the same level of education, certification status, and/or years of experience. In other words, at any given level of education, certification, and/or experience, teachers might exert different levels of effort in their classrooms. As such, with the largest share of countries’ education expenditure devoted to teachers (in the form of salaries, education and certifications), the know-do gap is one of the most significant sources of inefficiency in the education system. At the same time, closing this gap by increasing teachers’ effort up to their knowledge frontier is likely to have a very significant impact on student learning. 2.1.1 Teacher Effort In its most elementary level, low teacher effort can take the form of teacher absenteeism. It is reasonable to assume that teachers know that their timely attendance to class is needed for their students to learn. Yet teacher absenteeism is a significant problem in many countries. Using unannounced visits, nationally representative surveys found that 16 percent of teachers in Bangladesh, 14 percent in Ecuador, 25 percent in India, 19 percent in Indonesia, 11 percent in Peru, and 27 percent in Uganda were absent during normal school hours (Chaudhury et al. 2005). As expected, closing this very elementary know-do gap by increasing teacher effort yields significant effects on student learning. For example, evidence from a randomized controlled trial in rural India shows that reducing teacher absenteeism from 42 percent to 21 percent increases student test scores by 0.17 SD (Duflo, Hanna, and Ryan 2012). But even when teachers do show up to work, low effort can persist in teachers’ choice of classroom time allocation. Teachers’ classroom time can be thought of in terms of three sets of activities with descending levels of effort: instructional activities, classroom management activities (e.g., taking attendance, cleaning the blackboard, or distributing papers), and time spent completely off-task by being absent from the room or engaging in non-instructional socializing activities (Stallings 1986). Although it is reasonable to assume that teachers know that good practice for classroom time use consists of maximizing instructional time, minimizing classroom management activities, and abstaining from off-task activities, evidence from a number of developing countries suggests large variations in teachers’ use of classroom time across schools, which in turn is strongly predictive of student achievement. The single most consistent finding across a sample of schools in Rio de Janeiro, Mexico City, Honduras, Colombia, Jamaica, and Peru is the negative association between time off-task and student achievement. For example, in Rio de Janeiro, classroom observations revealed that the top 10 percent of performing schools spent an average of 70 percent of classroom time on instruction, 27 percent in classroom management, and only 3 percent off-task. This stands in stark contrast with the bottom 10 percent of performing schools, which spent only 54 percent of classroom time on instruction and a surprisingly high share of their time on classroom management (39 percent) and off-task (7 percent), resulting in students receiving an average 32 fewer days of instruction per academic year compared to their counterparts in high-performing schools (Bruns and Luque 2014). Yet as these elementary levels of teacher effort become satisfied (i.e., as the teacher absenteeism rate approaches zero and classroom time spent on instructional activities is nearly optimal), more substantive measures of teacher effort will be needed to understand differences in teacher quality. The case of Jordan is particularly illustrative in this matter. According to administrative data from school principal reports, Jordan benefits from an average teacher absenteeism rate of only 2.6 percent. Moreover, classroom observations suggest that, on average, teachers spend barely 4.5 percent of classroom time on non-instructional activities, and that at virtually no time are teachers 13 observed to be outside of the classroom during their lesson (USAID 2012). Within-country variance in students’ PISA math test scores, however, reveal a substantive gap between students at the 10 th percentile (290 points) and their counterparts at the 90 th percentile (485 points) (OECD 2012). Such variability in student outcomes may suggest important differences in teacher quality that are not captured in the elementary measures of teacher effort. Yet although less evident, low levels of more substantive measures of effort can be prevalent among teachers with excellent attendance rates who spend an optimal level of time in instructional activities. A variety of frameworks developed to assess a teacher’s classroom instructional practice could help capture more substantive measures of teacher effort. These include the Framework for Teaching (FFT) developed by Charlotte Danielson, the Classroom Assessment Scoring System (CLASS) developed at the University of Virginia, the Mathematical Quality of Instruction (MQI) developed at the University of Michigan and Harvard University, and the Protocol for Language Arts Teaching Observation (PLATO) and the Quality Science Teaching (QST), both developed at Stanford University. Broadly, these frameworks provide a set of dimensions, such as providing continuous feedback to students and designing coherent instruction, to mention a few, to assess teacher practices within the classroom. Evidence from the United States has found a correlation between these dimensions –as assessed by qualified observers– and gains in student outcomes as high as 0.18 SD (for FFT) and 0.25 SD (for CLASS). In fact, moving a teacher at the bottom quartile of the distribution on these dimensions to the top quartile corresponds to 0.06 SD and 0.08 SD in student outcome gains as measured by TFF and CLASS, respectively (Gates Foundation 2012). This suggests that teachers may be exerting suboptimal levels of effort in these substantial dimensions. 2.1.2 Holding Teachers Accountable to Increase Teacher Effort Against this backdrop, increasing teacher effort is a key priority for policy makers aiming to improve student outcomes. Doing so requires holding teachers accountable through, on the one hand, monitoring mechanisms that allow one to find facts and generate evidence on what teachers actually do, and, on the other hand, mechanisms to incentivize high effort and penalize shirking. Yet monitoring efforts face a critical “observability challenge,” described in the subsection below. Incentive schemes also encounter a “farther outcome problem,” also explained below, making the exercise of teacher accountability quite challenging. The Observability Challenge Monitoring and overseeing teachers’ level of effort –accountability’s first function– require teacher effort to be observable. The clearest example of this is in India, where an experiment that provided cameras to teachers to take pictures of themselves with their students at the beginning and end of each class with a tamper-proof date and time function allowed decision makers to observe teachers’ attendance and to make teachers’ salaries a function of their attendance rate ; teacher absenteeism rates decreased by 21 percentage points (Duflo, Hanna, and Ryan 2012). Yet as elementary levels of teacher effort (such as teacher attendance) are met and more substantial measures of teacher effort become necessary to explain differences in teacher quality, efforts to bring suboptimal levels of effort up to teachers’ knowledge frontier face a twofold observability challenge. On one hand, higher technical knowledge is required to distinguish between different levels of teacher effort. While assessing teachers’ level of effort through their attendance rate only requires knowledge of whether or not a teacher came to school, a similar assessment based on teachers’ level of instructional support to students, for example, necessitates technical knowledge of language modeling, instructional conversation, literacy instruction, richness of instructional methods, concept development, and use of formative assessments (Hamre and Pianta 2007). On the other hand, and closely tied to the knowledge challenge, is the proximity problem. The closer the measure of teacher effort is to the heart of teaching, the harder it is to observe effort for those who are outside the classroom and school. It is feasible for school principals, students’ parents, school 14 committees, and even decision makers to observe teacher attendance. But even with the necessary technical knowledge, parents and school committees would only be able to observe teachers’ level of instructional support to students if they had access to classrooms, which tends not to be the case in many countries. On the other side, with several schools to oversee, district supervisors or decision makers might be able to sporadically observe teachers’ level of instructional support, but not in a systematic fashion. Arguably, with both the necessary technical knowledge and the proximity to teachers’ classrooms, school principals seem to be best positioned to fulfill the monitoring function of accountability. The Farther Outcome Problem Recognizing the accountability void in many teacher compensation systems, some scholars and policy makers have recently begun to devise powerful incentives for teachers. As with the example of the photographic cameras in India, once decision makers were able to observe teachers’ effort (i.e., their attendance rate), they tied rewards and sanctions directly to this level of effort and were able to significantly reduce the absenteeism rate (Duflo, Hanna, and Ryan 2012). Yet with relatively low levels of observability for substantive measures of teacher effort to tie their incentives to, they have turned to a farther outcome they can observe: student test scores. That providing teacher incentives will improve student test scores as teachers exert higher levels of effort seems intuitive in principle, but the evidence of its effectiveness thus far is mixed. An experiment in India that provided bonuses to teachers based on their students’ test scores increased math and language scores by 0.28 SD, while also increasing the likelihood of teachers assigning homework and classwork and paying special attention to weaker students (Muralidharan and Sundararaman 2011). In contrast, a similar experiment in Kenya that rewarded schools based on student achievement found that teachers increased test preparation sessions, which resulted in higher scores in multiple choice tests but had no effect on open-ended question tests, suggesting no actual student learning (Glewwe, Ilias, and Kremer 2010). If anything, the findings in this strand of literature have cautioned about some of the undesirable and perverse practices through which higher test scores are achieved, void of actual student learning. It is reasonable to assume that by tying incentives to students’ test scores, many teachers will increase their levels of effort only in those activities that are less costly to them and that are most effective in achieving immediate student gains as measured by test scores. Veiled by the observability challenge, again, those not close enough to the classroom would fail to notice on time these undesirable, perverse practices. The Role of School Principals The key role that school principals can play in teacher accountability systems by easing the “farther outcome problem” and circumventing the “observability challenge” has been underemphasized thus far. Principals can play a key role in easing the farther outcome problem inherent to teacher accountability mechanisms used by policy makers (“top-down accountability”) and/or school committees and parents (“bottom -up accountability”). As complements to top -down and/or bottom-up accountability, principals’ monitoring and enforcement capacities within a school can be a powerful tool in ensuring that student outcomes are a product of increasing levels of effort in desirable teaching practices and, at minimum, detecting when they are not. School principals are not constrained by the twofold observability challenge. With many of them trained as teachers and having spent numerous years teaching in the classroom before entering school administration, it is reasonable to assume that principals can identify different levels of substantive aspects of teacher effort when they see them. Further, by sharing the same work space as teachers, and having the well-functioning of every classroom in the school as their main job, direct observation of teacher practices in all classrooms within a school is not only technically feasible for them, but also an implicit continuous responsibility (Bruns and Luque 2014). 15 Principals that leverage this position of visibility by continuously monitoring teachers to become aware of potential know-do gaps in their school’s classrooms could effectively contribute to bringing teachers’ levels of effort up their knowledge frontier. In fact, given the sizeable relative contribution of teachers within the school education production function, reducing te achers’ know-do gap is potentially the most direct mechanism through which principals can affect student outcomes. Empirical research has shown that, indeed, highly effective principals raise the achievement of a typical student in their schools by between two and seven months of learning in a single school year (Branch, Hanushek, and Rivkin 2013). Yet the pathways through which principals affect student outcomes have been underexplored thus far. The study herein provides new evidence that suggests a strong association between the degree to which principals leverage their visibility position to monitor teachers and teachers’ level s of substantive measures of effort, which in turn are predictive of student learning. 2.1.3 Roadmap to the Chapter The next section turns to the Jordanian education sector. An overview of the education system in Jordan indicates the value of focusing on teacher effort to improve student outcomes in the country. Section 2.3 uses evidence from a nationally representative sample of 156 schools in Jordan to test the association of principal monitoring and teacher effort. Section 2.4 specifically tests teacher effort as the pathway through which principal monitoring affects student learning. Finally, Section 2.5 provides complementary evidence from a comparative case study to ease potential endogeneity concerns. 2.2 THE EDUCATION SECTOR IN JORDAN In the last two decades, Jordan has achieved close to universal primary enrollment (97 percent) and completion (93 percent), as well as high enrollment (88 percent) and completion (90 percent) rates at the secondary level that are on par with OECD countries. Yet international student assessments keep refocusing the country’s attention to what actually matters: student learning. In spite of high levels of educational attainment, 15-year-old Jordanians’ average PISA mathematics, language, and science scores rank among the lowest of PISA-participating countries and economies (OECD 2012). Similarly, grade 8 students’ average achievement in both mathematics and science nearly bottoms the list of TIMSS-participating education systems (IEA 2011). These indicators are somewhat unexpected given Jordan’s internationally comparable expenditure levels in the education sector. Public education expenditure as a share of total government expenditure stood at roughly 10.3 percent in 2012 (World Bank 2016), slightly above the OECD average for that same year (9.8 percent), and on par with, for example, strong PISA-performers such as Germany, Austria, and Poland (Figure 1). Further, public education expenditure as a share of GDP was 3.4 percent in 2011 (World Bank 2016), just below the OECD average (5.2 percent), and yet at the same level as Singapore, Japan, and China’s administrative regions of Macao and Hong Kong –the top-PISA-performers (Figure 2). Average PISA scores mask considerable within-country variability in student learning, however. For example, students at the top quartile in mathematics score as high as the OECD average, while those at the bottom quartile perform worse than their counterparts in all participating countries but Peru and Qatar (OECD 2012). The picture is not very different for language and science scores, suggesting sizeable inequalities in student learning. These inequalities in student learning are unlikely to be attributed solely to differences in students’ background and socioeconomic status. In fact, the difference in Jordanian students’ mathematics performance associated with a one- unit increase in the PISA index of economic, social, and cultural status (ESCS) is one of the smallest among PISA-participating countries and economies. (OECD 2012). 16 Figure 1: Public Education Expenditure as a Share of Total Government Expenditure and Average PISA Math Scores 600 Above OECD Average Singapore Hong Kong-China 550 Japan Macao-China Switzerland Netherlands Estonia Finland Poland Canada Germany Belgium Vietnam 500 OECD Average Austria Australia Ireland Slovenia Denmark New Zealand Czech Republic France United Kingdom Iceland Latvia Norway Italy Spain Portugal Slovakia USA Hungary Sweden Croatia Israel Below OECD Average 450 Serbia Romania Bulgaria Cyprus Thailand Chile Malaysia Mexico 400 Uruguay Jordan Brazil Argentina Tunisia Colombia Indonesia Peru 350 Below OECD Average Above OECD Average 5 10 15 20 25 Public Education Expenditure as a share of Total Government Expenditure (%) Source: OECD 2012; WB 2012. Note: 2011 expenditure data were used for Estonia, Canada, Poland, Belgium, Germany, Austria, Ireland, Slovenia, Denmark, Czech Republic, United Kingdom, Iceland, Norway, Portugal, Italy, Slovakia, United States, Sweden, Hungary, Croatia, Israel, Cyprus, Bulgaria, Malaysia, Mexico, and Uruguay. This suggests that important drivers of the inequality in student learning might be found within schools. Yet an examination of the school production function suggests no sizable insufficiencies in structural inputs.5 School observations conducted in a nationally representative sample of schools in the country reveal basic school infrastructure to be almost universal, with 100 percent of schools having a source of electricity (97.4 percent functioning the day of the visit), all schools having working toilets or latrines (88.2 percent found to be very or somewhat clean), and roughly 90 percent of schools having a working drinking water source. Similarly, widespread availability of resources is found inside Jordanian classrooms, with 96.7 percent equipped with a blackboard/whiteboard and 97.7 percent with chalk/markers. Classroom inventories also reveal 99.6 percent of students are provided with a desk or bench/chair arrangement, and almost all students have an Arabic language textbook (99.3 percent), a math textbook (97.7 percent), and a pen or pencil to write with (99.2 percent) (USAID 2012). With a cadre of over 80,000 teachers, the most important school input, average class size stands at 27,6 on par with top PISA-performers such as Korea and Japan, and just slightly above the OECD average of roughly 21 students per classroom (OECD 2012). Teacher educational attainment is adequate and without substantial variability, as the vast majority of teachers (83 percent) have a bachelor’s degree or 5 Exogenous shocks, namely the recent incorporation of many refugee students into the education system due to the ongoing conflict in the Middle East, may have affected the resilience of the education system and with it, education inputs and/or outputs in localized areas. This exogenous shock with its corresponding inflow of inputs in the form of international aid, occurred after the data used in this study were collected, and thus, are outside the scope of this study. 6 As it would be expected, there is an urban-rural divide, as well as regional variation in class size, ranging from an average class size of 17 in the rural South Region to 31 in the urban Center Region (World Bank 2016). 17 higher diploma, 12 percent have a diploma, and roughly 5 percent have a postgraduate degree (USAID 2012). Further, available evidence on teacher training courses suggests a balance between pedagogical theory and methods with subject matter knowledge, whereby, for example, prospective primary school teachers spend 18 percent of their total training on pedagogy theory and methods, 27 percent of their time on mathematics, science, and language (9 percent each), and the remaining 55 percent of their time divided among six other subjects (social studies, English, computer science, art, physical education, Islamic learning) (World Bank 2010). With adequate levels of structural school inputs and class size, and an arguably satisfactory teacher knowledge frontier, the value of focusing on teacher effort to improve and bridge the gap in student learning becomes essential. Are teachers in Jordan performing up to their knowledge frontier? Available evidence indicates that Jordan benefits from an average teacher absenteeism rate of only 2.6 percent,7 on par with international standards. Moreover, classroom observations suggest that, on average, teachers spend barely 4.5 percent of classroom time on non-instructional activities, and that at virtually no time are teachers observed to be outside of the classroom during their lesson –again, on par with international standards (USAID 2012). Yet important differences in teacher quality may not be captured in these elementary measures of teacher effort. Section 2.4 identifies four substantive measures of teacher effort in Jordan that are predictive of student learning, and for which there is significant variability across teachers in the country. 7 This is the teacher absenteeism rate as reported by school principals, which might be an underestimation of the actual rate, masking excused absences allowed under certain conditions by the principal. Teacher absenteeism rates as measured by external observers through unannounced visits are not available for Jordan. 18 Figure 2: Public Education Expenditure as a Share of GDP and Average PISA Math Scores 600 Above OECD Average Singapore Hong Kong-China 550 Korea, Republic of Macao-China Japan Switzerland Netherlands Estonia Finland Poland Canada Germany Belgium OECD Average Vietnam 500 Australia Austria Czech Republic Slovenia Ireland New Zealand Denmark France United Kingdom Iceland Latvia Norway Italy Portugal Spain USA Slovakia Hungary Lithuania Sweden Croatia Israel 450 Below OECD Average Serbia Romania Bulgaria Cyprus Thailand Chile Malaysia Mexico Uruguay 400 Jordan Argentina Brazil Tunisia Indonesia Colombia Peru 350 Below OECD Average Above OECD Average 2 4 6 8 Public Education Expenditure as a share of GDP (%) Source: OECD 2012; WB 2012. Note: 2011 expenditure data were used for OECD average, Poland, Belgium, Germany, Austria, Denmark, Czech Republic, United Kingdom, Iceland, Norway, Portugal, Italy, Slovakia, United States, Lithuania, Sweden, Hungary, Croatia, Israel, Cyprus, Malaysia, Mexico, Uruguay, and Colombia. 19 2.3 PRINCIPAL MONITORING AND TEACHER EFFORT Is stronger principal monitoring associated with higher teacher effort? The first hypothesis of this study is that higher Box 1: Sampling levels of principal monitoring are associated with higher levels of teacher effort. To test this hypothesis, a In the first sampling stage, all multilevel model is estimated using data from a primary schools listed in Jordan’s nationally representative sample of schools in Jordan. Education Management The results suggest that principal monitoring is indeed a Information System (EMIS) were strong predictor of teacher effort, but the estimates also stratified by region (North, suggest that the effect of principal monitoring is a Central, and South) and school- function of principals’ ability to observe teacher effort in gender (all-boys, all-girls, and a given effort area. The characteristics of the data, the mixed schools), thus forming nine empirical strategy used, and the results from the analysis different strata. A random are detailed below. sample of schools was then selected proportional to the combined grade 2 and grade 3 2.3.1 Data enrollments as reported by the EMIS. This procedure resulted in a The empirical analysis relies on data collected total of 156 randomly sampled under the Student Performance in Reading and schools. Mathematics, Pedagogic Practice, and School Management Study conducted by USAID in Jordan. During the second stage, Data were collected for a nationally representative classes/teachers were sampled sample comprising 156 schools, and field work was within each selected school. In a completed at the end of May 2012. Data sampling was given school, one grade 2 class carried out in three stages to minimize bias and ensure was selected at random from all that the sample approximates wider population of the existing grade 2 classes characteristics as closely as possible (see Box 1 for (each with an equal probability sampling details). of selection). The selection process was repeated for the For each school in the sample, the principal (or third grade within each school, the assistant principal if the principal was not available) thus creating a sample of 156 was automatically chosen to complete the School randomly selected grade 2 Principal Questionnaire as well as the School classes and 156 randomly Observation Instrument. For each selected classroom, an selected grade 3 classes. external evaluator completed the Classroom Inventory Instrument and the classroom’s teacher was The third sampling stage automatically chosen to complete the Teacher randomly selected 10 students Questionnaire. Finally, each student in the sample within each class who were completed the Student Questionnaire, the Early Grade present on the day of the Math Assessment (EGMA), and an Early Grade Reading fieldwork. This process resulted in Assessment (EGRA) Instruments. Table 1 summarizes the 1,529 randomly selected grade 2 final count of the completed battery of instruments. students and 1,534 randomly selected grade 3 students. 20 2.3.2 Empirical Strategy Table 1: Study Instruments To test the relationship Total number between principal monitoring and Level of Instrument of instruments teacher effort, a dataset is administration completed constructed by collapsing all school-, teacher-, and student-level variables School Principal School/principal 156 in the USAID dataset at the teacher Questionnaire level. Next, a multilevel linear model School Observation School/principal 156 that allows the intercept in the Instrument regression equation to vary by Teacher Questionnaire Class/teacher 306 directorate and school is estimated. Such a model accounts for the Classroom Inventory Class/teacher 306 Instrument hierarchical nature of the data, which is nested into four groupings – Student Questionnaire Student 3,063 teachers, schools, directorates, and Early Grade Math Student 3,063 governorates. Assessment (EGMA) Early Grade Reading Student 3,063 Constructing a Principal Monitoring Assessment (EGRA) Index A composite measure of a principal’s level of teacher monitoring is constructed using : (a) a measure of the frequency by which a principal observes teachers in the classroom; and (b) an indicator of the frequency by which she reviews teachers’ lesson plans. Both measures are good indicators of the extent to which principals meet their key role of “following up with staff’s daily performance,” as stipulated by the Civil Service Bureau.8 Figure 3: Principal Monitoring Measures a. Observing teachers in the classroom Teachers were asked to recall how 71.5% often their principals observed their teaching through a teacher survey. Answers were recorded in a seven- point measure as (0) never, (1) once a TEACHERS (%) year, (2) once every 2-3 months, (3) 33.6% once every month, (4) once every 2 weeks, (5) once every week, and (6) 23.7% 20.4% daily. Figure 3 shows the distribution of 12.5% this measure in the sample. The majority 11.1% of teachers (33.6 percent and 23.7 6.4% 6.0% 4.9% 4.3% 2.3% 2.0% 0.7% 0.7% percent) reported that the principal conducted classroom observations once every 2-3 months, or once every NEVER ONCE A ONCE ONCE ONCE ONCE DAI LY Y EAR EVERY 2-3 EVERY EVERY EVERY month, respectively. Five percent MONTHS MONTH TWO WEEK reported never being observed by the WEEKS principal when teaching, and only 0.7 FREQUENCY OF MONITORING percent reported being observed every Principal Checks Lesson Plans single day. Principal Conducts Classroom Observations 8Jordan’s Civil Service Bureau stipulates seven specific roles for school principals, against which the latter are evaluated i n their Annual Performance Record. These are: (1) following up with staff’s daily performance, (2) developing and enabling school staff and ensuring the provision of an appropriate learning environment; (3) understanding and complying with the philosophy and core values of the education system; (4) organizing the school council’s meetings and activating the engagement of the local community; (5) cooperating with the supervisors to improve the teachers’ performance; (6) providing school supplies to ensure effective procedures; and (7) enhancing education concepts and codes of conduct for staff and students. 21 b. Checking teachers’ lesson plans Teachers were also asked to recall how often their principals checked their lesson plans. Answers were recorded in the same seven-point measure. The distribution for this variable is shown in Figure 3. Most teachers (71.5 percent) reported that their lesson plans are checked by the principal once every week. Roughly 6 percent reported this to be a daily occurrence, while 2 percent of teachers recalled that this had never happened. On their own, each of these measures provide information about two very specific types of monitoring mechanisms used by principals. However, the question at hand calls for an independent variable that provides a reasonable measure of the overall monitoring environment used by principals. As such, a composite measure –further referred to as Principal Monitoring Index– is constructed by adding together these two measures; it ranges from 0 to 12. If the Principal Monitoring Index variable takes on a value of 0, then a teacher reported that the principal never checks lesson plans or observes teaching. On the contrary, if the Principal Monitoring Index variable takes on a value of 12, then a teacher reported that the principal observes teaching and checks lesson plans every day. The composite measure provides a reasonable proxy for the overall monitoring environment since frequent teaching observation and lesson plan checking are likely to be correlated with frequent monitoring in other areas.9 The distribution of this composite measure, shown in Figure 4, reveals that almost 70 percent of schools have a Principal Monitoring Index that ranges somewhere between 6 and 8, while 9 percent of schools score below 6 in the index, and 21 percent have an outstanding monitoring index that is above 8. 9 The component measures for the Principal Monitoring Index variable are complements of each other. The complementary relationship between the two component measures ensures that the Principal Monitoring Index variable is a reasonable proxy for the true monitoring environment within a given school. 22 Figure 4: Principal Monitoring Index Box 2: Limitations of the Principal Monitoring Index An important shortcoming of 29.0% this index is that it only captures the monitoring function of principal 23.9% accountability. Arguably, principal accountability requires that principals PRINCIPALS (%) both monitor teacher performance 17.4% and, as a function of the information 14.8% they gather through monitoring, reward or penalize teachers to incentivize higher effort. The dataset, however, provides no good proxy for 5.2% incentives, thus leading this study to 4.5% limit its independent variable of 1.9% 1.9% interest to principal monitoring. 0% 0% 0.6% 0.6% 0% Section 2.5 addresses this limitation, presenting results from a qualitative 0 1 2 3 4 5 6 7 8 9 10 11 12 study aimed at disentangling the PRINCIPAL MONITORING INDEX effect that each of the two accountability functions –monitoring and enforcement– could potentially have on teacher effort. Measuring Teacher Effort In guiding the selection of the dependent variables for the study, a set of four measures of teacher effort that match each of three domains of the FFT were identified from the USAID dataset that match the teacher professional standards in Jordan, as stipulated by the Civil Service Bureau, which in turn match each of three domains of the Framework for Teaching (FFT)10 –an internationally used and comprehensive framework developed by education expert Charlotte Danielson to assess teachers’ practices. The first variable –Creating an Environment of Respect and Rapport – matches the national teacher standard on treating students with courtesy and falls under FFT’s Classroom Environment Domain. The second variable –Providing Feedback to Students– matches the teacher standard on grading students’ assignments and falls under FFT’s Instruction Domain. The third variable –Designing Student Assessment– matches Jordan’s teacher standard on using effective educational strategies and evaluation methods and falls under FFT’s Planning and Preparation Domain. Finally, the fourth variable – Designing Coherent Instruction– matches the teacher standard on planning for effective learning considering students’ individual differences and falls under FFT’s Planning and Preparation Domain. As shown in Figure 5, a reasonable degree of heterogeneity in observability exists in the selected variables, 10 FFT divides the complex activity of teaching into 22 components, clustered into four domains of teaching responsibility: (1) Planning and Preparation, (2) Classroom Environment, (3) Instruction, and (4) Professional Responsibilities. Table I in Appendix A presents FFT’s four domains and respective components. FFT’s fourth domain (Professional Responsibilities) has not been included in this study, as it is aimed to capture teachers’ professional development, which this study regards as part of teachers’ knowledge frontier, and teachers’ effort in outreach activities with the community at large . 23 with those pertaining to the Planning and Preparation Domain mostly requiring the exertion of teacher effort outside of the classroom, and those relating to the Classroom Environment and the Instruction Domains necessitating teachers’ daily effort inside the classroom. If principal monitoring is indeed a strong predictor of teacher effort, it should be expected that such association be a function of the degree of observability in the different measures of effort. Figure 5: Measures of Teacher Effort Mapped Against the FFT Outside the classroom Inside the classroom Domain 2: Classroom Environment  Creating an environment of respect Domain 1: Planning and Preparation and rapport  Establishing a culture for learning  Managing classroom procedures • Designing student  Demonstrating  Managing student behavior assessment knowledge of content  Organizing physical space • Setting instructional and pedagogy outcomes  Demonstrating • Designing coherent knowledge of students Domain 3: Instruction instruction  Demonstrating knowledge of resources  Providing feedback to students  Using questioning and discussion techniques  Engaging students in learning  Demonstrating flexibility and responsiveness Heterogeneity of Observability Less Observable More Observable The four selected measures of teacher effort are described in Table 2. 24 Table 2: Measures of Teacher Effort Measure of Teacher Effort Description and Scale Source This variable is proxied through an ordinal measure of how a teacher responds when a student is unable to answer a question during instruction. Higher values represent higher levels of teacher effort in responding to students in a way that is more conducive to creating a respectful and emotionally supportive environment for learning, while lower values represent the opposite. Specifically, this variable ranges from 0 to 2. A score of 2 is assigned Data for this variable were Creating an Environment of in cases where the student reports that the teacher rephrases/explains the collected based on interviews with Respect and Rapport question, encourages the student to try again, or corrects the student but 10 randomly selected students in does not scold him/her. A score of 1 is assigned in cases where the student each sampled classroom. reports that the teacher asks another student or the teacher asks the same student the exact same question. A score of 0 is assigned in cases where the student reports that the teacher scolds the student, sends the student outside of the classroom, hits the student, or sends the student to the corner of the classroom. This variable is proxied through an ordinal measure that measures how many Data for this variable were comments or corrections a teacher provides in each student's Arabic collected by an external observer language copybook. The variable ranges from 0 to 4, with higher values who visited classrooms and Providing Feedback to Students representing higher teacher effort in providing comments or corrections more examined the Arabic language frequently, and lower values representing fewer to no marks in students’ copybook of 10 randomly selected copybooks. students per classroom. This variable is proxied through an ordinal measure denoting how many of the Data for this variable were following assessment methods a teacher uses to monitor student learning and collected based on surveys Designing Student Assessments to provide a variety of performance opportunities for students: written tests, administered to every teacher in oral evaluations, homework, worksheets, end-of-semester evaluations, each sampled classroom. projects/portfolios, and debates. The variable ranges from 0 to 7. This variable is proxied through a dichotomous measure denoting whether a teacher uses student assessments to inform the design of her lesson plan. Data for this variable were Teachers who consider specific student performance and needs while collected based on surveys Designing Coherent Instruction designing lessons are considered to put forth more effort since creating administered to every teacher in tailored lesson plans takes more forethought and effort than just using a “one - each sampled classroom. size fits all” lesson plan. 25 Variability in Teacher Effort Box 3: Caveat For Teacher Effort Measures It was previously noted that given their It is to be noted that the Designing Student Assessment and small variability, elementary measures of Designing Coherent Instruction variables are measured through teacher surveys. As such, they are likely to be subject to social teacher effort such as teacher desirability bias –a tendency of survey respondents (i.e., absenteeism and classroom time teachers) to answer questions in a manner that will be viewed allocation were unable to capture any favorably by others. Yet the sizeable proportion of teachers in meaningful differences in teacher quality the sample that provided answers for these two measures that in Jordan. Contrariwise, an important are viewed negatively for the purposes of this study may degree of variability exists in the four suggest only a modest interference of this bias with the more substantive measures of effort interpretation of the study results. among teachers in the sample. i. Creating an Environment of Figure 6: Creating an Environment of Respect and Respect and Rapport Rapport Interviews with students revealed important differences in the extent to which teachers strive to create an environment of respect and rapport in 70.7% their classrooms. As shown in Figure 6, when a student is unable to answer a Teachers (%) question, almost a fourth of teachers try to create a positive environment by explaining or rephrasing the question, encouraging the student to try again, or 23.8% correcting the student without scolding her. However, as many as 70 percent of 5.5% teachers are reported to simply repeat the exact same question to the same Negative Somewhat Positive student again, or to ask another student environment negative environment instead, while 5.4 percent of teachers environment are reported to scold students, hit them, Teacher's response when a student is unable to answer a or send them outside of the classroom or question (reported by students) to stand on a corner if they fail to give the right answer to a question. ii. Providing Feedback to Students Figure 7: Providing Feedback to Students Variability is also observed in the effort 53.26% put forth by teachers in providing feedback to their students, as shown in Teachers (%) Figure 7. Roughly one-fifth of all teachers had marked all pages of their students’ 23.71% copybooks, and almost half of them had 19.59% marked most pages. One-fourth of all teachers, however, had marked only a 3.44% few pages of the copybook and 3.4 percent of them had not marked even a None Some (every Many (most All pages single page. few pages) pages) Comments or corrections teacher provides in student's copybook (based on external observer's assessment) 26 Figure 8: Designing Student Assessments iii. Designing Student Assessments 38.36% Interview answers also recorded varying levels of effort exerted by teachers in designing assessment methods for their students. Almost two-thirds of teachers report using only one or two methods of student assessment, while around 20 Teachers (%) 23.61% percent report using three methods, and 20.98% roughly 15 percent of teachers report using more than four methods to assess their students. Only 1.6 percent of teachers report not using any method of student assessment whatsoever (Figure 8). 5.90% 4.92% 4.59% iv. Designing Coherent Instruction 1.64% Finally, as seen in Figure 9, only one- 0.00% fourth of teachers report using student 0 1 2 3 4 5 6 7 assessments to inform their lesson planning, Number of student assessment methods used by teacher with the great majority (75.7 percent) (self-reported) reporting to be agnostic to it. Figure 9: Designing Coherent Instruction 75.74% Heterogeneity in the Observability of Teacher Effort Discerning where each of the above four measures of teacher effort falls on the Teachers (%) observability spectrum merits special consideration. At one extreme of the observability spectrum, Creating an 24.26% Environment of Respect and Rapport is mainly determined by teachers’ effort in interacting in a positive and supportive tone with their students, which can be observed within the classroom on a daily basis. As such, a strong Yes No and positive relationship is to be expected between principal monitoring and teacher Teacher uses student assessment for lesson planning effort in Creating an Environment of Respect (self-reported) and Rapport. Similarly, a strong and positive association is expected between principal monitoring and teacher effort in Providing Somewhere in the middle of the Feedback to Students, as this requires teachers spectrum, teacher effort in Designing Student to have a “finger on the pulse” of a lesson, and Assessment can be more challenging for to monitor student learning on a daily basis. This principals to observe. Teacher effort in this area daily imperative allows principals to conduct is mostly exerted outside of the classroom at random classroom visits on any day and expect determined intervals throughout the academic to see students’ copybooks to be marked. term. For example, the use of a rich set of assessments that provide a variety of 27 performance opportunities for students requires of the hardest areas for principals to observe. If effort at the design stage of such assessments, Designing –and implementing– Student before the start of the academic term, and/or Assessments is already difficult to observe, before determined assessment intervals during determining the extent to which the information the term. Further, it requires periodic effort in gathered through these assessments is used by implementing the designed assessments, such teachers to inform the design of a coherent as administering oral and written exams and lesson plan that responds to students’ learning assessing student debates, presentations, and needs can be even more formidable. Effort that projects. As it takes place mostly outside of the teachers put in planning for their lessons classroom, and is implemented intermittently happen almost exclusively outside of the throughout the academic term, teacher effort classroom, and although principals can observe in this realm is difficult to observe in its full their teaching during a class or request to read breadth through a limited number of classroom their lesson planning records, determining just observations. As such, a positive but weaker how much effort a teacher puts into designing relationship is to be expected between a lesson plan that is actually relevant to her principal monitoring and teacher effort in student needs seems extremely difficult. Thus Designing Student Assessment. principal monitoring should have only a very Lastly, at the other extreme of the weak relationship with teacher effort in this observability spectrum, teacher effort in realm. Designing Coherent Instruction is likely to be one Box 4: Bivariate Correlations Among Measures of Teacher Effort Having identified these four different measures of teacher effort, it is important to consider the bivariate correlations between them. If the correlation between the dependent variables is low, then it is likely that the four different measures are accounting for distinct aspects of teacher effort. Having uncorrelated measures of teacher effort is important since it will allow the analysis to determine how principal monitoring is associated with teacher effort on a variety of dimensions. The correlations between the different measures of teacher effort are provided in the table below. The strongest correlation (at 0.347) exists between the Designing Student Assessment variable and the Designing Coherent Instruction variable. The moderate correlation between these two variables is not surprising, since both of these measures proxy how much effort a teacher puts into planning and preparing for her lesson by designing student assessments that can help her inform her instruction. Aside from this, most of the other correlations are weak. This set of results suggests that the different dependent variables are indeed measuring distinct aspects of teacher effort. CORRELATION BETWEEN MEASURES OF TEACHER EFFORT Environment of Designing Providing Feedback Designing Student Respect and Coherent to Students Assessment Rapport Instruction Providing Feedback 1.000*** to Students Environment of 0.046** 1.000*** Respect and Rapport Designing Student 0.080** 0.219*** 1.000*** Assessment Designing Coherent 0.107*** -0.048** 0.347*** 1.000*** Instruction *p-value<0.10 **p-value<0.05 ***p-value<0.01. 28 Control Variables A variety of factors are likely to confound the relationship between principal monitoring and teacher effort. These factors are related to teachers’ knowledge frontier, the socioeconomic status of each school and its students, other monitoring mechanisms used b y actors other than the school principal, and specific school characteristics. As described in Table 3, measures for all of these factors have been included in the analysis to account for potential confounding effects. Descriptive statistics for all of the variables used in the analysis are presented in Table A.2 in Appendix A. Table 3: Control Variables Explanation of Potential Confounding Set of Control Variables Description of Specific Variables Effects The analysis includes: (1) an ordinal measure of a teacher's highest level of Controlling for a teacher’s level of education, (2) a dichotomous variable that denotes if a teacher received knowledge is important since highly pre-service training in how to teach reading, and (3) an indicator variable Teachers’ Knowledge Frontier qualified teachers may self-select that denotes if a teacher received pre-service training in how to teach themselves into schools with highly math. The analysis is unable to control for a teacher’s years of experience, competent and motivated principals. as this information was not collected in the fieldwork. Controlling for differences in socioeconomic status across students The analysis includes: (1) a variable that denotes if a school receives and schools is important since Socioeconomic Status of Schools government aid, (2) a variable measuring whether a student's family owns principals and teachers that exert and Students a computer, and (3) a variable that represents how wealthy a school is higher levels of effort could self-select relative to other schools in Jordan. themselves into better resourced schools and/or schools in higher- income neighborhoods. The analysis controls for “top-down” monitoring by including: (1) an ordinal variable that records how many times a school has been visited by a It is essential to account for other directorate inspector as reported by the principal, and (2) an ordinal types of monitoring because variable that measures how often a teacher has been observed teaching Other Types of Monitoring principals are likely to increase their by a directorate supervisor as reported by the teacher. It also controls for Mechanisms monitoring activities when they monitoring coming from the community and/or parents (“bottom-up”) by perceive that other actors are highly including (3) an ordinal variable that captures how frequently the parent concerned with teacher effort. teacher association met during the past school year, as reported by the principal. The analysis includes: (1) a dichotomous variable denoting if a school is Other School/Class located in a rural district, (2) an ordinal variable that records the gender of It is standard to control for specific Characteristics a school (i.e., all boys, all girls, or mixed), and (3) a variable that records school/class characteristics. the teacher-student ratio in each class. 29 2.3.3 Results Results from the analysis are presented in More effort in creating a positive learning Table A.3 in Appendix A, which provides environment is put forth by teachers who are estimates from four sets of models. Models 1-3 monitored more often. estimate the relationship between principal The results in Models 4-6 suggest that a monitoring and teacher effort in Providing strong and positive relationship exists between Feedback to Students. Models 4-6 test this principal monitoring and teacher effort in relationship with Creating an Environment of creating a climate of respect and rapport. The Respect and Rapport as the dependent coefficient for the Principal Monitoring Index variable. The relationships between principal variable is positive and statistically significant in monitoring and the Design of Student all three models, such that a 1.0 SD increase in Assessment and the Design of Coherent this index corresponds to a 0.09 SD increase in Instruction are estimated in Models 7-9 and 10- teacher effort in creating a positive learning 12, respectively. Each set of estimates provides environment for students. results from a number of models to test whether the results are robust to model specification. Findings from these models are discussed As anticipated, higher principal monitoring is below. weakly predictive of teacher effort in designing student assessment and not predictive of Teachers who are frequently monitored are designing coherent instruction. more likely to provide feedback to their students. In Models 7-9, the coefficient for the Principal Monitoring Index variable is positive In Models 1-3, the coefficient for the but just misses statistical significance with p- Principal Monitoring Index variable is positive values of 0.111, 0.134, and 0.110, respectively. and statistically significant at the 95% This set of results suggests that teachers may be confidence level, suggesting that teachers are more likely to put forth effort in designing and more likely to provide frequent using a variety of student assessment methods comments/corrections in their students' Arabic when they are monitored frequently by their language copybook when there is a high level principal. In Models 10-12, the coefficient for the of monitoring by the principal. Substantively, as Principal Monitoring Index variable is positive shown in Table 4, a 1.0 SD increase in the but statistically significant in just one model, Principal Monitoring Index variable corresponds suggesting no robust empirical relationship to a 0.14 SD increase in teacher effort in between principal monitoring and the likelihood providing feedback to students. that a teacher will put forth effort to use student assessments to inform the design of her instruction. Table 4: Substantive Effects – Principal Monitoring and Teacher Effort Environment of Providing Designing Designing Respect and Feedback to Students Student Assessment Coherent Instruction Rapport 0.145 0.090 0.048 0.082 [0.046, 0.242] [0.000, 0.178] [-0.003, 0.099] [0.003, 0.159] (0.033, 0.261) (-0.022, 0.194) (-0.010, 0.108) (-0.007, 0.174) Note: 90% confidence intervals in brackets. 95% confidence intervals in parentheses. Estimates were produced from Models 3, 6, 9, and 12 in Table A.3 in Appendix A. 30 Overall, principal monitoring is strongly variable to achieve statistical significance in associated with teacher effort, yet such Models 7-9 is not surprising, since observing just association is a function of the extent to which how rich a teacher’s assessment methods are is different measures of teacher effort are somewhat difficult for a principal to do with a observable to principals. limited number of classroom observations. The empirical analysis suggests that Finally, the analysis suggests that no higher levels of monitoring by principals have a robust relationship exists between principal strong and positive association with teachers’ monitoring and teacher effort in designing effort in Providing Feedback to Students and coherent instruction. This empirical result is also Creating an Environment of Respect and not surprising as it is extremely difficult for Rapport –both measures of teacher effort that principals to observe just how much effort a fall on the right side of the observability teacher puts into lesson plans. With this last spectrum. Furthermore, the results suggest that measure of teacher effort falling on the left side a weaker, but still positive, relationship exists of the observability spectrum, there is no reason between principal monitoring and teacher to expect that frequent monitoring by principals effort in designing student assessments. The will increase teacher effort in the context of inability of the Principal Monitoring Index lesson planning. 2.4 MONITORING, TEACHER EFFORT, AND STUDENT LEARNING IN JORDAN Is stronger principal monitoring also results suggest that principal monitoring is associated with higher student indeed strongly associated with student learning and that such association is mediated learning? by those areas of teacher effort that are The results presented thus far suggest observable to the principal. that teachers are more likely to exert higher levels of effort (in areas where effort is actually 2.4.1 Empirical Strategy observable) when they are monitored To test the relationship between frequently by their principals. Yet the critical principal monitoring and student learning, and question is whether principal monitoring is its potential mediation by teacher effort, a actually associated with better student learning. dataset is constructed by disaggregating all Empirical research has shown that highly variables from the previous analysis at the effective principals raise the achievement of a student level. Next, a multilevel mediation typical student in their schools by between two analysis is conducted to account for the and seven months of learning in a single school hierarchical nature of the data. Principal year (Branch, Hanushek, and Rivkin 2013). But monitoring can affect student learning through the pathways through which principals affect two potential pathways. First, it is possible that student outcomes have been underexplored principal monitoring may have a positive thus far. This study posits that reducing teachers’ impact on student academic performance know-do gap through higher levels of because of its influence on increased teacher monitoring could potentially be the most direct effort. In addition to this indirect effect, it is mechanism through which principals can affect possible that principal monitoring can directly student outcomes. Specifically, the second influence student learning (Figure 10 illustrates hypothesis of this study is that principal these two pathways). The conducted monitoring is associated with better student mediation analysis considers the possibility that learning and that this association is mediated principal monitoring has both a direct and an by teacher effort. To test this hypothesis, a indirect association with student learning. multilevel mediation analysis is conducted. The 31 Figure 10: Causal Pathways of Principal Monitoring on Student Learning Indirect effect Principal Monitoring Teacher Effort Student Learning Direct effect Box 5: Multilevel Mediation Analysis To conduct the multilevel mediation analysis, Hicks and Tingley’s (2011) mediation package in R is used to calculate the average mediation and direct effects by simulating predicted values of the mediator or outcome variable, which are not observable, and then calculating the appropriate quantities of interest (average causal mediation, direct effects, and total effects). This allows for the implementation of Imai, Keele, and Tingley’s (2010) four -step parametric algorithm: (i) fitting models for the observed outcome and mediator variables; (ii) simulating model parameters from their sampling distribution; (iii) simulating potential values of the mediator, calculating potential outcomes given simulated values of the mediator, and computing quantities of interest for each draw of model parameters; and (iv) computing summary statistics (Hicks and Tingley 2011). The hierarchical nature of the data poses a challenge for mediation analysis. Computational limitations only allow for mediation analysis to be conducted including one random effect into the model (for either the teacher, school, or directorate level). Because the intra-class correlation is likely to be highest at the school level (Duflo, Hanna, and Ryan 2012; Imberman 2011; Lavy, Paserman, and Schlasser 2011), the mediation analysis is conducted while accounting for clustering at this level. As a robustness check, results from an analysis that accounts for clustering at the class level are also presented. A final note to be made is about the sequential ignorability (SI) assumption –a necessary assumption to achieve identification in mediation analysis. The SI assumption comprises two assumptions: first, the independent variable is assumed to be statistically independent of potential outcomes and potential mediating variables; and second, the mediating variable is assumed to be exogenous conditional on pretreatment confounders and the independent variable of interest (Hicks and Tingley 2011). As these two assumptions are rarely satisfied in applied research, it is important to determine how sensitive estimates are to violations of SI. A discussion of how robust the mediation analysis results presented below are to violations of SI is provided in Appendix B. Measuring Student Outcomes Independent Variable: Principal Monitoring Principals’ role in teacher accountability As with the analysis in Section 2.4, the systems is as important as its contribution to independent variable of interest for the ensuring that student outcomes are a product mediation analysis is the Principal Monitoring of increasing levels of teacher effort in desirable Index –a composite measure of a principal’s teaching practices that are conducive to level of teacher monitoring. Using the Principal actual student learning, and not of perverse Monitoring Index variable, the analysis is able to teacher practices that simply promote rote determine specifically how principal monitoring memorization. is associated with student outcomes. 32 As such, two pairs of measures of student outcomes are used as dependent Figure 11: Letter Sound Knowledge variables for this analysis. The first pair are test scores measuring language skills, while the latter pair are test scores that assess mathematics 100 [VALUE]% skills. Each pair includes one score that measures basic automaticity and, thus, is likely 80 [VALUE]% to be subject to rote memorization. This is paired with a second score that measures conceptual understanding and application of key concepts 60 [VALUE]5 Score to new situations, and that is likely to represent actual student learning. 40 [VALUE]% a. Students’ Language Skills 20 [VALUE]% To measure students’ language skills, the analysis uses two main variables. First, a variable 0 [VALUE]% that denotes the percent of letter sounds a student correctly identifies (Letter Sound 0 5 10 15 20 25 30 35 Knowledge) is used to measure student Students (percentage) outcomes in basic language automaticity that are likely to be subject to rote memorization. As seen in Figure 11, significant variance exists in Figure 12: Reading Comprehension student outcomes for this variable, with an average of 33.4 percent of questions correctly answered and an SD of 21.3. At one end of the distribution, roughly 15 percent of students 91-100 [VALUE]% correctly answered more than half of the questions, while at the other end of the 81-90 [VALUE]% distribution, one-third of students provided correct answers to only 10 percent of the 71-80 [VALUE]% questions. 61-70 [VALUE]% Second, the Reading Comprehension variable is used to measure student outcomes in 51-60 [VALUE]% Score language skills that require a conceptual Mean understanding and are likely to represent actual 41-50 [VALUE]% student learning. Students were given a passage to read, and then asked to answer 31-40 [VALUE]% questions related to the passage. Half of the [VALUE]% 21-30 students in the sample were able to correctly answer as little as 20 percent of the questions they were asked, below the average of 26.1 11-20 [VALUE]% percent, meaning the distribution was fairly skewed. In contrast, 7.5 percent of students 0-10 [VALUE]% correctly responded to all of the questions, and 0 5 10 15 20 25 30 35 almost 20 percent of them provided accurate answers for 60-80 percent of the questions Students (percentage) (Figure 12). Mean 33 b. Students’ Mathematic Skills Figure 13: Number Identification Two other variables are used to measure students’ skills in mathematics. First, a variable 91-100 [VALUE]% that denotes the percent of a selection of one- to three-digit numbers a student correctly 81-90 [VALUE]% identifies (Number Identification) is used to [VALUE]% measure student outcomes in basic number 71-80 automaticity that are likely to be subject to rote 61-70 [VALUE]% memorization. Relative to the other variables, little variance exists in the data for this variable. 51-60 [VALUE]% Score As illustrated in Figure 13, half of the students in the sample accurately identified 80 percent or 41-50 [VALUE]% more of the digits they were presented with, and as few as a fifth of them correctly identified 31-40 [VALUE]5 less than 50 percent of the digits. 21-30 [VALUE]% Second, the Word Problems variable is 11-20 [VALUE]% used to measure students’ conceptual understanding of key mathematical concepts 0-10 [VALUE]% by presenting them three situations in words, 0 10 20 30 40 50 and asking them to make a plan and solve the problems through any mathematical solution Students (percentage) Mean they can think of. Contrary to the Number Identification variable, the Word Problems Figure 14: Word Problems variable presents significant variance in the data, with almost a third of students unable to correctly answer even a single question, roughly 3 [VALUE]% 50 percent of students correctly answering one or two of the questions, and only 14 percent 2 [VALUE]% Score correctly responding to all three word problem [VALUE]% questions (Figure 14). 1 0 [VALUE]% The four selected measures of student outcomes are described in Table 5. 0 10 20 30 40 Students (percentage) Mean 34 Table 5: Measures of Student Outcomes Measure of Student Outcomes Description and Scale Source Students were shown a chart containing 10 rows each with 10 letters arranged randomly, yielding a Data from this variable were collected total of 100 letters. Students were then asked to based on the Early Grade Reading produce the sounds associated with each letter as Assessment (EGRA) administered by Letter Sound Knowledge quickly and accurately as they could within one USAID examiners to the 10 randomly minute, yielding a score of correct letters per selected students in each sampled Language Skills minute. As such, the variable denotes the percent classroom. of letter sounds a student correctly identified. Students were given a passage to read, and after a Data from this variable were collected minute, the passage was removed. Students were based on the Early Grade Reading then orally asked questions that required them to Assessment (EGRA) administered by Reading Comprehension answer basic facts or inferential questions based on USAID examiners to the 10 randomly the passage they read. The variable is the number selected students in each sampled of correct answers by the student, with a maximum classroom. possible score of 6. Data from this variable were collected Students were given 30 seconds to orally identify based on the Early Grade Mathematics one- to three-digit numbers arranged in order of Assessment (EGMA) administered by Number Identification increasing difficulty presented in a grid. Thus, the USAID examiners to the 10 randomly variable measures the percent of number selected students in each sampled identification questions answered correctly. classroom. Math Skills Students were presented with three situations in Data from this variable were collected words, and asked to make a plan and solve the based on the Early Grade Mathematics problems through any mathematical solution they Assessment (EGMA) administered by Word Problems could think of. The variable provides a three-point USAID examiners to the 10 randomly measure of a student’s ability to correctly answer selected students in each sampled word problems in mathematics. classroom. Measures of basic automaticity, likely to be subject to rote memorization. Measures of conceptual understanding and application of key concepts to new situations, likely to represent actual student learning. 35 Mediating Variable: Teacher Effort In contrast, if principal monitoring has a strong and positive association on teacher The mediating variables are the four effort in a given area, then higher levels of measures of teacher effort used in the analysis principal monitoring are expected to be in Section 2.4: Creating an Environment of associated with student outcomes through Respect and Rapport, Providing Feedback to these mediators (i.e., the Creating an Students, Designing Student Assessments, and Environment of Respect and Rapport , Providing Designing Coherent Instruction. Multiple teacher Feedback to Students variables). effort measures are again used to ensure that teacher effort is measured across a variety of areas. To ensure that the results are robust for the mediation analysis, a large number of tests Control Variables are conducted by considering every unique combination of student outcomes and teacher A variety of factors are likely to effort variables. confound the relationship between principal monitoring and student outcomes. These factors The heterogeneity in the observability of are related to students’ and schools’ teacher effort has important implications for the socioeconomic status, and to the extent to mediation analysis. Since principal monitoring is which students receive academic support very weakly associated with teacher effort in outside of school. As described in Table 6, areas where effort is difficult to observe, it is measures for these factors were included in the expected that monitoring will also have less of analysis to account for potential confounding an indirect association on student learning effects. Descriptive statistics for all of the when considered through such mediators (i.e., variables used in the analysis are presented in the Designing Student Assessments, and Table A.4 in Appendix A. Designing Coherent Instruction variables). Table 6: Control Variables Included in the Mediation Analysis Explanation of Potential Confounding Set of Control Variables Description of Specific Variables Effects Four variables were included into the analysis to control for a student’s socioeconomic Controlling for differences in status: (1) a variable that denotes if a student socioeconomic status across students has a radio in his household, (2) a variable Socioeconomic Status of and schools is important since high- that denotes if a student’s family owns a car, Students and Schools performing principals, teachers, and (3) a variable that denotes if a student has a students could self-select themselves computer in his household, and (4) a into better resourced schools. variable that denotes if a student receives free meals at school. Closely related to the first set of control variables, this set of variables stems from the strong and well documented association between family Two variables were included in the analysis: background and student achievement (1) a variable is used to denote whether a (Sirin 2005; Bornstein and Bradley 2003; Academic Support student receives help with her homework at Brooks-Gunn and Duncan 1997; Outside of School home, and (2) another variable is used to Coleman 1988). Namely, parents with denote if a student receives private lessons higher socioeconomic status are more after school. likely to provide their children with a stimulating home environment to promote cognitive development and better school outcomes. 36 2.4.2 Results monitoring, when mediated by teachers’ effort in Creating an Environment of Respect and Results from the mediation analysis are Rapport, may be as high as 0.07 SD in language presented in Table A.5 of Appendix A and test scores that are suggestive of actual student described below. learning, for an increase in 9 points in the index. Students are more likely to learn math when Finally, as expected, with principal their teacher is frequently monitored by the monitoring having a very weak effect on school principal. teachers’ effort in areas that are difficult to The Principal Monitoring Index variable observe, the Principal Monitoring Index has a has a positive and statistically significant indirect statically significant indirect effect on only one effect on student math outcomes that is of the student outcome measures (Reading mediated through teachers’ effort in Providing Comprehension) that is mediated through Feedback to Students and Creating an teachers’ effort in Designing Student Environment of Respect and Rapport. In other Assessments. Further, the index has no words, teachers are more likely to put forth statistically significant indirect effect on any of more effort in their teaching when frequently the four measures of student outcomes, monitored by principals, and students tend to mediated through teachers’ effort in Designing learn better when taught by teachers who exert Coherent Instruction.11 higher levels of effort. Specifically, as mediated by teachers’ effort in Providing Feedback to Students, increasing principal monitoring from an index of Box 6: Robustness Check and Sensitivity 2 –at the bottom of the distribution– to an index Analysis of 11 –at the top of the distribution– may The mediation analysis presented in Table increase student outcomes by an average of roughly 0.03 SD in math test scores that are A.5 in Appendix A accounts for clustering at the school level. Table A.6 presents prone to memorization, and, more importantly, results from an additional mediation by approximately 0.02 SD in math scores that analysis that accounts for clustering at the are suggestive of actual student learning. class level. The latter suggests that the Similarly, when considered through teachers’ empirical conclusions herewith are effort in Creating an Environment of Respect relatively robust to changes in the way in and Rapport, the indirect effect of principal which clustering is accounted for. monitoring may be as high as 0.05 SD in math From another perspective, and as noted in test scores that are suggestive of actual student Box 5, the SI assumption is necessary to learning, for an increase in 9 points in the achieve identification in mediation analysis. Principal Monitoring Index. Since SI is likely to be violated in the data, a sensitivity analysis is presented in Appendix Students are also more likely to learn language B to estimate the extent to which the skills when their teacher is frequently monitored estimates are robust to violations of SI. The by the school principal. analysis suggests that the results from the mediation analysis are somewhat sensitive The indirect effect of principal to violations of SI. monitoring on students’ language outcomes is no different than in mathematics. Increasing principal monitoring from an index of 2 –at the bottom of the distribution– to an index of 11 –at the top of the distribution– may increase student outcomes by an average of roughly 0.05 SD in 11 In terms of the direct effect, the mediation analysis language test scores that are prone to generally suggests that principal monitoring has no direct memorization, and by approximately 0.04 SD in effect on student outcomes. The Principal Monitoring Index language scores that are suggestive of actual is shown to be positively associated with only one measure student learning –as mediated by teachers’ of student outcomes (Reading Comprehension). Estimates for the direct effect of principal monitoring on student effort in Providing Feedback to Students. outcomes are available upon request. Further, the indirect effect of principal 37 2.5 COMPARATIVE CASE STUDY IN JORDANIAN SCHOOLS The results of the empirical analyses in Sections 2.3 and 2.4 suggest that teachers are more likely to put forth effort in their teaching (in areas where effort is observable) when they are closely monitored by their principals. Moreover, the mediation analysis suggests that principal monitoring is positively associated with student learning, as mediated by teacher effort. Although the results are consistent with expectations, they are also vulnerable to endogeneity concerns and the potential of omitted variable bias. As the principal monitoring variable is constructed using observational data –and thus not randomly assigned– it is likely that it is correlated with unobservable factors that also influence teacher effort and student learning. If this index is indeed correlated with such unobservable factors, then the estimates in the previous two sections cannot be interpreted causally, and are rather associations. Aiming to gain insight into relevant causal mechanisms that could potentially add inferential leverage to the quantitative analyses in the previous sections, a comparative case study of six Jordanian schools was conducted using statistical matching for the case selection, followed by a process-tracing procedure. The results of the comparative case study analysis are highly complementary of the empirical analysis, suggesting that teachers do indeed put forth more effort in their teaching when their principals closely monitor them. 2.5.1 Methodology To address potential endogeneity concerns, the comparative case study needs to rule out alternative causal mechanisms driving teacher effort. To do so, statistical matching is first used to identify most similar cases, ensuring that the observed levels of teacher effort cannot be attributed to observable characteristics. Then, the remaining empirical variation among the selected cases is dealt with by using process tracing, whereby the causal process by which teacher effort came about is examined within and across selected cases, as described below. Case Selection via Matching To guide the selection of cases, Mahalanobis Distance Matching (MDM) –a statistical matching technique that measures the distance between two observations in a set of covariates – is used to identify matched pairs of schools that resemble each other as closely as possible in observable directorate-, school-, teacher-, and student-level characteristics, while varying in the degree to which the principal monitors teachers. Towards this end, a dichotomous variable is created at the school level, taking the value of 1 if a school scores 8 or higher in the Principal Monitoring Index (“high principal monitoring”), or a value of 0 if it scores 7 or lower in this index (“low principal monitoring”) .12 Through MDM, schools are then paired together such that each pair comprises a “high principal monitoring” school and a “low principal monitoring” school that are as close as possible in a similarity distance measure based on a vector of covariates that includes school wealth index, households with computer, rural/urban, frequency of school visits by directorate inspector, frequency of classroom visits by directorate supervisor, frequency of PTA meetings, teacher-student ratio, and teacher educational attainment. This selection procedure produces one school pairing in each region of Jordan (North, Center, and South). Each resulting pair of schools is very similar across the different covariates, while at the same time varying in the Principal Monitoring Index: the first school pairing has a difference of 4 points (9-5) for the index, while the other two school pairings each have a difference of 2 points (8-6). The three paired cases serve as mutual –imperfect– counterfactuals that rule out observable characteristics as confounders of the relationship of interest (Nielsen 2014). 12 The cutoff point is the sample mean for the index (x̄=7.22). 38 Process-Tracing Procedure Once the three paired cases are selected, a process-tracing procedure is used to rule out potentially unobserved intervening variables (George and Bennet 2005). Specifically, and as motivated by the literature, potential unobserved variables explored include the presence of teacher incentives that may be attached to principals’ monitoring mechanisms, and teacher selection bias –whereby more motivated teachers who exert higher levels of effort self-select into schools headed by more motivated principals who conduct more monitoring. Towards this end, in-depth, semi-structured interviews are conducted with principals and two second or third grade teachers at each selected school, tracing the causal process –if any– from principal monitoring to teacher effort. Within each school pairing, the causal process by which higher teacher effort came about in the “high principal monitoring” school is contrasted against its “low principal monitoring” pair. If a causal relationship is found, the analysis then contemplates whether this relationship is found repeatedly across the three school pairings (Collier 2011). 2.5.2 Results Principals seldom rely on incentive mechanisms to elicit higher teacher effort. The study investigated the presence of both financial and nonfinancial incentives as potential confounders in the relationship between principal monitoring and teacher effort. With regard to the former, every one of the interviewees confirmed the absence of financial incentives at the school level. This is consistent with the fact that, as in most countries where teacher payment is centralized, principals in Jordanian schools have no authority to reward high teacher performance through increased salary or bonuses. Further, when asked about punitive financial incentives, teachers across all selected schools agreed that these, too, were absent. They all mentioned dock in payment as a potential punishment for unexcused absenteeism. Yet none of them recalled being subject to docked payment or knowing another teacher whose payment had been docked. Absent financial incentives, principals could resort to nonfinancial rewards and sanctions, over which they have considerable latitude, to elicit higher teacher effort. Yet out of the six selected schools, only one school principal –in a low principal monitoring pair– was found to systematically recognize her teachers’ level of effort by organizing “teacher of the year” contests each academic year. Interviews with teachers at this school, however, revealed their lack of awareness of the criteria used by the principal to award this recognition, with one teacher even questioning “How can the principal recognize [them] at the end of the year if she did not know about [their] daily performance in the classroom?” Contrasting this school with its high principal mo nitoring pair exposes an important teacher effort gap in favor of the high principal monitoring school, suggesting that strong principal monitoring is a prerequisite for nonfinancial recognitions that intend to elicit higher teacher effort. Principal and teacher interviews also revealed that in two schools (one high principal monitoring pair, and one low principal monitoring pair), symbolic gifts –such as the Holy Quran, flowers, thank you cards, or pins – were used at times by principals. In neither case were these gifts tied to teachers’ performance , however; rather, they were handed to all teachers as a gesture of appreciation. Teachers expressed their gratitude when asked about these tokens of appreciation, but were also candid in expressing that poor-performing teachers were recognized equally with those who put significantly more effort into teaching. As such, the analysis was not able to trace these gestures of appreciation to teachers’ level of effort in either school. Turning to nonfinancial sanctions, the study found two schools (both high principal monitoring pairs) in which principals attached punitive consequences to their monitoring. These consequences took the form of verbal reprimands in private and in the presence of colleagues as a penalty for underperformance. In contrast to their low principal monitoring pairs, interviewed teachers in these schools were very certain that any underperformance would be noticed and sanctioned by the principal. In one teacher’s own words “The principal observes [their] teaching very often. [They] do not know when she may pay [them] a visit, and if [they] are not prepared she will be strict.” This may 39 indicate that increased teacher effort may be a result of principal monitoring and sanctions attached to this monitoring. Yet a word of caution is in order. Some teachers at these schools condemned the negative environment that these sanctions had created in the school, describing how they affected teacher morale. In this regard, comparing these schools with the third high principal monitoring pair (in which the use of sanctions was not prevalent) suggests that sanctions may actually not be a necessary condition for principal monitoring to elicit higher teacher effort. Positive incentives could significantly enhance the effect of monitoring on teacher effort. A recent strand of literature shows that nonfinancial rewards can be effective in settings where the power of financial incentives is limited (Ashraf , Oriana, and Kelsey 2014). Certainly, recognizing teachers’ effort and achievements can increase their motivation, incentivizing them to keep up the good work or increase their level of effort. Yet the evidence above suggests that principals in Jordan seldom rely on nonfinancial mechanisms to incentivize teacher effort. And when they do, they make use of mechanisms to sanction instead of reward. This is further corroborated by the two-thirds of interviewed teachers (in both high and low principal monitoring pairs) who expressed a very strong desire to be recognized in any way by the principal for their high effort so as to motivate them to keep working hard. Many even expressed their frustration at times when they overperformed at a particular task but were not recognized. For instance, a teacher recalled a time when she prepared an excellent lesson plan and the principal simply signed it and wrote “thank you”—as she had done with all other teachers’ lesson plans. Another teacher lamented that while all of her mistakes were always pointed out by the principal in her periodic classroom observations, she was not recognized for all of her good work. This suggests that the use of positive nonfinancial incentives may be a promising strategy –currently underutilized in Jordan– that could enhance the effect of principal monitoring on teacher effort. Teacher selection bias does not seem to drive higher teacher effort. All interviewed teachers agreed about their limited say on their school assignment. They were appointed by the central ministry to any existing vacancy within their governorate of residence. In most cases, when more than one vacancy was available, they asked to be assigned to the school closest to their home so as to better attend to their family-related obligations. When explicitly asked to select the criteria they used to select the school they currently work at, in all cases they put a priority on proximity to place of residence over school quality and reputation. This was consistent both within and across pairings of schools, suggesting a constraining environment for more motivated teachers to self-select into schools with more motivated principals. A causal relationship was traced between principal monitoring to teacher effort, as evidenced by teachers in high principal monitoring school pairings constantly referring to the need to prepare their classes very well, given that the principal may visit their classroom at any time. Teachers pointed to the specific areas where principals monitored them, or key tasks for which they are always held accountable. For instance, a teacher emphasized periodic monitoring whereby her principal randomly selected students’ copybooks to review the quality of the work and the teacher’s feedback to the student, leading her to be particularly meticulous when correcting students’ assignments. Another teacher highlighted that her principal quizzed students often to ensure that they fully understood the class, which required her to constantly check up on her students to ensure they followed the material. This pattern was systematically repeated across high principal monitoring school pairings, and seemed significantly weaker –if at all present– in the low principal monitoring pairs. This evidence, together with the weak incentives environment and the constraining environment for teacher self-selection in Jordan, suggests that teachers do indeed put forth more effort in their teaching when their principals closely monitor them. 40 2.6 CONCLUSIONS Overall, the findings of this chapter reveal suboptimal levels of teacher effort across classrooms in Jordan, while underscoring the pivotal role of principals in increasing teacher effort. The results of the empirical analyses suggest that teachers are more likely to put forth effort in their teaching when they are closely monitored by their principals. Specifically, teachers who are frequently monitored are more likely to provide feedback to their students. Similarly, more effort in creating a positive learning environment is put forth by teachers who are monitored more often. Further, the mediation analysis suggests that principal monitoring is positively associated with student learning, as mediated by teacher effort. Increasing principal monitoring from an index of 2 –at the bottom of the distribution– to an index of 11 –at the top of the distribution– may increase student outcomes by up to 0.07 SD and 0.05 SD in language and math test scores, respectively, on average. Evidence from a comparative case study across six Jordanian schools adds inferential leverage to the quantitative analyses, easing potential endogeneity concerns. Informed by these findings, Chapter IV explores key policy implications for the education sector in Jordan. 41 References Angrist, J., and V. Lavy. 1999. “Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement.” Quarterly Journal of Economics 114 (2): 533-575. Ashraf, N., B. Oriana, and J. Kelsey. 2014. “No margin, no mission? A field experiment on incentives for public service delivery.” Journal of Public Economics 120: 1-17. Banerjee, A., S. Cole, E. Duflo, and L. Linden. 2007. “Remedying Education: Evidence from Two Randomized Experiments in India.” Quarterly Journal of Economics 122 (3): 1235-1264. Barrera-Osorio, F., and L. Linden. 2009. “The use and misuse of computers in education: Evidence from a randomized controlled trial of a language arts program.” Working Paper, Columbia University. Bau, N., and J. Das. Forthcoming. “The Misallocation of Pay and Productivity in the Public Sector: Evidence from the Labor Market for Teachers.” Working Paper. World Bank, Washington, DC. Bornstein, M.C., and R.H. Bradley, Eds. 2003. Socioeconomic status, parenting, and child development. Mahwah, NJ: Lawrence Erlbaum. Branch, G., E. Hanushek. and S. Rivkin. 2013. “School Leaders Matter.” Education Next 13(1): 62-69. Brooks-Gunn, J., and G. J. Duncan. 1997. “The effects of poverty on children.” The future of children 7(2), 55–71. Bruns, B., and J. Luque. 2014. “Great teachers: how to raise student learning in Latin America and the Caribbean.” Washington, DC: World Bank Group. Chaudhury, N., J. Hammer, M. Kremer, K. Muralidharan, and H. Rogers. 2005. “Missing in Action: Teacher and Health Worker Absence in Developing Countries.” Journal of Economic Perspectives 20 (1): 91- 116. Coleman, J. S. 1988. “Social capital in the creation of human capital.” American Journal of Sociology 94: S95–S120. Collier, D. 2011. “Understanding Process Tracing.” PS: Political Science and Politics 44(4): 823-830. Cristia, J., P. Ibarrán, S. Cueto, A. Santiago, and E. Severín. 2012. “Technology and child development: Evidence from the One Laptop per Child program.” IZA Discussion Paper No. 6401, Forschungsinstitut zur Zukunft der Arbeit GmbH, Bonn, Germany. Duflo, E., R. Hanna, and S. Ryan. 2012. “Incentives Work: Getting Teachers to Come to School.” American Economic Review 102(4): 1241–1278. Gates Foundation. 2012. “Gathering Feedback for Teaching. Combining High-Quality Observations with Student Surveys and Achievement Gains.” MET Project. Research Paper. George, A., and A. Bennet. 2005. Case Studies and Theory Development in the Social Sciences. Cambridge, MA: MIT Press. Glewwe, P., M. Kremer, S. Moulin, and E. Zitzewitz. 2004. “Retrospective vs. prospective analyses of school inputs: the case of flip charts in Kenya.” Journal of Development Economics 74: 251-268. Glewwe, P., N. Ilias, N, and M. Kremer. 2010. “Teacher incentives.” American Economic Journal: Applied Economics 2(3): 205-227. Hamre, B., and R. Pianta. 2007. “Learning opportunities in preschool and early elementary classrooms. ” In School readiness and the transition to kindergarten in the era of accountability, ed. R. Pianta, M. Cox, and K. Snow, pp. 49-84. Baltimore: Brookes. Hanushek, E., and J. Luque. 2003. “Efficiency and equity in schools around the world.” Economics of Education Review 22: 481–502. Hanushek, Eric A., John F. Kain, Daniel M. O'Brien, and Steven G. Rivkin. The market for teacher quality. No. w11154. National Bureau of Economic Research, 2005. 42 Hanushek, E., and S. Rivkin. 2006. “Teacher Quality.” In Handbook of the Economics of Education, Volume 2, ed. E. Hanushek and F. Welch, pp. 1052-1075. Elsevier B.V. Hanushek, E., and S. Rivkin. 2010. “Generalizations about Using Value-Added Measures of Teacher Quality.” American Economic Review 100(2): 267-271. Hicks, Raymond, and Dustin Tingley. 2011. “Causal Mediation Analysis.” Stata Journal 11(4): 609-615. IEA. 2011. “Trends in International Mathematics and Science Study 2011 Results. Jordan Country Profile.” Boston, MA. Imai, Kosuke, Luke Keele, and Dustin Tingley. "A general approach to causal mediation analysis." Psychological methods 15, no. 4 (2010): 309. Imberman, Scott. 2011. “The Effect of Charter Schools on Achie vement and Behavior of Public School Students.” Journal of Public Economics 95(7-8): 850-863. Kane, T. 2004. “The Impact of After-School Programs: Interpreting the Results of Four Recent Evaluations.” Working Paper of the William T. Grant Foundation. New York, NY. Kane, T., J. Rockoff, and D. Staiger. 2006. “What Does Certification Tell Us About Teacher Effectiveness? Evidence From New York City.” NBER Working Paper No. 12155. Cambridge, MA. Kane, Thomas J., and Douglas O. Staiger. Estimating teacher impacts on student achievement: An experimental evaluation. No. w14607. National Bureau of Economic Research, 2008. Krueger, A. 1999. “Experimental Estimates of Education Production Functions.” Quarterly Journal of Economics 114 (2): 497-532. Krueger, Alan B., and Diane M. Whitmore. "The effect of attending a small class in the early grades on college‐ test taking and middle school test results: Evidence from Project STAR." The Economic Journal 111, no. 468 (2001): 1-28. Lavy, Victor, M. Daniele Paserman, and Analia Schlasser. 2011. “Inside the Black Box of Ability Peer Effects: Evidence from Variation in the Proportion of Low Achievers in the Classroom.” The Economic Journal, 122(559): 208-337. Muralidharan, K., and V. Sundararaman. 2011. “Teacher Performance Pay: Experimental Evidence from India.” Journal of Political Economy 119(1): 39-77. Nielsen, Richard A. "Case Selection via Matching." Sociological Methods & Research (2014): 0049124114547054. Nye, Barbara, Spyros Konstantopoulos, and Larry V. Hedges. "How large are teacher effects?." Educational evaluation and policy analysis 26, no. 3 (2004): 237-257. OECD. 2012. “Programme for International Student Assessment 2012 Results. Jordan Country Profile. ” Paris, France. RAND. 2012. “Teachers Matter: Understanding Teachers' Impact on Student Achievement.” Santa Monica, CA: RAND Corporation. Sirin, S. 2005. “Socioeconomic Status and Academic Achievement: A Meta-Analytic Review of Research.” Review of Educational Research 75(3): 417-453. Stallings, J. A. 1986. “Effective use of time in secondary reading programs.” In Effective teaching of reading: Research and practice, ed. J. Hoffman, pp. 85-106. Newark, Delaware: International Reading Association. Urquiola, M. 2006. “Identifying class size effects in developing countries: evidence from rural Bolivia.” The Review of Economics and Statistics 88(1): 171–177. USAID. 2012. “Student Performance in Reading and Mathematics, Pedagogic Practice, and School Management in Jordan.” EdData II Technical and Managerial Assistance Report. Washington, DC. World Bank. 2010. Saber Country Report. Teachers. Kingdom of Jordan. World Bank, Washington, DC. 43 World Bank. 2012. World Development Indicators Data. World Bank, Washington, DC. World Bank. 2015. World Development Indicators Data. World Bank, Washington, DC. World Bank. 2016 “Jordan Education Public Expenditure Review. Background Analysis. ” World Bank, Washington, DC. 44 III. HEALTHCARE QUALITY, PROVIDER EFFORT, AND ACCOUNTABILITY CHAPTER III HEWLETT-PACKARDCHAPTER III 3.1 INTRODUCTION An enormous academic and policy-focused literature has sought to identify and evaluate the inputs that affect the quality of healthcare service provision. This literature can be divided into research that analyzes the (i) structural and (ii) behavioral determinants of healthcare provision (Das and Hammer 2014). Structural determinants include factors that can readily be addressed with increased funding, such as the physical condition and availability of medical facilities, the quality of equipment, and the amount of medicine on shelves. Even the number of staff and the caseload for individual providers are structural determinants. It would seem reasonable to believe that in developing countries structural determinants would be the far more important factor in improving healthcare provision. In fact, the international community has focused largely on improving these structural inputs (Das and Hammer 2014) with a specific emphasis on improving the availability of healthcare in developing countries (Das and Gertler 2007). Evidence suggests that these structural factors are not the most important in improving healthcare service provision in these countries, however. Even in very low-income countries, the equipment necessary to treat common health conditions seems to be abundantly available and does not pose an obstacle for healthcare service delivery (Das and Gertler 2007). A review of healthcare impact evaluations in developing countries finds no correlation between structural inputs and quality (Das and Hammer 2014). Further, while medical education, or lack thereof, is an important structural feature that is highly correlated with having the knowledge to correctly treat patients, improving the knowledge of medical providers may ultimately have minimal effect on improving patient outcomes. This is because the amount of effort exerted by providers is alarmingly low in many contexts (Das and Hammer 2014). Thus, while a lack of medical knowledge hinders the provision of high-quality care, it does not seem to be the main hindrance.13 Even when providers have the knowledge to correctly treat a patient, they very often fail to do so, which is not the result of a lack of other structural inputs, but rather a result of behavioral determinants. Behavioral determinants are factors that describe what health providers do within a given level of structural determinants. This incorporates a number of different aspects, from the most basic such as whether providers consistently show up to work on time, to the most critical, relating to providers’ use of knowledge to correctly diagnose and prescribe for patients. Most research has found that these behaviors are seldom practiced, underscoring a lack of effort on the part of healthcare providers to meet expected performance standards. 13This is not to say that the knowledge of medical providers is not an important determinant of patient outcomes. Studies have found a serious lack of knowledge among healthcare providers in developing countries, and this lack of knowledge is often disastrous for patients. Furthermore, in their comprehensive review of healthcare in low-income countries, Das and Hammer (2014) find that all the studies they examined found a strong correlation between education and knowledge. 45 3.1.1 Healthcare Provider Effort Just as in the previous chapter on education, absenteeism is a straightforward manifestation of low provider effort. Notwithstanding provider knowledge or even the availability of high-tech equipment, if a health provider fails to show up to work, there is no chance to improve patient outcomes. Further, just as in the case of education, absenteeism is a chronic problem in many developing countries. In a study carried out in six countries across multiple developing regions, Chaudhury et al. (2006) find that, on average, 35 percent of health workers were not present during unannounced visits to health facilities. Banerjee, Deaton, and Duflo (2004) find that 36 percent of providers were absent on an average day in the larger urban healthcare centers in Rajasthan, India, and note an even higher absenteeism rate of 45 percent in smaller rural facilities. The latter is particularly problematic since many small rural centers have only one provider so when s/he fails to show up to work, the clinic simply does not open. Callen et al. (2013) find an even higher absenteeism rate of 68.5 percent prior to a randomized intervention in Pakistani health centers. Banerjee, Duflo, and Glennerster (2008) find that the majority of rural government health centers were closed more often than not because the attending nurse failed to report to work. These figures are consistently higher than those reported for teachers (Chaudhury et al. 2006). One explanation may be that healthcare providers have greater alternative options for income. Chaudhury et al. (2006) find that the only consistent predictor of provider absence is type of health worker: doctors have higher absence rates than less qualified healthcare professionals (e.g., nurses) in all six countries in the study, as doctors have more lucrative options moonlighting at private facilities. For example, in Peru, 48 percent of doctors reported earning extra income in private facilities whereas only 30 percent of other health professionals reported outside income. When doctors have the opportunity to moonlight, they devote less time to patients and have higher rates of absenteeism in public sector facilities (Ferrinho et al. 2004). While absenteeism is clearly a major problem that results from lack of provider effort as opposed to inadequate structural inputs, the fact is that even when providers do show up to work they often fail to exert substantial effort. Providers frequently do not follow basic clinical practice guidelines (CPGs) (i.e., taking patient history, physical examinations, test ordering, diagnosis, and treatment), spend inadequate amounts of time with patients, and do not maximize the value of their medical training in interactions with patients. Studies have found that providers spend remarkably little time with patients even in facilities with high levels of excess capacity. For example, the average consultation time in urban and rural India is three minutes. Furthermore, providers average only three questions per consultation. One-third of consultations are over in less than a minute and only involve the question “What’s wrong with you?” (Das , Hammer, and Leonard 2008). The same study finds similar figures in several other low-income countries. In contrast, the average consultation time in an OECD country is three to four times longer (Das and Hammer 2014). A “know-do” gap in the provision of healthcare has been found in a number of studies on healthcare provision in the developing world: e.g., by Das and Hammer (2007) in the context of India, Leonard, Masatu, and Vialou (2007) in Tanzania, and Gertler and Vermeersch (2012) in Rwanda. In all of these studies, researchers have found that providers often have sufficient knowledge to address common medical problems, and yet fail to do so when they interact with real patients. Essentially, providers fail to exert the effort necessary to utilize their medical training. The earlier discussed finding that absenteeism rates are higher in public hospitals carries over to the levels of effort exerted even when providers are present: providers exert less effort at public healthcare centers than at private ones. Das and Hammer (2007) find that providers spend 30-50 percent less time with patients in public healthcare centers than in private ones. 46 Das et al. (2013) compare private and public clinics in the Indian state of Madhya Pradesh and find that providers spend longer amounts of time in consultation, ask more questions, and perform more exams at private clinics than at public clinics. The results hold even for the same provider who spends time working at both public and private clinics. This result indicates that providers expend less effort at public healthcare facilities because fewer performance-based incentives exist in public facilities. This is not a function of structural disadvantages at public facilities. In fact, in the Das et al. (2013) study, the authors show that excess capacity exists in both private and public facilities and that public facilities have better equipment than private facilities. None of these indicators of low provider effort can be easily explained by heavy caseloads for providers either. If this were the case, it might be optimal for providers to exert low levels of effort so as to conserve energy and/or time to see a larger number of patients. However, the evidence does not support this idea. Studies have found large excess capacity in public clinics in Tanzania, Senegal, Kenya, and India (Das and Hammer 2014).14 Researchers found that providers rarely see more than 15 patients per day and on average spend 40 minutes per day with patients. Instead, providers do not exert effort because they are not incentivized to do so. If providers do not exert effort then improving structural determinants of healthcare provision is unlikely to improve patient outcomes. Thus, improving provider effort is key to improving healthcare provision. To do this, the most promising avenue is to increase provider accountability. 3.1.2 Holding Healthcare Providers Accountable to Increase Provider Effort The evidence clearly indicates that lack of provider effort is a common problem and that this is likely more important than many, if not all, structural factors that are often the focus of international interventions. As such, researchers have spent substantial time analyzing the best methods of improving provider effort. Nearly all of these methods involve increasing the accountability of providers. The rationale is that the lack of effort is a function of systems that do not incentivize effort because providers are compensated regardless of effort levels. Essentially, providers are not accountable for their performance and so choose to expend less effort. Thus, it may be expected that increasing accountability will increase provider effort. As discussed in previous chapters, improving accountability requires monitoring and incentivizing high levels of effort. Also, as discussed previously, issues of “observability” and “farther outcome” emerge when trying to monitor and properly incentivize effort. T The Observability Challenge Making providers accountable necessitates being able to observe and evaluate provider effort. Provider effort can be divided into two categories: effort to attend work and effort in interactions with patients. To ensure provider accountability, both aspects of provider effort must be observable. Monitoring of provider absenteeism can be implemented using a top-down, bottom-up, or within-facility approach. A common top-down method of observing provider absenteeism is for governments to employ inspectors to randomly visit health centers and check for provider presence (Callen et 14 Unfortunately, there are no empirical studies of this from the MENA region. In fact, this is one of the biggest contributions of the current study. However, it is important to note that given that these are studies on behavior of providers they should be transferable across settings at least to some degree. 47 al. 2013). This type of top-down approach has proven to be difficult in practice. A number of studies find that providers are able to pressure inspectors into giving favorable reviews even when they are absent when the inspector shows up (Callen et al. 2013; Banerjee and Duflo 2006). Essentially, this means that the inspector covers for deficiencies in provider performance (Banerjee and Duflo 2006). Callen et al. (2013) also find that external party control by government inspectors often fails because providers have political connections that they use to pressure inspectors. Bottom-up approaches often involve forming community organizations that monitor provider absenteeism through direct observation (i.e., random or regular checks at provider facilities) or through patient complaints (Banerjee and Duflo 2006; Banerjee, Deaton, and Duflo 2004). This method has had varying levels of success according to impact evaluations. The reason for the varied outcomes seems to be that monitoring by local communities is subject to a severe collective action issue. The community would like to monitor and enforce high levels of provider effort, but each individual in the community would be better off if someone else did the work necessary to monitor and enforce (Banerjee and Duflo 2006). In Banerjee, Deaton, and Duflo (2004), individuals from the local community were paid to randomly check on whether providers were present at their assigned health center. The study found that local monitoring had no effect on provider absenteeism because the community was either unable or unwilling to create an enforcement mechanism. Bjorkman and Svensson (2010) find that factors that reduce the ability of a community to collectively organize (such as income inequality and ethnic fractionalization) reduce the effectiveness of community-based monitoring. Within-facility monitoring of provider absenteeism occurs when a supervisor of a facility, often the chief medical officer (CMO), observes provider absenteeism. Mechanisms of promoting provider effort within facilities deserve more attention than they have received thus far in development research and may address some of the limitations of top-down and bottom- up accountability. An emphasis on within-facility accountability capitalizes on the technical knowledge of supervisors within health centers. This approach, as well as the other two mentioned above, has the ability to observe provider absenteeism if implemented properly. The “observability” challenge really emerges in the monitoring of provider effort in consultations with patients. As discussed in the prior section, ample evidence shows that providers do not exert high levels of effort when interacting with patients. To make providers accountable for failing to exert effort, it is necessary to observe them interacting with patients. However, observing patient-provider interactions is inherently problematic due to the private nature of the interactions. Using a top-down approach to observe this would require an outside party, perhaps the same government inspector that monitors attendance, to be present in such interactions. This clearly breeches privacy, considered one of the primary components of rights- based healthcare provision (Leonard and Masatu 2006). One means of overcoming this problem is to use patient surveys after provider service to evaluate the effort of providers – a form of bottom-up monitoring. This would not violate privacy as patients could choose whether to respond; they would not need to disclose personal medical information while still providing insight into their provider’s effort and overall performance. Unfortunately, patients do not appear capable of evaluating provider service. Banerjee and Duflo (2006) find that survey respondents reported that their last visit to healthcare centers made them feel better despite the extremely low levels of provider effort measured at the same healthcare centers. Similarly, Das and Sanchez-Paramo (2004) find large-N survey evidence indicating that individuals have essentially no ability to identify sources of quality care. Banerjee, Deaton, and Duflo (2004) suggest that patients may have a limited ability to evaluate provider service is due to extremely low expectations. If patients are used to low levels of provider effort, then they will view low-effort providers as the norm. 48 Lastly, supervisors within health centers can act as monitors. A CMO within a health center should have the knowledge to act as an effective monitor and not be subject to the same privacy issues associated with top-down monitoring. The Farther Outcome Problem In the context of healthcare provision, the “farther outcome” challenge relates to the difficulty in directly evaluating provider performance and being forced to evaluate providers based on other measures. Essentially, due to the “observability” challenge discussed in the prior section, evaluating provider effort must be based on outcomes other than actually observing the provider at work. This creates challenges in the measurement and choice of outcomes to be evaluated. In the case of absenteeism, measurement is not an issue; measuring whether a provider reports to work should be a simple process. However, measuring a provider’s performance while at work is a much more difficult task. We can think of there as being two types of common evaluation methods: a top-down approach that focuses on health outcomes in a community and a bottom-up approach that focuses on patient surveys. A top-down approach would be to evaluate provider performance on health outcomes in the community where the provider is located. For example, providers could be evaluated on the prevalence of illness or under-5 mortality rates in the community. The two problems with this type of evaluation are both related to the “farther outcome” problem. First, a host of factors unrelated to provider performance affect health outcomes in the community where the provider works. For example, Marmot and Wilkinson (2005) detail the importance of socioeconomic factors in determining health outcomes across societies. These factors include obvious ones such as average incomes in a local community, but also factors such as ethnic diversity and type of housing in a community. Because of the importance of these factors it becomes difficult if not impossible to accurately evaluate performance on health outcomes in the communities where they work. The second issue with evaluation based on health outcomes is choosing which health indicators by which to evaluate providers. For example, if providers know that they are only evaluated on under-5 mortality rates, they may be incentivized to inefficiently allocate their resources to improving this outcome at the cost of a community’s overall health. A bottom-up approach to measuring provider performance would be to evaluate providers based on patient reports. Essentially, provider performance could be judged by satisfaction surveys. This is problematic due to patients’ limited ability to evaluate provider performance as discussed in the last section. Of particular relevance to the “f arther outcome” challenge is that patients often desire treatments that are not in their own best interest. For example, Banerjee and Duflo (2006) report that patients in India preferred private providers because they were more likely to prescribe shots instead of pills. This was most likely true; government protocol recommends the use of pills when possible, as they are believed to be safer and more cost effective. However, patients believed that pills were somehow inferior to shots. If providers are evaluated on patient satisfaction, then providers will be incentivized to satisfy patients with treatments that are not in the patient’s best interest. Das and Hammer (2014) find evidence of this in India, where providers prescribe an average of three different types of medicine in each consultation. It seems very unlikely that this is optimal. Instead providers appear to be trying to satisfy patients while not exerting effort (this same study found an average consultation time of three minutes and three questions per patient). Lastly, within-facility approaches can be used to measure provider performance. This approach can potentially solve some, though not all, of the “farther outcome” problems in both top-down and bottom-up approaches. Within facility evaluators will have more knowledge of 49 healthcare provision than patients, thus overcoming some of the problems with bottom-up evaluation. Furthermore, within-facility evaluators should have knowledge of the local communities where providers operate and so be better able to judge which health indicators in a community are attributable to provider performance and which ones are not. Therefore, while within-facility accountability mechanisms cannot solve all issues related to the “farther outcome” problem, within-facility mechanisms should be able to address the issues in ways that top-down and bottom-up approaches cannot. The Role of CMOs The literature on improving levels of effort among providers has thus far focused on the role of inspectors and patients as the source of monitoring and evaluation of providers. In contrast, far less research has been conducted on the role of CMOs, who should be in a unique position to observe and evaluate providers. Unlike government inspectors and patients, CMOs have the professional knowledge to evaluate quality in providers. They should be better prepared to overcome the “farther outcome” challenge because of their knowledge of medicine, the specific providers, and local patients. Furthermore, CMOs are proximate to providers. Rather than relying on random inspections, CMOs are able to monitor providers on a much more regular basis, which helps to overcome the “observability challenge.” If nothing else, having a CMO who is regularly present and properly trained to evaluate provider performance should lead to a much larger Hawthorne effect (improved performance simply from being observed) than sporadic observation from either inspectors or patients who are unable to evaluate quality healthcare provision. In addition CMOs are often evaluated on the performance of providers under their supervision. Thus, part of their existing job description is to monitor and evaluate provider performance. Therefore, they should be among the first avenues of research on improving healthcare provision. 3.1.3 Roadmap to the Chapter This study is the first nationally representative study in Jordan to measure within-facility accountability and provider effort in primary health care facilities and is the first study in the MENA region to investigate the linkages between within-facility accountability and provider efforts, thereby providing novel policy-relevant information on the accountability mechanisms and their drivers that contribute to good service delivery outcomes. The study in this chapter specifically provides new evidence on the role of CMOs in improving accountability. Section 3.2 provides an overview of Jordan’s health system that indicates the value of focusing on provider effort to improve health outcomes in the country. Section 3.3 uses evidence from a nationally representative sample of 122 primary healthcare centers (PHCCs) representing each of the 13 Directorates of Health in Jordan to test the association between CMO15 monitoring and provider effort. Section 3.4 offers some conclusions. 15 Chief Medical Officers (CMOs) are referred to as Head of Healthcare Center (HOHCs) in Jordan. This report uses the conventional terminology of CMO to be consistent with existing literature. 50 3.2 THE HEALTH SECTOR IN JORDAN Jordan enjoys an advanced health system, with one of the most modern healthcare infrastructures in the MENA region, providing a range of both advanced medical services and basic primary care to most citizens at comparatively low direct costs. Over the past two decades, the country has achieved remarkable progress in improving the health status of the population. Life expectancy at birth increased from 69.9 years in 1990 to 73.7 years in 2012; maternal mortality declined from 86 per 100,000 live births in 1990 to 50 in 2013; infant mortality reduced from 34 per 1,000 live births in 1990 to 17 in 2012; and under-5 mortality declined from 39 per 1,000 live births to 21 in the same time period. With these improvements, particularly in maternal and child health, Jordan fares better than many other countries of similar income level, both within and outside of the MENA region (Figure 15, Figure 16, and Figure 17). Despite these gains, Jordan’s health indicators, especially infant and maternal mortality, suggest that considerable health gains can be made relative to the investment. And since Jordan has reached almost universal coverage in terms of antenatal care, births attended by a skilled health professional, and child immunization, the problem is not one of access but quality of services. Figure 15: Life Expectancy: Jordan, MENA Average, and Selected Other Countries, 1980-2011 Source: World Bank 2014; World Health Organization 2014; HHC 2013. 51 Figure 16: Infant Mortality Versus Income and Total Health Spending, 2011 Source: World Bank 2014; World Health Organization 2014. Note: Both axes are in log scale. Figure 17: Maternal Mortality Relative to Income and Spending, 2010 Source: World Bank 2014; World Health Organization 2014. Note: Both axes are in log scale. 52 In spite of achievements in population health, like many countries of similar economies, Jordan is experiencing an epidemiological transition with a shift from a prevalence of communicable to noncommunicable diseases (NCDs). Three out of every four deaths in Jordan are caused by NCDs (World Health Organization 2011), with cardiovascular and circulatory diseases the leading causes, accounting for about 37 percent of all deaths (IHME 2010). Cancers are the second leading cause of mortality in Jordan, having increased from 9 percent in 1990 to 15 percent of all-cause mortality in 2010 (IHME 2010). Diabetes has secured the third position as a leading cause of death in Jordan, responsible for 7 percent of all deaths in 2010 compared to 2 percent in 1990. Further, the top five conditions associated with the highest disability adjusted life years (DALYs) – a standard measure of morbidity – are related to NCDs. While Jordan’s young population composition offers a unique opportunity to capitalize on the potential benefits of the so-called “demographic dividend,” banking on this for future economic productivity may prove to be a remote possibility if NCDs remain unaddressed. Addressing the NCD burden in Jordan requires a revitalized focus on primary healthcare while making use of readily available, cost-effective interventions that rely on inexpensive technologies for early detection and diagnosis (World Health Organization 2010b). International evidence on effective health systems and their ability to promote health, prevent diseases, and manage chronic diseases show that such activities are most cost-effectively performed at the level of communities through primary healthcare. To address the emerging NCD challenge in Jordan, such reorientation would at its core need to uphold primary, and to a lesser degree secondaFry, prevention strategies that assume a life-course approach (Demaio et al. 2014). It would need to ensure that service delivery is both patient-centered and community-based, and would need to be anchored in an environment conducive for delivering services of the highest quality. Jordan’s apparent public health system challenges – particularly as they relate to quality of primary healthcare service delivery – generate a sense of mistrust in the health system on the part of the general public. While limited systematic evidence exists on the quality of healthcare in the country, a number of studies have pointed to perceived deficiencies in the level of primary and hospital care (Abu-Kharmeh 2012; Al-Qutob and Nasir 2008; Khatatbeh 2013; Khoury and Mawajdeh 2004; Otoom et al. 2002), which oftentimes is also predicated by geographical factors (Abu-Kharmeh 2012). Drivers for such quality seem to be related to a number of factors, some of which are concomitant with provider effort (Khoury and Mawajdeh 2004; Otoom et al. 2002), with one study finding that providers spend less than 30 percent of their clinic time directly providing care (Khoury and Mawajdeh 2004). Other drivers are inherently associated with managerial and supervisory performance (Al-Qutob and Nasir 2008), and also to a large extent, the incentive environment in which providers operate (Khatatbeh 2013). It has been suggested that the latter two, in the absence of a merit-based system, have mostly resulted in high attrition rates and in many cases replacement with inexperienced providers –especially in rural settings – further impeding the quality of healthcare service delivery (Al-Qutob and Nasir 2008). While it may be concluded that the underlying dynamics for the perceived inadequate quality of services in Jordan are fueled by limited resources going into the system, the evidence suggests otherwise. In 2011, Jordan’s public spending on health as percentage of GDP stood at 6 percent, almost double that of the MENA average of 3 percent. This was mirrored in per capita health expenditures, which stood at US$392, well above the averages for low- and middle- income countries and for developing countries in the MENA region, although not the highest in the region. Jordan stands out within the region and among countries of similar economies more generally for its high levels of public health spending (Figure 18). 53 Figure 18: Total Health Expenditure as a Share of GDP and Income Per Capita, 2011 Source: World Bank 2014; World Health Organization 2014 [3, 5]. Note: x-axis is in log scale. Jordan’s high spending on healthcare implies that the quality production function is not constrained by structural inputs, but rather by a limitation in practice. Whether this is related to provider knowledge or effort, the bottom line is that it is not about investing more in health, but rather addressing the core issues around what actually happens within healthcare settings; i.e., at the point of service where patient care is provided. Against this backdrop, it seems that Jordan has hit its input frontier, at least with respect to the large-scale allocation of financial resources to the system. Healthcare professionals are at the frontline of improving the quality of primary care. Designing and implementing programs to boost their commitment and effort can help to advance healthcare quality in Jordan without the allocation of large budget outlays. With the above said, the promotion of high-quality healthcare is not a new focus for Jordan, and recent initiatives attest that the government and nongovernmental partners are aware of the importance of human resource management, among other factors, for improving healthcare quality. The Jordan Healthcare Accreditation Program (JHAP) is the most significant recent program adopted by the Jordanian government, in cooperation with international and local partners, to promote quality improvements in the health sector. Initiated in June 2007 and officially completed in March 2013, the JHAP established the Healthcare Accreditation Council (HCAC), an independent, not-for-profit national accreditation agency for the health sector, and, in conjunction with the HCAC, created the National Quality and Safety Goals (NQSG) initiative in Jordan. The HCAC developed a comprehensive set of standards for healthcare facilities seeking accreditation. These relate to community integration to assess community needs and partner with the community to meet these needs; management and leadership; information and records management; a variety of technical and nontechnical dimensions of the provision of care; the health education of clients and their families; patient safety; environmental safety, infection control and employee health; and human resource management (HCAC 2011). 54 The accreditation program in Jordan has brought about some clear improvements in the primary health system thus far, even if many of its effects have yet to be assessed systematically. Findings from qualitative research on primary health centers in Jordan suggest that the mere preparation for accreditation results in substantive quality improvements such as better medical recordkeeping, more effective human resource management practices, and improved oversight of equipment and consumables, among other outcomes (Rabie, Ekman, and Özçelik 2014). Accreditation appears to have catalyzed increased community input and engagement with local health facilities (Rabie, Ekman, and Özçelik 2014), which in part derives from the requirement for healthcare facilities to establish community health committees (CHCs) to engage with community members and groups more extensively as part of the process (HCAC 2011). The durability of community participation, however, remains to be seen. Despite Jordan’s efforts to promote quality healthcare through the accreditation program and other initiatives, certain features of the health system limit the effectiveness of efforts to improve quality. For example, the system of recruitment, pay, and licensing fosters low quality in primary health centers while the use of public health facilities as the entry point for doctors trained abroad leads to lower-qualified staff and high turnover, as doctors leave public clinics after a short period to return to specialized medical training. Moreover, the large difference in compensation rates across the private and public sectors reduces the incentive for public service, results in high turnover, and encourages dual practice. And the lack of requirements regarding relicensing and the receipt of continuing education threatens the provision of high-quality, evidence-based care. To further strengthen Jordan’s quality of primary healthcare services, serious considerations to quality processes beyond accreditation need to be taken into account, with heightened focus on enhancing provider effort and accountability. 3.3 CMO MONITORING AND PROVIDER EFFORT Is stronger CMO monitoring associated with higher provider effort? This study was designed to generate knowledge on the relationship between within- facility accountability and provider effort. Specifically, the study seeks to answer whether in a nationally representative sample of PHCCs CMOs’ use of accountability mechanisms, namely monitoring practices and incentives, is linked to increased provider effort. The unit of the analysis in the study is the PHCC. Within each PHCC in the sample, data were collected from patients, the CMO, doctors and nurses who work at the center, and where available, a representative of the CHC. Phone interviews were also conducted with the Head of the Directorate of Health. Findings from this study show variability in provider effort across PHCCs, but consistently high rights-based practice. In general, within-facility accountability mechanisms are characterized by high CMO monitoring coupled with limited nonfinancial rewards, nearly nonexistent financial rewards, and uniformity in the application of sanctions. The study also shows that CMO monitoring is highly correlated with high levels of provider effort but not with absenteeism. Finally, in a high sanctions environment, monitoring seems to be associated with greater provision of rights-based care. 3.3.1 Study Sample The objective of the sampling strategy was to obtain a nationally representative sample of public primary health facilities from all 13 Directorates of Health. The Directorates of Health closely correspond to the 12 governorates in Jordan with the exception of the Governorate of 55 Irbid, which has two directorates. Sample size calculations used to estimate the number of patients and centers required to answer the research question are summarized in Appendix C. In summary, the study estimated that a sample size of approximately 120 PHCCs, and 25 patients per PHCC, across all 13 Directorates of Health was needed to test the research question. The PHCC sample was chosen to be representative of all Table 7: Number of centers with average daily utilization of at least 35 patients using Primary Health Facilities probability proportionate to the district population size, stratified by Sampled by Directorate of Health to ensure representation from all directorates. Governorate Study resources allowed only a one-day visit to each center. Therefore, health centers needed to have an average daily Governorate Number of utilization of 35 or more patients per day to be included in the PHCCs sample so that 25 patients could be interviewed, assuming that Amman 33 some patients would be ineligible, too unwell, or unwilling to Ajloun 4 participate. One Directorate of Health, Tafileh, did not have a Aqaba 3 PHCC that met the average daily utilization minimum. To ensure Balqa 9 representation of that Directorate of Health, one of the two clinics Irbid 32 in Tafileh with the highest utilization was randomly chosen. In Jerash 5 addition, one of the originally selected centers was inside a Karak 5 correctional facility and another was located in an area heavily Ma'an 3 guarded by the military, limiting accessibility for the study. These Madaba 3 two facilities were replaced with the support of the Department of Mafraq 8 Statistics, for a final total PHCC sample of 122 centers (Table 7). This Tafileh 1 sample size represents approximately 55 percent of all PHCCs that Zarqa 16 have a daily utilization of at least 35 patients and about a third of all PHCCs nationally. At the request of the Ministry of Health (MOH), Total 122 a sample of comprehensive health centers (CHCCs) (n=35) was also chosen, but given differences in size, staffing, and service lines, they are not included in the present analyses; however, descriptive statistics on CHCCs and RMSs are provided in Appendix D. 3.3.2 Respondent Selection Within each PHCC, the CMO and all health providers in pediatrics, family medicine, and general medicine were selected for participation in the study. Among facilities that had a local health committee, a committee representative was invited to participate. The committee chair was the preferred choice, but if he/she was unavailable the day of data collection, or if the CMO served as the head of the committee, then another committee member was invited to participate. Patients were selected for the study if they had received care on the day of the study visit from a clinician practicing internal medicine, pediatrics, family medicine, or general medicine. Respondents were 18 years of age or older, but eligible patients could be of any age. Patients reported on their own care except for minors or individuals with cognitive impairment. In these cases, the most knowledgeable adult 18 years of age or older was the respondent. If more than one member of a household received services the day of the visit, the patient whose birthday was closest to the visit date was chosen, unless both an adult and a child received services. In that case, the adult was chosen for participation. Patients were ineligible for participation if there was no adult 18 years of age or older to respond, or if the respondent was attending the clinic for nonclinical purposes (e.g., administrative issues only, or to visit a staff member for personal reasons), or received services outside of the clinical targets. Patients who were visibly crying or moaning were not approached and those who reported that answering 56 questions was overly burdensome given their poor health were not interviewed. The Director of the Directorate of Health was interviewed. 3.3.3 Instruments and Measures Study instruments were developed through an iterative, consultative process including study team members and the Governance and Service Delivery Technical Advisory Committee (TAC), which comprises stakeholders in Jordan representing the health and education sectors. Through this process, a set of instruments was developed including: a patient exit interview guide; questionnaires for the center director, center health staff, and CHC representative; and a telephone interview guide used with the Head of the Directorate of Health. Table 8 summarizes the content of each instrument/data source. Table 8: Contents of Data Collection Instruments Directorate of Health Chief Medical Center Health Patient Health Committee Officer Providers Representative Representative Socio-demographics X X X X X Health encounter details X Provider effort X Administrative information X about the health center Directorate-level monitoring X X and incentives Center-level monitoring X X and incentives Community-level X X monitoring and incentives Given that provider effort is a complex, multi-faceted construct, a multi-component approach to measurement was taken. Drawing on prior research and measurement tools (Das, Hammer, and Leonard 2008; Das and Sohnesen 2007; Leonard 2008), this study operationalized effort as: (i) percent of health facility doctors and nurses absent the day of the visit, assessed through a review of clinic administrative records; (ii) time spent with the patient; (iii) the provision of rights-based care; and (iv) clinical effort using a modified retrospective consultation review (Brock, Lange, and Lenoard 2014; Leonard and Masatu 2006) with patients exiting the center the day of the study visit. Table 9 displays each component of this construct, specific items used to measure the construct, and its form for analysis. 57 Table 9: Measures of Provider Effort Provider Effort Instrument Item(s)/Instrument Question Measurement Dimension  Review of clinic administrative records to ascertain Provider Percent absent at CMO Survey percent of doctors and nurses assigned to the center Absenteeism center i.16 who were not present the day of data collection. Average response of patients (sum of Time with Patient Exit time with doctor  How much time did you spend with a provider? Provider Interview and nurse/midwife) at center i.  Did a healthcare provider explain your/the patient’s treatment plan?  Were you involved in the decision making of the treatment plan?  The provider explained things in a way that was easy Average number Rights-Based Patient Exit to understand? of “True” answers Practice Interview  I/the patient could talk privately to the provider? at center i.  My/the patient’s treatments/exams were conducted in private?  I/the patient was treated with respect?  I/the patient had time to ask questions?  Did a healthcare provider: o take notes while you/the patient was/were speaking; o listen to your/the patient’s description of the illness [or reason for the visit]; o ask you/the patient if there were other symptoms Average number Compliance Patient Exit different from the main complaint [or reason for of “Yes” answers at with CPGs Interview the visit]; center i. o take your/the patient’s temperature; o take your/patient’s pulse; o check your/the patient’s blood pressure; o measure your/the patient’s height/length and weight; o conduct a bed examination for you/the patient? The study’s primary independent variable was the CMO’s use of accountability mechanisms (Brinkeroff 2003) including performance monitoring, sanctions, and affiliated positive incentives. Indicators of these measures were broadly based on the service provision literature (Health Systems 2020 2012; World Health Organization 2010a) with an emphasis on personnel management practices. Measurement of these activities was accomplished through surveys of health providers at each facility, and included an assessment of the degree of monitoring providers were subject to, as well as specific types of positive incentives and sanctions used by CMOs to hold providers accountable. The extent of monitoring was modeled as a latent factor of the frequency with which the CMO: monitors provider attendance (never (0) to daily (4)); joins healthcare providers for their clinics (never (0) to weekly or more frequently (7)); and holds staff and/or bilateral meetings (never (0) to daily (6)). Sanctions were assessed by asking providers if there were consequences (i.e., interrogation, verbal warning, written warning, report, deduction in payment) in their center for unexcused absences, tardiness, performing below expectations, and recurrent early departure from their assigned shift. Since less than 10 16 All survey instruments were piloted prior to the main study and translated. 58 percent of providers indicated the presence of financial sanctions, financial versus nonfinancial sanctions could not be separated for analyses. Therefore, the variable was modeled as binary, meaning that sanctions were either present or absent at their facility for the behavior. A sum across the behaviors was created to create a sanctions score (range 0 to 4). Since most respondents reported a high degree of sanctions, a binary variable was created to represent either greater (score greater than 3) or lesser (score of 3 or less) use of sanctions. The presence of financial rewards and recognition was also assessed for consistent attendance, timeliness, attendance during the entire shift, and performing up to or above expectations. Less than 1 percent of providers reported the presence of financial rewards, making it unsuitable for modeling and leaving only nonfinancial rewards for analysis. A sum across the distinct behaviors was created to construct a rewards score (range 0 to 4). Similar to the sanctions measure, nonfinancial rewards were rarely offered. Therefore, a binary measure was created to represent either greater (score greater than 2) or lesser (score of 2 or less) use of nonfinancial rewards based on the distribution of the variable. Table 10 displays each measure of within-facility accountability, constituent items, and its form for analysis. Table 10: Measures of Within-facility Accountability Accountability Instrument Item(s)/Instrument Question Measurement Dimension Factor analysis, a  How often does the CMO monitor your method by which attendance? separate items Health  How frequently does the CMO join you for your measuring one Monitoring Provider clinic? underlying concept Survey  How frequently does this center have staff are summarized into meetings or the CMO holds bilateral meetings with a score, using the you? average response to each item at center i.  Are there any repercussions [sanction by CMO] at Average number of your center for: “Yes” responses Health o unexcused absences; among providers at Sanctions Provider o recurrent tardiness; center i dichotomized Survey o recurrent early departure from assigned shift; and at greater than a o performing below expectations? score of 3.  Are there rewards [recognition by the CMO] at Average number of your center for: “Yes” responses Health o consistent attendance; among providers at Nonfinancial Provider o consistent timeliness; center i, Rewards Survey o consistent performance of entire assigned shift; dichotomized at and greater than a score o performing to or above expectations? of 2.  Are there rewards [financial reward by the CMO] Frequency of “Yes” at your center for: responses tabulated Health o consistent attendance; Financial individually due to Provider o consistent timeliness; Rewards infrequent use. Survey o consistent performance of entire assigned shift; Measure not used in and regression analysis. o performing to or above expectations? Finally, several factors (delineated in Table 11) were considered potential alternative explanations (confounders) at the patient, provider, CMO, and facility levels of the relationship between within-facility monitoring/incentives and provider effort. Directorate- and community-level monitoring, positive incentives, and sanctions were assessed through the health provider and CMO surveys to understand the degree to which 59 representatives from these two levels directly monitor, incentivize, and sanction staff behavior. The extent of top-down monitoring was modeled as the average frequency (from never (0) to weekly or more frequently (7)) with which a representative from the Directorate of Health/MOH/Royal Medical Service (RMS) joins the healthcare providers for their clinics. Top- down sanctions were assessed by asking providers if there were consequences (i.e., interrogation, verbal warning, written warning, report, deduction in payment) meted out in their center by the Directorate, MOH, or RMS for unexcused absences, tardiness, performing below expectations, and recurrent early departure from their assigned shift. Response options were treated as binary ‒ top-down sanctions were either present or absent at their facility for the behavior. A sum across the behaviors was created to create a sanctions score (range 0 to 4). The presence of top-down financial rewards and recognition was also assessed for consistent attendance, timeliness, attendance during the entire shift, and performing up to or above expectations. Similar to within-facility rewards, top-down financial rewards were almost nonexistent (less than 1 percent of health providers reported any top-down financial rewards) and nonfinancial rewards were only somewhat more frequently reported (8 percent of health providers reported any top-down nonfinancial reward). Therefore, only top-down sanctions could be modeled analytically. Bottom-up monitoring was assessed as whether or not the clinic had a CHC. Table 11: Potential Confounding Factors Construct Instrument Item(s)/Instrument Question Measurement  How often does a supervisor/representative Average response Top-Down Health Provider from the Directorate of Health/RMS join you among providers at Monitoring Survey for your clinic? center i.  Are there any repercussions [sanction by Directorate of Health/MOH/RMS] at your center for: Average number of Top-Down Health Provider o unexcused absences; “Yes” responses Sanctions Survey o recurrent tardiness; among providers at o recurrent early departure from assigned center i. shift; or o performing below expectations?  Are there rewards [recognition by Directorate Frequency of “Yes” of Health/MOH/RMS] at your center for: responses tabulated Top-Down o consistent attendance; Health Provider individually due to Nonfinancial o consistent timeliness; Survey overall infrequent use. Rewards o consistent performance of entire Measure not used in assigned shift; or regression analysis. o performing to or above expectations?  Are there rewards [financial reward by Directorate of Health/MOH/RMS] at your Frequency of “Yes” center for: responses tabulated Top-Down Health Provider o consistent attendance; individually due to Financial Survey o consistent timeliness; infrequent use. Rewards o consistent performance of entire Measure not used in assigned shift; or regression analysis. o performing to or above expectations? Bottom-Up  Is there a Community Health Committee at Response by CMO at CMO Survey Monitoring this center? center i. Average response Patient Exit  Overall, would you say that the patient’s/your Self-rated Health among patients at Interview health is [poor (1) to excellent (5)]? center i. 60  What was your total household annual Average response Socioeconomic Patient Exit income before taxes last year [<50 (1) JD to among patients at Status Interview 700 JD or more (9)]? center i.  What is the reason you/the patient visited the center today [routine visit or medical Average response problem/concern]? among patients at Percent of Care  Select type of routine visit [check-up, types of Patient Exit facility i leading to a Provided that is maternal and child health services, follow-up Interview percent of care Preventive for chronic condition]. received that was Measure combined responses from both items to preventive in nature. generate an indicator for whether visit was for preventive care or not. Receipt of  In the past three years, have you received Average response Continuing Provider Survey any form of continuous medical/health among providers at Medical training? center i. Education Receipt of  Did you do any post-graduate training or Response by CMO at Postgraduate CMO Survey fellowship? center i. Medical Training  Has this center been accredited by the Health Facility Response by CMO at CMO Survey Care Accreditation Council [no/not yet, Accreditation center i. once, more than once]? To provide a more in-depth examination of provider effort and accountability, many of the items listed in the tables above were presented to more than one respondent (for example, the health provider and the CMO), using similar wording where possible. Where relevant, these additional items are described in the results section. 3.3.4 Administration of the Instruments At each facility, enumerators administered the questionnaires to the CMO, health providers, and local health committee representative using a tablet computer. While this process was underway, another enumerator conducted the patient exit interviews with patients who had received services from the pediatric, family medicine, and general medicine clinics of the health facility. 3.3.5 Statistical Analyses Summary statistics were generated to examine the distribution of the various measures of provider effort and the use of sanctions. In addition, the relationship among the provider effort variables was examined with a correlation matrix to understand how strongly the variables related to one another and to ensure that they were not so highly correlated as to be substitutes for one another. In addition, the relationship between within-facility accountability measures and accountability originating from the Directorate and the community was examined with a correlation matrix. To examine the relationship between accountability and provider effort, multi-level linear regression models were constructed for each of the four measures of provider effort (absenteeism, compliance with CPGs, provision of rights-based care, and time with provider). For each outcome, an interaction between within-facility monitoring and sanctions was tested to determine if the relationship between monitoring and provider effort was different in clinics in which sanctions were used to a greater extent compared to those in which sanctions were less frequently used. Tests for an interaction between monitoring and rewards were originally intended. However, the extremely skewed distribution of the nonfinancial incentives 61 (i.e., the vast majority of clinics did not have a positive incentive environment) precluded such a test. Additional statistical detail is provided in Appendix C. 62 3.3.6 Results Study results are provided below, starting with descriptive findings of the measures of provider effort and accountability, followed by results of the regression analyses. These data stem from interviews with 2,101 patients and surveys of 772 healthcare providers, 122 CMOs, and 50 CHC representatives. Patient, Provider, and Clinic Characteristics Among the patients attending the clinics, patients reported earning between JD 300-399 per year on average. Approximately 15 percent of patients surveyed were attending the clinic for preventive care as opposed to curative care, and average patient health was reported to be very good. Among the providers, over half reported receiving continuing medical education in the prior three years and three-quarters of CMOs reported postgraduate training. Thirty percent of the centers were accredited. Table 12: Percentage of Healthcare Providers Following CPGs (N=2,101) Variability in Provider Effort Provider effort was variable. Average absenteeism in Refuse to Answer/ the PHCCs was 17 percent (MOH Yes No Don’t records indicate that most Know/NA absences were excused). Patients Take notes 68.92 29.41 1.67 reported spending approximately Listen to the description of the illness 82.91 7.09 10.00 10 minutes with a provider (doctor Ask about other symptoms 85.25 14.75 0.00 and/or nurse), ranging from 4-24 Take temperature 23.08 75.44 0.76 minutes. On average, clinicians Take pulse 16.71 82.39 0.90 performed about half of the eight Check blood pressure 21.94 77.49 0.57 methods of clinical assessment, Measure height/length and weight 12.95 86.44 0.62 with notetaking and verbal Conduct a bed examination 49.45 50.45 0.10 assessments conducted much more frequently than measurement of vital signs (Table 12). A bed examination was performed in about half of the encounters on average. According to patient reports, healthcare providers delivered rights-based care, scoring an average of six out of seven behaviors assessed. Table 13 describes the frequency of each of the items inquired about. The only item not overwhelmingly positively reported by patients was patient involvement in treatment plan decisions. Only about half of the patients reported participating in treatment decisions. 63 Table 13: Percentage of Providers Practicing Rights-Based Care (N=2,101) Refuse to Yes No Answer/Don’t Know/NA Provider explained the treatment plan* 84.36 15.06 0.58 Patient involved in deciding the treatment plan* 47.39 52.24 0.37 Provider explained things in a way that was easy to understand 86.39 8.23 5.38 Patient could talk privately to the provider 89.67 7.66 2.67 Exam was conducted in private 82.91 7.09 10.00 Patient was treated with respect 96.19 2.09 1.71 Patient had time to ask questions 86.96 5.95 7.09 *N=1899 as 202 did not receive a treatment during the encounter. Overall, the various measures of provider effort were related to one another (Table 14). Clinics in which a higher percentage of healthcare providers were absent were also more likely to have providers who on average exerted less effort in the clinical encounter; and there was a trend, although not statistically significant, for lower provision of rights-based care and shorter clinical encounters in clinics in which absenteeism was higher. As expected, positive relationships were detected between greater time in the clinical encounter and greater clinical effort and the provision of rights-based care. Table 14: Correlations Between Indicators of Provider Effort (N=122) Rights- Time Absenteeism CPGs Based with Care Provider Absenteeism 1.00 CPGs -0.20** 1.00 Rights-Based Care -0.15 0.40*** 1.00 Time with Provider -0.12 0.53*** 0.24*** 1.00 Note: Numbers represent correlation coefficients. Stars indicated p-values less than 0.05 (**) and less than 0.01 (***). The correlations were not so high as to suggest that one form of provider effort was a substitute for another. Therefore, all measures of provider effort were retained to generate a more comprehensive assessment compared to that gained using any one indicator alone. Within-Facility Accountability Mechanisms in PHCCs Characterized by High Level of Monitoring, Limited Nonfinancial Rewards, Nonexistent Financial Rewards, and Uniformity in Sanctions By design, the monitoring score had a mean of 0 and a SD of 1. Among the score’s components, according to the health providers, staff meetings were held monthly, the CMOs joined them for their clinics approximately twice weekly, and attendance was monitored daily. Nearly all CMOs (97 percent) reported tracking attendance. Most providers reported the presence of sanctions in their clinics for absenteeism (84 percent), tardiness (85 percent), and early departure from their shift (79 percent). A minority of healthcare providers reported recognition for regular attendance (30 percent), consistently arriving on time (29 percent), and consistently performing their entire shift (30 percent). Less than 1 percent of healthcare providers reported financial incentives for these same behaviors, but this is not surprising given that financing of public PHCCs is carried out centrally, with limited financial autonomy of CMOs over 64 facility budgets. Overall, absenteeism is a regularly monitored, clearly sanctioned, and poorly positively incentivized behavior in the PHCC environment. Almost all (95 percent) CMOs reported conducting observations or carrying out clinical record audits at least monthly. When asked specifically about actions taken to ensure adherence to CPGs, 70 percent of CMOs reported that they personally observe their providers’ clinics, 45 percent reported conducting patient clinical audits, and 41 percent reported training their providers as a mechanism to ensure adherence. Fourteen percent, however, reported doing nothing to ensure adherence to CPGs. Adherence to CPGs was hampered in some clinics (22 percent) by a total lack of CPG use and 17 percent of centers had not been provided with guidelines, according to CMOs. Nearly 30 percent of healthcare providers reported that guidelines pertaining to their area of responsibility had not been provided to the clinic. Therefore, while a high degree of monitoring is reported by CMOs and healthcare providers, monitoring that is intentionally geared toward guideline adherence occurs less often and the lack of guideline provision and use in some clinics effectively undermines adherence. Overall, most healthcare providers (65 percent) reported the presence of sanctions for performing below expectations; similar to the findings for attendance, less than a third (32 percent) reported the presence of recognition for performing beyond expectations. Financial incentives for performing well were rare (reported by less than 1 percent of healthcare providers), but again not surprising in light of the limited financial autonomy at the facility level referred to previously. On balance, the average PHCC environment was one in which within- facility monitoring was present, nonfinancial rewards were infrequent, financial rewards were nearly nonexistent, and sanctions were almost uniformly in place. Top-Down and Bottom-Up Accountability Mechanisms are Linked to Within-facility Monitoring and Sanctions Practice The frequency of top-down monitoring and sanctions was somewhat similar to within- facility monitoring and sanctions practice as can be seen by the significant positive correlations between within-facility and top-down accountability monitoring and sanctions presented in Table 15 and by health provider reports. According to health providers, representatives from the Directorate, MOH, or RMS joined them for their clinics quarterly, on average. This ranged from a frequency of not even once a year to at least weekly. Similar to the within-facility findings, nearly all behaviors investigated were sanctioned, although with differing frequency. Absenteeism was the most consistently reported sanctioned behavior, with 76 percent of health providers reporting the presence of this sanction. Recurrent tardiness (73 percent) and recurrent early departure from the shift were nearly as often mentioned sanctions (69 percent). Performing below expectations was mentioned by just over half of the providers (53 percent). Top-down monitoring and sanctions were not associated with CMOs’ use of nonfinancial rewards, although bottom-up monitoring was (Table 15), suggesting some role for CHCs in supporting the CMO’s use of rewards for quality enhancing behavior. Only 43 percent of clinics had a CHC, suggesting that many communities lack formal bottom-up mechanisms to monitor clinic performance. According to the CHC representatives surveyed, committees, where present, monitored overall clinic performance on average twice a year, suggesting at least a moderate level of bottom-up monitoring among clinics with a CHC. CHCs varied in their monitoring function, with some not monitoring clinic performance at all and others reporting that they monitor clinic performance monthly. Much greater information on what the monitoring entailed is needed to better capitalize on the potential benefits of this accountability mechanism. 65 Table 15: Correlations Between Within- Facility and Top-Down and Bottom-Up Measures of Accountability (N=122). Within Within Within Facility Facility Facility Monitoring Sanctions Non- Financial Rewards Top-Down 0.358*** 0.235*** 0.029 Monitoring Top-Down 0.207** 0.768*** 0.143 Sanctions Bottom-Up 0.055 0.060 0.289*** Monitoring Notes: Numbers represent correlation coefficients. Stars indicated p-values less than 0.05 (**) and less than 0.01 (***). CMO Monitoring Highly Correlated with High Provider Effort, But Not with Absenteeism Results from the multilevel regression models are presented in Table C.1 of Appendix C and described below. Each within-facility accountability mechanism (monitoring, sanctions, and rewards) was examined for its independent relationship to provider effort. Among the accountability mechanisms examined, monitoring proved to be the most consistent correlate of higher provider effort. In clinics in which the CMO monitored health providers more closely, health providers exerted greater clinical effort (p<.01), provided more rights-based and responsive care (p<.05),, and spent more time with patients in clinical examinations (p<.05),. Monitoring was not independently related to absenteeism, potentially due to the fact that absenteeism is already so frequently monitored in centers and sanctioned by the CMO and Directorate. CMO sanctions, when considered independently, were either not associated with provider effort (absenteeism, time spent with the provider) or were associated with poorer effort and rights-based practice, such that health providers exerted less clinical effort during exams and were less respectful of patients’ rights when providing care. This may suggest that a high sanctions environment is present in clinics in which provider effort is poor, or that sanctions are not producing the desired outcome. This cannot be discerned from the cross-sectional data. CMOs’ use of nonfinancial rewards to recognize good behavior and excellent clinical practice was not associated with provider effort. However, this form of incentive was very infrequently used, limiting this study’s ability to assess its effectiveness as a tool for enhancing accountability. Monitoring Associated with Greater Provision of Rights-Based Care in High Sanctions Environment In addition to the main effects of accountability mechanisms, this study tested whether the impact of monitoring on provider effort was different depending on the degree to which sanctions were also used to hold providers accountable. This effect was tested for each type of provider effort, but found to be significant only for the provision of rights-based care (Figure 19). In an environment of low sanctions, monitoring has little impact on the provision of and because rights-based care was provided at a high level at most facilities, perhaps due to the lack of monitoring specifically on this aspect of care. Alternatively, in environments where sanctions are 66 present for nearly every behavior assessed, monitoring is associated with greater provision of rights-based care. Figure 19: Relationship Between Monitoring and Rights-Based Care by Sanction Level (90% CI) Relationship Between Monitoring and Rights-Based Care by Sanctions (90% CI) 7 6.5 6 5.5 5 -.36 -.21 -.06 .09 .24 .39 Monitoring Low Sanctions High Sanctions Among the top-down and bottom-up accountability mechanisms, a few stand out, specifically top-down sanctions, which are associated with providers spending more time with patients and lower provider absenteeism. As noted above, absenteeism and other behaviors related to provider presence in the clinic such as recurrent tardiness and recurrent early departure from shift were recognized by most health providers to be sanctionable offenses. The cross-sectional nature of the data makes it difficult to know if the presence of sanctions serves to deter effort lapses or to punish them, or both. However, directorate-level supervision carries additional weight and power since the decision to terminate a staff member is made at the directorate/MOH level, not at the facility level. Therefore, sanctions may be used punitively within the facility, but may serve as a deterrent when issued by the more powerful directorate. This explanation, while plausible, does not explain the inverse association between top-down monitoring and both rights-based care and time with the provider. Monitoring may function differently when performed by the CMO compared to a representative from the directorate. However, the study cannot discern potential differences. Bottom-up monitoring through a CHC was associated with greater provision of rights-based care. Similarly, in accredited facilities, providers spent more time on average with patients than in centers that were not accredited. Notably, neither the health provider’s report of training in the prior three years nor the CMO’s receipt of postgraduate training was related to the var ious measures of provider effort. In fact, having received postgraduate training was associated with less time spent per patient. While more detailed information is needed to understand potential links between different types of training and provider effort, as some differences may be obscured by considering all types of training together, the finding does suggest that training alone is not sufficient to enhance effort as measured by this study. 67 3.3.7 Study Limitations More research is needed to examine some of the inconsistencies found across types of provider effort and level of accountability, preferably using a mixture of data collection formats since the present study relied almost entirely on self-reporting. The study attempted to avoid bias from the provision of socially desirable responses through the review of clinic administrative records for attendance and the analysis of data provided by health providers, patients, and CHC representatives in the regression analyses. To minimize the influence of employees or patients who might have a vested interest in responding overly positively or negatively, responses were sought from all health providers in a facility and 25 patients per facility, and responses were averaged at the facility level for the primary analyses. Questionnaire items were worded to elicit as objective a response as possible, avoiding language to suggest the presence of a “correct” response. Still, socially desirable responses cannot be ruled out. Further, given the asymmetry of information about the quality of care received, and the inclusion of patients receiving a variety of preventive and curative care, only a narrow range of key clinical procedures could be assessed through patient reports. While it is beneficial to assess patients ’ experiences across the major primary care service lines, the breadth of clinical experiences limited the range of clinical procedures that could be assessed to those that are likely present across preventative and curative, adult and pediatric care. Additionally, more service line or disease-specific research is needed to ascertain whether accurate and appropriate care was provided. Future work should examine these relationships over time to enable an assessment of cause and effect, which is confounded in the present study by its cross-sectional design. While this study includes a nationally representative sample of PHCCs, its findings pertain only to those with daily patient loads of at least 35 patients and potential differences between urban and rural facilities cannot be investigated. Finally, complex analyses were performed on a relatively small sample size, which limits the study’s power to detect relationships. 3.4 CONCLUSIONS Overall, the findings from this study characterize the degree and type of accountability mechanisms operating in Jordan’s PHCCs and their relationship to provider effort, with an emphasis on within-facility accountability. Results show that within-facility monitoring seems to improve provider effort. Bottom-up monitoring is potentially also beneficial, especially to encourage rights-based clinical care, but more research is needed in this area. Sanctions at the facility level likely serve more of a disciplinary function rather than acting as a deterrent when considered independently of other accountability practices. However, within-facility use of sanctions does seem to enhance the impact of monitoring on the provision of rights-based practice. From another perspective, accreditation seems to support longer clinical encounters and as shown in prior research, accreditation is associated with improved health outcomes in Jordanian hospitals (Halasa et al. 2015). This strategy is currently underutilized as a minority of clinics are accredited. The accreditation process involves a range of quality-enhancing changes, including the establishment of a CHC. While it is challenging for citizens who are not medically trained to monitor provider effort in the same manner as a CMO, the committee can serve as a channel for community preferences and grievances and can leverage its power to incentivize greater provider effort. 68 References Abu-Kharmeh, and S. Suleiman. 2012. “Evaluating the Quality of Health Care Services in the Hashemite Kingdom of Jordan.” International Journal of Business and Management 7(4):195- 205. Al-Qutob, R, and L.S. Nasir. 2008. “Provider perceptions of reproductive health service quality in Jordanian public community health centers.” Health Care Women Int 29(5):539-50. Banerjee, Abhijit, Angus Deaton, and Esther Duflo. 2004. "Health, health care, and economic development: Wealth, health, and health services in rural Rajasthan." American Economic Review 94.2: 326. Banerjee, Abhijit, and Esther Duflo. 2006. "Addressing absence." Journal of Economic Perspectives 20(1): 117. Banerjee, A. V., E. Duflo, and R. Glennerster. 2008, “Putting a Band-Aid on a Corpse: Incentives for Nurses in the Indian Public Health Care System.” Journal of the European Economic Association 6: 487–500. doi: 10.1162/JEEA.2008.6.2-3.487. Björkman, Martina, and Jakob Svensson. "When is community‐ based monitoring effective? Evidence from a randomized experiment in primary health in Uganda." Journal of the European Economic Association 8, no. 2‐ 3 (2010): 571-581. Brinkeroff, D. 2003. "Accountability and health systems: overview, framework, and strategies." Partners for Health Reformplus. Abt Associates Inc., Bethesda, MD. Brock, J.M., A. Lange, and K.L. Leonard. 2014. "Giving and promising gifts: experimental evidence on reciprocity from the field," Working Paper No. 165. European Bank for Reconstruction and Development. London, United Kingdom. Callen, Michael Joseph, Saad Gulzar, Syed Ali Hasanain, and Muhammad Yasir Khan. 2013. "The political economy of public employee absence: Experimental evidence from Pakistan." Available at SSRN 2316245. Chaudhury, Nazmul, Jeffrey Hammer, Michael Kremer, Karthik Muralidharan, and F. Halsey Rogers. 2006. "Missing in action: teacher and health worker absence in developing countries." Journal of Economic Perspectives 20(1): 91-116. Das, Jishnu, and Paul J. Gertler. 2007. "Variations in practice quality in five low-income countries: a conceptual overview." Health Affairs 26(3): w296-w309. Das, Jishnu, and Jeffrey Hammer. "Money for nothing: the dire straits of medical practice in Delhi, India." Journal of Development Economics 83, no. 1 (2007): 1-36. Das, Jishnu, and Jeffrey Hammer. 2014. "Quality of primary care in low-income countries: facts and economics." Annu. Rev. Econ. 6(1): 525-553. Das, Jishnu, and Carolia Sanchez-Paramo. 2004. “Short But Not Sweet: New Evidence on Short Duration Morbidities from India.” Working paper, Development Research Group . World Bank, Washington, DC. Das, J., and T. Sohnesen, 2007. "Variations in doctor effort: evidence from Paraguay." Health Affairs (Millwood) 26: w324-w337. Das, J., A. Holla, M. Kremer, A. Mohpal, and K. Muralidharan. 2013. "Quality and accountability in healthcare delivery: evidence from an audit study of healthcare providers in India." Unpublished manuscript. World Bank, Washington, DC. Demaio, Alessandro R., Karoline Kragelund Nielsen, Britt Pinkowski Tersbol, Per Kallestrup, and Dan W. Meyrowitsch. 2014. "Primary Health Care: a strategic framework for the prevention 69 and control of chronic non-communicable disease." Global Health Action 7:24504. Accessed on November 1, 2015 at http://dx.doi.org/10.3402/gha.v7.24504 Ferrinho, P., W. Van Lerberghe, I. Fronteira, F. Hipólito, and A. Biscaia. 2004. "Dual practice in the health sector: review of the evidence." Human resources for health 2(1): 14. Gertler, P.J., and C. Vermeersch. 2012. "Using performance incentives to improve health outcomes." Policy Res. Work. Pap. 6100. World Bank, Washington, DC. Halasa, Y.A., W. Zeng, E. Chappy, and D.S. Shepard. 2015. "Value and impact of international hospital accreditation: a case study from Jordan." Eastern Mediterranean Health Journal 21: 90-99. Health Systems 20/20. 2012. The Health System Assessment Approach: A How-To Manual. Version 2.0. www.healthsystemassessment.org HCHC . 2011. "Improving Healthcare at the Naitonal Level: Insights from the Amman, Jordan International Policy Seminar." Amman, Jordan. HHC. 2013. Jordan National Health Accounts 2010-2011 Technical Report 4 2013, High Health Council: Amman. IHME. 2010. Global Burden of Disease Study (2010) Dataset, Institute for Health Metrics and Evaluation, Seattle, WA. Khatatbeh, Moawiah. 2013. "Factors associated with high turnover of Jordanian physicians in rural areas: a sequential exploratory mixed method study." Ph.D. Curtin University. Centre for International Health. http://espace.library.curtin.edu.au/R?func=dbin. Irbid, Jordan Khoury, S.A., and S. Mawajdeh. 2004. "Performance of health providers in primary health care services in Jordan." Eastern Mediterranean Health Journal10(3):372-381. Leonard, K.L. 2008. "Is patient satisfaction sensitive to changes in the quality of care? An exploration of the Hawthorne effect." Journal of Health Economics 27: 444-459. Leonard, K., and M.C. Masatu, 2006. "Outpatient process quality evaluation and the Hawthorne Effect." Soc Sci Med 63: 2330-2340. Leonard, K.L., M.C. Masatu, and A. Vialou. 2007. "Getting doctors to do their best: the roles of ability and motivation in health care." J. Hum. Resource 42:682–700. Marmot, Michael, and Richard Wilkinson, eds. 2005. Social Determinants of Health. Second ed. Oxford: Oxford University Press. Otoom, S., A. Batieha, H. Hadidi, M. Hasan, and K. Al-Saudi. 2002. "Evaluation of drug use in Jordan using WHO patient care and health facility indicators." Eastern Mediterranean Health Journal 8:544-549. Rabie, T., B. Ekman, and E. Özçelik. 2014. 'Towards Universal Health Coverage: A Comprehensive Review of the Health Financing System in Jordan." World Bank, Washington, DC. World Bank. 2014. World Development Indicators, 2014, World Bank: Washington, DC. World Health Organization. 2010a. "Monitoring the building blocks of health systems: a handbook of indicators and their measurement strategies." World Health Organization: Geneva. World Health Organization. 2010b, "Package of Essential Noncommunicable (PEN) Disease Interventions for Primary Health Care in Low-Resource Setting." World Health Organization: Geneva. World Health Organization. 2011. "NCD Country Profiles: Jordan." World Health Organization: Geneva. 70 World Health Organization. 2014. "Global Health Observatory Data Repository." [cited 2014 April]; Available from: http://apps.who.int/gho/data/?theme=main. Geneva. 71 IV. Conclusions and Policy Recommendations CHAPTER IV 1 Effort put forth by teachers and healthcare providers in their jobs is seemingly low. Across both service sectors in this study, provider effort was low on average. Among the many standards to be followed in teachers’ classroom instructional practice, teachers are expected to strive to provide continuous feedback to students, respond to students’ questions in a way that is conducive to creating a respectful and emotionally supportive environment for learning, design a range of student assessment methods that provide a variety of performance opportunities for students, and consider specific student performance and needs while designing lessons. Yet an analysis of data collected by USAID through classroom observations, teacher questionnaires, and student surveys of a representative sample of second and third grade classrooms in Jordan reveals that effort put forth by teachers in meeting these standards is seemingly low. Only one in five teachers mark all pages of students’ copybooks, while roughly 25 percent of teachers mark only a few pages, and 3.4 percent do not mark even a single page. When a student is unable to answer a question, students report that as many as 70 percent of teachers simply repeat the exact same question to the same student again, or ask another student instead, while 5.4 percent of teachers scold the student or send her outside of the classroom or to stand in a corner. Moreover, almost two in three teachers report using only one or two methods of student assessment, and as little as one-fourth of all teachers report using these assessments to inform their lesson planning. While these findings are exclusive to teachers in early primary grades, they may be indicative of a wider challenge present across education levels in the country.17 Similarly, in health centers, doctors and other healthcare staff are expected to deliver appropriate care that meets technical standards while respecting patients’ rights. Doctors and other staff therefore must regularly come to work on time, remain in clinic for their full shifts, abide by up-to-date clinical protocols, listen and respond to patients with respect and clarity, and spend sufficient time with patients to understand their health concerns, diagnose health conditions correctly, and prescribe appropriate treatments and, where applicable, medications. An analysis of original data collected in this study shows that provider effort is low in multiple areas. During field visits to health centers, 17 percent of health providers on average were reported absent. While some clinics operated fully staffed, others were missing over half of their providers, suggesting a lack of access to care. Based on interviews conducted with patients exiting healthcare facilities, study findings highlight low provider effort during the clinical encounter. On average, health providers performed only half of key exam elements, suggesting 17The Classroom Observation Phase II Study prepared in 2015 by the National Center for Human Resources Development (NCHRD) in Jordan, documents a positive trend in teacher practices in the country in the 2011-2014 period. Specifically, using classroom observations, the study reports improvements in classroom management, student-centered teaching, and student assessment, as measured by a standardized classroom observation tool. These positive trends are certainly encouraging. Yet, as elaborated in the present study, more is required to bring teachers’ efforts (or practices) up to their knowledge frontier. 72 that diagnoses and other health-related decisions are being made with limited clinical information. Further, these decisions occur during clinical encounters that last as little as 4 minutes. The average length of an encounter was 10 minutes, but thorough, high-quality, rights- based care is difficult to deliver in that span, let alone in 4 minutes. This was substantiated by the data. Shorter encounters were associated with lower clinical effort and lower likelihood of the provision of rights-based care, although on average, patients reported that they received respectful, responsive, rights-based care. Across the two sectors, significant effort gains can be made. Given the strong evidence linking provider effort to higher-quality education and healthcare, findings from these studies highlight the potential quality gains to be made through policies incentivizing greater effort in both sectors. 2 Increasing principal and CMO monitoring of providers may yield tangible improvements in teachers’ and healthcare providers’ effort in the workplace. Being trained as teachers and medical doctors, having spent numerous years teaching in the classroom and providing clinical services, and sharing the same work space as the teachers and healthcare providers they oversee, school principals and CMOs are well placed to identify low levels of provider effort when they see them. Indeed, findings from this study suggest that principals and CMOs in Jordan that leverage this position of visibility by continuously monitoring teachers and healthcare providers are assisting providers to exert the effort needed to provide quality services. In the case of education, the analyses for this study suggest that teachers put forth more effort when principals conduct classroom observations and verify their lesson plans more frequently. Teachers who were better monitored provided more feedback to students and took more steps to create a positive learning environment for students. In turn, students tend to learn better when their school principals monitor teachers more frequently, as their teachers exert higher levels of effort. This is evidenced in this study by higher math and language test scores among students whose teachers were better monitored. Findings in the health sector mimic those in education. Health providers exert greater effort in examining and treating patients and spend more time with patients when CMOs institute and carry out monitoring procedures at the facility level. 3 Effective monitoring in Jordan is a missed opportunity. Teachers report that only 5 percent of school principals conduct weekly classroom observations. The majority of principals (57.3 percent) observe their teachers’ classroom instruction once every one to three months. Alarmingly, 12.5 percent of principals visit classrooms only once a year, and 4.9 percent have never conducted a classroom observation according to their teachers. Principals are more likely to verify teachers’ lesson plans, with 71.5 percent of them conducting this verification once every week. Still, roughly 8 percent of principals carry out this verification only once every one to three months, and 2 percent have never verified their teachers’ lesson plans. 73 Health providers seem to be monitored quite frequently. According to health providers, staff meetings are held monthly, the CMOs join them for their clinics approximately once every two weeks, and attendance is monitored daily. Nearly all CMOs (97 percent) report tracking attendance and a similarly high percent (95 percent) report conducting observations or carrying out clinical record audits at least monthly. Fourteen percent, however, report doing nothing to ensure adherence to CPGs. While a high degree of monitoring is reported by CMOs and healthcare providers, monitoring that is intentionally geared toward guideline adherence occurs less often. Given the effort-enhancing benefits of monitoring, quality gains can be made through more extensive monitoring in the education sector and better targeted monitoring in the health sector. 4 Reaping the highest values from principal and CMO monitoring is only possible in a strong incentives environment that rewards provider effort more than it penalizes it. Despite the effort gains that are possible through appropriate monitoring, the accountability environment in Jordan’s education and health sector s provides very few incentives for teachers and healthcare providers to dedicate the highest level of effort into their jobs. This relates to financial as well as nonfinancial incentives for providers at both the facility and central levels. a Financial incentives to encourage provider effort are absent. Salary schemes for teachers and healthcare providers are only tied to providers’ credentials and years of experience, providing no incentive for providers to perform to their knowledge frontier. Further, evidence from the case controlled study in education suggests a prevalent belief by teachers that they will receive an automatic promotion and salary increase after four to six years, regardless of how much effort they put forth in their jobs. The picture is no different at the facility level, where school principals and CMOs do not provide any kind of financial bonuses to incentivize high effort. On the other hand, reductions in payment are possible according to civil service regulations, although docking payment is rarely practiced. Only 19.6 percent of healthcare providers report the possibility that the Directorate of Health may dock their payment in case of absenteeism. Anecdotal evidence in education also suggests that reduction in payment is hardly used, and when it is, it is only to penalize unjustified absenteeism. b The accountability environment in Jordan leans heavily toward sanctions as opposed to recognition. Recognizing provider effort and achievements can increase motivation. Yet principals and CMOs in Jordan seldom rely on nonfinancial mechanisms to incentivize provider effort. And when they do, they mostly make use of mechanisms to sanction. Out of the six schools visited for the case controlled study in education, only one school principal was found to be systematically recognizing her teachers’ level of effort by organizing “teacher of the year” contests each academic year. In the rest of schools, two-thirds of interviewed teachers expressed a very strong desire to be recognized in any way by the principal for their high effort, so as to motivate them to keep up the good work. On the other hand, teachers in a third of the schools reported the use of verbal reprimands in the presence of colleagues as a penalty for underperformance. 74 Teachers in all visited schools agreed on the lack of any formal nonfinancial mechanism to reward or sanction teachers’ effort by the Directorate of Education. Similarly, the evidence on the health sector suggests that less than a third of all CMOs use some form of nonfinancial reward to recognize healthcare providers’ effo rt, while roughly two- thirds use sanctions ranging from verbal admonitions to written warnings to deter providers from being absent, late, leaving early from their shift, and underperforming. In environments in which sanctions are in place for most effort-related transgressions, the impact of monitoring on provider effort is enhanced at least for some types of provider effort. In a high sanctions environment, better monitored healthcare providers are more likely to provide rights-based care than more poorly monitored providers. However, the use of sanctions was shown to be unrelated to the linkage between monitoring and other forms of provider effort, and most clinics already operate in a high sanctions environment, suggesting limited additional benefit from greater use of sanctions as an effort-enhancing strategy. The use of positive incentives, on the other hand, is a promising strategy that is currently underutilized in Jordan. Greater managerial autonomy at the facility level could enhance the relationship between accountability and provider effort. In the health and education sectors, CMOs and principals have limited managerial autonomy that could support their more effective use of effort-enhancing accountability measures. In both sectors, extremely limited facility budgets preclude the use of financial incentives while the inability to hire and fire staff limits the impact of efforts to bolster provider accountability. Providing greater managerial and financial autonomy for CMOs and principals would incentive their use of accountability measures and potentially strengthen the impact of their monitoring and sanctioning efforts. 5 Increasing monitoring and strengthening the incentives environment will lead Jordan toward performance-based education and health systems. Traditional education and health systems place seniority and education credentials at the center of their interaction with teachers and healthcare providers. These determinants inform the advancement of providers’ rank in the organizational hierarchy, and the consequent impacts on salary raise. However, the imperative to improve the quality of education and health services has led many countries to instead put provider performance at the heart of this interaction. Moving toward such performance-based accountability systems requires countries to respond to four key questions, as follows. i What indicators will be used to measure provider performance? The selection of appropriate indicators to measure provider performance is of paramount importance, as this guides teachers and healthcare providers in their decision of where to 75 allocate their effort. The adequacy of this selection rests on two main criteria. On the one hand, indicators need to have a direct impact on the broader system goals of improving quality of education and healthcare services. On the other hand, countries should select indicators that providers can directly influence. In this regard, this study presents a set of indicators that lie within providers’ span of control. In other words, providers can influence these indicators by increasing their level of effort. In the education sector, these indicators include providing continuous feedback to students, responding to students’ questions in a way that is conducive to creating a respectfu l and emotionally supportive environment for learning, designing a range of student assessment methods that provide a variety of performance opportunities for students, and considering specific student performance and needs while designing lessons. The analysis in Chapter II suggests that improvements in these indicators may have also directly impact student learning in Jordan.18 In the health sector, indicators can track a variety of practices within facilities that can improve both the technical and nontechnical dimensions of care. These indicators include measures of: whether providers abide by clinical protocols and guidelines based on clinical observation or periodic reviews of patient records; time spent with patients; provider compliance with the basic principles of rights-based care; recurrent absences, tardiness, and/or early departures from shifts by staff members; the frequency of staff meetings; and the implementation of regular performance evaluations and clear communication of professional rights and responsibilities. ii How will these indicators be collected? Given their technical expertise, their daily proximity to providers, and their implicit responsibility to continuously monitor teachers and healthcare providers, principals and CMOs should be at the frontline of data collection endeavors. This study has identified relevant monitoring methods currently carried out by principals and CMOs in Jordan. School principals conduct classroom observations and verify teachers’ lesson plans, and CMOs join providers’ clinics, allowing them to directly observe different indicators of provider effort, such as the ones identified in this study. In addition to these conventional methods, principals and CMOs should complement their monitoring efforts by gauging beneficiaries’ perspectives of the quality of services they receive through student surveys and patient exit interviews. Beyond beneficiari es’ satisfaction, these instruments should aim to capture what is happening at the beneficiary-provider interaction. When asked the right questions, in the right ways, students and patients can be an important source of information on what providers are doing in classrooms and clinics, contributing to cross-verify principals and CMOs’ own observations (Gates Foundation 2012; Leonard 2008). The USAID student survey instrument and the developed patient exit interview instrument used under this study provide good examples of this. Administrative records should also be used when possible to complement the information received from observation and client reports. This may require changes in information systems or in how information is documented (electronically or on 18 Careful consideration of these four areas is fundamental as the country develops teacher performance assessments based on the National Teacher Professional Standards under the Second Education Reform for the Knowledge Economy (ERfKE II), It is also highly informative in the Ministry of Education’s ongoing endeavor in establishing an accountability and quality assurance mechanism to incentivize stakeholders in the education system to improve learning in Jordan’s public schools. 76 paper) to allow for the easy retrieval of information needed to bolster provider effort and accountability. Yet for these monitoring methods to meaningfully contribute toward a performance-based system, they ought to be systematized in their frequency and standardized in their documentation. As mentioned above, evidence from this study reveals that some principals conduct classroom observations every single day, while others do so only once a year. Although CMOs and health providers reported high levels of monitoring, considerably less monitoring was directed at compliance with CPGs –an essential component of safe, high-quality healthcare. Frequent observation of these indicators, through a number of different methods, is critical as it increases the likelihood of obtaining reliable indicators that produce similar results under consistent conditions. But this process should not stop here ‒ these indicators need to be documented in a standard manner to provide a solid evidence base for providers’ annual performance appraisals and communicated to the directorate level. Importantly, the collection and documentation should occur across all facilities and audits of CMOs’ and principals’ use of accountability mechanisms and provider effort could be performed on a regular basis by the directorate as part of the top-down monitoring function. Although principals and CMOs should be at the frontline of monitoring endeavors, the independent verification role of the Directorates of Education and Health is also key. Directorate inspectors should corroborate the indicators reported by principals through periodic –and to the extent possible unannounced– visits to the providers. In this regard, evidence from this study indicates that roughly 60 percent of schools receive monthly visits from directorate supervisors, while nearly 23 percent of schools are visited only once a year or not at all. In the health sector, providers reported that representatives from the directorate joined them for their clinics quarterly on average, but this was not a uniform practice, with some providers reporting almost no direct monitoring and others reporting being joined by a directorate representative at least weekly. Therefore, a considerable degree of inconsistency exists in directorate-level monitoring of provider behavior. Beyond the need for periodic visits from directorate inspectors, the use of common metrics that mirror those used by the principals are key to ensure a quality verification process. iii What actions will be taken in light of these indicators? With reliable indicators of provider performance in their hands, the next question to address is what principals, CMOs, and ministries will do with this information. Reward and sanction schemes need to be devised and tied to performance indicators to incentivize a change in provider effort. At the facility level, the use of nonfinancial rewards (such as recognition of good performance through “employee-of-the-month” or other types of awards and opportunities for additional training that are tied to performance) are a promising course of action in Jordan that can be implemented in the short run, at very little cost . At the central level, the need to tie promotions and salary increases of teachers and healthcare providers to performance indicators cannot be overemphasized, and is well within reach in Jordan. With the largest share of Jordan’s education and health expenditures devoted to salaries, ensuring that salary increases are merit based has the potential to significantly increase efficiency in the allocation of public resources, while at the same time aligning system incentives toward the goal of improving the quality of education and healthcare. 77 In the medium to long term, more sophisticated pay for performance (P4P) schemes that are closely linked to quality of service delivery can be explored, tailored, and incrementally implemented in the Jordanian context, bringing Jordan to the forefront of performance-based systems along with some of the most advanced countries in the world. The design of P4P schemes should benefit from the growing body of research on the use of these schemes in both sectors, including the appropriate size of incentives, strategies for the mitigation of potential unanticipated consequences, sources of funding and resource flows, individual versus group incentives, an orientation toward positive and not punitive incentives, as well as the necessary implementation arrangements and monitoring and evaluation mechanisms around them. iv How can the approach? above be addressed through a systems Performance-based accountability is one of the links of effective performance management systems. Performance-based accountability is one of the key links in performance management systems (PMS), but not the only one. As such, it needs to be fully incorporated into existing PMS, creating synergies with all other elements in the system. At the facility level, principals and CMOs ought to effectively communicate specific expectations for teaching and clinical practice to providers in light of the performance indicators against which they would be held accountable. Moreover, their role in creating an adequate supportive environment for providers that is conducive to eliciting the highest level of effort is essential. This includes ensuring all necessary equipment and supplies are present at the facility and are well functioning. In a constrained budget environment, principals and CMOs should be very strategic in prioritizing those structural factors that are especially important for providers to achieve performance indicators. Finally, paramount to an effective PMS and a natural extension of their monitoring efforts is the technical leadership role that principals and CMOs should play in their facilities (Education First and Gates Foundation 2015). Beyond providing teachers and health providers with a summative assessment through the annual performance appraisal, the provision of actionable, formative feedback should be built into the ongoing monitoring mechanisms and incentive schemes of principals and CMOs, ensuring that providers striving to improve their efforts in the classroom and clinics are well aware of how they can do so. Performance indicators are highly valuable for strategic professional development planning. At the central level, effective personnel management systems use two main pillars to ensure continuous improvement in provider performance. On the one hand, and as discussed above, strong performance-based accountability systems are required to incentivize the highest level of effort by providers. On the other hand, and building on this first pillar, ministries need to closely examine providers’ performance indicators to refine and purposefully target teacher professional development and continuous medical education programs. Furthermore, and beyond their clear relevance to inform in-the-job training programs, performance indicators 78 provide ministries with a wealth of information to identify specific areas of strength as well as areas for growth of teachers and healthcare providers that can inform pre-service education and certification programs in Jordan. Adequate accountability and training are required for principals and CMOs to champion such an important undertaking. If principals and CMOs in Jordan are to become the primary champions of a strong performance-based accountability system for teachers and healthcare providers, they should be subject to an accountability system that ensures they meet their monitoring functions and their technical leadership roles to the best of their ability. The role of the Directorates of Education and Health in systematically monitoring and verifying principal and CMO practices, coupled with the provision of financial and nonfinancial incentives to motivate them, is key. Similarly important is the need to provide the necessary training –both pre-service and in- service–to ensure that principals and CMOs are well-equipped to champion such an important undertaking. In sum, this study has shown that Jordan’s education and health sectors can greatly benefit from instituting more effective monitoring and incentive systems to enhance provider effort for better education and health outcomes. The role of school principals and CMOs in this respect cannot be overemphasized given their knowledge and proximity to interactions that take place at the student-teacher and patient-health provider interface. The move toward a performance- based system in both sectors is a sound overall policy reform that the Government of Jordan would be advised to further pursue. This calls for reorientation of the system in a way that ensures more efficiency by linking pay to productivity and a focus on quality. Arrangements to achieve efficiency may also be seen as equitable if they fairly reward provider performance. To realize this, such systems need to uphold performance-based accountability and strongly integrate it within existing PMS. The Government of Jordan would be advised to initially pilot the recommended course of action presented in this report on a small scale, which can then be taken to economies of scale contingent on positive outcomes as corroborated through extensive impact evaluations. The design of such a pilot program and its idiosyncrasies would be informed by consultations with stakeholders in both sectors in Jordan. As previously described, the pilot would need to carefully consider a number of key design features, including inter alia: criteria to measure performance; specificities related to performance appraisal systems; feedback mechanisms; the right mix of extrinsic, as well as intrinsic, rewards and sanctions; appropriate quantum of pay subject to performance criteria; evaluation schemes; implementation arrangements; and overall governance mechanisms. With the above said, while Jordan’s overall educati on and health systems have fared well over the past two decades, the recommendations presented in this report based on findings from the two sectoral studies provide an even stronger impetus to push Jordan to the forefront in both sectors. It is high time that Jordan reaps the benefits of its investments in health and education. The focus on provider effort and quality under an effective accountability system is at the heart of reform and cannot be stressed enough. 79 References Education First and Gates Foundation. 2015. “Giving Teachers the Feedback and Support They Deserve. Five Essential Practices.” Seattle, Washington. Gates Foundation. 2012. “Gathering Feedback for Teaching. Combining High-Quality Observations with Student Surveys and Achievement Gains.” MET Project. Research Paper. Seattle, Washington. Leonard, K.L. 2008. "Is patient satisfaction sensitive to changes in the quality of care? An exploration of the Hawthorne effect." Journal of Health Economics 27: 444-459. 80 APPENDIX A: EDUCATION SECTOR Table A.1: Framework for Teaching Framework for Teaching (FFT) Domain 1: Planning and Preparation 1a Demonstrating Knowledge of Content and Pedagogy 1b Demonstrating Knowledge of Students 1c Setting Instructional Outcomes 1d Demonstrating Knowledge of Resources 1e Designing Coherent Instruction 1f Designing Student Assessments Domain 2: Classroom Environment 2a Creating an Environment of Respect and Rapport 2b Establishing a Culture for Learning 2c Managing Classroom Procedures 2d Managing Student Behavior 2e Organizing Physical Space Domain 3: Instruction 3a Communicating with Students 3b Using Questioning and Discussion Techniques 3c Engaging Students in Learning 3d Providing Feedback to Students 3e Demonstrating Flexibility and Responsiveness Domain 4: Professional Responsibilities 4a Reflecting on Teaching 4b Maintaining Accurate Records 4c Communicating with Families 4d Participating in the Professional Community 4e Growing and Developing Professionally 4f Showing Professionalism Source: Danielson 1996. 81 Table A.2: Summary Statistics Obs Mean Std. Dev. Min Max 297 7.236 1.602 2 11 Principal Monitoring Index Creating an Environment of Respect 311 1.171 0.413 0 2 and Rapport 291 1.830 0.699 0 3 Providing Feedback to Students Designing Student Assessment 305 2.243 1.421 0 6 Designing Coherent Instruction 305 0.243 0.429 0 1 305 2.089 0.665 1 5 Teacher Level of Education 305 0.377 0.485 0 1 Reading Pre-Service Training 305 0.384 0.487 0 1 Math Pre-Service Training 311 0.174 0.379 0 1 Receipt of External Funding 311 0.644 0.223 0 1 Households with Computer 311 -0.001 0.720 -5.161 0.582 School Wealth Index 311 1.865 0.737 0 3 Directorate School Inspection 304 1.174 1.074 0 4 Directorate Supervisor Classroom Visit 309 2.058 0.740 0 4 PTA Meeting Frequency 311 0.399 0.490 0 1 Rural 304 3.175 0.469 1.099 3.892 log(Teacher-Student Ratio) 311 0.900 0.647 0 2 School Gender 82 Table A.3: Principal Monitoring and Teacher Effort in Jordan Providing Feedback to Students Creating a Climate of Respect Designing Student Assessments Designing Coherent Instruction Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10 Model 11 Model 12 Principal Monitoring 0.058** 0.062** 0.064** 0.024* 0.025* 0.023* 0.043 0.040 0.043 0.017 0.016 0.022* Index (0.025) (0.025) (0.026) (0.014) (0.014) (0.014) (0.027) (0.027) (0.027) (0.012) (0.012) (0.12) School Gender -0.054 -0.064 -0.021 0.090*** 0.087*** 0.114*** 0.084 0.082 0.092 0.012 0.010 0.014 (0.057) (0.057) (0.059) (0.032) (0.032) (0.033) (0.061) (0.069) (0.061) (0.027) (0.027) (0.028) Rural -0.052 -0.032 -0.059 0.105*** 0.105*** 0.077 0.101 0.097 0.102 0.087* 0.084* 0.094** (0.092) (0.092) (0.096) (0.050) (0.051) (0.051) (0.103) (0.101) (0.105) (0.045) (0.046) (0.047) Computer in Household 0.353* 0.319 0.369* -0.035 -0.041 -0.280 -0.077 -0.065 -0.155 0.074 0.072 0.075 (0.209) (0.207) (0.206) (0.115) (0.115) (0.112) (0.222) (0.219) (0.219) (0.100) (0.100) (0.101) School Wealth Index -0.126* -0.118* -0.129* 0.016 0.019 0.003 0.091 0.068 0.048 -0.030 -0.035 -0.032 (0.066) (0.065) (0.066) (0.034) (0.034) (0.034) (0.065) (0.065) (0.066) (0.029) (0.030) (0.030) Receipt of External 0.127 0.141 0.038 0.031 0.140 0.133 0.047 0.048 Funding (0.111) (0.111) (0.063) (0.061) (0.121) (0.120) (0.055) (0.55) Reading Pre-Service -0.211** -0.228** -0.017 -0.001 0.195* 0.182 0.047 0.047 Training (0.105) (0.105) (0.060) (0.059) (0.113) (0.113) (0.052) (0.052) Math Pre-Service 0.196* 0.197* -0.017 -0.024 0.094 0.089 -0.009 -0.024 Training (0.105) (0.105) (0.060) (0.059) (0.112) (0.111) (0.052) (0.052) Directorate Supervisor 0.070 0.076*** 0.046 -0.026 Classroom Visit (0.043) (0.023) (0.046) (0.021) log(Teacher-Student -0.100 -0.012 0.047 0.078 Ratio) (0.099) (0.059) (0.107) (0.049) Teacher Education 0.094 0.050 0.066 0.028 (0.059) (0.032) (0.059) (0.028) Directorate School -0.020 -0.099*** -0.032 -0.035 Inspection (0.064) (0.035) (0.074) (0.033) PTA Meeting Frequency -0.054 0.001 0.061 -0.025 (0.056) (0.031) (0.060) (0.028) Constant 1.234*** 1.210*** 1.346*** 0.891*** 0.902*** 0.923*** 1.716*** 1.592*** 1.234** 0.062 0.047 -0.158 (0.259) (0.258) (0.421) (0.143) (0.144) (0.232) (0.329) (0.331) (0.480) (0.131) (0.132) (0.209) Directorates 39 39 39 39 39 38 39 39 38 39 39 38 Schools 149 149 147 152 152 150 152 152 150 152 152 150 N 276 276 273 293 293 289 293 293 289 293 293 289 Note: Standard errors in parentheses. *p-value<0.10 **p-value<0.05 ***p-value<0.01. 83 Table A.4: Summary Statistics of Variables Included in the Mediation Analysis Variable Obs Mean Std. Dev. Min Max Letter Sound Knowledge 3,063 26.167 21.344 0 100 Reading Comprehension 2,832 33.439 31.513 0 100 Number Identification 2,987 77.349 24.829 0 100 Word Problems 3,063 1.224 1.039 0 3 Creating a Climate of Respect and Rapport 2,882 1.171 0.563 0 2 Providing Feedback to Students 2,582 1.871 0.838 0 3 Designing Student Assessments 3,003 2.241 1.426 0 6 Designing Coherent Instruction 3,003 0.246 0.431 0 1 Monitoring Index 2,923 7.229 1.602 2 11 Help With Homework 3,063 0.872 0.334 0 1 Private Tutoring Sessions 3,052 0.336 0.916 0 3 Radio in Household 3,063 0.473 0.499 0 1 Vehicle in Household 3,063 0.708 0.455 0 1 Computer in Household 3,043 1.584 2.072 0 5 Receive Free Meals 3,063 0.645 0.479 0 1 84 Table A.5: The Indirect Effect of Principal Monitoring on Student Outcomes Dependent Variable Letter Sound Reading Number Mediating Variable Word Problems Knowledge Comprehension Identification 0.009** 0.007*** 0.005* 0.003* Providing Feedback to Students [0.04] [0.01] [0.08] [0.09] Creating an Environment of Respect and 0.003** 0.012*** 0.009*** 0.008*** Rapport [0.04] [0.01] [0.00] [0.01] Designing Student Assessments 0.002 0.003 0.004** 0.003** [0.16] [0.15] [0.03] [0.03] Designing Coherent Instruction -0.001 0.001 0.001 -0.002 [0.74] [0.69] [0.36] [0.62] Note: p-value in parentheses. *statistically significant at 90% **statistically significant at 95% ***statistically significant at 99%. 85 Table A.6: Robustness Checks on the Indirect Effect of Principal Monitoring on Student Outcomes Dependent Variable Letter Sound Reading Number Mediating Variable Word Problems Knowledge Comprehension Identification 0.026*** 0.028*** 0.023*** 0.018*** Providing Feedback to Students [0.00] [0.00] [0.00] [0.00] Creating an Environment of Respect 0.004* 0.007* 0.007** 0.005** and Rapport [0.08] [0.09] [0.03] [0.05] Designing Student Assessments 0.003 0.004 0.005*** 0.005* [0.17] [0.16] [0.01] [0.07] Designing Coherent Instruction -0.013 -0.014 -0.017 -0.025* [0.33] [0.38] [0.14] [0.09] Note: p-value in parentheses. *statistically significant at 90% **statistically significant at 95% ***statistically significant at 99%. 86 APPENDIX B: SENSITIVITY ANALYSIS The sequential ignorability (SI) assumption is necessary to achieve identification in mediation analysis. The SI assumption comprises two assumptions: first, the independent variable is assumed to be statistically independent of potential outcomes and potential mediating variables; and second, the mediating variable is assumed to be exogenous conditional on pretreatment confounders and the independent variable of interest (Hicks and Tingley 2011; see Chapter II references). Since SI is likely to be violated in the data, a sensitivity analysis is presented in this appendix to determine the extent to which the estimates are robust to violations of SI. The sensitivity parameter produced by sensitivity analysis – denoted by   [-1, 1] –represents the correlation between the error terms in the mediation and outcome models. A nonzero correlation between the error terms denotes a violation of the SI assumption. By conducting sensitivity analysis, the point for  where the indirect effect is estimated to be zero is calculated to determine how robust the estimates are to violations of SI. For example, Figure B.1 presents results from a sensitivity analysis that uses the Letter Sound Knowledge variable to measure student outcomes and the Providing Feedback to Students variable to proxy for teacher effort. The black line in the plot represents the estimated indirect effect (denoted by ACME) for different values of . The sensitivity analysis estimates that the indirect effect is equal to 0 when  equals 0.1577; hence the plot in the figure crosses 0 when  equals 0.1577. The 95% confidence interval around the estimated indirect effect at different values of  is denoted in gray. This suggests that the estimate of the indirect effect of principal monitoring is somewhat sensitive to violations of SI. Only when  < 0.1 is the indirect effect estimated to be positive and statistically significant. The results in the figure are highly representative of the results produced by sensitivity analyses for each of the 16 estimates presented in Table A.5, suggesting that all of the results presented in this section are somewhat sensitive to SI violations. Figure B.1: Sensitivity Analysis Results 87 APPENDIX C: HEALTH SECTOR Sample Size Calculations Due to the lack of preliminary data, sample size calculations relied on prior research examining the linkage between accountability and absenteeism (Banerjee, Glennerster, and Duflo 2008; D’Amuri 2011; Dhaliwal and Hanna 2014). Sample size calculation parameters include an alpha of 0.05, a power of 0.80, and the use of a linear regression model to test the primary relationship of interest in which provider effort, y, is regressed on approximately15 independent variables (x1-x15). The independent variable of interest xi is assumed to increase the model’s R2 by 0.15 when it is included in the model and the restricted model’s R2 is assumed to be 0.1. The assumed intra-class correlation (ICC) is 0.2, and the sample averages 10 PHCCs per Directorate of Health in Jordan. Based on these assumptions, the study requires a sample size of approximately 120 PHCCs across all 13 Directorates of Health. Assuming a continuous measure of provider effort based upon dichotomous ratings of provider effort at the patient level, a 90% confidence level, a margin of error no larger than 0.15, and that approximately 70 percent of patients will report that their physician provided high effort, then 25 patients should be surveyed at each facility. Budgetary constraints precluded a larger sample size, which would have reduced the margin of error. Statistical Analyses Descriptive statistics were generated to examine the distribution of the various measures of provider effort and the use of sanctions. In addition, the relationship among the provider effort variables was examined with a correlation matrix to understand how strongly the variables relate to one another and to ensure that they are not so highly correlated as to be substitutes for one another. To examine the relationship between accountability and provider effort, multi-level linear regression models allowed the intercept in the regression equation to vary randomly by Directorate of Health, as the clinics are nested within each directorate. Two models were constructed for each of the four measures of provider effort (absenteeism, compliance with CPGs, provision of rights-based care, and time with provider). In the first model for each of these outcomes (a), within-facility monitoring, sanctions, and rewards were modeled along with potential confounders. In the second model (b) for each outcome, an interaction between within-facility monitoring and sanctions was tested. Tests for an interaction between monitoring and incentives were originally intended, but the extremely skewed distribution of the nonfinancial incentives (i.e., the vast majority of clinics did not have a positive incentive environment) precluded such a test. Estimated margins were calculated and graphed for the 25th, 50th, and 75th percentiles of the monitoring variable to display significant interactions. To examine whether the relationship between each outcome variable and the continuous exposure and confounder variables was linear, restricted cubic spline functions (Desquilbet and Mariotti 2010) were used and quadratic terms were introduced where indicated. Regression Results Table C.1 presents the results of the multi-level regression analysis where each within- facility accountability mechanism (monitoring, sanctions, and rewards) was examined for its independent relationship to provider effort. Statistical significance is indicated by (*), where (***) indicates a p-value less than 0.01, (**) indicates a p-value less than 0.05, and (*) indicates a p- value less than 0.10. Standard errors for the estimated coefficients are in parentheses under each estimate. For a verbal description of the results see Chapter 3. 88 Table C.1: Relationship Between Accountability Practices and Provider Effort (N=122) Absenteeism Clinical Effort Rights-based Time with a Practice Provider b/se b/se b/se b/se CMO Monitoring -0.008 0.510*** 0.278** 1.314** (0.029) (0.163) (0.122) (0.572) CMO Sanctions 0.013 -0.546** -0.500*** -0.120 (0.039) (0.224) (0.167) (0.787) CMO Rewards -0.014 -0.228 -0.131 -1.163 (0.035) (0.206) (0.153) (0.732) Top-Down Monitoring 0.017 -0.075 -0.095* -0.619** (0.013) (0.073) (0.054) (0.259) Top-Down Sanctions -0.027** -0.080 -0.054 1.891** (0.013) (0.078) (0.058) (0.865) Top-Down Sanctions2 -0.570*** (0.204) Community Health Committee -0.016 0.015 0.239* 0.773 (0.030) (0.173) (0.129) (0.600) SES 0.005 2.753** 1.415 0.592 (0.020) (1.253) (0.934) (0.421) SES2 -0.236** -0.113 (0.115) (0.086) Preventive Care -0.211** -0.192 0.388 2.653 (0.102) (0.570) (0.425) (2.035) Patient Health -0.028 0.856*** 0.386** 1.292* (0.034) (0.203) (0.153) (0.719) Provider Continuing Education 0.016 0.300 0.230 1.418 (0.046) (0.279) (0.209) (0.987) CMO Postgraduate Training 0.015 0.109 -0.011 -1.211* (0.032) (0.184) (0.138) (0.644) Accreditation -0.004 0.221 0.093 1.255** (0.031) (0.174) (0.130) (0.611) _cons 0.270 -6.491* 0.887 4.721 (0.174) (3.538) (2.633) (3.625) N Directorates 13 13 13 13 N PHCCs 122 122 121 122 89 References Banerjee, A.V., R. Glennerster, and E. Duflo. 2008. "Putting a band-aid on a corpse: incentives for nurses in the Indian public health care system." Journal of the European Economic Association 6: 487-500. Danielson, Charlotte. 2011. Enhancing professional practice: A framework for teaching . ASCD, 2011. Alexandria, VA. D’Amuri, F. 2011. "Monetary incentives vs. monitoring in addressing absenteeism: experimental evidence." Bank of Italy, Economic Research and International Relations Area. Rome. Desquilbet, Loic, and François Mariotti. "Dose response analyses using restricted cubic spline functions in public health research." Statistics in medicine 29, no. 9 (2010): 1037-1057. Dhaliwal, I., and R. Hanna. 2014. "Deal with the devil: the successes and limitations of bureaucratic reform in India." National Bureau of Economic Research, Cambridge, Massachusetts. 90 91