WPS4086 Patient Satisfaction, Doctor Effort and Interview Location: Evidence from Paraguay Jishnu Das (World Bank)1 Thomas Pave Sohnesen (World Bank) Abstract To examine the relationship between patient satisfaction and doctor performance, we observed 2,271 interactions between 292 doctors and their patients in 98 clinics and hospitals in Paraguay and conducted an exit-survey with the same patients as they left the clinic. For a sub-sample of 64 facilities we also tracked down and interviewed patients who visited the facility in the week prior to the clinical observation date. There are three patterns in the data: (a) Patient satisfaction is positively correlated with doctor effort, measured as a combination of time spent, questions asked and examinations performed after controlling for observed doctor and patient characteristics; (b) however, accounting for unobserved doctor characteristics dramatically reduces the level of significance and size of correlation between effort and satisfaction, showing that much of the positive relationship is driven by these unobserved doctor-specific factors and; (c) reported satisfaction is significantly lower for patients interviewed at home compared to those interviewed at the clinic. Even if patient satisfaction reflects some aspects of the doctor's performance, unobserved heterogeneity combined with survey biases limit the widespread applicability of patient satisfaction as an indicator of doctor performance. World Bank Policy Research Working Paper 4086, December 2006 The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors, or the countries they represent. Policy Research Working Papers are available online at http://econ.worldbank.org. 1Corresponding Author: jdas1@worldbank.org. The data collection for this paper was conducted as part of a World Bank study on health in Paraguay, led by Daniel Dulitzky (World Bank) and the data was collected by CIRD, Asuncion. In addition we thank, without implicating, Ken Leonard for extensive discussions and Ariel Fiszbein and Jesko Hentschel for comments and suggestions. 1 Introduction Measures of user-satisfaction are widely used as indicators of public sector performance and a tool for improving accountability. They hold a particularly attractive promise for assessing quality in the delivery of health care, since health outcomes reflect multiple factors and take a long time to change. Clearly, the extent to which measures of user-satisfaction can be used to increase incentives for doctors depends on whether they reflect an aspect of performance that doctors can control. Using data from Paraguay collected by the Centro de Información y Recursos para el Desarrollo (CIRD) in collaboration with the authors, this paper empirically assesses whether patient satisfaction responds to a measurable component of doctor performance that is also likely to improve the quality of medical advice. If so, satisfaction could be used as a measure of performance; further, if this is an aspect of performance that the doctor can control, incentive payments based on satisfaction measures could also lead to improvements in medical care. The core of the empirical exercise is the simultaneous measurement of doctor performance and patient satisfaction. The measure of doctors' performance, which we call doctor "effort", is based on an easily observable set of actions undertaken by doctors in their interactions with patients. To measure effort, interviewers observed doctor-patient interactions and noted, for every interaction, the time spent, the questions asked and the examinations performed. These three indicators were combined into a single index of effort using principal component analysis. To measure patient satisfaction, a second interviewer conducted an exit survey outside the doctor's office for every patient. This exit survey also included questions on patient's socio-economic background and health status. The results show that patient satisfaction is positively correlated with doctor effort within a multivariate regression context. However, the correlation between satisfaction and effort is small to begin with, and disappears once provider and patient characteristics are adequately controlled for through doctor fixed-effects (more on this below). In contrast, the interview location and/or the time since the last interaction with the doctor dramatically alters reported satisfaction, suggesting that factors other than a measurable aspect of the doctor's performance play a large role in the determination of patient satisfaction. There are several reasons for our focus on doctor effort as an indicator of performance. First, higher effort leads to better care--recent studies show that doctor effort is strongly correlated with the doctor's medical knowledge and with the quality of medical advice (Das and Hammer 2006, Leonard and others 2005). This suggests that effort and competence are complements--if they were substitutes so that less competent doctors exert higher effort, the relationship between effort and quality of care becomes unclear. Second, as these studies document, the problem is not that doctors do not know what to do, it's that they put in too little effort relative to what they do know. Das and Hammer (2006) for instance, show that increases in the quality of care due to a 5-year medical degree are completely offset by the difference in effort between the public and private sector in India. Third, if doctors are to be rewarded for their performance, the reward has to be based on an aspect of the doctors' practice that s/he can control. Although doctors could acquire training to improve their competence, in the short-run, all improvements in the quality of care are likely to arise from higher effort in clinical practice. If measures of user-satisfaction are then to be used as quality indicators in (say) pay-for-performance schemes, a positive correlation between satisfaction and effort is a reasonable minimum requirement. This paper contributes to a growing literature suggesting that patient satisfaction depends on characteristics of the facility, the doctor and the patient, as well as aspects of the interaction between the doctor and the patient (Crow et all 2002, Williams, S et all 1998 and Avis et all 1997). Given the multiple factors that affect patient satisfaction, there is little consensus about whether satisfaction 2 measures can be used to evaluate the quality of care. For instance, Davies and Ware (1988) conclude that "consumers can provide a valid assessment of quality and that bias from personal characteristics is not strong enough to invalidate consumers' ratings", a finding that contrasts with those of Eriksen (1987) and Chang et al. (2006), which do not find support for a strong relationship between the quality of care and patient satisfaction. We differ from previous studies in our methodological approach. Quantitative studies of patient satisfaction often examine correlations between reported satisfaction and particular aspects of the doctor-patient interaction, with and without the inclusion of patient and doctor characteristics as additional covariates. As is well known from the empirical economics literature, these associations have limited importance as policy-tools if unobserved patient and doctor characteristics correlated both with effort and satisfaction could bias the results. For instance, if more courteous doctors also exert greater effort, and patients report higher satisfaction when they are treated courteously, a positive correlation between effort and satisfaction would reflect, in part, the positive association between courtesy and satisfaction rather than the impact of effort.2 Policy based on such associations could lead to results very different from those intended--for instance, bonuses for doctors based on reported satisfaction may lead to more courteous behavior without any impact on the quality of medical advice. To address this problem we collected doctor effort and patient satisfaction measures for multiple patients for every doctor in the sample. By comparing patient satisfaction for the same doctor when s/he exerts greater/less effort in the interaction, unobserved variables at the level of the doctor can be accounted for. This technique, known as the "fixed-effect" or "within" specification, has been used extensively in the economics literature and, as discussed below, using this methodology significantly alters the estimated impact of effort on satisfaction. Indeed, the results argue for a "sorting" interpretation whereby patients more likely to report high levels of satisfaction are also more likely to visit doctors who exert greater effort; effort in and of itself has no impact on reported satisfaction. The remainder of the paper is as follows. Section II introduces the data and the empirical framework. Section III presents the results and Section IV discusses the conclusions and caveats. Section II: Country Context, Data and Empirical Framework The Country Context Paraguay is a land-locked country in the center of South America with a fast-growing population (2.48 percent) of 6.3 million people, of which 52 percent are located in urban areas. Geographically, the country is divided into 18 "departments" that also serve as organizing units for health facilities. Three main institutions provide health services in the country. These are the public sector under the Ministry of Health (MoH), the semi-public sector under the Institute of Social Provision (IPS), funded through employees and employer contributions, and the fully private sector. The Ministry of Health covers 63 percent of the population; the IPS 17 percent and the private sector a further 10 percent (the remainder is covered by a variety of providers including the police, the army and the Paraguayan Red Cross). In contrast to MoH facilities, which can be visited by any individual, both the private sector (through higher prices) and the IPS (by treating only those with formal employment) are restricted in access. 2Chang et al (2006) report a significant association between communication and satisfaction with no relationship between quality and satisfaction. One problem with this study is the sample, which consists almost entirely of older patients who may be less able to differentiate the quality of care from interpersonal interactions (Cleary et al, 1997). 3 The Ministry of Health follows a "centre-periphery" model. Smaller health posts (called "puestos de salud") provide care for the peripheral (and more rural) populations. The size and complexity of facilities increases closer to the center, where health centers ("centro de salud"), district hospitals ("hospital districtal") and central hospitals ("hospital central") are responsible for medical care. Finally, IPS facilities are predominantly in urban areas, where most of the population with regular employment lives and are similar to MoH facilities located in the same areas (World Bank 2006). The Data The data are from a survey of 292 doctors in 98 facilities in four departments of the country-- Asuncion, Cordillera, Misiones and Central. These facilities were picked randomly from a list of public facilities in these provinces administered by the MoH and the IPS. There were four different parts to the data collection. In every facility, CIRD completed a facility survey and a roster of all doctors. We then observed interactions between doctors and patients for a set of doctors randomly chosen from the roster--depending on the size of the facility, the total number of doctors observed varied between 1 and 10, yielding a total of 292 doctors who were observed at least once.3 For this sample of 292 doctors, there are 11 interactions on average for each doctor, yielding a total of 2271 doctor-patient interactions. As the patient left the clinic, a second surveyor standing outside completed an exit-interview with detailed socio-demographic information on education, wealth, age and employment; health status including self-reported health status and a health measure based on activities of daily-living as well as satisfaction with the service. Finally, we were concerned that observing the doctor-patient interaction would itself change the way that the doctor behaved--this is the well-known Hawthorne effect, discussed for instance in Leonard (2006). To gauge the extent to which this happened, patients who had visited the facility in the last week were surveyed at home and completed a questionnaire similar to that administered in the exit survey. This part of the survey was not completed for all the facilities, but for a sub-sample of 64 facilities administered by the MOH. The households surveyed were predominately within a 15 minute walk from the facility. Although the patients interviewed through the exit survey are not the same as the patients interviewed at home, it is likely that the two samples are comparable: Since we chose the day of the facility visit randomly, there is no a priori reason to believe that the patients interviewed at home are very different from those interviewed at the facility, once we restrict attention to those who came to the clinic from within a 15-minute radius. Table 1 summarizes basic characteristics of doctors, patients, and reported satisfaction for the full sample (Table 1, Column A), the sub-sample of facilities where the additional household survey was administered (Table 1, Column B) and for respondents from the household survey itself (Table 1, Column C). On average, patients visiting the facilities are in their mid-thirties (parents were interviewed when the patient was a child), report at least primary education and just under 20 percent report arriving in very good health (SRHS). The doctors who see these patients tend to be female (55 percent), middle-aged and with an average of 17 years of experience on the job. As in most other surveys, patients report a high level of satisfaction: Although the question on satisfaction allowed for 5 options ranging from "very dissatisfied" to "very satisfied", 90 percent responded that they were either "satisfied" or "very satisfied", a finding that is standard in the literature (Williams, B et all 1998). For our analysis, we restrict ourselves to two options--"very satisfied" and "others"; for the full sample, 59 percent fall in the former category.4 3A small number of doctors (42) were observed in multiple facilities as part of a study trying to understand the impact of contracting regimes on performance. These doctors were over-sampled from the random population and their inclusion (or exclusion) does not alter the results. 4This categorization does not affect our results; in alternative specifications using 4 possible values for satisfaction and an ordinal probit specification, we obtain very similar relationships. Collapsing the 4 4 There are few significant differences between the sample of patients in all facilities and in the facilities where we also completed the household survey, although there are differences between patients interviewed at home and those interviewed through exit surveys at the facility. The former tend to live closer to the facility, are older and more likely to be female. These differences are partially a result of the distance-restriction implicit in the survey design; the first because we only interviewed those who lived close to the facility, and the second because older patients are less likely to come from further away. Once we restrict ourselves to patients who come from similar distances, the differences between the two samples diminish, but patients interviewed at home are 20 percent more likely to be female and there are small but significant differences in the health status, with those at home reporting slightly worse health (Appendix Table 1). In sharp contrast to these small differences, there is a large and significant drop of 23 percentage points in reported satisfaction in the household survey. Table 2 presents summary statistics for the observed interactions between doctors and patients. For every interaction, information volunteered by the patient (number of days sick and presenting symptoms) and the tasks that the doctor completed were noted. The latter included the time spent with the patient, the number of questions asked and the number of type of examinations performed. In addition, we also noted some aspects of the information that the doctor gave to the patient. These characteristics of the interaction--the time spent, the number of questions asked, and the number of physical examinations performed--were combined into a single "effort-index" using principal components analysis. The effort index is standardized with a mean of zero and a variance of one. On average, the effort exerted by the doctors is higher in Paraguay than in lower income countries, and is comparable to that observed in European countries (Table 2). Moreover, there are large differences in the nature of the interaction between doctors at different levels of the effort index. Doctors in the lowest tercile of the index spend around a third time, ask half as many questions, and perform a third as many examinations as those in the highest tercile. An increase of one standard-deviation in effort corresponds roughly to four extra minutes, three extra questions asked and 0.2 extra examinations performed by the doctor. A second basic characterization of the effort-index and satisfaction variable is a decomposition of the total variation in these two variables into variation within and variation across doctors. That is, we ask whether most variation in effort and satisfaction is due to differences "across" doctors, so that some doctors always put in more effort or "within" doctors so that most doctors put in less effort with some patients and more with others. If some doctors consistently put in more effort and others less, variation across doctors will account for most of the overall variation in the data. ANOVA (Analysis of Variance) decomposition suggests that the variation is roughly equivalent across the two categories--58 percent of the variation in effort and 28 percent of the variation in satisfaction is across doctors while 42 percent (effort) and 72 percent (satisfaction) is "within" doctors. The substantial variation "within" doctors is reassuring and presages a promising empirical strategy--by focusing on the variation in effort and satisfaction "within" doctors, we can eliminate potential estimation biases arising from systematic variation in effort and satisfaction across doctors. We turn to this next. values into two allows us to present simpler probit specifications that can then be compared to the results from the linear probability model. This comparison is useful since in later specifications, we estimate fixed effect models only in the linear form. 5 Section II.1. Empirical Framework The Empirical Model The empirical framework relates patient satisfaction to the effort exerted by the doctor in the interaction. Since most of the variation in satisfaction is between those who report being "very- satisfied" and those who report being "satisfied", we retain the binary classification of Table 1 in the econometric analysis. There is some evidence that this distinction resonates with patient's qualitative assessment of care and that the difference between the two categories corresponds to a very clear distinction in the patient's feelings about the interaction (Collins and O'Cathian, 2003). The dependent variable, Sij, the satisfaction reported by patient i in her interaction with doctor j, is therefore coded as: Sij = 0 if reported satisfaction "very satisfied" and Sij = 1 if reported satisfaction = "very satisfied" Since the use of fixed-effect estimations leads to substantially different results, two simple tables highlight the main contribution of this methodology. Table 3a and 3b show the (hypothetical) effort exerted by Doctors A and B in their (hypothetical) interactions with four patients. Doctors can either exert high effort (1) or low effort (0.5); likewise patients can report either high satisfaction (1) or low satisfaction (0). In our example, Doctor A generically exerts greater effort while Doctor B exerts less. In Table 3a, there is a strong association between effort and satisfaction across the two doctors but there is no association between effort and satisfaction "within" doctors--that is, neither for Doctor A nor for Doctor B do patients report more/less satisfaction when each doctor exerts more or less effort. The association between effort and satisfaction is driven by the relatively greater use of Doctor B by patients who always report low satisfaction and of Doctor A by patients who always report high satisfaction. In contrast, Table 3b depicts the situation where there is both a correlation across doctors and "within" doctors. In a normal OLS regression both Tables 3a and 3b would show the same positive correlation between effort and satisfaction due to the correlation across doctors. A fixed-effect specification would reveal that there is no correlation between effort and satisfaction within each doctor. One requirement for "within" doctor estimations of the kind in Table 3b is that there is substantial variation in effort for every doctor; indeed, the regression with a "within" doctor estimator, which is akin to including a separate dummy variable for every doctor, will not be estimable when every doctor exerts the same effort for every patient. In this case, the doctor fixed-effects will be fully collinear with effort; fortunately, as we saw above, close to 50 percent of the variation in total effort is within rather than across doctors. The estimation strategy can be presented formally as follows. The multivariate regression framework is given by an equation of the form: Sij = + Effortij + Xj + Xi + (j + i + ij) [Equation 1] where Effortij is the effort for doctor j in her interaction with patient i, Xj are observed doctor characteristics (such as age and gender), Xi are observed patient characteristics (such as wealth, education and health-status) and the terms in the parenthesis (.) are doctor and patient characteristics that we did not measure but are correlated with reported satisfaction. The j corresponds to unmeasured intrinsic attributes of the doctor (for example courtesy); i are unmeasured 6 characteristics of the patient (for example expectations) and ij is an idiosyncratic error associated with the interaction. The estimated impact of effort on satisfaction, , is biased if unmeasured characteristics of doctors, such as courtesy, that are correlated with satisfaction are also correlated with effort so that cov(j,Effortij) 0 or if unmeasured characteristics of patients, such as expectations, that are correlated with satisfaction are also correlated with effort (cov(i,Effortij) 0). A fixed-effect specification compares reported satisfaction across patients visiting the same doctor and therefore addresses the first concern of unmeasured characteristics of the doctor. Suppose we observe two patients visiting the same doctor, so that: S1j = + Effortij + Xj + X1 + (j + 1 + 1j) [Patient 1] S2j = + Effortij + Xj + X2 + (j + 2 + 2j) [Patient 2] Subtracting the second equation from the first: Sij = Effortij +Xi + (I + ij) [Equation 2] Equation 2, (the "fixed-effect" specification) removes bias associated with unobserved doctor characteristics that may be correlated both to effort and to reported satisfaction (j is eliminated in the difference as are all Xj variables. Note that the estimated is still biased if unmeasured patient characteristics that are correlated with satisfaction are also correlated with effort (cov(I,Effortij) 0). This may be the case, for instance, if doctors also put in greater effort for patients who are harder to satisfy. We address this concern by including a rich set of patient-level variables; below we also discuss results from a potential instrumental variables specification with similar results.5 The second basic empirical relationship we are interested in is the difference in reported satisfaction between patients interviewed at home and patients interviewed outside the facility, because of the potential Hawthorne effect of observing doctor-patient interactions in the facility. For both sets of patients, the data include reported satisfaction and the same set of patient characteristics, but for those interviewed at home, there are no measures of the effort-index.6 We thus implement the equivalent of Equation 2, comparing those who came to the same facility (thus eliminating facility- level characteristics) and focusing on the difference between those interviewed at home and those interviewed at the facility. Si = Data_Typei +Xi + (i + ij) [Equation 3] The main concern with estimations based on [Equation 3] is that the sample of patients interviewed at home may be very different from those interviewed at the clinic. As Table 1 and Appendix Table 1 show there are some differences in characteristics of patients in the two samples. The econometric analysis addresses this by using propensity-score matching techniques to exclude all individuals without matching counterparts in both samples and re-estimating Equation 3. That is, we compute 5 Instrumental variables estimators use a variable (the "instrumental variable") that is correlated to the variable of interest (doctor effort), but is uncorrelated with the dependent variable (satisfaction). We argue below that one possibility instrument variable is the order in which patients arrive, since doctors exert less effort with patients who arrive later in the day. 6 In the pilot phase of the survey, the recollections of patients interviewed at home were not accurate enough to reconstruct the effort-index. That is, patients did not know the time the doctor had spent or the number of questions asked, or the number of examinations performed. 7 the likelihood (propensity-score) of being in the household dataset as a function of observed individual characteristics, and drop those who look "very different" (in a sense made precise below) from the estimation. Section III: Results Effort and Satisfaction Table 4 presents a series of specifications based on Equations 1 and 2. Columns (1) to (4) present the estimations arising from Equation (1), and Columns (5) and (6) present the fixed-effects estimate from Equation (2), where unobserved provider characteristics are removed by comparing differences across patients for the same doctor. The bivariate correlation between doctor-effort and patient satisfaction is highly significant and not affected by the choice of the functional form. In the linear (OLS) specification, a one-standard deviation increase in effort leads to a 7 percentage point increase in patients reporting that they were very satisfied with the interaction; the point-estimate is virtually the same in the non-linear probit specification, which explicitly accounts for the limited dependent nature of the satisfaction variable (satisfaction is either 0 or 1) using a normally distributed error term. Column 3 adds in observed patient characteristics as additional controls. The estimated coefficient on effort retains both its size and significance. Interestingly, the wealth, health (measured by 5 questions related to activities of daily living) and education level of the patient are unrelated to satisfaction, while age, gender and self reported health status are significant at usual levels of confidence. Elder and male patients tend to report higher satisfaction and those who report that their health status is excellent are also more satisfied. These correlations agree with a large number of studies that find higher satisfaction levels among the elderly and among those in relatively better health (Crow et all 2002). Adding in observed doctor characteristics (Column 4, Table 4) again has no effect on the reported effect of effort on satisfaction. The age and experience of the doctor has no independent effect on satisfaction, while patients tend to report lower satisfaction for doctors who are male or in urban areas. The interaction between the patient's and the doctor's gender is highly significant and large relative to the impact of effort on satisfaction. Patients in an urban area are 15 percentage points more likely to be "very satisfied"; this increase corresponds to a 2.5 standard-deviation increase in the doctor's effort. Finally, there is some indication that patients report higher satisfaction when an index of treatment, measuring actions related to doctor-patient communication, is higher. Columns 5 and 6 are the equivalent to Columns 1, and 4 in a fixed-effect specification (Equation 2). There is a large drop in the size and significance of the effect of effort on satisfaction. Once unobserved doctor characteristics are accounted for, a one standard-deviation increase in effort leads to only a one percentage point increase in the probability of being "very satisfied" with the interaction, an increase that is no longer significant. As before, most patient characteristics are uncorrelated with satisfaction, although older and patients with excellent self reported health status continue to report higher satisfaction levels. Thus, while patient characteristics are associated with satisfaction in the same manner as before, there is no longer any observed relationship between patient satisfaction and doctor effort, once we compare satisfaction reports by different patients for the same doctor. This situation corresponds to the hypothetical example discussed in Table 3a. Satisfaction at home and at the facility Table 5 compares satisfaction reported by patients interviewed at home and those interviewed at the facility. Columns 1 and 2 report the bivariate correlation between the data type (home or clinic) and satisfaction in OLS and probit specifications; Column 3 introduces the full set of patient controls; Column 4 repeats Column 3 in a fixed-effect specification. Columns 5 and 6 "trim" the sample by 8 excluding individuals who do not have common support with the sample of patients observed in the facility.7 There are three basic messages. First, the relationship between satisfaction and individual characteristics is similar to that in the exit-survey: Most individual characteristics are insignificant but, as in Table 4, older patients and those who report excellent health status report higher satisfaction. In addition, those who travel farther report lower satisfaction. Second, being interviewed at the clinic increases reported satisfaction by 22 to 27 percentage points, a result that is highly significant at the 1 percent level of confidence and stable across all specifications. Being interviewed at home, in terms of satisfaction, is similar to the effect of a doctor putting in 7 standard-deviations less effort in the clinic (this is the difference between the doctor spending 1 minute and 30 minutes with the patient). Third, there are no significant interaction effects between doctor and patient characteristics, so that the drop in satisfaction is similar across all patient and doctor characteristics; it is not the case that richer or healthier patients report differential drops in reported satisfaction across the household and exit surveys. Section III.1. Further Results and Some Speculation The large differences between the reported satisfaction of patients interviewed at home and those interviewed in the clinic raises the possibility that the difference was induced by the nature of the study--since we observed doctor-patient interactions for patients interviewed at the clinic, perhaps doctors put in higher effort levels than they normally would have. In this case, the difference reflects intrinsic differences in doctor's effort. Leonard's (2006) discussion of the Hawthorne effect in a parallel study in Tanzania points to this possibility; his study also suggests that the Hawthorne effect dissipates over time, so that interactions observed later on may be more similar to the "actual" effort put in by a doctor when s/he is not being observed. Figure 1 indicates that patient order is important, suggesting either that doctors get used to the observers over time or that (independent of whether they are observed or not), effort naturally declines over time. Here, we plot the non-parametric relationship between the doctor's effort and the interaction order, where the 1st interaction we observed is coded as order = 1 and the 20th as order = 20. Since non-parametric plots are sensitive to the number of observations, we also overlay the total number of observations we have for every patient order; the percentages are labeled on the right- hand axis. Doctor effort is higher for the first interaction and drops off for later patients, steeply at first and gradually later on. A regression of doctor effort on patient order is highly significant in the linear and quadratic form, with and without additional patient controls and with and without patient fixed- effects (Columns 1-3, Table 6). However, the absolute magnitude of the drop-off is too small to potentially explain the difference between household and exit surveys. The estimated drop-off from the 1st to the 20th patient is around 0.25 standard-deviations in effort, whereas to account for the 22 percentage point difference requires a difference of 7 standard-deviations or 28 times what is actually observed in the data. Equally troubling is that there is no reported decline in patient satisfaction with the order of the interaction in the exit survey (Columns 3-6, Table 6). Thus, although doctor-effort declines significantly with patient- order, patient-satisfaction itself remains unaffected. Returning to Equation 2, the estimated 7The trimming is done by first fitting the likelihood of participation in either sample as a function of observed patient characteristics using a logistic regression model. Individuals with greater than a 90 percent or lower than a 10 percent probability of participation in the household sample (and who therefore do not have counterparts in the facility sample) are then excluded from the estimation. These results are not sensitive to alternative cutoffs for trimming the sample. 9 coefficient of effort on satisfaction remains biased in the fixed-effect specification if cov(I,Effortij) 0, a possibility that arises, for instance, if doctors put in more effort for patients who are more likely to report being "very satisfied". One way to address such a bias is to look for exogenous variation in effort which is unrelated to the characteristics of the patient. If patient-order is indeed unrelated to patient characteristics, the true effect of effort on satisfaction would be given by the instrumental variables specification, where patient-order is used as an instrument for doctor-effort.8 The regression estimate of patient-order on satisfaction, which is the "reduced-form" specification or the numerator of the instrumental variables estimate, shows that the instrumental variables estimate is no different from zero, or that doctor effort has no impact on patient satisfaction. We discuss below why this may be too strong a conclusion, but note that this may indeed be a distinct possibility for this dataset. A second explanation for the difference in satisfaction between the household and facility sample could be due to systematic differences in the interviewed sample. One possibility is that those interviewed at home were more likely to be the patients who did not benefit from the treatment given in the previous week (and were therefore at home when we visited them). There is an indication that this could be part of the explanation, since reported health status is lower among the household sample, even after controlling for distance from the facility. However, this difference in the health status is very small (to the order of 0.03 standard deviations) and there is no change in the estimated impact on satisfaction of interview location with or without health-status variables in the regression. If biases arising from sample selection are then ruled out, it must be either that satisfaction decreases as we move further from the date of the interaction or individuals report lower levels of satisfaction at home than at the clinic. The literature reports both sorts of results, although there are also studies that have emerged at the opposite conclusion (Crow et. all 2002; Jackson et. all 2003). Section IV: Conclusion, Caveats and a Brief Discussion The main aim of this project was to ascertain whether patient satisfaction responded to a measure of doctor performance that can be altered in the short-run. If so, satisfaction can be used to provide incentives and/or increase the accountability of service providers. Our primary contribution is the simultaneous measurement of doctor effort and patient satisfaction and the use of standard empirical economic tools in large samples to address concerns arising from unobserved heterogeneity in doctor-patient interactions. We feel that these tools add to our understanding of the determinants of patient satisfaction. For instance, a naïve positive correlation between doctor effort (using consultation time) and patient satisfaction has been reported in a number of studies. The significant drop in the estimated impact of effort on satisfaction once doctor fixed-effects are included suggests that much of this relationship is driven by unobserved doctor characteristics. Overall, the results suggest that satisfaction is poorly correlated to doctor performance and in ways that are hard to fix. Several results point us to this negative conclusion. The estimated effect of doctor-effort on satisfaction is small compared to other observable characteristics of patients and doctors. The location of the facility, the age of the patient and the self reported health status and whether s/he saw a doctor of the same gender are all significantly associated with satisfaction, and are large relative to the impact of effort. Some of these are (somewhat) under the control of the doctor: If satisfaction is to be used as a measure of performance, ensuring that male patients see male doctors will increase satisfaction more than doubling the time and effort spent with the patient; whether matching doctor-patient genders has as large an effect on the quality of care is debatable. 8 We thank Ken Leonard for this suggestion. 10 If observable characteristics alone determined satisfaction, "adjusted" satisfaction measures that present the level of satisfaction after accounting for differences in these characteristics would be a reliable measure. This is the equivalent, for instance, to "value-added" measures in school- performance cards that present student test-scores after adjusting for student background and school characteristics. The results suggest however, that unobserved rather than observed characteristics are important. There is a large and significant decline in the estimated impact of effort on satisfaction in fixed-effect specifications where unobserved doctor characteristics are adequately accounted for. Furthermore, although we cannot fully account for unobserved patient characteristics, the reduced- form of the instrumental variables specification suggests that the lack of an association between satisfaction and doctor effort is robust to unobserved patient characteristics. We are somewhat reluctant to argue for this strong conclusion, since we do not know whether order itself is related to patient characteristics, or has an independent effect on satisfaction (for instance, if patients later in the order also waited more, this could affect their satisfaction level). However, in combination with the fixed-effects results, this suggests that satisfaction level is severely contaminated by the systematic matching of patients to doctors and in ways that are unobserved by the researcher. Finally, the differences in satisfaction reported by a comparable sample of patients interviewed at home compared to those interviewed in the clinic are an order of magnitude higher than between doctors who put in more/less effort. Hawthorne effects are unlikely to explain these differences; it is likely that other behavioral survey issues such as the "warm-glow" of being interviewed shortly after the interaction or the comfort of being interviewed at home rather than in the setting of the clinic are at play. In either case, there is a strong case for exercising caution in interpreting satisfaction results either over time or across settings when survey conditions were not identical. There are a number of caveats and potential modifications. In the household interview we did not record the date of the facility-visit, although we ensured that the visit to the facility was during the week previous to our visit. We therefore cannot ascertain the degree to which satisfaction drops-off with time. As discussed previously, unobserved patient characteristics could bias our results; the instrumental variables discussion suggests that the estimate in the fixed-effect specification may be too high. Finally, by design our interviews were highly structured. While this does afford us the luxury of a large sample and the use of econometric tools with some precision, a component of the survey that allowed for unstructured interviews and more qualitative information could have added to our understanding of the environment. 11 References Avis M, Bond M, Arthur A. (1997) Questioning patient satisfaction: An empirical investigation in two outpatient clinics. Social Science and Medicine. Vol. 44, No. 1, pp 85-92. Chang J.T., Hays R.D., Shekelle P.G., MacLean C.H., Solomon D.H., Reuben D.B., Roth C.P., Kamberg C.J., Adams J., Young R.T., and Wenger N.S. (2006) Patients' Global Ratings of Their Health Care Are Not Associated with the Technical Quality of Their Care. Annals Internal Medicine Vol. 144, pp 665-672. Collins, K. and O'Cathain A. (2003) The continuum of patient satisfaction--from satisfied to very satisfied . Social Science and Medicine. Vol. 57, No. 12, pp 2465-2470. Crow R, Gage H, Hampson S, Hart J, Kimber A, Storey L, et al. (2002) The measurement of satisfaction with healthcare: implications for practice from a systematic review of the literature. Health Technology Assessment; 6(32). Das, J. and Hammer, J. (2006). "Money for Nothing: The Dire Straits of Medical Practice." Delhi, India." Journal of Development Economics, forthcoming. Das, J and Sohnesen, T. P. (2005). Practice-Quality Variation in Paraguay. The World Bank. Processed. Davies A, Ware JJ. Involving consumers in quality of care assessment. Health Affairs 1988;7:33­48. Deveugele M, Derese A, van den Brink-Muinen A, Bensing J, De Maeseneer J. (2002) Consultation length in general practice: cross sectional study in six European countries. British Medical Journal. Aug 31;325(7362):472. Eriksen L. (1986) Patient. Satisfaction: An Indicator of Nursing Care Quality. Nursing management Jul;18(7):31-5. Hogerzeil, H., Bimo, D., Ross-Degnan, D.G., Laing, R.O., Orofi-Adjei, D., Santoso, B. and Chowdhury, A.K. (1993). Field tests for rational drug use in twelve developing countries." Lancet 342(8884): 1408-1410. Jackson J, Chamberlain J, Kroenke K. (2001). Predictors of patient satisfaction. Social Science and Medicine ;52:609­20. Leonard, K, Melkiory C. Masatu and A. Vialou. (2005). "Getting Doctors to do their Best: Ability, Altruism and Incentives." University of Maryland. Processed. Leonard , K (2006). Exploiting the Hawthorne effect to test for information asymmetries in health care. University of Maryland. Processed. Williams B, Coyle J, Healy D. (1998) The Meaning of patient satisfaction: An exploration of high reported levels. Social Science and Medicine. Vol. 47, No. 9, pp 1351-1359 Williams S, Weinman J, Dale J. (1998) Doctor­patient communication and patient satisfaction: a review. Family Practice;15:480­92. World Bank (2006). Health Service Delivery in Paraguay : A Review of Quality of Care and Policies on Human Resources and User Fees. Report No. 33416-PY 12 Figures and Tables Table 1 - Means and Standard deviations of Patient and Doctors Characteristics across Samples Complete Sample Matched Sample Household Difference and p- Difference and p- (1) (2) Sample value value (3) (1­ 2) (2 ­ 3) Patient Characteristics Asset index 0.03 (0.92) -0.11 (0.87) -0.13 (0.92) 0.15 (0.00) -0.01 (0.77) Health index 0.01 (0.98) 0.01 (0.99) -0.04 (0.87) 0.00 (0.95) -0.05 (0.28) Distance to facility in minutes 31.41 (42.94) 23.25 (26.34) 9.29 (7.82) 8.20 (0.00) -13.96 (0.00) Age of patient1 35.25 (14.98) 33.48 ( 14.35) 38.65 (15.83) 1.77 (0.00) 5.17 (0.00) Patient is Male 0.34 (0.47) 0.34 (0.47) 0.13 (0.33) -0.01 (0.76) -0.21 (0.00) Primary education or higher 0.82 (0.39) 0.79 (0.41) 0.75 (0.43) 0.03 (0.04) -0.03 (0.11) dummy Household size 5.33 (2.36) 5.49 (2.41) 5.08 (2.15) -0.16 (0.06) -0.41 (0.00) Excellent self reported health 0.19 (0.39) 0.18 (0.38) 0.13 (0.34) 0.01 (0.40) -0.05 (0.01) status dummy Doctor Characteristics Doctor is male 0.45 (0.50) 0.45 (0.50) 0.01 (0.64) Experience in years 17.29 (7.61) 16.54 (7.61) 0.75 (0.01) Age of Doctor 43.79 (8.33) 43.35 (8.50) 0.45 (0.14) Measure of effort and satisfaction Number of examinations by 2.73 (1.96) 3.19 (1.92) -0.46 (0.00) doctor Number of questions asked by 8.02 (5.26) 8.17 (5.28) -0.15 (0.43) doctor Time spent in interaction 8.01 (4.81) 8.16 (5.17) -0.15 (0.40) Standardized index of doctor 0.00 (1.00) 0.11 (1.02) -0.11 (0.00) effort Satisfaction dummy 0.59 (0.49) 0.57 (0.50) 0.35 (0.48) -0.02 (0.27) -0.22 (0.00) Observations 2271 1190 582 Notes: Standard errors in (parentheses). Column (1) uses the entire sample, Column (2) only those facilities from where patients were also interviewed at home. Column (3) presents the attributes of patients interviewed at home; for these patients, we do not know the characteristics of the doctors who they interacted with. The index of physicians' effort is a composite index of time spent with patients, number of examinations and total number of questions asked. The effort index is normalized to be mean 0 and variance 1. The asset index is based on the family's access to water, sanitation at home and ownership of 16 durable assets. The health index is a composite index based on 5 questions related to Activities of Daily Living (ADL) where the patients were asked about their ability to lift heavy objects, carry bags home from the market, take the stairs, kneel down, and walk one kilometer. 1) For patients younger than 15 we record the age of the parent, while for patients older than 15 the age of patient. 13 Table 2 ­ Mean Physician Effort in Paraguay and Internationally The effort Minutes spent Number of Number of Poly- Paraguay Index with patient questions asked examinations pharmacy5 Physicians in first tercile of effort index -0.95 4.55 4.87 1.42 1.32 Physicians in second tercile of effort index -0.14 7.17 7.24 2.96 1.52 Physicians in third tercile of 1.09 12.31 11.97 3.81 effort index 1.79 All Physicians 0.00 8.01 8.02 2.73 1.54 International comparison Delhia1 3.80 3.20 N.A 2.63 Tanzania2 (2003) 6.95 3.57 N.A N.A Tanzania3 (1991) 3.0 N.A N.A 2.2 Nigeria3 6.3 N.A N.A 2.8 Malawi3 2.3 N.A N.A 1.8 Germany4 7.6 (4.3) N.A N.A N.A United Kingdom4 9.4 N.A N.A N.A 1Das and Hammer (2005). : Ken Leonard, Private Communication : Hogelzeil et al (1993) : Deveugele M et al (2002)5 Total 2 3 4 number of medicines given. Cells where no data is available are labeled N.A. (Not Available). 14 Table 3a ­ Positive Association between Effort and Satisfaction Across Doctors - Not Within Doctors Doctor A Doctor B Effort Satisfaction Effort Satisfaction Patient 1 1.0 1 0.5 0 Patient 2 1.0 1 0.5 0 Patient 3 1.0 1 0.5 0 Patient 4 0.5 1 1.0 0 Table 3b ­ Positive Association between Effort and Satisfaction Across and Within Doctors Doctor A Doctor B Effort Satisfaction Effort Satisfaction Patient 1 1.0 1 0.5 0 Patient 2 1.0 1 0.5 0 Patient 3 1.0 1 0.5 0 Patient 4 0.5 0 1.0 1 15 Table 4 - Determinants of Satisfaction (1) (2) (3) (4) (5) (6) Estimation Method OLS Probit OLS OLS Fixed Effect Fixed Effect Dependent Variable Patient Satisfaction Physicians & Patients Standardized effort 0.067 0.071 0.071 0.054 0.012 0.007 [5.65]a [5.26]a [5.93]a [4.63]a [0.83] [0.46] Treatment index 0.078 0.024 [5.30]a [1.20] Interaction male dummies 0.136 [2.79]a Patients Asset index 0.003 0.009 0.01 [0.24] [0.72] [0.82] Age of interviewed 0.006 0.007 0.009 [1.89]c [2.09]b [2.83]a Patient is Male 0.046 -0.004 0.046 [1.79]c [0.12] [1.83]c Distance to facility in 0.00 0.00 0.00 minutes [0.29] [0.13] [0.17] Squared distance to facility 0.00 0.00 0.00 in minutes [1.79]c [1.45] [1.00] Excellent self reported 0.235 0.224 0.151 health dummy [8.49]a [8.40]a [5.21]a Primary education or -0.02 -0.024 -0.025 higher dummy [0.67] [0.82] [0.81] Health index 0.001 -0.001 0.017 [0.11] [0.10] [1.35] Physicians & Facilities Doctor is Male -0.061 [1.81]c Age of doctors 0.002 [0.65] Urban Facility 0.144 [2.27]b IPS Administered Facility -0.05 [0.85] Constant 0.588 0.383 0.278 0.588 0.337 [39.17]a [4.63]a [2.00]b [62.60]a [4.14]a Observations 2271 2271 2172 2111 2271 2172 R-squared 0.02 0.08 0.1 0.00 0.05 Number of physicians fixed effects 292 291 Notes: a,b and c denote significance at 1%, 5% and 10% levels of confidence respectively. Reported coefficients in Probit regressions are marginal effects calculated at the mean of the sample and robust t statistics are presented in parenthesis. Additional controls include 8 dummies for presenting symptoms, an indicator for consultations paid for by a third party, household size and age-squared. The doctor's experience and whether the doctor holds a specialization are also controlled for. The index of physicians' effort is a composite index of time spent with patients, number of examinations and total number of questions asked normalized to be mean 0 and variance 1. The treatment index is a composite index for 5 actions taken. The actions recorded were; advised additional examinations, treated patient during consultation, wrote documents, gave verbal instructions and arranged follow up consultation. The interaction variable for male dummies is a dummy for a male patient seeing a male doctor. The asset index is based on the family's access to water, sanitation at home and ownership of 16 durable assets. The health index is a composite index based on 5 questions related to Activities of Daily Living (ADL) where the patients were asked about their ability to lift heavy objects carry bags home from the market, take the stairs, kneel down, and walk one kilometer. IPS is a semi-public sector health provider funded through employees and employer contributions. 16 Table 5 ­ Patient Satisfaction at Home and at Facility (1) (2) (3) (4) (5) (6) Estimation Method OLS Probit OLS Fixed OLS1 Fixed effects (Trimmed) effects1 (Trimmed) Dependent Variable Patient Satisfaction Facility sample dummy 0.222 0.222 0.253 0.262 0.269 0.269 [7.84]a [7.59]a [9.18]a [9.07]a [8.57]a [8.74]a Interaction between sample 0.018 0.029 0.000 0.006 dummy and asset index [0.64] [1.03] [0.00] [0.19] Interaction between sample 0.041 0.026 0.04 0.032 dummy and health index [1.22] [0.94] [1.15] [1.01] Patients Asset index -0.012 -0.022 -0.005 -0.013 [0.55] [0.96] [0.21] [0.55] Age of interviewed 0.008 0.01 0.003 0.005 [1.94]c [2.55]b [0.71] [1.11] Age squared 0.00 0.00 0.00 0.00 [0.86] [1.64] [0.42] [0.30] Male dummy 0.021 0.006 -0.011 -0.008 [0.73] [0.22] [0.29] [0.20] Household Size 0.00 0.001 0.001 0.003 [0.10] [0.26] [0.13] [0.54] Distance to facility in minutes -0.002 -0.002 -0.01 -0.007 [2.28]b [1.84]c [2.16]b [1.74]a Squared distance to facility in 0.00 0.00 0.00 0.00 minutes [3.53]a [1.90]c [1.97]c [1.84]c Excellent self reported health 0.264 0.228 0.282 0.233 dummy [6.32]a [7.00]a [6.98]a [5.87]a Third party paid for -0.037 -0.031 -0.012 -0.115 consultation dummy [0.62] [0.49] [2.09]b [1.45] Primary or higher education -0.021 -0.025 0.006 0.00 dummy [0.61] [0.79] [0.14] [0.01] Health index -0.011 0.015 -0.001 0.018 [0.43] [0.61] [0.02] [0.67] Constant 0.347 0.119 0.091 0.222 0.18 [13.88]a [1.33] [1.04] [1.95]c [1.71]c Observations 1772 1772 1689 1689 1227 1227 R-squared 0.04 0.1 0.09 0.13 0.12 Number of facilities fixed effect 64 64 Notes: a,b and c denote significance at 1%, 5% and 10% levels of confidence respectively. Reported coefficients in Probit regressions are marginal effects calculated at the mean of the sample and robust t statistics are presented in parenthesis. The asset index is based on the family's access to water, sanitation at home and ownership of 16 different assets. The health index is a composite index based on 5 questions related to Activities of Daily Living (ADL) where the patients were asked about their ability to lift heavy objects, carry bags home from the market, take the stairs, kneel down, and walk one kilometer. In Column (6) observations are restricted to matched observations defined as those with a likelihood of inclusion in the household survey between 0.1 and 0.9. This excludes the observations that are most unlikely to have a matching observation in the data based on patient characteristics. 17 Figure 1 - Doctor Effort Decreases with Patient-Order .2 .1 .08 .1 rt Effo .06 Density Doctor0 .04 .02 -.1 0 0 5 10 15 20 Relationship between doctor Distribution of patients by effort and order of patient the order seen by doctor The non-parametric curve showing the relationship between doctors effort and order of patients is a smoothed value given by a local weighted linear least squares regression. This allows us to show the relationship between doctor effort and patient order without assuming a particular functional form. Since this non-parametric plot is sensitive to the number of observations, the distribution of observations is overlaid and shown on the right axis. 18 Table 6 ­ Relationship between Patient Order, Doctor Effort and Patient Satisfaction (1) (2) (3) (4) (5) (6) Estimation method OLS OLS Fixed OLS OLS Fixed Effect Effect Dependent Variable Doctor Effort Patient Satisfaction Order of patients -0.053 -0.05 -0.035 -0.01 -0.007 -0.004 [2.86]a [3.05]a [2.92]a [1.24] [0.98] [0.56] Order of patients squared 0.002 0.002 0.001 0.001 0.001 0.00 [1.76]c [1.88]c [1.69]c [1.33] [1.25] [0.97] Asset index -0.063 -0.029 -0.001 0.01 [2.27]b [1.53] [0.10] [0.79] Age of interviewed 0.001 0.003 0.006 0.01 [0.11] [0.66] [1.81]c [2.90]a Age of interviewed squared 0.00 0.00 0.00 0.00 [0.16] [0.59] [0.52] [1.42] Male dummy -0.075 -0.011 0.042 0.048 [1.27] [0.28] [1.59] [1.90]c Primary or higher education dummy -0.024 0.004 -0.021 -0.026 [0.31] [0.09] [0.68] [0.85] Distance to facility in minutes 0.00 0.00 0.00 0.00 [0.34] [0.38] [0.10] [0.17] Squared distance to facility in 0.00 0.00 0.00 0.00 minutes [0.16] [0.12] [1.52] [0.99] Excellent self reported health 0.046 -0.038 0.234 0.149 dummy [0.72] [0.83] [8.28]a [5.15]a Health index 0.028 0.033 0.003 0.017 [1.04] [1.72]c [0.22] [1.37] Somebody ells paid for consultation -0.295 -0.022 -0.07 -0.2 dummy [2.80]a [0.15] [1.79]c [2.06]b Referred patient -0.057 0.086 0.043 -0.041 [0.54] [1.19] [1.03] [0.88] Constant 0.217 0.135 -0.043 0.613 0.379 0.348 [2.61]b [0.79] [0.33] [20.47]a [4.52]a [4.19]a Observations 2271 2172 2172 2271 2172 2172 R-squared 0.01 0.07 0.05 0.00 0.06 0.05 Number of doctor fixed effects 291 291 Notes: a,b and c denote significance at 1%, 5% and 10% levels of confidence respectively and robust t statistics are presented in parenthesis. The index of physicians' effort is a composite index of time spent with patients, number of examinations and total number of questions asked. The effort index is normalized. The health index is a composite index based on 5 questions related to Activities of Daily Living (ADL) where the patients were asked about their ability to lift heavy objects, carry bags home from the market, take the stairs, kneel down, and walk one kilometer. The asset index is based on the family's access to water, sanitation at home and ownership of 16 different assets. 19 Appendix Tables Appendix Table 1 ­ Difference in Sample (1) (2) (3) (4) (6) Estimation Method Probit OLS Probit OLS Fixed Effect Dependent Variable Indicator Variable for whether the patient was observed in the facility Asset index 0.001 0.001 0.036 0.03 -0.009 [0.05] [0.06] [1.99]c [1.71]c [0.71] Health index 0.071 0.065 0.05 0.048 0.033 [4.11]a [4.32]a [3.11]a [3.47]a [2.77]a Age of interviewed -0.007 -0.006 -0.005 -0.005 -0.003 [4.97]a [4.99]a [3.76]a [4.26]a [4.20]a Primary education or higher dummy -0.039 -0.035 -0.004 -0.015 -0.026 [1.00] [0.94] [0.12] [0.46] [0.93] Male dummy 0.25 0.242 0.213 0.213 0.206 [7.53]a [7.75]a [6.87]a [8.18]a [8.82]a Household Size 0.014 0.011 0.009 0.009 0.008 [2.49]b [2.32]b [1.70]c [1.95]c [1.81] Excellent self reported health status 0.053 0.047 0.05 0.043 0.054 [1.33] [1.30] [1.20] [1.20] [1.89]c Distance to facility in minutes 0.02 0.011 0.01 [7.13]a [11.24]a [11.87]a Squared distance to facility in minutes 0.000 0.000 0.000 [6.83]a [6.45]a [8.01]a Constant 0.778 0.554 0.54 [9.64]a [6.89]a [10.38]a Observations 1712 1712 1689 1689 1689 R-squared 0.09 0.20 0.16 Number of facilities 64 Notes: This table compares the attributes of patients observed in the facility and those who had visited the facility in the week prior to the observation (and were therefore interviewed at home). a,b and c denote significance at 1%, 5% and 10% levels of confidence respectively. Reported coefficients in Probit regressions are marginal effects calculated at the mean of the sample and robust t statistics are presented in parenthesis. The health index is a composite index based on 5 questions related to Activities of Daily Living (ADL) where the patients were asked about their ability to lift heavy objects, carry bags home from the market, take the stairs, kneel down, and walk one kilometer. The asset index is based on the family's access to water, sanitation at home and ownership of 16 different assets. 20