WPS.1357 POLICY RESEARCH WORKING PAPER 1357 Does Participation Improve Participation by the intended "i beneficiaries caused improved Project Performance ? project performance in the rural water supply projects Establishing Causality stdied. with Subjective Data Jonathan Isham Deepa Narayan Lant Pritchett Background paper for World Development Report 1994 The World Bank Office of the Vice President DevelopmentEcononics September 1994 POLICY RESEARCH WORKING PAPER 1357 Summary findings Development practitioners are coming to a consensus Isham, Narayan, and Pritchett show methodologically that participation by the intended beneficiaries improves how to answer each of these objections. Subjectivity does project performance. But is there convincing evidence not preclude reliable cardinal measurement. Halo effects that this is true? Skeptics hay.. three objections: do not appear to induce a strong upward bias in *"Participation" is not objective: project rankings are estimating the effect of participation. Finally, subjective. instrumental variables estimation can help establish a * This subjectivity leads to "halo effects." structural cause and effect relationship between * Better project performance may have increased participation and project performance - at least in the beneficiary participation rather than the other way rural water supply projects they studied. around; a statistical association is not proof of cause and effect. This paper ta product of the Office of the Vice President, Development Economics - is one in a series of background papers prepared for World Development Report p994 on infrastructure. Copies of this paper are available free from the World Bank, 1818 H Street NW, Washington DC 20433. Please contact Michael Geller, room t7-079, extension 31393 (38 pages)- September 1994. Pre oduce estimao P y R c Dsseminato Cen t canel oesas a devlomen isus. n bjetie f te eris s o gt heininsuctural causite anefeti arelatiothnhloip hd betwee paprs ary te ame o t athos nd hold e sedan ciearticiingy ah uiq nd erpretierforan ocuso atet the aurorsoa an soul no b atriute t th WoldBan, s xrurale war supplyto pryots h e udied. H ~ ~ ~~~~roue yte Policy Research WoknDaerSre isseminat2tefninsotokinpogrs Coenrther ecag fiesau Does Participation Improve Performance?: Empirical Evidence from Project Data Jonathan Isham Deepa Narayan Lant Pritchett Does Participation Improve Project Performance?: Empirical Evidence from Project Data IntroductiOn Development practitioners are coriing to a consensus that participation by the intended beneficiaries improves project performriice. Championed since the early 1970s by mostly non-economic social scientists and grassroots organizations (e.g. Freiere 1973, Korten 1980), participatory development is increasingly advocated by the largest and most influential aid agencies (UNDP 1993, World Bank 1991). However, the existence of consensus (or advocacy) does not imply the existence of clear and convincing evidence. Participation advocates have most often relied on case studies to document the link between participation and performance (e.g. Briscoe and Ferranti 1988, Korten and Siy 1989). However these case studies are easily dismissed by skeptics as inconclusive, as the small number of cases and informal method do not allow formal testing of the findings. In response, some studies used the systematic case study method to establish statistically the relationship between participation and project performance (Esman and Uphoff 1984, Finsterbusch and Van Wicklin I 1987). Skeptics have raised three objections to the evidence for a causal impact of participation on project performance from this type of study. First, "participation" is not objective: hence project rankings are subjective and not appropriate for statistical analysis. Second, the subjectivity in the ranking of projects will lead to "halo effects": if investigators believe participation is good, their subjective rankings will overstate the level of participation in highly successful projects and the level of success of highly participatory projects. Third, 2 better project performance may cause increased beneficiary participation rather than vice versa: a mere statistical association is not evidence of a causal impact of participation on performance. After a brief review of the construction of the data on project performance and beneficiary participation from 121 rural water supply projects and a presentation of the basic statistical results we address, and overcome, each of these three objections. The subjective nature of the data does not preclude inter-subjectively valid, cardinal measures of participation appropriate for statistical analysis. There is no iecessary connection between objectivity and quantification with cardinal numbers: cardinal measurement using subjective criteria is common. Moreover, the cardinal rankings for each project were created by two different coders and the degree of inter-coder agreement is very high. Th;s indicates that inter-subjective reliability can be achieved even for intrinsically subjective concepts like "participation." Finally, we show that assuming cardinality or imposing linearity is not necessary to establish the basic result. The "halo effect" in coding performance and participation data from project documents is addressed in two ways. First, we show the results are the same if the first coder's performance indicators are regressed on the second coder's participation scores (and vice versa), which indicates a lack of a subjective halo effect by the coders. However, the primary danger of subjective measurement, and one that we cannot address, is that the same individual who assessed project success also assessed participation. If that person has strong views about the relationship this may induce a bias in the project documents themselves, as the performance of participatory projects (and the participation in successful projects) may be exaggerated. We do however show that the strength of the performance-participation relationship does not depend on the "objectiveness" of the success indicator. 3 The third objection, and the most difficult objection to answer, is that the existence of an association does not imply causation. While causality is nearly impossible to establish, we present three arguments. First, we use instrumental variables estimation which allows the identification of the impact of exogenous changes in participation, eliminating the effect on the participation estimates of reverse causation or simultaneity. Second, data on the timing of participation shows that participation at early stages improves project performance at every stage, from implementation to maintenance. Third, we describe case studies in which exogenous changes in participation in ongoing projects had strong impacts on performance. Even for this limited set of projects.. this paper is not intended as a comprehensive account. We focus on a narrow set of econometric issues involved in drawing inferences from project data. Narayan (1994), drawing on these data and more, discusses in more detail the relationship between performance and participation, the determinants of project success besides participation, the determinants of participation, and the mechanisms whereby participation increases overall effectiveness. I) Data on rural water projects and basic results The systematic case review method (Finsterbusch 1990) is used to transform varied, qualitative evaluations of a set of related phenomena into data suitable for statistical evaluation. To illustrate this approach, consider a hypothetical study of effectiveness in higher education. A researcher wants to use statistical techniques to determine the most important factors that contribute to undergraduate learning. She does not have the time or resources to design and implement a new survey, but does have access to myriad evaluations by other 4 agencies--in our hypothetical example, evaluations of 100 American universities, drawn from six different education organizations. A set of variables to be tested are then defined--e.g., quality of the faculty, research facilities--and given a simple numerical scale. Two independent coders then read the different evaluations and, based on their subjective analysis of that information, code the level of each variable. The coder generates this numeric score based upon quantitative -nd qualitative analysis in the original report. The final set of data, usually the average of the two coder's evaluations, is then used for statistical analysis. The statistical data of the 121 rural water supply projects in this study were assembled from project documents using the systematic case review method'. Ex-post project assessments by development agencies typically combine limited quantitative evaluations with subjective judgements of project performance. Each project document was read by two independent readers. From this information, these two readers coded specific project variables (e.g., overall success of project) onto a scale from I to 7, creating 144 distinct variables. Appendix Table 1.1 shows the list of coded variables (along with the basic summary statistics). These can be divided into four groups: project performance indicators (e.g. "overall project effectiveness", "percent of water system in good condition"); * measures of participation (e.g.. "overall beneficiary participation", "participation in construction"); 'Finsterbusch and Van Wicidin I (1987) used the systematic case review approach for their study of USAID projects. While Esman and Uphoff (1979) did not exactly follow this methodology, their basic approach was similar: converting independent project evaluations to a numerical scale. 5 * background or project characteristics which determine project performance (e.g. "size of project", "availability of spare parts"); * background or project characteristics which determine participation (e.g. "extent to which agency made participation a goal", "consensus among users"). The "participation" variable merits some discussion. The measure of participation was not simply a measure of whether potential beneficiaries were surveyed about their preferences. Participation was scored on a continuum, ranging from information sharing, consultation, shared decision-making to control over decision-maldng. Participation of beneficiaries was considered at three different stages of the project cycle: in project design, construction, and operation and maintenance. 1) Basic model and results We begin with the most general indicators-overall project effectiveness (OPE) and overall beneficiary participation (OBP)-and specify and estimate a simple linear relationship between performance and participation. The bivariate relationship between OPE and OBP is simply: 6 1) OPE= *OBP + el which is functionally equivalent to the simple correlation of the two variables. The usefulness of this bivariate relationship is limited since other non-participation determinants of project performance are excluded. In expanding the model, it proves useful to divide non-participation determinants into two groups: those that are fully exogenous and not affected by participation (e.g. "availability of spare parts") denoted by a matrix Z and those which are potentially affected by participation (e.g. 'responsiveness of managers") denoted by the matrix W. The multivariate equations are then: 2) OPE= p OBP + * i w * Wi + e 3) W=y*OBPI+7y*ZI+yX* X,. Both Z's and W's are potential determinants of project performance, however Z's represent variables that are not influenced by participation, while W's represent variables which may be determined (in part) by participation. As indicated in the second equation, W's may also be In the bivariate linear regression the regression coefficient is (where a, is the covariance a of y and x) where the correlation coefficient p is . So, the correlation coefficient Gx* CFV p =J3 *(-) is a simple rescaling of the bivariate linear regression coefficient. al 7 determined in part by the Z's and by some other set of variables, X's (the non-participation determinants of the W's). In summary, these multivariate equations state that performance of a water project depends on: beneficiary participation; a set of inputs (Z) not related to participation (e.g., "adequacy of facilities", "availability of spare parts"); a set of performance determinants (W) that may in turn be determined by participation (e.g., "responsiveness of managers", appropriateness of technology) as well as by other inputs. The distinction between the Z's and the W's is important for maintaining the distinction between the partial and the total impact of participation on project performance. In the multivariate regression (equation 2), the beta coefficient gives the direct impact of increasing participation, holding all included variables constant: 4) PE aOBP 4z..rwW But participation may also influence performance indirectly: the total impact of changing participation is the sum of the direct and indirect impacts: 5) dOPE = BOPE + 8OPE aw dOBP aOBP aW 80BP or in this particular specification, 6) dOPE dOBP- 8 Thus, the simple partial coefficient with all controls understates the total impact of participation while the bivariate coefficient (which excludes the Z's and W's) overstates the impact to the extent these determinants and participation are positively correlated. For each regression we report the linear regression coefficient on OBP in the OPE for regression for each of the three specifications. Table 1 presents three estimates of the linear association of overall beneficiary participation (OBP) with overall project effectiveness (OPE): bivariate; limited multivariate (with Z's); and full multivariate (with Z's and W's). In all cases the results are strongly statistically significant and empirically quite large. The t-statistics range from 10.6 in the bivariate case to 3.8 with the full multivariate controls. The estimated impact of participation on project effectiveness ranges from .62 for the bivariate case to .24 in the full multivariate case. How are these coefficients to be interpreted? The expected impact of increasing participation from a low level (OBP=2) to a high level (OBP=6) is to improve project performance from between 1 to 2.5 points (on a seven point scale). A one standard deviation increase in participation (s.d.=1.7, Appendix table 1.1) is associated with between a .41 (full multivariate) to 1.05 (bivariate) point increase in performance. The interesting-and intuitively appealing-results of the all regressions (limited and full multivariate) are reported in Appendix table 1.2 and are discussed in Narayan (1994). The multivariate impact is naturally lower than the bivariate effect due to the exclusion of positively correlated non-participation performance determinants, as discussed above. Since the bivariate effect is a biased upward estimate of the total effect and the full multivariate estimate of the partial effect is likely biased downward for the total effect, these create reasonable bounds for the total effect. 9 Table 1: Basic results for participation (OBP) with OPE (Overall Project Effectiveness) as the dependent variable Bivariate Limited multivariate (Z Full multivariate' (Z and variables) W variables) on OBP 0.62 0.28 0.24 t-stat. 10.6 5.3 3.8 N 121 77 68 R2 0.49 0.86 0.89 Notes: a) Regression results for other variables reported in Appendix table 2.1. We note here that the choosing of the Z's and W's in this study was not entirely straightforward nor rules-driven. However, all the results have proved robust to a number of variations of the model, and we feel that the choice of control variables is not of primary interest'. We were generous in our inclusion of potential performance factors, including 18 non-participation variables. The participation variable thus easily passes this "kitchen sink" torture test of throwing all plausible variables into the regression. The danger of inadequate controls for other determinants of project performance is not nearly as serious a problem as the three we discuss below'. " We do, after all, face the difficulty of 144 coded variables with only 121 projects, which means that mechanical procedures for selecting variables will lack degrees of freedom and are unlikely to be of much help. Moreover, many of the variables are clearly overlapping and likely to be collinear. After some experimentation, we based variable inclusion on three criterion: decent inter-coder reliability, prior judgements about the best of collinear sets of variables, and impact on the estimate of participation (we never dropped any variable which seriously affected the estimate of the participation coefficient). In none of the experiments were the results on participation substantially different from the full multivariate case reported in table 1. s Heteroskedasticity, a typical econometric problem which receives a fair bit of attention (one suspects because it is easy to fix) deserves slight mehtion here. It is not a problem with the present results for two reasons. First, the White heteroskedasticity consistent standard errors are roughly the 10 II) Subjective cardinal data The first objection to studies--and results--of this kind is that the data generated by the systematic case review method are subjective. According to this skeptical view subjective data is unreliable and/or ordinal and therefore inappropriate for statistical analysis. In this section we show our data are subjective, yet reliable and cardinal. First, we argue that the automatic association of subjective phenomena with ordinal data is incorrect. Second, the degree of inter-coder agreement on the scoring of the major variables reveals that the subjective measurement error, while present, was a nainor source of variation. Third, using techniques appropriate for ordinal data do not dramatically change the results, and the constraints imposed by linear regression analysis are also not rejected by the data. Subjective and objective, ordinal and cardinal Economic theory presumes that objective phenomena (e.g. numbers of oranges, relative prices) have a natural cardinal metric (e.g. real numbers) whereas intrinsically subjective phenomena (e.g. consumer utility) allow only ordinal comparisons (e.g. better or worse) especially intersubjectively'. This distinction is critical. While both cardinal and ordinal data same as the OLS: the t-statistics with White are 11.4 vs 10.6 (bivariate) and 5.33 vs. 5.25 (full multivariate). Second, in scoring variables, each coder recorded their subjective assessment of the reliability of the score assigned. When these reliability measures were used to weight observations, the results were roughly the same. And when, because of a programming mistake, we weighted the observations by giving more weight to the less reliable observations, the coefficient point estimates were still the roughly the same. This is perversely reassuring: under the assumptions for consistency of OLS, weighted least squares estimation is also consistent for any set of weights, even the exactly wrong weights. 'Ordinality stems directly from the basic theory of mapping a binary preference relation into a utility index. The restrictions imposed on the preference relation (complete, reflexive, transitive) imply that once a given utility index is derived any monotone transform of that index equally well represents the 11 can rank phenomena, only with cardinal data can numbers be tabulated and values of the phenomena being measured be compared directly. Most common statistical techniques (like correlations or linear regression) cannot be applied to ordinal data7. However, the data used in this analysis created by the systematic case review method are subjective, yet cardinal. Our data on rural water projects is doubly subjective: the original project evaluator subjectively assessed and described the amount of participation in each project; a coder later read the evaluator's report and subjectively assigned a level of participation to that project. If this process generated ordinal data, empirical analysis would be difficult. But note that in everyday life, we observe many events which generate subjective, cardinal data. Contests are the most obvious example. When hogs, figure skaters, or bodybuilders compete judges assign cardinal scores to subjective criteria: 'quality of coat" for hogs; "artistic impression" for figure skaters; and 'muscle tone" for body builders. Grades for academic papers are another familiar example: a professor's subjective evaluation of a humanities paper is given a cardinal score. In each case, these subjectively assigned scores are added, averaged, tabulated in ways only appropriate to cardinal data. Of course, the judging and grading criteria same preferences. 1 For instance is x were an ordinal measure of participation then estimating the linear model y=p *x could produce different results than estimating y=P *J(x) , J) a monotone transform of x, even though x and Ax) would represent exactly the same information. Therefore any statistical procedure that relied on summing observations (or any comparison of the magnitude of the distance between observation) would be invalid for ordinal data. 12 are created to achieve inter-subjective consensus'. Thus, the question for this data set on characteristics of water projects is not whether the data is subjective, but whether the subjectively cardinal scores are reliable. Inter-subjective agreement Since project variables were scored from the same documents by two independent coders, the coherence of their separate scores illuminates the overall reliability of the variables. Table 2 presents two measures of the cross coder agreement. The correlations for the two major variables (column 1) are strikingly high: over .9'. The average absolute value of the difference in the scores (column 2) is quite small: most scores either agree or differ by just 1 point. For each of the two major variables, the coders disagreed by 2 or more points on only one project. Table 2: Cross coder reliability Correlation between coder Average absolute difference A and B of scores (1 to 7 scale) Overall project effectiveness .95 .36 Overall beneficiary -92 .55 participation I I This high degree of inter-subjective consensus has two important implications. It creates a prima facie case that the characteristics of the project could be reliably gauged from the project documents. The high correlation also places a relatively tight bound on the This does mean that judging requires training to achieve this level of inter-subjective agreement. For instance, there are contests for the judges in which those training to be livestock contest judges are themselves judged on the degree to which their subjective judgements conform to those of established judges (Herren, 1984). ' The reliability was much less for some other variables in the data set- 13 magnitude of measurement error: a correlation coefficient of .9 implies that the noise from measurement error is roughly 10 percent of the variance of the observed variable.'0 Testing linearity or cardinality We examine the implications of cardinality in two ways. We treat the data as if it were ordinal, using dummy variables for each participation category. We also test linearity of the relationship-which imposes even stronger conditions than cardinality. Of course, these techniques do not "prove" cardinality of the data; but they both show that the basic results on beneficiary participation are not affected either by allowing for the possibility that the data are ordinal or by our functional form assumptions. A first procedure treats both the performance and the participation data as ordinal. For both project performance and participation, a binary variable takes a value of 1 if the score is high (> 3.5) and 0 otherwise. This procedure is valid even if the data were ordinal; binary variables would be unaffected by monotone transforms". The final column of table 3 reports o That is if two observation. differ by only measurement error then the correlation between the two 2 observations is p = ax ,where a . is the variance of the "true" variable and a is the measurement error variance for coder A(B). If both coders measurement error is equal 2 2 2 2 4 a =EA = a a correlation of .9 implies the ratio of measurement error to true variance 2 is about .1. n Although this procedure is valid if the data are ordinal if the data are in fact cardinal this procedure is very inefficient-as it throws away all of the information about variation within each of the two performance categories. 14 the results of this linear probability regression2. The performance-participation effect remains evident with this transformation of the data. A second approach argues that if the participation data were in fact ordinal, one would not expect to find a linear relationship between the two variables: the "true" underlying ordinal relationship variables would not be invariant with respect to arbitrary transformations (e.g., squaring) of the data. Table 3 presents one test of linearity: allowing for a slope shift depending on the value of participation. When participation is low (<=3.5) the slope would be p while when participation is high (>3.5) the slope is P1 + p2 . Although the estimates do suggest the incremental impact of participation is larger at higher levels (slope of .466 vs .781), this difference is not statistically significant (a low t-statistic and a declining adjusted R-squared). Increases in participation have roughly the same impact along the range from low to high participation. 12 We could have used an estimator more efficient fbr binary dependent variables (such as logit or probit) but the linear probability model produces a consistent estimate which is sufficient for our purposes. 15 Table 3: Tests of linearity and functional form Linear Linear with Binary Binary variables kink variable for (both OPE and each OBP OBP)' category Participation coeff. .623 .466 .552 (t-statistic) (10.6) (3.007) (7.34) Change in slope .315 (OBP > 3.5) (1.33) OBP< = 1.5 2.55 1.5 < OBP < =2.5 3.06 2.5 < OBP < =3.5 3.59 3.5< OBP<=4.5 4.25 4.5< OBP<=5.5 5.16 5.5< OBP 5.74 .481 .480 .459 .306b R2 Note: a) The binary model is estimated as a linear probability model. b) The R-squared is not comparable between the linear regressions and the binary model. A third technique treats participation as if it were ordinal while treating performance as cardinal. Each discrete level of the participation variable is entered into the performance equation as a dummy variable: D1= 1 if OBP= 1, 0 otherwise; D2= 1 if OBP=2; etc." This functional form imposes no a priori constraints on the effect of the independent variable. The results in table 3 show a strong participation effect-performance increases for each dummy variable-without any strong indications that this statistically unconstrained fit is tremendously superior to the imposition of linearity. The incremental impact from category to 1 Although since the averages of coders responses were used, the numbers are not always integers ranges for the variables were specified to generate these dummy variables. 16 category (from the differences in the coefficients) ranges from .5 to .75, roughly equivalent to the overall linear slope of .623. Thus, the subjective nature of the data per se appears to have no impact on the result: high inter-subjective reliability of cardinal was achieved, and the results appear to be broadly consistent with a simple linear model. III) Halo effects A potentially more serious problem than the data's intrinsically subjective nature is that either the initial evaluator of the projects or the coders themselves succumbed to the plausible assumption that all good things go together: the whalo effect".4 The halo effect occurs when the measurement of the variables are affected by the observed state of the other variable. This systematic measurement error will induce an association between two subjectively measured variables even in the absence of any "true" relationship between the underlying variables. In our study, the halo effect may occur at two stages. The evaluators may have falsely attributed participation to successful projects (or vice versa) or the coders-reading the project documents searching for evidence of project participation-may have been affected by their simultaneous assessment of project success (despite their efforts to remain objective). The This psychological tendency associate all good Ihings go together has been discussed in a number of fields. In particular, there is a large literature in the human resource management about the halo effect problem in performance assessments in which outstanding performance in one dimension or characteristic (even a potentially irrelevant characteristic such as physical attractiveness) may tend to bias upwards evaluation of other dimensions or characteristics. Hammermesh and Biddle (1993) find that plain people make about 5 percent less and attractive people five percent more than persons of average attractiveness. However, for a recent dissenting view on the importance of halo effects in perfbrmance evaluation, see Murphy, Jako, and Anhalt, 1993. 17 second is particularly dangerous: in this study the two coders knew the purpose of the empirical exercise and may have had some strong prior beliefs as to the expected outcome. There is nothing we can do about the potential "halo effect" of the original evaluations. We do know that the project reports were regular parts of institutional evaluation cycle and that it is doubtful that the financing agencies had a particular stake in promoting participation. It could also be expected those many different individuals writing the project documents would have had widely different beliefs about the importance of participation so that a uniform bias in the first hand assessments would be unlikely. As for the potentially serious "halo effect" in the coding process, we explore three avenues to address the problem. Note that the results in table 2 are based on the average of the two coders assessments. Alternatively, we estimated the same models using only data from coder A and from coder B. Differences in these two assessments may reflect differences in the "halo effect" between the coders. Second, in the same models, we used coder A's assessment of the explanatory variables (including OBP) with coder B's assessment of the dependent variable (OPE). Since coder A's assessments of participation and the other potential determinants are not affected by coder B's performance assessment, this should reduce the halo effect bias (although the confounding effect of pure measurement error in the coder's assessments will be important). Third, the coders created some project performance indicators that, by their nature, are more objective than others. If they were present halo effects would be more likely for the more subjective indicators. 18 Results by and across coders Table 4 shows the regression results with the average values, only the values of coder A, only the values of coder B (columns 1, 2, and 4 respectively). The differences in the both bivariate and multivariate result are very small. Table 4 also shows the corresponding results of the cross-coder tests: column 3 show A's outputs on B's inputs; column 5 shows B's outputs on A's inputs. Again the results are substantially the same. In the bivariate and multivariate models the coefficient does not systematically change, whether we use the average of the coders' scores, each coder's own scores, or one coder's dependent variable scores on the other coder's independent variable scores. Table 4: Basic results by average coder value, coders A and B respectively; regressing overall project e4ectiveness (OPE) on overall beneficiary participation (OBP) OPE on OBP Averages of A on A A on B B anB BonA A and B Bivariate 0.62 0.60 0.62 0.60 0.57 (10.6) (10.1) (10.3) (9.7) (9.3) N 121 111 116 111 116 R2 0.49 0.49 0.48 0.46 0.43 Full multivariate 0.24 0.23 0.26 0.21 0.25 (3.8) - (2.6) (2.1) (2.0) (2.7) N 68 37 46 46 37 R 0.89 0.94 0.85 0.89 0.94 19 How reassuring are these cross-coder results? The following equations are helpful. Let's say that A's observation on project performance is the "truth" plus some random noise 0, plus an upward bias based on A's observation of participation: 7) OPEA = OPE + 8A*OBPA + e Coder A's observation on participation is just the "truth" plus random error: 8) ODPA - OBP1*q In this case, the coefficient of regressing performnce on participation will be biased W by the "halo effect". If the true structural relationship were: 9) OPE = p*OBP + E the estimated coefficient would be: 10) p. so that even if there were no structural relationship between the "true" variables ( p = 0 ) the estimate of the participation effect could be spuriously positive due to halo effects. Given this background, why does having another coder matter? If the observations-and scoring-of participation were completely objective, using a second coder's data would have no effect: A's and B's observations on participation would be identical (OBPA = OBPJ) and if the degree of halo effect was similar (.A a , the bias on performance 20 data would have no effect: A's and B's observations on participation would be identical (OBPA = OBPm) and if the degree of halo effect was similar (8 . 8, ,the bias on performance would be equivalent. If the observations of participation are subjective, then the "halo effect" bias should be less using cross coder data because the pure subjective component of B's assessment does not affective the bias in the performance measure. However, to the extent the performance is truly subjective this argues against the inter-subjective reliability above anA induces downward bias in the estimates due to classical measurement error. Table 5 shows Monte Carlo simulations of the combined effects of the "halo effect" bias and of pure measurement error, using different assumptions about the relative strengths of the two effects. Unfortunately, these simulations show that both underestimation and overestimation are possible when coders scores are crossed, depending on the ratio of the two effects. With large measurement error (the final column), crossing the coder rankings should produce lower estimates than using the rankings of a single coder for all possible strengths of "halo effect" for inputs and outputs with no measurement error (the first columns) and high degrees of the "halo effect", crossing the coder rankings does not help, it produces the same estimates (with a similar upward bias). Evidence from the relatively high inter-subjective reliability (as well as the instrumental variable results below) suggests low but non-zero measurement error, -- is from .1 to .25. In this range of measurement error (columns 4 and 5), one would expect a 21 modest but significant change in estimates when crossing coder's OPE and OBP rankings if the "halo effect" was strong. The lack of a consistent downward movement of the estimates (in the multivariate case they actually rise) suggest at least that the "halo effect" is not dominating the results. Table 5: Results of Monte Carlo simulation of the combined effects of measurement error and "halo effects" using cross coder information. Degree of measurement error 0 .25 .5 Degree 1 2 3 4 5 6 7 of A on A AonB AonA AonB AonA AonB Effct O .5 .5 .40 .40 .33 .33 Effect' .(true) (true) .25 .75 .75 .65 .60 .58 .50 .5 1 1 .90 .80 .83 .67 1 1.5 1.5 1.4 1.2 1.3 1.0 Note: The results are the average estimates from 1,000 replications of 120 observations each of the model: y=p*x + e, p=.5, x, E- N(0,1) , A(B)'s observations on the x variable are subject to measurement error of the form: xA = x* + *qA , where A(B) indicates that A and B have different random measurement error of proportion k. The observations on the dependent variable y are determined by yA(M = y + 6 *xA so that the measurement error of A or B influences the measurement of y by common "halo" factor of a Results by performance indicator 22~ A second technique to evaluate the halo effect is to examine the impact of participation on project performance indicators that vary in their objectivity. In addition to OPE several more objective indicators of project success were coded, including "percent of water system in good condition," or "percent of population target reached." To the extent that these more objective phenomena are relatively less susceptible to halo overestimation, the possible halo effect bias should be reduced. If the true coefficient were equal in the two models (which is not clear-see below), the more objective indicator should be systematically lower than the upwardly-biased subjective indicator. Table 6 presents these results. There is no evidence that the more subjective indicators (such of OPE) have systematically larger estimated impacts. Table 6: Impact of participation by various indicators of performance Overall Project Objective % of Water % of Target Effectiveness Value of System in Good Population Benefits Condition Reached Bivariate _ P on OBP 0.62 0.53 0.54 0.29 t-statistics 10.6 10.3 6.4 5.30 N 121 120 98 118.00 R2 0.49 0.47 0.29 0.19 FHll multivariate ,6 on OBP 0.24 0.27 0.29 0.25 t-statistic 3.8 3.6 2.4 2.50 N 68 68 60 68.00 R2 0.89 0.79 0.77 .0.47 23 IV) Joint determination and Causality The prior two sections have answered possible skepticism about the strong statistical association between performance and participation association. Another line of skepticism may accept the statistical association between participation and performance but deny this association reveals cause and effect. In this skeptical view, the data do not show that greater participation causes better project performance, simply that they happen to be related. Indeed, there are at least two good reasons that a performance-participation association may not be causal. First, there could be "reverse causation": projects that are exogenously better might induce greater beneficiary participation. This is sensible especially when performance and participation depend on a sequence of actions. Once it is clear that project is failing, each potential beneficiary may be less likely to participate because she perceives a relatively low benefit from participation which is unlikely to alter the project outcome. Second, joint determination of project success ad beneficiary participation may be driven by a third local or project attribute. For example, if dynamic leaders induce both project performance and participation, performance and participation data will be strongly associated-even without an independent causal effect of participation on performance. While we have tried to address this concern over spurious correlation with the "kitchen sink" inclusion of possible performance determinants, it would not take a very clever skeptic to name a large list of excluded variables (and some unobservable even in principle) that could affect both performance and participation. We use three approaches to resolve the problems of reverse causation and joint association and demonstrate a causal impact: instrumental variables estimation techniques, timing, and case studies. 24 InstrWmental variables One econometric solution to the problem of identifying a structural relationship is estimation with instrumental variables. Instrumental variables estimation avoids the problem of the joint determination of the independent and dependent variables by eliminating in the estimation of the coefficients that part of the variation in the independent variable which is due to variation in the dependent variable. The vehicle to eliminate that variation is a third variable (the instruments) which affects only the independent variable and not the dependent variable. In order to do instrumental variables, we need a variable that does affect participation but which does not affect directly, nor is affected by, performance. This variable is used as our instrument to purge the participation variable of any performance-related component. When the participation effect is estimated using only the part of participation variation that is correlated with the variation in the instrument, the resulting estimate is free of reverse causation: since better performance does not affect the instrument, the reverse effect of better performance on participation is eliminated and cannot bias the results. Expressed in equations, if the model is: 11) OPE = * P + 5t* + 6*Wi +6,*VI+ 12) .OB caZr a. * w *,*Vi + Ts 25 All V's which are included in the participation equation ( v, o but eicluded from the performance equation,( 8=0 ) are legitimate instruments. The V's provide a source of variation in participation that is exogenous to performance. On the other hand, neither the Z's nor the W's are valid instruments: they directly affect both participation and performance. To choose appropriate instruments, we need a positive model for participation. Hypotheses based upon the larger study of these water projects (Narayan 1994) (as well as theoretical literature on the determinants of participation) generated a set of equations to estimate the effects of participation (Appendix table 1.3 shows the full "first stage" regressions). We identify four variables as legitimate instruments: "extent to which participation was a project goal"; "percent of investment costs borne by users"; "beneficiaries overall net benefits of participation"; and "extent to which organization is based on local collectives." We hypothesize each of these phenomena may directly affect participation, but should have no independent, direct effect on project performance after controlling for participation. In table 7, the OLS and IV results are compared-for both the bivariate and the limited multivariate case'5. The estimated impact of participation increases with IV estimates. For instance, when "extent to which participation was a project goal" is used as an instrument, the bivariate impact rises from .63 to .70; the multivariate rises from .28 to .34. Similar results-the IV producing a higher and statistically significant estimate-are observed for each of '5 The full multivariate case loses too many degrees of freedom. So while the results are empirically similar they are less precise. 26 the instruments used singly". When all instruments are used together the impact in the bivariate case rises to .86 and to .37 in the limited multivariate case. What do these IV estimates tell us? The basic statistical relationship-high correlation between participation and performance-would also occur if better project performance caused greater better participation: as clean water is delivered in the early stages of a project, more potential beneficiaries may want to get involved. But if this were the causal story, the IV technique would cause the estimates to fall by removing this upward simultaneity bias. The rising coefficients reported in Table 7 are consistent with causality running from higher participation to better project performance in the presence of some measurement error. The IV results allow us to compute an independent estimate the magnitude of pure measurement error. Even if the OILS estimator is inconsistent with measurement error, the IV estimator is consistent and their ratio estimates converges to plim( OW)- 2ox* . With Pn, (e" + c5 the reported estimates, this ratio is between .8 and .9 (e.g. .63/.70 (bivariate) or .28/.36). The 2 ratio of measurement error to true variance -! is between .11 and .25: this is consistent with 2 * Except for the multivariate case when "organization based on local collectives" is used where the coefficient drops to .15 and statistical insignificance (likely due to the low power of the instrument). 27 (although somewhat higher than) the estimates of measurement error from cross coder reliability correlations in table 2 above. In using this technique, one would like to test whether the assumptions made in obtaining the IV results are valid. Note that since one variable may be endogenous-beneficiary participation-at least one instrument must be used to identify the model. But if one instrument can be unambiguously accepted-that without argument it directly affects only the independent variable-then the validity of any other instruments can be tested. Indeed, the entire set of instruments can be tested. 1 Heuristically the problem is that we need to test the exclusion of the instrument from the performance equation. However, one cannot test the exclusion directly (say by a t-test of the inclusion of the instrument) because in the presence of endogedeity the coefficient on the potentially endogenous variable is inconsistent when not instrumented and hence the t-test on the instrument would be biased. As a simple example say the model is y = P *x + 8 *z + c and x is endogenous. Say there is a single potential instrument z, say x = 7r*z +xj . But z is a valid instrument only if 8 = 0 . However, this hypothesis can only'be tested if there is a consistent estimate for P . But if z is used as an instrument for x, then the "instrumented" x is perfectly collinear with z (since the instrumented x is just x projected onto z). But since the "instrumented" x is collinear it is obvious one cannot use z to both recover a consistent estimate of 3 and to estimate 8 to test the exclusion restriction because using only z both cannot be identified separately. Therefore sufficient "exclusion restrictions"(such as 8 = 0 ) must be imposed a priori and the "just identifying" assumptions cannot be tested. 28 Table 7: Instrumental variables estimates of the participation-performance relationship, using various instrument sets Estimation OLS "Extent Part. a *% of Invest- "Net benefits "Organization based on "Prior All technique/ goal" ment costs by of local collectives* commitment of Instrument set users' participation' clients' Bivariate coeff .63 .70 .59 .77 .74 .97 .86 (t stat) (10.6) (10.2) (7.3) (10.6) (6.3) (7.54) (10.4) N 120 120 113 120 98b 105 90 R-squared .488 .482 .476 .453 .507b .378 .521 First stage R- - .763 .573 .701 .364 .326 .816 squared' Limited Multivariate Coeff .28 .34 .32 .36 .15 .39 .37 (t stat) (5.25) (5.2) (3.6) (5.4) (1.28) (3.00) (3.57) N 77 77 75 77 66 72 63 R-squared .862 .860 .861 .858 .855 .863 .865 First stage R- [.40101 .826 .643 .803 .719 .559 .857 squared'_ I I I I_II Notes: a) Unadjusted R-squared of the 'first stage* regression of participation on the instruments (which in the multivariate case includes all variables in the performance equation). b) Since the sample sizes are not the same the results are not strictly comparable in all columns. In particular the IV R-squared are less than OLS R- squared when run for the same sample. c) This is the R-squared of participation regressed on all the Z variables which are included in the performance equation. The increment to the R- squared for each instrument can be calculated as the difference with this column. 29 We believe that "extent to which participation was a goal' is the most plausibly exogenous variable among individual instruments as there is no reason to believe that participation as a goal should by itself lead to better performance-except insofar as it actually raised participation. When each of the other instruments is tested, conditional on the validity of this variable (using a Hausman-Taylor test), we fail to reject the exogeneity of the other instruments in every case. When the entire set of instruments is tested, we do not reject the validity of the instrument set in either the bivariate or multivariate case". Our set of instruments do stand up to the available tests for instrument validity". Timing Evidence on causality also can be observed from the timing of the project cycle. If the association between participation and project performance were not causal, we would see no association between events that occur before -project completion-proximate determinants of The value of the Sargan test with the full set of instruments is 7.03 (significance level .133) in the bivariate and 5.24 (significance level .263) in the multivariate estimates. 1 Of course, the major objection to these tests is that they tend to be of very low statistical power (that is, these tests will often fail to reject a hypothesis that is false). Therefore a "failure to reject" the instruments cannot be taken as compelling evidence to accept the instruments. 30 project performance--and beneficiary participation. Table 8 reports the impact of participation Table 8: Impact of beneficiary participation on the proximate determinants of project performance Bivariate Limited Full multivariate multivariate Quality of implementation 0.53 0.17 0.21 (9.3) (2.7) (2.7) Effectiveness of O&M 0.49 0.14 0.11 (7.4) (2.0) (1.1) Maintenance after 1 year 0.43 0.16 0.18 (6.6) (2.0) (1.8) on quality of implementation, effectiveness of operations and maintenance (0 & M), and maintenance after 1 year. We find that in all but one (multivariate) case, beneficiary is a statistically significant input to these proximate determinants. If project effectiveness were causing participation rather than vice versa, we would not expect to see this result. Case Studies Studies of individual cases help to further resolve questions of causality, particularly when exogenous shifts in participation change project outcomes. Narayan (1994) documents two such case studies. Phase I of the Aguthi Rural Water Supply Project in Kenya was implemented without community participation. The project was so plagued with problems- construction delays, cost over-runs and disagreements over consumer payment methods-that it came to a standstill. At this point, the project was redesigned. The Aguthi Water Committee, working with local leaders and project staff, mobilized the community: after public stakeholder 31 conferences, community members organized and began contributing to the project. Phase II of the project was completed on schedule and within budget; the communities continued to pay monthly tariffs for the new water service, and operations and maintenance of the system was handled successfully, in cooperation with the government parastatal. The WAS (Waniata, Air dan Sanitasi) program in Indonesia assisted community groups to launch and manage their own water system. A water group in the village of Silla was formed in 1986 as WAS began. Initially, they relied heavily on the arrival of a government team to dig a bore hole, but none came. When they realized that they could not rely on immediate government assistance, the members increased their participation. They negotiated water rights with a neighboring group, collected building material, and built three water tanks- with only a small amount of outside assistance. By 1988 a new well was under construction, financed by their own contributions. Eggplant and chilies-with water from the new tanks-were flourishing in peoples' yards. Conclusion We began by showing the existence in project level data of a strong association between project performance and beneficiary participation. We then addressed and answered the three econometric objections to these results. The subjectivity of the data is not an overwhelming problem. The "halo effect" does not appear to induce a strong upward bias. Most importantly, there are strong arguments that the participation and project performance relationship is cause and effect. This paper, especially together with the more comprehensive work of Narayan, 1994, does provide development practitioners-including early and recent 32 converts to the participatory approach-with strong statistical findings that increasing participation directly causes better project performance. Three questions which are important for practice and policy are not explored here. First, does participation directly cause better project performance across all sectors? One cannot blindly extrapolate the results in this study across all sectors, since this data is only from rural water supply projects. The economic characteristics of rural water as a good would seem to promote the importance of direct beneficiary participation; these economic characteristics vary across goods provided by projects in other sectors. Second, what policy instruments help to achieve more effective participation? The behavior of project beneficiaries, staff in project agencies, and other suppliers responds to incentives, but there is little documented experience on creating incentives in public sector agencies for promoting and incorporating participation. Finally, can experiences with participation help to clarify the analysis of the deficiencies inherent in either a purely individualistic "market" or a purely statist "government" approach to development? An analytic approach that incorporates participation might examine the various mechanisms whereby cooperative action by groups can overcome the inefficiency of individualistic solutions--e.g, from "free riding" or strategic (mis)revelation of private information--while avoiding the limitations of centralized government. These "informal" methods of cooperation have been explored by a number of authors (Ostrom, Schroeder and Wynne 1993, DeSoto 1989, Wade 1988) but much remains to be learned. 33 Bibliography Briscoe, John and David de Ferranti, 1988. Water for Rural Communities: Helping People Help Themselves. The World Bank, Washington, DC. Desoto, Hernando, 1989. The Other Path. New York: Harper and Row. Esman, Milton J., and Norman T. Uphoff (1984). Local Organizations: Intermediaries in Rural Development. Ithaca, NY: Cornell University Press. Finsterbusch, Kurt, 1990. "Studying Success factors in Multiple Cases Using Low Cast Methods." University of Maryland, Department of Sociology, College Park, MD. and Warren Van Wicklin II, 1987. "The contribution of beneficiary participation to development project effectiveness," Public Administration and Development, Vol. 7 (1- 23). Freiere, Paulo, 1973. Education for Critical Consciousness. New York: Seabury Press. Gerson, Philip, 1993, "Population Participation in Economic Theory and Practice" HRO Working Paper No. 18, World Bank, Washington, D.C. Hammermesh, Daniel S., and Jeff E. Biddle, 1993. "Beauty and the Labor Market," NBER Working Paper 4518. Cambridge, MA. Herren, R., 1984, "Factors associated with the success of participants in the National Future Farmers of America livestock judging contest," Journal of American Association of Teaching and Education in Agriculture, vol 25, no. 1, ppl 2-19. Korten, David C, 1980. "Community Organization and Rural Development: A Learning Process Approach," Public Administratori Review, September-October, 1980 pp. 480- 510. Korten, Frances F. and Robert Siy, Jr., 1989. Transforming Bureaucracy: The Experience of the Philippine National Irrigation Administration. Kumarian Press. Murphy, Kevin R., Robert A. Jako, and Rebecca Anhalt, 1993, "Nature and consequences of halo error: A critical analysis," Journal of Applied Psycholo2y, vol 78, no. 2, pp 218- 225 Narayan, Deepa, 1994. "Contribution of People's Participation: Evidence from 121 Rural Water Supply Projects," ESD Occasional Paper Series, No. 1, World Bank, 1994. 34 Ostrom, Elinor, Larry Schroeder, and Susan Wynne, 1993, Institutional Incentives and Sustainable Development: Infrastructure Policies in Perspective, Westyiew: Boulder. United Nations Development Programme, The Human Development Report 1993, The United Nations, New York. Wade, Robert, 1988, Village Republics: Economic Conditions for Collective Action in South In.dia, Cambridge: Cambridge University Press. World Bank, 1991. World Development Report 1991, Washington, DC, Oxford University Press. 35 Appendix Table 1.1: Variable Label N Mean Std. Dev. Performance Indicators V24 Overall Project 121 4.09 1.6 Effectiveness V90 Percentage of Water 98 4.8 1.8 System in Good Condition V44 Objective Value of 120 4.2 1.3 Benefits V33 Percentage of Target 118 4.9 1.1 Population Reached Participation Variable V105 Overall Participation 121 3.7 1.7 (OBP) Fully exogenous performance determinants (Z) V1 GNP/Capita 114 519.8 389.3 V5 Project Complexity 121 3.3 1.2 V47 Total Cost (LN) 104 15.4 1.5 V126 Adequacy of Facilities 121 4.5 1.3 V127 Difficulties in Staff 92 3.8 1.7 Recruiting V94 Availability of Parts 115 4.2 1.5 V130 Objectives, Target 121 4.4 1.2 Other performance determinants (W) V134 Appropriateness of 121 4.5 1.3 Technology V66 Support of Government 118 4.6 1.1 V71 Agency Understanding 118 2.8 0.9 V61 Conduciveness of Political 121 3.2 0.7 Context 36 V62 Conduciveness of 121 3.22 0.7 Economic Context V63 Conduciveness of 121 3.5 0.7 Social/Cultural Context V64 Conduciveness of 121 3.2 0.9 Geol/Environmental Context V72 Average Number of Users 117 3.2 1.1 V69 Competition From Other 109 3.4 1.5 Sources V128 Skill of Staff 111 4.6 1.2 V129 Overall Quality of 120 4.2 1.3 Management 37 不一 『,긔― Policy Research Working Paper Series Contact Title Author Date for paper WPS1331 The Myth of Monopoly: A Now View Annette N. Brown Auqust1994 M. Berg of Industrial Structure in Russia Barry W, Ickes 36969 Rand! Ryterman WP81332 Poverty and Household Size Peter Lanjouw August 1994 P.Cook Martin Ravallion 33902 WPS1333 A Test of the International Norman V. Loayza August 1994 R. Martin Convergence Hypothesis Using 39026 Panel Data WPS1334 Taxation, Public Services, and the Juan Braun August 1994 R. Martin Informal Sector in a Model of Norman V. Loayza 39026 Endogenous Growth WPS1335 Labor Regulations and the Informal Norman V. Loayza August 1994 R. Martin Economy 39026 WPS1336 Modemizing Payment Systems in Robert Listfield August 1994 F. Montes-Negret Emerging Economies Fernando Montes-Negret 37832 WPS1337 The Countrywide Effects of Aid Howard White August 1994 E. Khine Joke Luttik 37471 WPS1338 Commodity Exports and the Adding- Maurice Schiff August1994 A. Kim Up Problem in Developing Countries: 33715 Trade, Investment. and Lending Policy WPS1339 China's Emergence: PRospects, Andres Boltho August 1994 J. Queen Opportunities, and Challenges Uri Dadush 33740 Dong He Shigeru Otsubo WPS1340 Opportunity Cost and Prudentiality: Herbert L. Baer August 1994 P. Sintim-Aboagye An Analysis of Futures Clearinghouse Virginia G. Franco 38526 Behavior James T. Moser WPS1341 Explaining Pakistan's High Growth Sadiq Ahmed August 1994 A. Bhalla. Performance Over the Past Two 84440 Decades: Can It Be Sustained? WPS1342 Winners and Losers in Transition: Peter F. Orazem August 1994 J. Walker Retums to Education, Experience, Milan Vodopivec 37466 and Gender in Slovenia WPS1343 Strategic Interdependence in the Wafik Grals August1994 K.Zheng East-West Gas Trade: A Hierarchical Kangbin Zheng 36974 Stackelberg Game Approach Policy Research Working Paper Series Contact Title Author Date for paper WPS1344 Which Foreign Investors Worry About Eric Bond August 1994 A. Estache Foreign Exchange Risk in South Antonio Estache 81442 Asia and Why? WPS1345 The Decentralization of Public Jacques Cremer August 1994 A. Estache Services: Lessons from the Theory Antonio Estache 81442 of the Firm Paul Seabright WPS1346 Linking Competition and Trade Bernard M. Hoekman August 1994 F. Hatab Policies in Central and Eastern Petros C. Mavroidis 35835 European Countries WPS1347 Antitrust-Based Remedies and Bernard M. Hoekman August 1994 F. Hatab Dumping in International Trade Petros C. Mavroidis 35835 WPS1348 Quality Change and Other influences Robert E. Lipsey August 1994 J. Ngaine on Measures of Export Prices of 37947 Manufactured Goods WPS1 349 The New Regionalism and the Threat Andrew Hughes Hallett August 1994 A. Kim of Protectionism Carlos A. Primo Braga 33715 WPS1350 Economic Parameters of Joachim von Amsberg August 1994 E. Schaper Deforestation 33457 WPS1351 NAFTA's Implications for East Asian Carlos A. Primo Braga August 1994 A. Kim Exports Raed Saadi 33715 Alexander Yeats WPS1352 Trade and Growth in Ecuador Jesko Hentschel August 1994 D. Jenkins A Partial Equilibrium View 37890 WPS1353 Nontariff Measures and Developing Patrick Low August 1994 J. Jacobson Countries: Has the Uruguay Round Alexander Yeats 33710 Leveled the Playing Field? WPS1354 The Effects of Fiscal Consolidation Warwick J. McKibbin September 1994 J. Queen in the OECD 33740 WPS1355 Export Incentives: The Impact of Sanjay Kathuria September 1994 M. Haddad Recent Policy Changes 32160 WPS1356 Central Bank Independence: Ignacio Mas September 1994 PRDC A Critical View 33482 WPS1357 Does Participation improve Project Jonathan Isham September 1994 M. Geller Performance? Establishing Causaflty Deepa Narayan 31393 with Subjective Data Lant Pritchett