Policy Research Working Paper 9194 Demand for Information on Environmental Health Risk, Mode of Delivery, and Behavioral Change Evidence from Sonargaon, Bangladesh Alessandro Tarozzi Ricardo Maertens Kazi Matin Ahmed Alexander van Geen Development Economics Knowledge and Strategy Team March 2020 Policy Research Working Paper 9194 Abstract Millions of villagers in Bangladesh are exposed to arsenic by Sales were not increased by informal inter-household agree- drinking contaminated water from private wells. Testing for ments to share water from wells found to be safe, or by arsenic can encourage switching from unsafe wells to safer visual reminders of well status in the form of metal placards sources. This study describes results from a cluster random- mounted on the well pump. However, switching away from ized controlled trial conducted in 112 villages in Bangladesh unsafe wells almost doubled in response to agreements or to evaluate the effectiveness of different test selling schemes placards relative to the one in three proportion of house- at inducing switching from unsafe wells. At a price of about holds who switched away from an unsafe well with simple USD0.60, only one in four households purchased a test. individual sales. This paper is a product of the Knowledge and Strategy Team, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at alessandro.tarozzi@upf.edu. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Demand for Information on Environmental Health Risk, Mode of Delivery, and Behavioral Change: Evidence from Sonargaon, Bangladesh Alessandro Tarozzi Ricardo Maertens Kazi Matin Ahmed Alexander van Geen∗ ∗ Alessandro Tarozzi (corresponding author) is associate professor at Universitat Pompeu Fabra and Barcelona Grad- uate School of Economics, Barcelona; his email is alessandro.tarozzi@upf.edu. Ricardo Maertens is a post-doctoral research fellow at Harvard University; his email is maertensodria@fas.harvard.edu. Kazi Matin Ahmed is professor at University of Dhaka; his email is kazimatin@yahoo.com. Alexander van Geen is Lamont Research Professor at Columbia University, New York; his email is avangeen@ldeo.columbia.edu. We are very grateful to Prabhat Barnwal for the idea of forming groups to share risk that started this project. We acknowledge financial support from the Earth Clinic at the Earth Institute, Columbia University (this is LDEO publication number 8384), NIEHS (grant P42 ES010349), NSF (grant ICER1414131), and the Ministerio de Econom´ ıa y Competitividad of the Spanish Government (grant ECO2015- 69869-R). We thank Dr. Zahed Masud of the Arsenic Mitigation and Research Foundation for obtaining the necessary approvals in Bangladesh and for managing the finances of the project. We are also grateful to the survey team and in particular Ershad Bin Ahmed for their work during the data collection, and to Mr. Saifur Rahman from the Department of Public Health Engineering for his insights throughout the project concerning arsenic mitigation. The paper benefited from constructive comments and suggestions from the Editor, anonymous referees, as well as many colleagues at Lund University, Helsinki Center of Economic Research, the Workshop on Health Economics and Health Policy (Heidelberg), Universit´e Catholique de Louvain, University of Bristol, King’s College (London), Venezia C` a Foscari, CEPREMAP India-China Conference (PSE, Paris), the VI Navarra Center for International Development (Fundaci´ on Ram´on Areces, Madrid), Oxford (CSAE), Stanford University, Universit´ eve, Uppsala, Universit` e de Gen` a di Torino, Waikato University, and the 14th CEPAR Summer Workshop in the Economics of Health and Ageing at University of New South Wales, Syd- ney. The study was approved by the Columbia University IRB (Protocol AAAN9900) and the Government of Bangladesh NGO Affairs Bureau. All errors are our own. 1 Introduction Poor health stands out as a common feature of life in less developed countries (LDCs). Several factors contribute to the persistence of the problem, including poor availability and high cost of good quality health care, insufficient investment in prevention, and the frequent reliance on ineffective and sometimes unnecessarily expensive treatments (see Dupas 2012, Dupas and Miguel 2017, and Tarozzi 2016 for recent reviews). Information campaigns on health risks are sometimes seen as an appealing tool in environmental and health policy. This is because they can be relatively inexpensive to run when compared to other options such as investments in infrastructure or public health measures needed to eliminate the risk at its root. Some health conditions are in fact preventable if appropriate risk-mitigating behavior is adopted. However, governments in LDCs may lack the resources or the political will to carry out even simple information campaigns (let alone campaigns that provide reports specific to each household), and information alone is often not sufficient to promote positive changes in behavior. This paper describes the results of a randomized controlled trial (RCT) carried out in Sonargaon sub-district of Bangladesh, to examine the impact of different ways of selling a contaminant test on risk-avoiding behavior. Households were offered tests that measure tube well water contamination with arsenic, a common occurrence in the area. The primary objective was to determine whether a novel mode of test delivery, leveraging within-village solidarity networks, could increase health-protective behavioral responses relative to the standard delivery of private information to well users. In a first group of 49 randomly selected villages, field tests were offered at a (subsidized) price of BDT45 (about USD0.60 at current nominal exchange rates, close to the price of one kg of rice in Dhaka), an amount estimated to be just enough to cover for the salary of the surveyors hired for the project.1 In an additional subset of 48 villages, surveyors received incentives to offer tests—at the same price of BDT45—to groups of buyers: group members were asked to sign an informal agreement according to which those with safe wells would share their well with others in the group whose well water was found to be unsafe. The agreement was not binding legally, but the prior was that it would increase rates of switching from unsafe sources through two mechanisms: first, by making sharing more likely through a form of soft-commitment and, second, by facilitating the spread of information about the safety of wells, thereby facilitating the identification of safe options within the village. While a large literature documents the importance of village networks to cope with shocks, including health shocks (see Fafchamps 2011 for a review), we are not aware of other work studying how informal networks can help in creating opportunities to reduce environmental health risk.2 The study also examines the impact of a second mode of information delivery, in the form of metal placards attached to the well spout to convey test results. Budget limitations, however, only allowed the inclusion of 15 villages in this experimental arm, reducing statistical power. In these villages, 1 Throughout the paper, Bangladesh Takas (BDT) are converted into United States Dollars (USD) using a nominal exchange rate of 80BDT/USD, and a PPP exchange rate of 23.145, as indicated in World Bank (2015, Table 2.1). 2 In broadly related work, Goldberg et al. (2018) show that peer networks can be leveraged to improve screening for tuberculosis in Indian urban areas. 2 individuals who purchased a test at the same price of BDT45 were also given a metal placard of a color depending on the arsenic level: blue for arsenic up to 10 ppb (parts per billion or micro-grams per liter), the World Health Organization guideline for arsenic in drinking water, green if above 10 and up to 50 ppb, and red if ‘unsafe’, that is, above the national government standard of 50 ppb. Similar metal placards have been used before in some testing campaigns (Opar et al. 2007, van Geen et al. 2014), as a more durable alternative to the routine strategy—adopted also during past nationwide testing campaign in Bangladesh—of applying to the well spout red or green paint that often becomes invisible within a year (Pfaff et al. 2017). Such visible indicators are a reminder about the status of specific tube wells with respect to arsenic and can spread this information throughout the village. In different contexts, other researchers have found large impacts of reminders on health-related behavior, for instance through the use of SMS messages, see Pop-Eleches et al. (2011) and Raifman et al. (2014). However, the cost of the placards (about BDT80) is high enough to increase significantly the total cost of testing. It was therefore important to determine whether the placards made any difference relative to the alternative solution (adopted in the two experimental arms described earlier) of informing the household with a less expensive laminated card indicating the test result and encouraging the household to keep the card in the house. Despite much progress on numerous health indicators (Chowdhury et al. 2013), Bangladesh remains in the midst of a severe health crisis due to the widespread presence of naturally occurring arsenic (As) in shallow aquifers (see Ahmed et al. 2006, Johnston et al. 2014, and Pfaff et al. 2017). The problem, due to the widespread presence in the country of geological conditions conducive to accumulation of arsenic in groundwater, is compounded by millions of households in rural areas relying on water from privately owned, unregulated shallow tube wells for drinking and cooking. Using nationwide data from 2009, Flanagan et al. (2012) estimated that, in a country of more than 150 million people, about 20 million were likely exposed to arsenic levels above the official Bangladesh standard of 50 ppb, while almost one third of the population was likely exposed to levels above the significantly lower WHO guideline of 10 ppb. The most visible health consequences of chronic exposure to arsenic from drinking tube well water in South Asia, such as cancerous skin lesions and loss of limb, were recognized in the state of West Bengal, India in the mid-1980s (Smith et al. 2000). It has since then been shown on the basis of long- term studies in neighboring Bangladesh that arsenic exposure increases mortality due to cardiovascular disease, and may inhibit intellectual development in children and be detrimental for mental health (Wasserman et al. 2007, Argos et al. 2010, Rahman et al. 2010, Chen et al. 2011, Chowdhury et al. 2016). These health effects are accompanied by significant economic impacts: exposure to arsenic has been estimated to reduce household labor supply by 8% (Carson et al. 2011) and household income by 9% per every earner exposed (Pitt et al. 2015), while Flanagan et al. (2012) calculated that a predicted arsenic-related mortality rate of 1 in every 18 adult deaths represents an additional economic burden of USD13 billion in lost productivity alone over the next 20 years. Piped water from regulated and monitored supplies would likely be the most effective policy answer, but such a solution would require immense investments in infrastructure that may not be sustainable 3 or cost-effective for the foreseeable future, so that identifying short-term mitigation strategies remains essential. The consensus view is that household-level water treatment, dug wells, and rain-water harvesting are not viable alternatives for lowering arsenic exposure because of the cost and logistics of maintaining such systems in rural South Asia (Ahmed et al. 2006; Howard et al. 2006; Johnston et al. 2014; Sanchez et al. 2016). In contrast, despite being the main source of arsenic exposure, tube wells remain the most effective way of providing safe drinking water to the rural population of Bangladesh in the short to medium term (Krupoff et al. 2020). With the exception of the most severely affected areas of the country, the spatial distribution of high- and low-arsenic wells is highly mixed, even over small distances. At the same time, whether a well is contaminated with arsenic or not rarely changes over time (van Geen et al. 2007; McArthur et al. 2010). Therefore, exposure among users of arsenic-contaminated wells can often be avoided by switching to a nearby safe well, be it a shallow private well or a deeper—which usually means safer—community well (van Geen et al. 2002; van Geen et al. 2003). Using data from Araihazar, a sub-district bordering the location of this study, Jamil et al. (2019) estimate that blanket testing campaigns that inform households about the arsenic contamination of all private wells were significantly more cost-effective at reducing arsenic exposure than the provision of piped water, or the construction of deep wells by the government. Previous campaigns aimed at testing tube well water for arsenic have only partially succeeded at promoting risk-avoiding behavior, highlighting the need to devise novel strategies to achieve this goal. Between 1999 and 2005, the Bangladesh Department of Public Health Engineering (DPHE), with support from the World Bank, DANIDA, UNICEF and a number of non-governmental organizations (NGOs), coordinated the Bangladesh Arsenic Mitigation and Water Supply Program (BAMWSP) testing campaign. The campaign tested close to 5 million wells making use of field kits, and identified them as ‘safe’ or ‘unsafe’—according to the Bangladesh standard of 50 ppb—by painting the well spout with green or red paint, respectively. Several studies have documented switching rates from an unsafe to a safe well after testing of between one-third and three-quarters, with higher switching rates in trials that provided information campaigns on arsenic health risks and repeat visits, in some cases with objective measures of exposure taken in the form of urine samples (Chen et al. 2007; Madajewicz et al. 2007; Opar et al. 2007; George et al. 2012; Bennear et al. 2013; Balasubramanya et al. 2014; Inauen et al. 2014; Pfaff et al. 2017). Despite these partial successes, a substantial fraction of households continues to use unsafe wells today and it is thus important to identify mechanisms to increase risk- mitigating responses. In addition, millions of new wells have sprouted in the country, and in most cases users do not know the arsenic level of the water, because campaigns such as BAMWSP have not been replicated, and a market for tests barely exists. There are a few commercial laboratories in Dhaka with the capability to test wells for arsenic, but few rural households are aware of these services.3 The cost of well testing is greatly reduced and the logistics are greatly simplified by the use of field kits, which have become increasingly reliable and easy to use (George et al. 2012; van Geen et al. 2014), but even these tests are rarely available in the villages. 3 In addition, the cost of the laboratory analysis is as high as 25-40 USD, not including the cost of the kits necessary for the collection of the water sample. 4 In the context of this study, only about one in four households purchased a test, regardless of the offer type, despite the low—and subsidized—sale price and widespread awareness about the arsenic problem, and despite little prior awareness about the safety status of individual wells. This is consistent with a growing literature that documents low demand for health-protecting technologies in developing countries for a variety of such products, ranging from insecticide-treated nets (Cohen and Dupas 2010, Dupas 2014, Tarozzi et al. 2009, Tarozzi et al. 2014), to de-worming drugs (Kremer and Miguel 2007) and water-disinfectant (Ashraf et al. 2010). However, while the offer type barely affected demand, it did affect how households responded to the information. Intent-to-Treat estimates (not conditional on purchase) show that while standard individual sales led 3.7% of households to switch water source, the fraction was 4.3% with group sales (14% higher, 95% C.I. [−0.0137, 0.0339], p-value 0.404) and 6.4% (75% higher, 95% C.I. [0.003, 0.072], p-value 0.031) when metal plates were attached to the well spout in case of purchase, although the difference is only statistically significant in the latter case. In addition, the estimates show very substantial differences between arms in the response of households who receive ‘bad news’ about the safety of their well water. Among households informed that their well is high in arsenic, switching rates almost doubled from 30 to 56% with group sales relative to standard individual sales (95% confidence interval of the difference adjusted for baseline covariates is [0.011, 0.362]). Switching rates also more than doubled when metal plates were attached to the pump head (from 30 to 72%, 95% C.I. of adjusted difference [0.188, 0.609]). This work complements the literature on the demand for health-protecting technologies by looking at demand for health-related information that can be exploited by households to reduce risk. The focus here is on the offer of information that is specific to the buyer (the test measures arsenic contamination in the water from a specific well), in contrast to general information (for instance, on the likelihood of arsenic contamination, or the health risks associated with unsafe water). While this article studies demand for information on environmental factors, earlier work has considered the demand for information on health status, see in particular Cohen et al. (2015), Bai et al. (2017), Thornton (2008), and Gong (2015). These studies suggest that even among households willing to pay for information, behavioral responses may not be optimal from a public health perspective, so that it is important to study whether the mode of delivery of information can help achieving desirable policy objectives. This article is related to Barnwal et al. (2017) who estimate a very steep demand curve for arsenic tests in Bihar, India, another location with a groundwater arsenic problem. This study found that uptake was 25% at a price of INR40 (about BDT49), which is about the same as what this study estimates at a very similar price of BDT45. Unlike Barnwal et al. (2017), this study does not analyze how demand changes with price, but it examines the role of non-price factors on demand and behavioral responses to information. The results show that demand was not sensitive to the introduction of informal agreements or the use of placards, but conditional on demand, these nudges led to large and significant increases in switching among users of unsafe wells relative to simpler, private sales. The paper proceeds as follows. The next section provides additional background information on the extent of the arsenic problem in the study area and describe the experimental design. Section 3 de- 5 scribes the data collection protocol, present selected summary statistics, and show that by chance the means of some covariates were not balanced at baseline, highlighting the importance of controlling for baseline characteristics in our estimates (the adjusted and unadjusted estimates remain qualitatively similar). Section 4 presents the conceptual framework that guided the study design and the interpre- tation of the results provided in Section 5. The cost effectiveness of the interventions is evaluated in Section 6. Finally, Section 7 concludes and highlights limitations of the results. 2 Program Description and Study design This study was carried out in Sonargaon, a 171 km2 sub-administrative unit (or upazila ) of Narayanganj district, located approximately 25 kilometers south-east of the capital Dhaka. According to the 2011 Census of Bangladesh, Sonargaon had a population of about 400,000, and at the time of the study it included 365 villages. According to the BAMWSP blanket testing, conducted locally in 1999-2000, about 40% of villages in Sonargaon contained > 90% of wells that did not meet the national standard of 50 ppb.4 2.1 Study Area and Program Description The study area for this trial was initially formed by the list all 128 villages in Sonargaon with more than 10 wells and with a 40-90% share of unsafe wells based on the BAMWSP testing conducted years earlier. A lower bound was chosen to focus on areas where a sizeable fraction of new untested wells were likely to be unsafe, while the upper bound was designed to avoid areas where switching to safe wells was not likely to be a viable option for most households. Between January and June 2016, surveyors conducted home visits to identify all wells in the selected villages, regardless of whether they had been tested before. Privately owned wells were linked to household who owned it, while public wells were linked to the main caretaker or user. Almost all wells (98.6%) were privately owned, and for simplicity in the rest of the paper the term ‘owner’ will be used to refer to the household who owned the well, or to the household who was the primary user of community wells. During the home visits, surveyors explained the risk of consuming arsenic-contaminated tube well water to an adult—typically the most senior woman—and offered to test the well for a fee. When a test was purchased, tube well water was tested in the field using the Arsenic Econo-Quick (EQ) test kit, which has been shown to be reliable, and can deliver results within ten minutes (see George et al. 2012 for details). The result was immediately communicated to the buyer. The kit’s test strip is evaluated visually and the result classified in the following sequence (in ppb) {0, 10, 25, 50, 100, 200, 300, 500, 1000}. The tests cost USD0.30 for volume purchases, although the total cost per test is higher at about USD2.4 per test based on a testing campaign that also covered 4 Blanket testing in Sonargaon was carried out by BRAC, a partner NGO of BAMWSP. A total of 25,048 tube wells were tested for arsenic. 6 the costs of trained personnel and metal placards displaying the test result and attached to the body of the hand-pump (van Geen et al. 2014). Surveyors also administered a short household questionnaire and distributed color-coded laminated cards with the hand-written test result (in case of purchase) and well identification number. All cards included the following information: (i) that arsenicosis is not a communicable disease, (ii) that arsenic cannot be removed by boiling water, (iii) that testing tube well water for arsenic is important, and (iv) that the Bangladesh safety standard for arsenic concentration in water is 50 ppb. Black cards with these messages were given to households who did not buy a test, while in case of purchase the laminated card was blue if arsenic was up to 10 ppb, green if 25 or 50 ppb, and red if > 50 ppb (i.e. unsafe according to the national standard). Owners of unsafe (‘red’) wells were encouraged face-to-face to switch to a safe (blue or green) well, while owners of wells with concentrations up to 10 ppb were encouraged to share their well water with their neighbors. Owner of green wells were both encouraged to share their water and to switch to a safer (blue) well, if possible. Experimental variation came from differences between three ways of selling arsenic tests across villages. In a first group of villages, referred to as group A, surveyors offered to test tube well water for a fee of BDT45 (about USD0.60). This fee was expected to cover the salary of the testers and their supervisor. Of the BDT45 charged per test, testers kept BDT30 to cover their transportation expenses and salary, and handed over the remaining BDT15 to their supervisor. The price was determined assuming that a field worker would test about 15 wells/day for 20 days/month, leading to a monthly salary of BDT9,000 (USD112.5/month), which is roughly what village-health workers were paid for blanket testing in the neighboring Araihazar in 2012-2013 (van Geen et al. 2014). According to the same scenario, the supervisor of 10-15 workers would earn BDT45,000-67,500 (USD563-844), a range that spans what he earned while supervising the testing in Araihazar in 2012-2013. Across all experimental arms, the cost of the field kits (USD0.30/test) was covered by the project.5 In a second group of villages (B), surveyors were asked to sell the tests to groups of buyers, rather than to individual households. When a well owner was identified, surveyors would propose the formation of a group of buyers of at least three and up to ten households, while individual sales were not allowed.6 Surveyors would help group formation, for instance by proposing a sale to all households within the same compound (or bari ), and then coordinating the inclusion of additional buyers via mobile phones. After a group was formed and an informal well-sharing agreement was reached by all group members, each household would pay BDT45. The agreement had no legal standing, and was meant to serve as a soft commitment device. Our study design called for an agreement in writing, but in practice most buyers were uncomfortable about signing a document, so in a large majority of cases a verbal agreement took place instead. All members were informed of the test results for all wells within the group. In a third group of villages (C), households were again assigned to receive individual test offers at BDT45 (as in group A), but in the case of purchase a color-coded stainless steel placard was attached 5 In practice, demand (and thus testers’ compensation) was lower than expected, see footnote 15. 6 Most groups included 7-10 buyers, although some had as few as three and some had 11. 7 to the well’s pump-head. Placards displayed both in text and color whether the arsenic concentration was up to 10 ppb (blue), 25 or 50 ppb (green), or above 50 ppb (red). As shown in Figure 1, the placards displayed two hands holding a drinking cups, one hand holding a drinking cup, or a large cross over a hand holding a drinking cup, depending on the arsenic concentration. The split of the test fee between the tester and supervisor in groups B and C was the same as in group A, but in B the project gave an additional bonus of BDT12 per sale to testers, to compensate for the additional effort necessary to coordinate group formation.7 2.2 Power Calculation and Study Design The trial was not registered, and was not based on a pre-analysis plan. Comparison of demand for and responses to tests between arms A and B was the primary objective of the study. Data from earlier work in neighboring Araihazar subdistrict (see Bennear et al. 2013), were used to estimate an intra- village correlation of ρ = 0.09 in switching decisions among owners of unsafe wells and a switching rate of about 30%, a rate at the lower end of what observed in previous studies. It was determined that 50 clusters (villages) would be sufficient to detect meaningful impacts with high probability. In particular, assuming a 30% switching rate from unsafe wells with individual sales (arm A), and even using a conservative 0.12 intra-village correlation in switching rates, and with only five unsafe wells identified through the test sales per village, 49 clusters per arm would have been necessary to have 80% power to detect a 15 percentage points difference in switching rates between arms for a two-sided √ test of equality at the 5% significance level (effect size = 0.15/ 0.3 × 0.7 = 0.33). The same number of clusters would have achieved the same power for a smaller 10 percentage points difference (effect √ size = 0.10/ 0.3 × 0.7 = 0.22) with a less conservative ρ = 0.09 and with higher demand leading to the discovery of 20 unsafe wells per cluster, while 51 clusters per arm and 40 unsafe wells per village would have been necessary with the more conservative ρ = 0.12. Given that the available funding allowed the inclusion of a larger number of villages, it was possible to include the additional experimental arm C, for which however sample size was dictated by budget constraints rather than power calculations. The assignment to treatment arms was done by the principal investigators using random assign- ment, using the statistical software Stata, after stratification. Strata were determined by whether the share of unsafe wells in the BAMWSP testing campaign carried out years earlier was below or above 7 The experimental design also included two exploratory arms, with only six villages each, where tests were sold individually either at a village-level price of BDT45 or BDT90, but with payment required only in case of ‘good news’, that is, in case of arsenic level no higher than 50 ppb. The inclusion of these proof-of-concept conditional sales was motivated by the aversion expressed by several members of focus groups to the idea of ‘paying for bad news’. Sales conditional on the results may have thus increased demand (a prediction strongly supported by the observed purchase rates), although the conditional payment also generates a reduction in the (expected) price and a different selection into purchase conditional on beliefs about the safety of the well water. Because of these confounding factors and because of the very small number of villages assigned to these sales, the results are not discussed in detail in this article but they are available upon request from the authors. 8 the median, and by union (an administrative unit).8 There were two deviations from the experimental protocol. First, while programming the mobile application used for data collection, 27 villages were assigned by mistake to a treatment different from the original one. The partial re-assignment of treat- ments was thus due to a programming error and not to the incorrect implementation of the protocol in the field. In addition, the checks for balance in covariates are very similar based on originally assigned or actual treatment (see below). For this reason treatment status is defined as actual treatment in the analysis. The second deviation is that surveyors were unable to differentiate a village from the one adjacent to it in four cases. While data were collected from households in these four villages and the ones adjacent to them, only pairs of villages could be distinguished, and both villages in each pair received the same treatment. In the statistical analysis there are thus effectively 112 clusters divided into experimental arms A (49 clusters), B (48), and C (15). For simplicity, in the rest of the paper these clusters will be referred to as ‘villages’.9 It should be emphasized that, while surveyors completed a census of wells in study villages, our data do not represent a census of households. The choice to survey only owners—who were anyway a majority—was due to budgetary constraints, but an implication is that one cannot study whether the choice of the primary source of drinking water changed also among non-owners. 3 Data A team of testers who had at least completed secondary education was recruited locally. During the home visits when sale offers were made, testers also administered a short household baseline survey and recorded information on sales and, in case of purchase, the result of the test. Additionally, surveyors recorded GPS coordinates of all wells and noted down whether there were any visible labels attached to the well indicating the status with respect to arsenic. The baseline questionnaire included a household roster, basic questions on socio-economic status, detailed questions on the well, and a number of questions related to knowledge and practice in relation to drinking water and arsenic risk. Testers also recorded whether, according to the respondent, the arsenic status of the well water was safe, unsafe, or unknown. Information about a total of 12,606 wells was recorded and the household survey was completed for all but three well owners. A limitation of the data is that, in case of group sales in arm B, surveyors did not keep accurate records of who belonged to which group. In other words, while the data indicate for which wells a test was purchased, one cannot study the characteristics of buyers belonging to the same group, or to what extent well sharing was actually taking place within each group as a consequence of the test results. The endline survey was completed between August 2016 and January 2017.10 The average time 8 Unions are the third smallest administrative unit in Bangladesh, and are formed by several mouzas, which in turn are composed of two to three villages. The 128 study villages belong to nine unions. 9 The Appendix shows that the randomization led to large spatial variation in treatments, see Figure A.1. Unlike 10 the baseline survey, where the wages of surveyors and the supervisor were covered mainly from test fees, the 9 between baseline and endline surveys was 7.7 months, and 86% of households had their follow-up interview between seven and nine months after the baseline interview. During the endline survey, surveyors were instructed to return to the wells identified during the test sales and record if the household owing the well was still using it as a primary source of water for cooking and drinking. In case of a negative answer, the surveyor asked the respondent to accompany him/her to the actual source and would record the new GPS location and the presence of any visible indicators of arsenic status (for instance one of the metal placards distributed during our intervention). The surveyor would also ask the respondent the perceived safety of the new source as well as about the primary reason for switching to a different source. Switching behavior was thus self-reported, but earlier work in the neighboring Araihazar sub-district found that switching behavior recorded in a way similar to this study was actually consistent with urinary arsenic concentrations, an objective biomarker of exposure (Chen et al. 2007). Unfortunately, the records on the location of the new source of drinking water do not allow to measure precisely to what extent switching was associated to a reduction in arsenic risk. The smartphone’s GPS sensor—with a precision of 10 m at best—cannot uniquely identify a specific well among those surveyed due to the density of wells within a typical village in Bangladesh. The GPS data was still used to estimate the distance from the old to the new well used for drinking. 3.1 Summary Statistics at Baseline Table 1 shows selected summary statistics measured at baseline. Throughout this paper, unless oth- erwise noted, the analysis is restricted to the large majority of households (91%) that used their own well at baseline, as this is the sample for which baseline water source and post-intervention switching can be determined. The baseline source was not recorded for non-users, and during the endline survey they were only asked again if they used their own well for drinking and cooking.11 All summary statistics except those on the first row of Table 1 are thus calculated for households who used water from their own well for drinking and cooking. Household heads had low levels of educational attainment on average, and most households were poor, with only 17% of the houses having a concrete roof (an indicator of wealth), while the rest had tin or (in rare cases) mud roofs. The average well owner in our study area lived in a village where 75% of the wells tested by BAMWSP between 1999 and 2000 were unsafe with respect to arsenic. Despite the BAMWSP blanket testing campaign, a large majority of respondents (76%) did not know whether their well was safe or unsafe with respect to arsenic. Only 7% of them thought that their well was unsafe, while the remaining 17% reported believing that their well was safe. Whereas about a quarter of respondents indicated that they knew the status of their well, more than 99% of wells lacked any visible sign of their status with respect to arsenic such as red or green paint. Information on the safety of the minority of wells that had been tested was thus not immediately observable by other households, although in cost of the follow-up survey was paid for by the project. 11 Wells not used for drinking were on average significantly shallower, and thus more likely to be contaminated with high levels of arsenic. Of the 1,193 wells not used for drinking, only 14 (about 1%) were believed to be safe by the owner. 10 principle knowledge could have been shared with others privately. Using geographic coordinates, we estimate that before our pay-for-test campaign the average well owner had about 0.02 wells labeled as safe within 50 m, out of an average of nearly 12 wells within that distance. The immense public health challenge due to widespread arsenic contamination of well water has been widely discussed and advertised in Bangladesh, and this is reflected in the data. Almost all respondents replied ‘yes’ to the question “[h]ave you ever heard about arsenic in tube well water?” Similarly, all but a handful of respondents replied yes when asked “[a]re you aware of the health risks of drinking tube well water containing arsenic?” On average, wells were shallow (179 feet, or 55 meters) and about nine years old—consistent with many wells having been installed after the BAMWSP blanket testing. The average reported installation cost of wells in the sample was BDT7,560, or about USD100 (USD323 using the PPP exchange rate from World Bank 2015). Well depth is a key predictor of installation costs: in the data, the elasticity of cost with respect to depth is 0.72 (s.e. 0.04). The BDT45 price charged for the test in this study thus represented slightly more than one half of a one percent of the installation cost. Well-sharing was already common in the study area: while the average household had fewer than four members, the average number of individuals using water from a well for drinking was 8.8, and in more than half of the sample wells the number of users was larger than household size.12 3.2 Variable Balance across Experimental Arms Column 5 of Table 1 shows the p-value for the null hypothesis of equality of means across the three treatment arms. The null is rejected in 5 of 26 cases at the 10% level. The differences are due to chance, and because baseline data were collected while offering wells tests, balance on variables measured at baseline could not be enforced through stratification or re-randomization. Some of the differences are substantively important. While overall 19% of household heads in the study area had no schooling, this number drops to 5% in treatment C and is close to 26% in arm A. The fraction of respondents who did not know the status of their well ranged from 68% in arm A to 90% in arm C, while the fraction of wells perceived as safe ranged from 4% in C to 22% in A. Both the group-specific means and the tests of significance are very similar if the estimation is repeated using treatment as initially randomized rather than actual treatment, sometimes not identical due to errors in coding the smartphones used for surveying.13 The overlap in the distribution of covariates between arms can be gauged more systematically following the approach described in Imbens and Rubin (2015, Ch. 14). First, for each covariate in Table 1, differences between means for each pair of experimental arms normalized by a measure of variance are estimated, see Appendix A.1 for details. While the usual t-statistics used to construct the tests for balance have a denominator that shrinks to zero when sample size grows large (because the standard 12 Recall households who did not own a well were not surveyed. 13 In terms of statistical significance, the only differences are that “Wells within 50 m labeled safe” becomes significant at the 10% level (instead of 5) and the fraction of privately owned wells become significant at the 1% level, despite very similar means across arms that range from 97.8 to 99.7%. The full results are available upon request from the authors. 11 errors become smaller), this is not the case for the normalized differences, where the denominator is the simple average of two arm-specific standard deviations. Imbens and Rubin argue that these latter statistics are more relevant than the t-statistics for assessing whether simple adjustment methods such as controlling for covariates or matching estimators can adequately remove bias due to covariance imbalance. One can also calculate, for each pair of arms, a ‘multivariate difference’ estimated with a Mahalanobis distance that aggregates all the individual differences. Although Imbens and Rubin do not propose formal tests based on these statistics to gauge balance, they argue that balance is excellent in an empirical illustration where all standardized differences are smaller than 0.3 and the multivariate measure is 0.44. In contrast, simple regression adjustments are deemed to be likely inadequate to eliminate bias in cases where some standardized differences are larger than 0.50 and the multivariate measure is 1.5 or above. The normalized differences, reported in Appendix Table A.1 show that overall there is good balance between arms A and B: there is no variable for which the standardized difference is larger than 0.3, and the aggregate measure of balance is 0.604. In contrast, lack of balance is more problematic when one compares either arm A or B to arm C. In comparing A and C, consistent with the formal tests of equality, the differences are particularly large for schooling of the head and beliefs about well safety, with standardized differences larger than 0.5 in absolute value. The multivariate difference is also relatively large and equal to 1.1. The comparisons between B and C also show that four of 22 standardized differences are larger than 0.3, with a multivariate difference equal to 0.720. Because there is lack of balance in some characteristics such as beliefs or schooling that may affect behavior, results that control for observed covariates will also be analysed. The estimates are qualitatively robust to such inclusion, although the point estimates are in some cases affected, and the standardized differences described above suggest that some caution should be exercised in particular when making comparisons that involve group C. Attrition is analyzed at the bottom of Table 1. Overall, 8.8% of households could not be matched to the endline data, either because of true attrition (6 percentage points) or because errors in identifiers— which appear as duplicates in the data—did not allow the match. The null of equality among the three arms is not rejected at conventional levels for any of the attrition measures. 4 Conceptual Framework Before discussing the results, it is useful to consider the main factors likely to influence purchase choices and, conditional on test results, risk-mitigating behavior. This section does not describe a formal model but rather offers a simple conceptual framework to interpret the results in terms of expected differences between experimental arms and in terms of mechanisms. Willingness to pay for a test likely requires the existence of three conditions: first, that the test provides new information; second, the perception that there are health and/or economic costs asso- ciated with continued use of arsenic-contaminated water; third, that in case of ‘bad news’ there will 12 be mitigation strategies available (e.g., the possibility of switching to a nearby safer well). All these conditions were present in the empirical context of this study. The first condition is satisfied by the large majority of respondents (76%) who did not know whether their well water was safe or unsafe to drink (Table 1). In addition, very few wells had visible signs of safety status such as paint, so even households with strong priors about the safety of their well water may have valued the possibility to demonstrate water quality to others by displaying test results.14 That the tests could provide new information also required trust, as there is growing evidence that lack of trust in health-related information may hinder the adoption of behavior that could reduce health risks (Cohen et al. 2015, Bennett et al. 2017, Alsan and Wanamaker 2017, Martinez-Bravo and Stegmann 2019). Although the data do not include measures of trust, water tests were not a novelty because many wells had been tested in the past in Sonargaon. In addition, earlier work carried out in the neighboring Araihazar sub-district found switching rates from unsafe to safe wells after testing of between one-third and three-quarters (see Section 1 for references), consistent with a high degree of trust in the results. The second condition (relevance of the information) is also satisfied because virtually all respon- dents knew in a general sense about the presence and health risks of arsenic in tube well water. Data on risk perceptions collected in neighboring Araihazar in 2008 indicated that a majority of respondents were aware not only of the serious nature of arsenic risk, but also that the risk becomes more severe with prolonged exposure, see Tarozzi (2016, Figure 4) for details. Finally, the third condition (perceived availability of alternative sources of drinking water) was also likely satisfied given that well-sharing was already practiced in the area (see Section 3). To some extent households were therefore already aware of the possibility of using neighbours’ well water for drinking. Data from neighboring Araihazar show that the BAMWSP testing campaign conducted about 10 years earlier led to a substantial degree of well sharing (Balasubramanya et al. 2014). On the one hand, the figures in Table 1 show that about three quarter of wells had been found to be unsafe by BAMWSP in the study area, and this may have reduced the perceived chance to have safe options nearby in case of bad news. On the other hand, the average household had more than 10 other wells within a short 50 m radius and about 30 within 100 m, and such dense network of wells likely increased the perceived likelihood of having safe wells close by. Overall, differences in switching rates between any two experimental arms could have emerged either from different selection into purchase or from the way information was provided (in which case even identical buyers may have reacted differently). The decision to change source was still likely to depend primarily on any new information made available by the test results. Hence, regardless of the 14 In principle, such value needs not be positive, for instance if knowledge of a high-arsenic well is perceived as lowering land value, or signaling poor health among household members, or is more generally stigmatized. In Bihar, India, Barnwal et al. (2017) find that placards indicating unsafe arsenic levels were more likely to be removed by households than those indicating low levels of arsenic two years after installation, although such behavior may have also been justified by the desire not to be reminded constantly about the health risks. However, earlier research has shown that households very rarely refuse testing when this is offered for free, even when the results are posted on the wells (see for instance Madajewicz et al. 2007, Opar et al. 2007 and Bennear et al. 2013). 13 experimental arm, very little switching was expected to take place from untested wells (driven perhaps by ‘free riders’ who moved to wells nearby found to be safe), and even less from wells that were found to be safe. The prior was also that the difficulty in predicting arsenic contamination without a test would mean that the likelihood of having an unsafe well, even if conditional on purchase, would be uncorrelated with the mode of sale and thus similar between groups. Finally, conditional on finding out that one’s water is unsafe, the expectation was that the soft commitment and the easier access to within-group information on safe alternatives (in group B), and the salience and visibility of the tags posted on wells (in group C), would lead a larger proportion of households to stop drinking from the tested well relative to the individual sales in group A. In contrast, priors about the switching rates in B relative to C were not as clear. However, the effectiveness of C at inducing switching from unsafe wells rested at least in part on households not removing the metal plates from the pump heads, thereby maintaining the ability of the plate to make safety status visible and to discourage drinking from unsafe wells. Regardless of the sale method, switching from unsafe wells should be more likely when safer alternatives are available nearby. If households recognize that health risks increase with the arsenic concentration in the water, then switching from unsafe wells should increase with the arsenic level, a desirable pattern that has been observed elsewhere, see Madajewicz et al. (2007). The method of information delivery may also affect the choice of the new source conditional on switching. In particular, the labeling of wells in C could make safe wells easier to identify for the whole village relative to B, possibly leading to the choice of safer wells. Finally, it is possible that buyers that were not using their well for this purpose at the time of the sales started doing so if they learned that the water was safe. Predictions for differences in demand between arms were not as sharp: while factors leading to higher willingness to react to information were expected to also lead to higher willingness to pay for it, key factors such as perceived health risks and availability of alternative sources were likely to become more salient after the realization of the test result. However, consistent with the conceptual framework described above, households that reported knowing the safety of their well water were expected to be less likely to purchase a test. 5 Results This section first describes the estimated effect of the selling schemes on the demand for testing. Next, it describe the information on arsenic levels that was revealed by the tests, and finally it discusses to what extent such information changed household behavior in terms of choice of water source for cooking and drinking. In describing the results the focus is primarily on households who used the well as primary water source for cooking and drinking at baseline, given that for those who were not the baseline records do not indicate what the main source was. 14 5.1 Demand Of the 11,410 households who used their own well for cooking and drinking at baseline and who were offered an arsenic test, 2,829 (25%) bought a test under one of the selling schemes. To estimate the average treatment effect of selling schemes B and C relative to A, the following equation is estimated using a linear probability model: buysvh = β B Bv + β C Cv + γXsvh + δs + svh , (1) where buysvh is equal to one if household h in village v and stratum s bought a test at baseline and zero otherwise, Bv and Cv are village-specific indicator variables for the respective treatments, Xsvh is a set of predetermined household and tube well characteristics, and svh is an error term. To account for the stratified design, regressions also include strata fixed effects (δs ). Recall that treatment was stratified by the prevalence of unsafe wells based on BAMWSP data and by union. The coefficients of interest are β B and β C , which capture the causal impact on demand of selling schemes B and C, relative to A, respectively. All standard errors and statistical inference are robust to the presence of intra-village correlation of residuals. Figure 2 shows graphically the simple comparison of take up rates across arms without the in- clusion of controls or strata fixed effects. A first clear result is that neither the group sales nor the addition of the metal placard made any appreciable difference for demand. A second finding is that demand was overall quite low, with about one quarter of households purchasing the test in each of the three experimental arms. As in many earlier studies looking at demand for health-related preventive products, even a small fee led to low demand, despite the potentially vital information provided by the tests.15 Table 2 displays the corresponding regression results. Column 1 shows see that, as expected, the small differences in demand between arms A, B and C are not statistically significant. Column 2 shows that the results are quite robust to the inclusion of controls. Because missing values in one or more of the controls lead to the loss of about 20% of observations, in column 3 the model is estimated without controls but including only the observations with complete observations used in column 2. In this ˆB barely changes, consistent with the overall good balance between arms A and B suggested case, β by the Imbens and Rubin approach. In contrast, β ˆC doubles in magnitude from 3 to 6 percent (s.e. 0.034) and it becomes significant, although only at the 10% level. Recall that Arm C appears to be different mostly because, on average and relative to the other arms, (a) household heads had better education and (b) the fraction of wells whose safety was unknown at the time of the test sales was higher and the fraction believed to be safe was lower. In column 2 it can be seen that low schooling predicts lower demand, while the higher prevalence of wells of unknown safety is positively associated 15 The low demand also raises concerns on the sustainability of a test-for-fee selling scheme at this price. On an average work day, surveyors visited 25 well owners to offer As tests (with surveyor-specific averages ranging from 12 to 36 visits), but only sold six tests (with surveyor-specific averages ranging from 3.5 to 10.8 tests). Recall that the test price was chosen so that surveyors would earn a wage similar to that earned in the neighboring district of Araihazar for similar work by selling 15 tests a day. The low demand thus implies that the actual wage fell short of the expected one. 15 with it, conditional on other observed characteristics. Both these factors suggest that the omission of controls may have biased demand upwards in arm C, although the point estimates remain very close. The coefficient estimates for the controls in column 2 cannot be interpreted causally, but it can be noted that beliefs about the safety status of the well strongly predict demand: well owners thinking that their well is safe had little to gain from buying a test, and indeed they were 12 percentage points less likely to purchase the test (p-value< 0.01). The belief that the water was unsafe also decreased the probability of purchase, although by less than half as much (β ˆ = −0.049, p-value< 0.01). This is overall consistent with the conceptual framework, where it was highlighted that a key factor for willingness to pay for the test is that its result will provide new information. 5.2 Test Results Although the purchase rate was low, the intervention generated a large increase in the number of tested wells in Sonargaon. Before looking at the responses to the information made available by the tests, it is useful to first describe such information. The test results are summarized in Table 3, which also includes the detailed figures on switching behavior that will be described later, and so the statistics are calculated for the 10,412 households (91.3% of the total) that could be tracked in the endline survey. Of these, 2,417 purchased tests during the intervention. Overall, 19% (455/2417) of the tested wells which had been used for cooking and drinking at baseline had unsafe arsenic levels relative to the national standard of 50 ppb.16 Recall that these results are conditional on demand so that the randomization across treatments does not guarantee similar distributions across arms, even in large samples. However, the distribution of arsenic was overall similar between arms A and B, the two largest arms. Arm C had more unsafe wells (27%, versus 19 and 16% in arms A and B, respectively), although the null of equality among these three arms cannot be rejected at standard levels (p-value= 0.32). Consistent with the existence of a degree of awareness about arsenic risk, at baseline group C was by far the one with the smallest fraction of respondents thinking that their well was safe, although the fraction believing that the well was unsafe was fairly similar between groups, see Table 1. The larger share of unsafe wells in arm C may thus have been the result of lack of balance at baseline arising by chance, possibly due to the small number of clusters (15) in this treatment arm. Overall, at the time of the endline survey, of all the wells found to be unsafe, 30% had at least one well identified as safe within 25 m after the testing, 57% had at least one within 50 m, and 78% had at least one within 100 m. This confirms that, in principle, switching to a nearby safe well was a feasible 16 The prevalence of unsafe wells was much lower than the 40-90% observed at the time of the BAMWSP testing campaign, about 10 years earlier. This is consistent with a degree of learning over time about local arsenic risk and how to avoid it, in particular by digging deeper wells, more expensive but perhaps made more affordable by economic development. Indeed, the data indicate that a majority of wells were of recent construction and were deeper than the older ones. The data also show that the beliefs about the safety of their well water among the minority of respondents who reported to know it were strongly correlated with actual test results. Detailed results for these findings are available upon request from the authors. 16 strategy to mitigate arsenic risk for the large majority of households. In addition, and consistent with the similarity across arms in the prevalence of purchases and unsafe results, the different testing strategies produced very similar frequencies of safe alternatives in the vicinity of high-arsenic wells. At distances of 25, 50, and 100 meters, such frequencies ranged across arms from 27 to 33%, from 55 to 60% and from 73 to 85%, respectively, and the null of equality is never rejected at standard levels. Note also that these figures may underestimate substantially the potential role of switching to reduce arsenic risk, given that they do not take into account the likely presence of safe wells that were not tested. 5.3 Responses to test results Figure 3 shows the raw switching rates—and corresponding confidence intervals—observed in each experimental arm, without any control or strata fixed effects. Corresponding regression results are shown in Table 4. Consistent with the conceptual framework, little well switching was found among households who did not buy a test, and among those who found that the well they used for drinking was safe. The estimates in Table 3 show that barely anyone moved away from a well identified as safe (12/1,962), while less than 3% of untested wells (224/7,995) stopped being used for drinking. The two bottom bar charts in Figure 3 show that for these two groups the switching rates were similarly very small in all arms. The primary outcome of interest of this study was the response of users of unsafe wells, but because such responses are conditional on the choice to purchase a test and on the test result, the intent-to- treat (ITT) estimates are presented first, to describe unconditional switching rates. In Figure 3.1 it can be seen that while standard individual sales (A) led 3.7% of households to switch water source, the fraction was 4.3% with group sales (16% higher) and 6.4% (73% higher) when metal plates were attached to the well spout in case of purchase. The regression results are displayed in columns 1-2 of Table 4, where the estimated models are as in equation (1). When controls are included (column 2), the difference in switching rates between B and A is 0.01 (95% C.I. [−0.012, 0.03]), while the difference between C and A is 0.03 (95% C.I. [0.003, 0.065]). Consistent with Figure 3.1 both estimates suggest that group sales and especially placards increased switching rates relative to individual sales, but the point estimates are small, and the null of equality is only rejected—at the 5% level—for arm C. Overall, the unconditional switching rates were small in each arm, reducing the statistical power when making between-arm comparisons. This is in large part because about three quarter of households did not purchase a test, and a large majority of wells were found to be safe. Turning next to the switching rates among households who purchased a test and found their well water to have unsafe arsenic levels, individual sales (arm A) led to a 30% switching rate (60/200), while both group sales (arm B) and metal placards posted on the pump head (arm C) led to substantially higher responses. In B, 56% of households switched (89/160), while in C the rate was even higher, at 72% (68/95). Column 3 of Table 4 shows the corresponding regression results. Both estimated ˆB = 0.27 (95% C.I. [0.061, 0.484]) differences are large and significant at the 5% level or below, with β 17 ˆC = 0.46 (95% C.I. [0.231, 0.696]). The difference in switching rates between B and C is and β substantively important (19 percentage points) but is not estimated precisely (p-value=0.163). The estimated differences become smaller but remain large and significant when baseline controls are included (column 4). For both arms B and C the coefficients are almost identical (column 5) when the model does not include controls but it uses only the complete observations used in the regression with controls in column 4. This suggests that the impacts are not substantively biased by differences in the level of observed confounders. Contrary to expectations, unsafe higher levels of arsenic do not predict more switching. Using 100 as the omitted category, dummies for the arsenic level being equal to 200, 300 or 500/1000 ppb are actually negative and in some case very large and statistically significant. This finding is consistent with most households gauging safety primarily in a binary way, an unfortunate possibility given that in reality arsenic health risk is to first order proportional to arsenic concentration.17 Figure 4 shows indeed that overall switching rates were well approximated by a step function with a jump at 100 ppb. Note also that very high arsenic levels were not rare in the sample: although less than 30% of tested wells were unsafe, more than half of these had arsenic levels above 100 ppb. The estimates in column 6 also include as regressor a dummy for the (endogenous) presence of a safe well within 50 meters, where neighboring well is defined as safe when it was identified as such by the research team. Recall that, at baseline, very few wells could be identified as safe by visible signs such as placards or paint on the well spout. In this model some observations are lost due to implausible entries for the geo-location of some wells. Controls are also included for the total number of wells in a 50-meter radius, and the dummy for safe wells is interacted with the treatment indicators. Among owners of unsafe wells in arm A, as expected, having a safe alternative nearby increases switching. The coefficient is large (27 percentage points) and significant at the 1% level. In group B, this association is weaker, given that the interaction (= −0.19) is negative and its magnitude is about two-thirds of that observed in arm A, although it is estimated imprecisely and is thus not significant at standard levels. This is consistent with informal group agreements leading some households to share wells with other members, with less concern of geographical distance, something which may have happened if geographical proximity was a poor proxy for sorting into the same risk-sharing group. This remains, however, a conjecture, given that the data do now allow determining with certainty if the well being used at endline belonged to a group member. The interaction between distance to a safe well and the treatment C dummy is again negative (= −0.08) but smaller and not significant at standard levels. Of the 217 ‘switchers’, almost all (214) listed safety concerns as the primary reason for their decision. Although the arsenic level in the new source of drinking water for switchers cannot be determined, about one third of these (79/217) had switched to a different well which was itself perceived by the respondent as being unsafe, while 88 had switched to a well reported as being safe, and the remaining 50 households did not know the status of the well. In principle even a switch to an unsafe well, if the new well is safer, can reduce exposure to arsenic, but this finding suggests that in the study 17 In a RCT carried out in 2008 in the neighboring Araihazar sub-district, Bennear et al. (2013) showed that attempts to highlight the existence of such gradient did not increase switching, with some evidence that it actually decreased it. 18 area a degree of arsenic exposure remained even among a sizeable fraction of households who reacted to the new information by switching to a different water source for drinking and cooking. 5.3.1 Mechanisms These results confirm the conceptual framework, according to which group signing or metal placards would lead to more switching relative to privately provided information. This section provides evidence to support possible mechanisms behind the findings, although the arguments are tentative as it is not possible to separate conclusively the relative role of the increase in the information about alternatives versus the soft commitment (in arm B) or the added salience of the placards (in arm C). There are two key limitations. First, the data include respondents’ beliefs about the well used by their household, and also include the arsenic levels of all tested wells, but the latter was not necessarily known to households. As a consequence, one cannot measure how each intervention changed the whole information set for each household. Second, in arm B the data do not include group composition. It is thus possible to examine neither the nature of the specific groups, nor whether households whose water turned out to be unsafe were being allowed to drink water from wells belonging to other members of the same group. Despite these limitations, the data suggest that the added salience provided by the placards in arm C played a role in explaining the higher switching rates. In principle, owners who did not want to be reminded of their well water being unsafe, or did not want the information to be known to others, could have removed the placards, although detaching the metal wire holding them to the pump head would have required some effort. However, this behavior was rare. At the time of the return visits, the vast majority of the 348 placards installed on the well spout at the time of the test were still in place, regardless of their color. Of the 95 red placards installed on unsafe wells, 90 were still visible, while no placard was visible in two wells and a ‘black’ placard (perhaps a data entry error) was found on the remaining three. Almost all blue and green placards remained similarly in place during the study period. This suggests that the testing campaign led to a persistent increase in the salience and visibility of information in villages included in arm C. This result stands in contrast with Barnwal et al. (2017), who found that placards indicating unsafe arsenic levels in Bihar, India, were significantly more likely to be removed by households, although such actions were observed two years after installation, a much longer time interval relative to the average of eight months in this study. The data also indicate that the placards were more effective than result cards only (as in arm A) at reminding users of the arsenic status of their well water, again suggesting increased salience. At the time of the endline survey, almost 90% of buyers correctly reported whether the water was found to be unsafe, but while learning about unsafe water was similar in arms A and B, it appeared to be better in C, consistent with the role of placards as reminders. Among respondents whose well water was found to be unsafe, the fraction who correctly identified them as such was 83% in arm A (135/162), 88% in B (122/139), and a remarkable 98% in arm C (83/85). There is also evidence that the placard allowed switchers to make better choices, while group sales, despite inducing more switching relative to individual sales, may have induced households to 19 share wells within the group despite the existence of better options outside of the group. Of the 217 households who stopped drinking from their unsafe well, only 88 (41%) had switched to a well that they believed to be safe. Looking at this by arm, the fraction was 47, 27, and 53% in arms A, B and C, respectively. It is possible that switchers from unsafe wells in B started drinking water from wells safer than the original one (unfortunately it cannot be checked if this was the case), but that in C switchers were almost twice as likely as in B to change to a safe well suggests that the placards played a role in allowing better choices. This is also consistent with data on the distance to the new well. Recall that when a respondent reported a change in the main source of water for drinking since baseline, the surveyor would ask to be accompanied to the new source, whose GPS location would then be recorded. Unfortunately such GPS records were clearly incorrect for about 40% of the 217 switchers (this was evident because the new source was located too far from the household residence, usually even outside the village borders), but when one looks at the 124 observations with likely correct records, while distances from the new source were almost identical in arms A and B (on average 68 and 80 meters, respectively, with the p-value of the difference = 0.518), the distance was substantially larger in C (190 meters, with the p-value of the difference with respect to A < 0.01).18 In sum, the data suggest that placards (C) likely made households more aware of the risks asso- ciated to drinking from their own unsafe wells and allowed better choices, sometimes at the cost of longer distances traveled to fetch drinking water. In contrast, there is less that can be said about the mechanisms that made group sales (B) relatively successful, although the key factors delineated in the conceptual framework are consistent with the results. 5.3.2 Responses Among Non-Users The results discussed so far are related to the large majority of households who used their well for cooking and drinking at baseline. Perhaps not surprisingly, demand was significantly lower among ‘non-users’ (12%, vs. 25% among users), and for these households there is no record of their main source of water for drinking. Switching behavior is thus harder to analyze, also because the sample is small. However, our data allow us to determine that many of these households reacted to ‘good news’ by switching to the well they were not initially using. In the sample, 139 non-users purchased a test, and of these 126 (91%) were re-interviewed at endline. Of these, exactly half (63) found out that their well was safe (As≤ 50), and all but 3 reported that they were using the well for drinking and cooking at the time of the endline.19 In contrast, only nine of the 63 with unsafe wells reported that they were 18 This finding also suggests that the results are not driven by courtesy bias in reporting switching behavior. In principle, safety information provided publicly (as in arm C) or to a group (as in B) may have induced some respondents to over-report switching behavior if this was perceived as socially desirable. If the higher switching rates in B and C relative to A had been driven by courtesy bias, one would have expected to observe shorter distances in the two former arms relative to A (the opposite of what the data show), given that the respondent had to accompany the interviewer to the new source of drinking water, and a well nearby would have likely been an easier and faster choice if the only purpose was to back up cheap talk. 19 Of these 126 households, 11 were in arm A, 39 in arm B, and 13 in arm C. Because of the very small numbers involved, the results by arm are not analyzed in detail. 20 using the well. However, for them it is not known if the well they were using at baseline was found to have an arsenic level even higher than their own. 6 Cost Effectiveness This section evaluates the cost-effectiveness of the different sale strategies. Because the RCT did not include an arm with free provision (unlike earlier studies that only included free provision, see for instance Madajewicz et al. 2007 or Bennear et al. 2013), the merit of free provision as compared to our sales strategy is gauged by assuming a range of switching rates consistent with earlier studies.20 In addition, recall that the change in arsenic contamination for ‘switchers’ cannot be estimated reliably, given that the data only include the arsenic level of the initial source. Consistent with figures from this study, the calculations assume that tests costs USD0.30 (or BDT24 using an exchange rate of 1USD/80BDT), and that personnel is paid BDT45 per test delivered, with an additional bonus of BDT12 for group sales. An amount of BDT80 per test is added in arm C, to account for the cost of the placards. Consistent with the experimental results, a take up rate of 25% is set in arms A, B and C, while consistent with results from earlier blanket testing a 100% testing rate is used when tests are offered for free. Again using estimates from the RCT, the calculations use a 30% switching rate among users of unsafe wells in arm A, while for arms B and C they use the estimates adjusted for controls and strata fixed effects in column 4 of Table 4. That is, switching rates are assumed to be 0.49 in arm B and 0.70 in arm C. In the case of free provision, switching rates are varied from 0.3 to 0.75, consistent with findings from earlier work that evaluated switching after free provision. In the study area, the fraction of unsafe wells varied in the 16-27% range, while the earlier BAMWSP figures in these same villages varied from 40 to 90%. To cover a wide range of possibilities calculations are shown using a fraction of unsafe wells that is either low (20%), medium (40%) or high (80%). The results are summarized in Table 5, assuming that a policy maker is deciding how best to allocate a total and fixed budget of USD10,000. While free provision maximizes switching opportunities within a given locality, charging a fee allows for a larger coverage—at the cost of reducing uptake among those with low willingness or ability to pay. Given the fixed budget, the total number of tests ranges from a maximum of 33,333 with individual sales, to a minimum of 7,692 with sales of tests supplied with a metal placard, so that the placards make arm C even more expensive (per test) than free provision without placards. Under the simplifying assumption that the probability of uncovering an unsafe well does not depend on the mode of supply, these figures imply that individual sales (A) would be the strategy that maximizes the number of unsafe wells uncovered, followed by group sales (B) and free provision, while sales with placard (C) would be the worst in this respect. However, the relative performance of the strategies changes once the different switching rates are taken into account. In particular, given the high switching rates observed in B and C, and the relatively low cost of group sales (B), it is group sales that maximize 20 Jamil et al. (2019), using data from blanket free testing in Araihazar, estimates a total cost of 50 ppb), respectively. .5 % of Households Purchasing Test .4 .3 .246 .252 .241 .2 .1 0 A: 45 B: 45+group C: 45+placards RCT Treatment Arm Figure 2: Demand for Tests of Arsenic Concentration in Well Water Source: Authors’ estimations from baseline data (January to June 2016). Each bar is labeled with the arm-specific purchase rate. The vertical intervals represent 95% confidence intervals, estimated allowing for intra-village correlation of residuals. The number of observations by arm are, from left to right, n = 5, 164 (A), 4,697 (B), and 1,549 (C). 29 30 .2 1 % of households switching % of households switching .15 .8 .716 .6 .556 .1 .065 .4 .3 .05 .037 .043 .2 0 0 A: 45 B: 45+group C: 45+placards A: 45 B: 45+group C: 45+placards (1) All wells (Intent-to-Treat) (2) Test buyers, well unsafe (As above 50ppb) 1 1 % of households switching % of households switching .8 .8 .6 .6 .4 .4 .2 .2 .002 .01 .004 .031 .026 .024 0 0 A: 45 B: 45+group C: 45+placards A: 45 B: 45+group C: 45+placards (3) Test buyers, well safe (As 50ppb or below) (4) Untested wells Figure 3: Switching rates Source: Authors’ estimations from endline data (August 2016 to January 2017). Each figure shows the fraction of households who stopped using the baseline well for drinking and cooking and switched to a different water source, by experimental arm. Switching rates are shown separately for all wells (graph 1, top left), those that tested unsafe (2, top right), safe (3, bottom left) and for wells that were not tested because the test had not been purchased (4, bottom right). The vertical intervals within each bar are 95% confidence intervals robust to intra-village correlation. The number of wells nT , T ∈ {A, B, C } used in each bar are as follows: all wells: nA = 4, 679, nB = 4, 281, nC = 1, 425; unsafe wells: nA = 200, nB = 160, nC = 95; safe wells: nA = 838, nB = 869, nC = 253; untested wells: nA = 3639, nB = 3252, nC = 1104. 31 A: 45 .8 Very safe (Blue) Safe (Green) Unsafe (Red) .6 .4 .2 0 0 10 25 50 100 200 300 500/1000 B: 45+group .8 .6 .4 .2 0 0 10 25 50 100 200 300 500/1000 C: 45+placards .8 .6 .4 .2 0 0 10 25 50 100 200 300 500/1000 % Switching % at given arsenic level Figure 4: Switching rates from tested wells, by arsenic level and experimental arm Source: Authors’ estimations from baseline (January to June 2016) and endline (August 2016 to January 2017) data. The figures show, for each experimental arm and for tested wells, the prevalence of each arsenic level as identified by the test (light grey bars, in ppb, or micrograms per litre) and the fraction of households who were no longer using the well at endline (dark grey bars). The field tests identified the arsenic level as a value in the set As ∈ {0, 10, 25, 50, 100, 200, 300, 500, 1000}. The values on the horizontal line are not drawn at scale. Results of As= 1000 were rare and hence 500 and 1000 were pooled together. Wells with arsenic below the thick vertical line were the safest, while those with arsenic above the second and thin vertical line were labeled unsafe. A household was described as having switched if, at the time of the endline survey, the respondent stated that the main source of water used for drinking and cooking was no longer the well used at baseline. 32 Table 1: Baseline Summary Statistics and Balance across Treatment Arms (1) (2) (3) (4) (5) Means Tests of equality by experimental arm (p-values) H0 Obs. A B C A=B=C Drink from well at baseline 12603 0.930 0.884 0.891 0.2520 Household head is male 11410 0.843 0.850 0.860 0.9210 Household head wage worker 10890 0.247 0.298 0.378 0.4260 Household head self-employed 10890 0.440 0.438 0.285 0.3390 Household head no schooling 10890 0.262 0.163 0.054 <0.001*** Household head primary school 10890 0.317 0.343 0.319 0.7900 Heard about As in well water 11410 0.996 0.996 0.997 0.7430 Aware of health risks of As 11410 0.997 0.998 0.999 0.2950 House has concrete roof 11252 0.175 0.184 0.133 0.4060 Household members 11045 3.630 3.570 3.590 0.8270 Number of Children 11045 1.480 1.430 1.470 0.6920 Well As status unknown (belief) 10515 0.684 0.789 0.903 0.0005*** Well As status unsafe (belief) 10515 0.097 0.042 0.057 0.2590 Well As status safe (belief) 10515 0.220 0.168 0.041 <0.001*** Well labeled safe 10515 0.003 0.001 0.001 0.1020 Wells within 50m 10260 10.400 14.500 12.100 0.2530 Wells within 50m labeled safe 10260 0.028 0.003 0.007 0.0302** Share unsafe wells (BAMWSP) 11410 0.759 0.720 0.782 0.4120 Well is privately owned 11410 0.980 0.991 0.992 0.1690 Well depth (×100 feet) 11410 1.830 1.770 1.730 0.8800 Well age (years) 11410 8.860 9.490 9.000 0.0105** Well cost (×10000 BDT) 11410 0.763 0.764 0.707 0.7340 Persons drinking from well 11343 8.880 8.490 9.750 0.5710 Attrition 11410 0.094 0.089 0.063 0.3150 Lost after baseline 11410 0.058 0.055 0.045 0.5990 Duplicate I.D. at baseline 11410 0.036 0.033 0.018 0.4170 Notes: Author’s calculations from baseline data (January to June 2016). The unit of observation is the primary household attached to a specific well. The number of clusters (villages) in the five arms are 49 (arm A, n = 5, 550 wells), 48 (B, n = 5, 314) and 15 (C, n = 1, 739). Except for the first variable (“Drinks from well at baseline”) all variables are summarized for household who used the specific well for cooking and drinking at baseline. Differences in the number of observations across these variables are explained by missing entries during the data collection. The p-values in column 5 are for tests of the null of equal means across treatment arms (robust to intra-village correlation). Asterisks denote test significance: *** p<0.01, ** p<0.05, * p<0.1. Table A.1 in the appendix shows the detailed results for the normalized differences described in the text.) 33 Table 2: Demand for tests (1) (2) (3) Dependent variable: Indicator = 1 if household purchased test B: 45+group -0.004 -0.012 -0.001 (0.037) (0.018) (0.018) C: 45+placards 0.034 0.030 0.059* (0.032) (0.028) (0.034) Household head is male -0.070*** (0.018) Household head works for wage 0.005 (0.016) Household head self-employed 0.037** (0.015) Household head has no schooling -0.126*** (0.014) Household head has primary only -0.066*** (0.012) Concrete house roof 0.073*** (0.015) No. household members besides children 0.039*** (0.009) No. of children of head in household 0.052*** (0.006) No. of wells within 50m -0.003*** (0.001) No. of visibly safe wells within 50m 0.006 (0.037) Fraction unsafe wells in village (BAMWSP) 0.093 (0.086) Well depth (’00 feet) 0.033*** (0.009) Well age (years) -0.001 (0.001) Well cost (’0000 BDT) 0.027* (0.014) Believes well is safe -0.115*** (0.018) Believes well is unsafe -0.049*** (0.015) Observations 11,410 8,892 8,892 R-squared 0.130 0.111 0.040 Controls No Yes No Strata FE Yes Yes Yes Mean in A 0.246 0.246 0.246 Clusters 112 102 102 Source: Authors’ estimations from baseline data (January to June 2016). The dependent variable is binary and is = 1 if the household purchased the test at baseline. All regressions are estimated with OLS, and group A is the omitted category, as in Model (1). Strata fixed effects include union fixed effects and a dummy = 1 in villages where the % of unsafe wells in the village (estimated by BAMWSP) was below the median. Standard errors are clustered at the village level. The smaller sample size in column 3 relative to columns 1-2 is due to missing values in one or more controls, while in column 4 controls are not included but only observations with complete observations are used. Significance: *** p<0.01, ** p<0.05, * p<0.1. 34 Table 3: Number of wells by safety status and switching decision (1) (2) (3) (4) (5) (6) (7) (8) (9) Tested Not tested Total wells Safe Unsafe used for Total Unsafe Switched Switched Switched drinking (proportion) Yes No Yes No Yes No and cooking A: BDT45 4,679 1,040 0.192 2 838 60 140 113 3,526 B: BDT45+Group 4,281 1,029 0.155 9 860 89 71 85 3,167 C: BDT45+Placards 1,452 348 0.273 1 252 68 27 26 1,078 Total 10,412 2,417 0.188 12 1,950 217 238 224 7,771 Tests of equality (p-value) H0 : A=B=C 0.3174 Notes: Authors’ calculations using information from a total of 10,412 wells that were used at baseline for drinking and cooking purposes. Excluded from the analysis are 768 wells used by households that could not be re-contacted at endline, and 406 wells with a duplicate ID at baseline which can thus not be matched to endline data on switching decisions. 35 Table 4: Choice of water source at endline (1) (2) (3) (4) (5) (6) Dependent variable: Indicator = 1 if switched (no longer uses same well used at baseline) All wells Conditional on purchase and (ITT) unsafe well (As> 50ppb) B: BDT45+group 0.010 0.009 0.273** 0.186** 0.191* 0.344*** (0.012) (0.010) (0.106) (0.088) (0.103) (0.097) C: BDT45+placards 0.038** 0.034** 0.463*** 0.398*** 0.390*** 0.389*** (0.017) (0.016) (0.117) (0.106) (0.121) (0.121) Household head is male -0.020** -0.063 -0.068 (0.009) (0.078) (0.091) Household head works for wage 0.002 -0.033 0.012 (0.006) (0.056) (0.058) Household head self-employed 0.029*** 0.233*** 0.208*** (0.010) (0.070) (0.057) Household head has no schooling -0.007 0.076 0.151 (0.009) (0.097) (0.118) Household head has primary only -0.009 0.031 0.027 (0.007) (0.059) (0.062) Concrete house roof 0.005 0.028 0.079 (0.006) (0.079) (0.078) No. household members besides children -0.002 -0.052 -0.055 (0.002) (0.032) (0.045) No. of children of head in household 0.004** 0.013 0.013 (0.002) (0.028) (0.030) Well depth (’00 feet) -0.015** 0.011 -0.034 (0.006) (0.032) (0.041) Well age (years) 0.001 -0.004 -0.002 (0.000) (0.004) (0.004) Well cost (’0000 BDT) -0.016*** -0.090*** -0.078** (0.006) (0.034) (0.038) Believes well is unsafe 0.029 0.060 0.028 (0.020) (0.081) (0.076) Believes well is safe -0.022*** -0.113 -0.123 (0.007) (0.108) (0.123) As = 200ppb -0.039 0.012 (0.062) (0.066) As = 300ppb -0.151** -0.157** (0.061) (0.066) As = 500 or 1000ppb -0.148* -0.114 (0.077) (0.084) Number of wells within 50m -0.002 (0.005) There is at least one safe well within 50m 0.270*** (0.085) B × at least one safe well within 50m -0.188 (0.123) C × at least one safe well within 50m -0.076 (0.121) Observations 10,412 9,385 455 407 407 355 R-squared 0.012 0.028 0.169 0.261 0.176 0.298 Controls No Yes No Yes No Yes Strata FE Yes Yes Yes Yes Yes Yes Mean in A 0.0374 0.0374 0.300 0.300 0.300 0.300 Test of equality B = C , p-value 0.124 0.136 0.163 0.0676 0.140 0.728 Clusters 112 105 76 71 71 66 Notes: Authors’ estimations from baseline and endline data. The Intent-to-Treat results in columns 1 and 2 show switching rates not conditional on purchase or test result, including all households who used the well at baseline and who could be matched between baseline and endline surveys. All regressions in columns 3-6 include only observations for which the well was used for cooking and drinking purposes at baseline, a test was purchased, and the test indicated unsafe levels of arsenic in the water (As > 50 ppb). In column 2 (relative to column 1), and in column 4 (relative to 3) the decrease in the number of observations is due to missing values in controls, and in column 6 some additional observations are lost because the GPS location was not recorded correctly. The model in column 5 is the same as in column 3 but uses only observations with complete data used in column 4. All regressions are estimated using a linear probability model where the dependent variables is a dummy equal to one if the well was no longer used for cooking and drinking at endline. Standard error are clustered at the village level. Asterisks denote statistical significance: *** p<0.01, ** p<0.05, * p<0.1. Table 5: Cost Effectiveness of Different Sale Strategies Demand Labor Unit Total Wells Wells Total Switch Wells Cost Wells (%) cost cost cost tested found number Rate no per not per for per given to be unsafe longer switch tested test placard test budget unsafe wells used (USD) (BDT) (BDT) (BDT) in area Fraction with unsafe wells 0.20 Free provision 1 45 0 69 11,594 2,319 2,319 0.30 696 14.38 0 Free provision 1 45 0 69 11,594 2,319 2,319 0.75 1,739 5.75 0 A: Individual sales 0.25 0 0 24 33,333 6,667 26,667 0.30 2,000 5.00 20,000 B: Group sales 0.25 12 0 36 22,222 4,444 17,778 0.49 2,178 4.59 13,333 C: Individual sales+placards 0.25 0 80 104 7,692 1,538 6,154 0.70 1,077 9.29 4,615 Fraction with unsafe wells 0.40 Free provision 1 45 0 69 11,594 4,638 4,638 0.30 1,391 7.19 0 36 Free provision 1 45 0 69 11,594 4,638 4,638 0.75 3,478 2.88 0 A: Individual sales 0.25 0 0 24 33,333 13,333 53,333 0.30 4,000 2.50 40,000 B: Group sales 0.25 12 0 36 22,222 8,889 35,556 0.49 4,356 2.30 26,667 C: Individual sales+placards 0.25 0 80 104 7,692 3,077 12,308 0.70 2,154 4.64 9,231 Fraction with unsafe wells 0.80 Free provision 1 45 0 69 11,594 9,275 9,275 0.30 2,783 3.59 0 Free provision 1 45 0 69 11,594 9,275 9,275 0.75 6,957 1.44 0 A: Individual sales 0.25 0 0 24 33,333 26,667 106,667 0.30 8,000 1.25 80,000 B: Group sales 0.25 12 0 36 22,222 17,778 71,111 0.49 8,711 1.15 53,333 C: Individual sales+placards 0.25 0 80 104 7,692 6,154 24,615 0.70 4,308 2.32 18,462 The estimate show the responses to different sale strategies, assuming a total budget of USD10,000. Each test is assumed to cost USD0.30 (BDT24), while testers are assumed to be paid BDT45 per test delivered, with an additional bonus of BDT12 for group sales. BDT80 per test are added in arm C, to account for the cost of the metal placards. The take up rate is assumed to be 25% in arms A, B and C, and 100% when tests are offered for free. Switching rates among users of unsafe wells are assumed to be 30%, 49% and 70% in arms A, B, and C, respectively. In the case of free provision, switching rates are varied from 0.3 to 0.75, consistent earlier studies in neighboring areas. A Appendix not for publication A.1 Details about Calculation of Normalized Differences Here we describe the details of the calculations of the “normalized differences” among arms, following approaches described in Imbens and Rubin (2015, Ch. 14). The results are in Appendix Table A.1, which is an extended version of Table 1. First, for each covariate X in the table, and for each pair of experimental arms a, a , ∈ {A, B, C }, a = a , in columns 8-10 we show normalized differences calculated as ¯ ¯ ˆ X )a,a = Xa − Xa ∆( (2) s2 2 a + sa 2 where X¯ a and s2 are the sample mean and sample variance of the variable X in arm a, respectively. a We also report, for each pair of arms a and a , a ‘multivariate difference’ estimated with a Maha- lanobis distance calculated as −1 ˆa ˆa + Σ Σ ˆ a,a = ∆ ¯a ) ¯a −X (X ¯a −X (X ¯ a ), (3) 2 where X ˆ a are the vector of means and the covariance matrix for all variables in arm a. In ¯ a and Σ calculating these multivariate differences, we exclude—because there is barely any variation in the data—the dummy for awareness of arsenic in well water, the dummy for awareness of associated health risks, and the dummy for the well being privately owned. Although Imbens and Rubin do not propose formal tests based on these statistics to gauge balance, they argue that balance is excellent in an empirical illustration where all standardized differences are smaller than 0.3 and the multivariate measure is 0.44. In contrast, simple regression adjustments are deemed to be likely inadequate to eliminate bias in cases where some standardized differences are larger than 0.50 and the multivariate measure is 1.5 or above. 37 38 Figure A.1: Study area and treatment assignment Notes: Author’s illustrations from the geo-location of study villages recorded at baseline. Each village is placed at the mean latitude and longitude of all well-owners interviewed at baseline in the village. Table A.1: Baseline Summary Statistics and Balance across Treatment Arms (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Overall Means Tests of equality Standardized by experimental arm (p-values) differences H0 Obs. Mean St.Dev. A B C A=B=C A/B A/C B/C Drink from well at baseline 12603 0.905 0.293 0.930 0.884 0.891 0.2520 0.161 0.139 -0.022 Household head is male 11410 0.848 0.359 0.843 0.850 0.860 0.9210 -0.018 -0.047 -0.028 Household head wage worker 10890 0.285 0.452 0.247 0.298 0.378 0.4260 -0.115 -0.287 -0.171 Household head self-employed 10890 0.419 0.493 0.440 0.438 0.285 0.3390 0.004 0.327 0.323 Household head no schooling 10890 0.193 0.395 0.262 0.163 0.054 <0.001*** 0.245 0.596 0.355 Household head primary school 10890 0.328 0.470 0.317 0.343 0.319 0.7900 -0.055 -0.005 0.05 Heard about As in well water 11410 0.996 0.062 0.996 0.996 0.997 0.7430 -0.007 -0.029 -0.022 Aware of health risks of As 11410 0.998 0.044 0.997 0.998 0.999 0.2950 -0.018 -0.047 -0.031 House has concrete roof 11252 0.173 0.378 0.175 0.184 0.133 0.4060 -0.024 0.117 0.142 Household members 11045 3.600 1.300 3.630 3.570 3.590 0.8270 0.048 0.037 -0.014 Number of Children 11045 1.460 1.040 1.480 1.430 1.470 0.6920 0.054 0.016 -0.041 Well As status unknown (belief) 10515 0.758 0.428 0.684 0.789 0.903 0.0005*** -0.242 -0.562 -0.319 Well As status unsafe (belief) 10515 0.069 0.253 0.097 0.042 0.057 0.2590 0.214 0.151 -0.066 Well As status safe (belief) 10515 0.173 0.379 0.220 0.168 0.041 <0.001*** 0.13 0.553 0.427 Well labeled safe 10515 0.002 0.040 0.003 0.001 0.001 0.1020 0.049 0.05 0.002 Wells within 50m 10260 12.300 15.600 10.400 14.500 12.100 0.2530 -0.239 -0.222 0.14 Wells within 50m labeled safe 10260 0.015 0.128 0.028 0.003 0.007 0.0302** 0.19 0.157 -0.046 Share unsafe wells (BAMWSP) 11410 0.746 0.132 0.759 0.720 0.782 0.4120 0.295 -0.193 -0.46 Well is privately owned 11410 0.986 0.117 0.980 0.991 0.992 0.1690 -0.087 -0.103 -0.018 Well depth (×100 feet) 11410 1.790 1.080 1.830 1.770 1.730 0.8800 0.054 0.096 0.045 Well age (years) 11410 9.130 7.570 8.860 9.490 9.000 0.0105** -0.084 -0.019 0.062 Well cost (×10000 BDT) 11410 0.756 0.642 0.763 0.764 0.707 0.7340 -0.001 0.119 0.095 Persons drinking from well 11343 8.840 11.100 8.880 8.490 9.750 0.5710 0.036 -0.071 -0.101 Attrition 11410 0.088 0.283 0.094 0.089 0.063 0.3150 Lost after baseline 11410 0.055 0.228 0.058 0.055 0.045 0.5990 Duplicate I.D. at baseline 11410 0.032 0.177 0.036 0.033 0.018 0.4170 Multivariate standardized differences 0.604 1.103 0.720 Notes: Author’s calculations from baseline data (January to June 2016). The unit of observation is the primary household attached to a specific well. The number of clusters (villages) in the five arms are 49 (arm A, n = 5, 550 wells), 48 (B, n = 5, 314) and 15 (C, n = 1, 739). Except for the first variable (“Drinks from well at baseline”) all variables are summarized for household who used the specific well for cooking and drinking at baseline. Differences in the number of observations across these variables are explained by missing entries during the data collection. The p-values in column 7 are for tests of the null of equal means across treatment arms (robust to intra-village correlation). Asterisks denote test significance: *** p<0.01, ** p<0.05, * p<0.1. The normalized differences in columns 8-10 are calculated as in equation (2), while the 39 multivariate standardized differences in the last row are calculated as in equation (3), see Section 3.1 for details.