Policy Research Working Paper 10052 Refugees, Diversity and Conflict in Sub-Saharan Africa Luisito Bertinelli Rana Comertpay Jean-François Maystadt Social Sustainability and Inclusion Global Practice May 2022 Policy Research Working Paper 10052 Abstract Despite mixed empirical evidence, refugees have been Such an effect corresponds to a 10 percent increase at the blamed for spreading conflict in the countries that receive mean. The opposite effect is found for the fractionaliza- them. This paper hypothesizes that such a relationship tion index. Additional analyses are also conducted based largely depends on the resulting change in ethnic com- on individual data. Ethnic polarization increases the like- position of refugee-hosting areas. To test this, this paper lihood of experiencing physical assault by 2.1 percentage investigates changes in diversity in refugee-hosting areas points. Inversely, the equivalent change in the ethnic frac- across 23 countries in sub-Saharan Africa between 2005 tionalization index decreases the likelihood of experiencing and 2016. The paper then assesses the likelihood of conflict physical assault by 1.9 percentage points. Similar effects in relation to the changing level of ethnic fractionalization are found for interpersonal crime. The results should not and ethnic polarization. Ethnic fractionalization measures be interpreted as evidence that refugees per se impact the the probability that two individuals drawn at random from likelihood of violence. Indeed, there is no evidence of a a society will belong to two different ethnic groups and thus significant correlation between the number of refugees and increases with the number of ethnic groups present. Ethnic the occurrence of conflict. Instead, the analysis points to the polarization captures antagonism between individuals and risk of conflict when refugees exacerbate ethnic polarization is maximized when the society is divided into two equally in the hosting communities. In contrast, a situation where sized and distant ethnic groups. Refugee polarization is refugee flows increase the level of ethnic fractionalization found to exacerbate the risk of conflict, with a one stan- is likely to see an attenuated risk of violence. This certainly dard deviation increase in the polarization index increasing calls for specific interventions in polarized refugee-hosting the incidence of violent conflict by 5 percentage points. communities. This paper is a product of the Social Sustainability and Inclusion Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at j.maystadt@lancaster.ac.uk. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Refugees, Diversity and Conflict in Sub-Saharan Africa* cois Maystadt§ Luisito Bertinelli„ Rana Comertpay… Jean-Fran¸ Keywords: Refuges, Diversity, Conflict, Migration, Africa JEL-Classification: D74, F22, J15, O15, Q34 * This paper was commissioned by the World Bank Social Sustainability and Inclusion Global Practice as part of the activity “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts.” The activity is task managed by Audrey Sacks and Susan Wong with assistance from Stephen Winkler. This work is part of the program “Building the Evidence on Protracted Forced Displacement: A Multi-Stakeholder Partnership”. The program is funded by UK aid from the United Kingdom’s Foreign, Commonwealth and Development Office (FCDO), it is managed by the World Bank Group (WBG) and was established in partnership with the United Nations High Commissioner for Refugees (UNHCR). The scope of the program is to expand the global knowledge on forced displacement by funding quality research and disseminating results for the use of practitioners and policy makers. This work does not necessarily reflect the views of FCDO, the WBG or UNHCR. „ Department of Economics and Management, University of Luxembourg, L-1359 Luxembourg … Department of Economics and Management, University of Luxembourg, L-1359 Luxembourg § IRES/LIDAM, UCLouvain; FNRS - Fonds de la Recherche Scientifique, Belgium 1 Introduction Over the last decade, the number of refugees has more than doubled (UNHCR 2020) to over 70 million. Although forced displacement in the Middle East (e.g., from Syria) and Latin America (e.g., from Venezuela) has attracted considerable attention, the African continent hosted 6.3 million refugees at the end of 2019, a number that has almost tripled since 2010. Most are accommodated in camps, mainly in neighboring countries and in so-called protracted situations. Over the same period, there has been a boom in the economic literature investigating the impact of refugees on the hosting economies (Ruiz and Vargas-Silva, 2013; Maystadt et al., 2019; Verme and Schuettler, 2021). The general consensus is that refugees do not necessarily constitute a burden but may induce important distributional changes among the refugee-hosting population. However, the focus on labor markets may not help in fully understanding the structural changes induced by these population flows. For instance, a standard narrative has been that refugees may induce non-cooperative behaviors that lead to the emergence of conflict between (ethnic) groups (Brzoska and Frohlich, 2016; Burrows and Kinney, 2016; Mach et al., 2019, 2020). This study aims to assess the consequences of forced migration on ethnic diversity and conflict in Sub-Saharan Africa. We combine a unique dataset on refugee camps with individual data from the Afrobarometer Surveys across 23 African countries for the 2005—2016 period. We construct two standard measures of ethnic diversity: indices of ethnic fractionalization (EF) and ethnic polarization (EP). Ethnic fractionalization measures the probability that two individuals drawn from the society at random will belong to two different ethnic groups and thus increases with the number of ethnic groups present. Ethnic polarization captures antagonism between individuals and is maximized when the society is divided into two equally sized and distant ethnic groups. Although these indices have been widely used, little variation over time has been found, making causal inference difficult. The innovative aspect of our analysis is that we use data on the precise locations of refugee camps, their yearly size, and—most importantly—their annual composition in terms of countries of origin. Combined with the Ethnic Power Relations - Ethnicity of Refugees 2019 dataset, we are able to predict changes in ethnic diversity induced by refugee inflows. We then assess the relationship between refugee diversity and the likelihood of conflict. In an additional analysis, we also assess how refugee-induced changes in diversity affect the incidence of theft and violence, participation in protests, and perceptions of ethnic attachment, inter-personal trust, and institutional trust. Other studies have investigated the links between displacement and social conflict or social co- 3 hesion in contexts as diverse as Greece (Murard, 2021) and Germany (Albarosa and Elsner, 2021). Closer to our context, Betts et al. (2021), Zhou et al. (2021), Sedova et al. (2021) and Pham et al. (2021) study the impact of displacement on conflict and social cohesion in Uganda, Nigeria, and the Democratic Republic of Congo. Betts et al. (2021) find that in Uganda, the intensity of refugee inter- actions with hosts does not strongly affect their perceptions of refugees. A similar finding is reported by Zhou and Lyall (2021) in Afghanistan. Also in Uganda, Zhou et al. (2021) provide strong evidence that proximity to refugees improves access to aid and public service delivery and does not affect attitudes towards migration. In contrast, Sedova et al. (2021) find that the presence of internally displaced people (IDP) in Nigeria exacerbates local conflict, although this is due to increased inequal- ity and not eroded social cohesion. Finally, Pham et al. (2021) find that displacement is negatively correlated with perceptions of social cohesion as a whole in the Democratic Republic of Congo but that at the individual level, those who report hosting refugees tend to also have higher perceptions of social cohesion. Beyond the specificities of the Ugandan refugee policy or the situation in Eastern DRC, these papers all build on some version of contact theory, which holds that contact between groups can reduce prejudice and increase tolerance. We take a different perspective, focusing on one particular channel, i.e., changes in ethnic diversity. In line with the theoretical work of Esteban and Ray (2011), we find a positive effect of ethnic polarization on the occurrence of various types of conflict. A one standard deviation increase in the polarization index increases the incidence of violent conflict by 5 percentage points. This corresponds to a 10 percent increase at the mean. The opposite effect is found for the fractionalization index. Our results should not be interpreted as evidence that refugees per se impact the likelihood of violence. Indeed, we do not find any significant correlation between the number of refugees and the occurrence of conflict, confirming previous results from Zhou and Shaver (2021).1 Coniglio and Vurchio (2021) find that refugees do exacerbate the risk of conflict, but only in the initial two years of presence. The null effect in Zhou and Shaver (2021) or the short-term nature of the effect in Coniglio and Vurchio (2021) can be explained by compensating economic effects (Maystadt et al., 2019; Verme and Schuettler, 2021; Alix-Garcia et al., 2018) materializing, for example, through labor markets (Buscher and Vlassenroot, 2009; Maystadt and Verwimp, 2014; Ruiz and Vargas-Silva, 2016, 2015), improved consumption (Maystadt and Verwimp, 2014; Kreibaum, 2016; Taylor et al., 2016; Alloush et al., 2017; 1 Zhou et al. (2021) review earlier reports of social tensions arising between refugees and local citizens through resource competition or ethnic rivalry (Salehyan, 2006; Ruegger, 2017), but the validity of this literature has been largely questioned in Zhou and Shaver (2021). 4 Fotz and Shibuya, 2021), improved infrastructure (Maystadt and Duranton, 2019), or the provision of local public goods (Maystadt and Verwimp, 2014; Kreibaum, 2016; Zhou et al., 2021).2 Without contradicting these compensating economic effects, our results offer an alternative explanation: it is not the size of the refugee flows that matters but the way they alter the ethnic composition of the refugee-hosting areas. More specifically, the risk of conflict increases when refugees increase ethnic polarization in the hosting communities. In contrast, a situation where refugee flows raise the level of ethnic fractionalization is likely to see an attenuated risk of violence.3 Our results are robust to alternative definitions of conflict, constructions of the diversity indices, specifications, and samples. Overall, our results call for particular attention to be paid to the reception of refugees in highly polarized communities. Specific interventions aiming to reduce prejudice and strengthen cooperation between groups needs to be promoted. Additional analyses suggest that the unemployed might be particularly sensitive to the effect of changing diversity in highly polarized refugee-hosting communi- ties. The remainder of our paper is structured as follows. Section 2 provides the context of our study. Section 3 introduces our theoretical motivations. Section 4 presents our research design. First, we present our identification strategy (Section 4.1). We then describe our data to offer a clear under- standing of the sample used in our study (Section 4.2). Finally, we propose an instrumental variable approach (Section 4.3). In Section 5, we first present our main results (Section 5.1). Then we extend the analysis to non-violent conflict (Section 5.2), conduct some robustness tests (Section 5.3), present results from our instrumental variable approach (Section 5.4), and finally, we discuss the level of aggre- gation used to build our diversity indices (Section 5.5). Section 6 discusses alternative explanations, heterogeneity, and the limits of our analysis. Section 7 concludes with policy implications. 2 Context The global number of refugees increased from about 10 million in 2010 to 20.4 million at the end of 2019 (United Nations High Commissioner for Refugees, 2020).4 Although refugees have been found 2 We purposefully focus on studies of the African context, but such observations can be extended to other contexts (Maystadt et al., 2019). 3 Note that these results are not necessarily in contradiction with the finding that ethnolinguistic proximity between refugee and host populations is associated with more positive attitudes (Betts et al., 2021). If similar results were to be found for our 23 countries, our paper would indicate the need for further investigation in terms of area-based fractionalization and polarization. 4 A refugee is defined as “a person who has been forced to flee his or her country because of persecution for reasons of race, religion, nationality, political opinion or membership in a particular social group” (1951 Convention Relating 5 to travel longer distances than in the 1980s (Devictor et al., 2021), the majority of refugees are still hosted in neighboring countries, which themselves often face challenging socioeconomic conditions. United Nations High Commissioner for Refugees (2020) estimates that 73 percent of refugees reside in neighboring countries, and developing countries host about 85 percent of the world’s refugees. The number of refugees under the United Nations High Commissioner for Refugees (UNHCR) mandate and residing in Sub-Saharan Africa increased from 2.2 to 6.3 million over the same period (United Nations High Commissioner for Refugees, 2020). Other regions have seen two significant changes that took place more abruptly. The war in Syria translated into a significant increase in refugees arriving in Europe and the Middle East (the latter is encompassed in “Asia and Pacific” in Figure B.1) after 2011. A more recent rise in Latin America has been driven by a surge in Venezuelan refugees. In contrast to these recent events, Africa has seen a continuous increase of displaced people between 2005 and 2020. These population movements have been largely driven by civil wars and political instability in countries like South Sudan, the Democratic Republic of Congo, the Central African Republic, Somalia, Burundi, and Eritrea (Figure B.2). Until 2016, the majority of refugees were hosted in neighboring countries, but we confirm the general trend observed by Devictor et al. (2021) of greater geographical dispersal over time (Figure B.3). Chad, the Democratic Republic of Congo, Ethiopia, Rwanda, South Sudan, Sudan, the United Republic of Tanzania, and Uganda are among both the least developed countries and those hosting the largest numbers of refugees (Figure B.4). Finally, forced displacement in Sub-Saharan Africa is further characterized by the protracted nature of refugee situations (Verwimp and Maystadt, 2015). Figure B.6 shows that the number of protracted refugee situations is not only higher in Africa but has also been increasing sharply over the last decade.5 3 Theoretical motivation Ethnic diversity has been argued to be strongly linked to the non-cohesiveness of a society (Alesina et al., 2016; Arbatli et al., 2020), potentially leading to the most extreme outcome of organized violence to the Status of Refugees and the 1967 Protocol, Art 1.A.2.) and includes people in refugee-like situations. 5 Although it recognizes the statistical limits of such a definition, the United Nations High Commissioner for Refugees (2020, 24) “defines a protracted refugee situation as one in which 25,000 or more refugees from the same nationality have been in exile for at least five consecutive years in a given host country” (excluding Palestinian refugees under the United Nations Relief and Works Agency (UNRWA) mandate). 6 (Esteban and Ray, 1994, 1999; Collier and Hoeffler, 1998; Fearon and Laitin, 2003; Esteban et al., 2012b; Amodio and Chiovelli, 2018; Bazzi and Gudgeon, 2021). Others have investigated intermediary outcomes such as the prevalence of mistrust (Robinson, 2017), suboptimal provision of public goods (Habyarimana et al., 2007; Desmet et al., 2020), a lower quality of institutions (Alesina et al., 1999; Alesina and Zhuravskaya, 2011), and increases in socioeconomic inequality and associated grievances. In contrast to a long-standing cross-country literature on the role of ethnic diversity (Easterly and Levine, 1997; Alesina et al., 2003; Miguel and Gugerty, 2005; Habyarimana et al., 2007), we follow a more recent approach and investigate similar research questions but at the local level (Desmet et al., 2020; Gomes, 2020b; Montalvo and Reynal-Querol, 2020). Ethnic fractionalization is the probability that two individuals drawn at random from the society will belong to two different ethnic groups (Esteban and Ray, 1994; Esteban et al., 2012b) and is the most commonly used index to describe the ethnic structure of a society (Ray and Esteban, 2017). Assuming groups of equal size, the index increases with the number of groups. It captures differences in identification between groups and “reaches a maximum when everyone belongs to a different group” (Esteban et al., 2012b, 859). Given the lack of theoretical and empirical support to relate conflict and fractionalization as a measure of diversity, Esteban and Ray (1994) introduced the polarization index, which is “defined as an aggregation of all interpersonal antagonisms” (Esteban et al., 2012b, 859). This index captures the existence of deep divides between ethnic groups and goes from zero to one but reaches its maximum when the society is divided into two equally sized but highly distant ethnic groups (Esteban and Ray, 2011). These indexes are further described in Section 4.2. We argue that the presence of refugees and the resources that usually accompany such popula- tion flows are associated with a risk of conflict over these additional resources. There is indeed a large literature suggesting that refugees and their hosts compete over existing resources and services (Maystadt et al., 2019). In this context, we expect ethnic polarization to matter much more for conflict than ethnic fractionalization.6 6 We build our empirical investigation on the theoretical implications of Esteban and Ray (2011)’s game-theoretic model. Esteban and Ray (2011) presents a contest model between different groups. In the model, the winner can enjoy two sorts of prizes, a private prize and a public prize. “Private prizes are diluted by group size. Examples include access to oil or other mineral deposits (or the revenue from them), specific material benefits ... The larger the group, the smaller is the return per capita.” Ray and Esteban (2017, 282). Public prizes are not diluted by the recipient’s own group size. According to the unique Nash equilibrium of the game, conflict is more likely to occur when the population is highly polarized in presence of so-called public payoffs. On the contrary, group fractionalization is expected to matter much less in this case. The intuition behind the theoretical prediction is that in the presence of a public prize, fractionalization increases the coordination costs for collective action. As described by Esteban et al. (2012b), it is different with polarization. With polarized groups, the public payoff will accrue to a large number of people for the winning group and a large number of people internalize that accrual and therefore have more incentives to 7 Our contribution is twofold. First, drawing causal inference is challenging since most of the lit- erature only uses a time-constant measure of diversity (Montalvo and Reynal-Querol, 2005; Desmet et al., 2012). The problem is that ethnic diversity may be correlated with many unobserved charac- teristics. As far as we know, two other papers have recently exploited time-varying changes in ethnic diversity. First, Bazzi et al. (2019) investigate how changes in inter-group diversity affect national identity, social capital, public goods, and ethnic conflict in Indonesia. To that purpose, they analyze a resettlement program in Indonesia and shed light on the distinction between ethnic fractionalization and polarization. The former is associated with a greater sense of national identity (in contrast to ethnic attachment), and vice versa for the latter. Polarization is also associated with adverse effects on social capital, as reflected by lower intergroup tolerance and trust, lower community engagement, and preferences for redistribution. Amodio and Chiovelli (2018) investigate the variation in ethnic diversity induced by migration flows resulting from the repeal of apartheid segregation laws in South Africa. They show that stronger polarization among the Black population at the district level is asso- ciated with a higher number of armed confrontations. Our paper differs by specifically investigating changing ethnic diversity induced by international movements of refugees in Africa. By focusing on the changes in diversity induced by non-voting migrants, we abstract from mechanisms that rely on the median voter theorem or the seizing of power. Second, we contribute to another strand of the literature assessing the impact of forced migration on hosting societies (Ruiz and Vargas-Silva, 2013; Maystadt et al., 2019; Becker and Ferrara, 2019; Verme and Schuettler, 2021). More specifically, we investigate the impact of refugees in low-income countries. While the literature has mainly focused on labor and goods markets (Maystadt et al., 2019; Verme and Schuettler, 2021), little is known about long-term effects on the hosting population, including on trust and identity formation. Exceptions are provided by Zhou (2018, 2019). Zhou (2018) analyzes how the presence of refugees affects local citizen opposition to citizenship inclusion in Sub-Saharan Africa. Zhou (2019) also shows how the presence of refugees affects national identity formation in Tanzania, as a way for local citizens to distance themselves from a new migrant out-group. We differ from these insightful studies in that we focus on the way refugees affect those outcomes contribute to the conflict efforts. With private payoffs, group size matters less, since private prizes are diluted by group size. Conceptually, our theoretical framework is in line with an instrumentalist view on the role of ethnic diversity in conflict, in contrast with a primordial approach (Brubaker and Laitin, 1998; Ray and Esteban, 2017). The primordial view sees ethnic cleavages as deeply cultural, biological or psychological in nature (Ignatieff, 1993; Huntington, 1996). Well, we know that ethnic lines are multiple and highly malleable. Example include recent works by Gaikwad and Nellis (2017) on India. By contrast, the instrumentalist view considers that ethnic identities can be used to facilitate strategic coordination and enforcement for collective action (Brubaker and Laitin, 1998; Esteban et al., 2012b). 8 through a specific channel: the change in ethnic diversity. Another difference is that we do not assume refugees to be a homogeneous group by exploiting their likely ethnic attachment. Although different, these studies highlight the need to control for the direct effect of the presence of refugees in our research design. Finally, Zhou and Shaver (2021) have revisited a long-standing claim that refugees are fueling conflict across borders (Salehyan, 2006, 2008). We qualify their results by showing that refugees may increase conflict if their composition exacerbates between-group antagonism in polarized communities. 4 Research design 4.1 Identification strategy Our aim is to assess how changes in ethnic diversity affect the incidence and intensity (i.e. the number) of conflict. Our first empirical exercise consists in regressing conflict on refugee-induced ethnic fractionalization and ethnic polarization in the following way: Conf lictjt = αj + τt + β1 REFjt−1 + β2 REPjt−1 + β3 Ref ugeesjt−1 + β4 Qjt + ϵjt , (1) where Conf lictjt represents the incidence or the intensity of conflict in location j in year t. We define 7 locations using information from the Afrobarometer Enumeration Areas. In the remainder of this paper, we refer to the Afrobarometer Enumeration Areas as clusters. The number of conflict events is transformed into the inverse hyperbolic sine to ease interpretation (Bellemare and Wichman, 2020). REFjt and REPjt refer to revised refugee ethnic fractionalization and ethnic polarization, respectively. They capture the change in ethnic diversity induced by refugee inflows at time t − 1 and are constructed by revising standard measures of diversity based on the Afrobarometer with the changes in ethnic diversity induced by the refugee flows within a certain buffer around each cluster j . This is the main divergence with the existing literature, since we revise 7 These correspond to location classes (administrative regions, e.g., states or provinces; populated places, e.g., cities or villages; structures, e.g., buildings, bridges or roads; and other topographical features, e.g., rivers, mountains, or national parks) with exact or approximate geographic information. These are designated by a precision code that allows the user to choose the desired level of geographical unit. We restrict our analysis to observations with a maximum precision code of 2, thus covering locations that are defined at any level smaller than administrative regions. We conduct robustness checks with a maximum precision code of 3 (therefore also covering administrative regions) and with no restriction on the precision code (Section 5.3). 9 the standard ethnic diversity indices to include the annual variation in refugee ethnicities.8 We then construct a measure of proximity between the clusters in the host country and refugees in surrounding camps by defining an 80-km buffer around each cluster.9 To control for unobserved heterogeneity and changes within a given cluster, we introduce cluster and year fixed effects, αj and δt . To minimize the risk of confounding the refugee-induced changes in diversity with the annual changes in refugee numbers, we also control for the presence of refugees based on the same buffer as the one used to construct the refugee-induced change in diversity. More specifically, the variable Ref ugeesjt−1 counts the number of refugees present in cluster j at year t − 1 within the predefined buffer. The variable is also transformed into an inverse hyperbolic sine to ease interpretation. Finally, Qjt controls for yearly shocks at the cluster level, such as weather shocks. In particular, we control for rain and temperature anomalies. Standard errors are clustered at the Afrobarometer cluster level. 4.2 Data and descriptive statistics Our analysis combines various sources of data: Afrobarometer, UNHCR refugee camp data, Armed Conflict Location and Event Data (ACLED), Uppsala Conflict Data (UCDP), and the Ethnic Power Relations - Ethnicity of Refugees (EPR-ER) 2019 dataset. Using Afrobarometer’s geocoded surveys, we focus on clusters as our unit of observation.10 Our sample consists of 7,547 such locations and 76,518 individuals in 23 countries in Sub-Saharan Africa. “The sampling universe normally includes all citizens age 18 and older. As a standard practice, they [we] exclude people living in institutionalized settings, such as students in dormitories, patients in hos- pitals, and persons in prisons or nursing homes.” (Afrobarometer, https : //af robarometer.org/surveys− and − methods/sampling − principles) Since the sampling frame is based on recent censuses, with the aim of representing all citizens of voting age in a given country, the Afrobarometer samples are unlikely to include refugees. Note also that “the sample design is a clustered, stratified, multi-stage, 8 We explain the construction of theses indices in Section 4.2. 9 We test the robustness of our results with a smaller (40 km) and a larger (120 km) radius in Section 5.3. This choice of buffer size assures us that between 75 percent and virtually all refugee camps fall within a cluster buffer. Other studies relying on Afrobarometer data construct buffers ranging from 25 km (e.g., Michaelopoulos and Papaioannou (2011), investigating ethnic-specific pre-colonial institutional structures) to 100 km (e.g., McGuirk and Burke (2020a), analyzing the impact of food-price shocks on conflict). 10 Afrobarometer is a pan-African research network conducting public attitude surveys on democracy, governance, the economy, and society in African countries that are repeated on a regular basis (Afrobarometer, 2020). 10 area probability sample. Specifically, we first stratify the sample according to the main sub-national unit of government (state, province, region, etc.) and by urban or rural location. Area stratifi- cation reduces the likelihood that distinctive ethnic or language groups are left out of the sample. Afrobarometer occasionally purposely oversamples certain populations that are politically significant within a country to ensure that the size of the sub-sample is large enough to be analyzed.” Afrobarometer provides geocoded data for 6 rounds, which correspond to the 1991–2016 period, with the information on an individual’s ethnicity available from round 3 (corresponding to 2005–2006). We therefore restrict our analysis to the 2005–2016 period. The selection of countries is driven by data availability. Among the 33 countries with available Afrobarometer data, we exclude Botswana, Cape Verde, Lesotho, Madagascar, Mauritius, Sao Tome and Principe, South Africa, and Swaziland, for which no data is available on refugee camps or from the EPR-ER. We also exclude Sudan since the question on individual ethnicity is not asked in this country’s survey. The countries in our sample are Benin, Burkina Faso, Burundi, Cameroon, Gabon, Ghana, Guinea, Ivory Coast, Kenya, Liberia, Malawi, Mali, Mozambique, Namibia, Niger, Nigeria, Senegal, Sierra Leone, Tanzania, Togo, Uganda, Zambia, and Zimbabwe. As described in Table B.1, we also incorporate information on the quality of our refugee data, which is determined by comparison with official UNHCR bilateral data. Below we describe how these data have been used to define our main variables of interest and present some descriptive statistics in Table B.2.11 Conflict. In Equation 1, we first relate variation in ethnic diversity with data on conflict from ACLED (Linke et al., 2010). Two main definitions are used: the incidence of conflict and the intensity of conflict. Incidence is captured by an indicator equal to one if conflict occurred in a particular year within a pre-defined buffer around cluster j . Intensity is measured by summing the number of conflict events occurring in a particular year within the same buffer area. A conflict event is defined as a single altercation wherein force is used by one or more groups for a political end (Linke et al., 2010). We further describe events (non-exclusively) as violent events, non-violent events, violence against civilians, and riots. In our main analysis, we focus on violent conflicts (Section 5.1) and report results for other outcomes as robustness tests (Section 5.3). In doing so, we follow a recent and large literature that has combined the ACLED dataset with geographically disaggregated data in Africa (Besley and Reynal-Querol, 2014; Berman and Couttenier, 2015; Michaelopoulos and 11 Panel A of Table B.2 shows descriptive statistics for the data from refugee-hosting areas specifically, whereas panel B of Table B.2 shows descriptive statistics for our data in all covered areas. 11 Papaioannou, 2016; Berman et al., 2017; Harari and Ferrara, 2018; Eberle et al., 2020; McGuirk and Burke, 2020b). As a further robustness check, we also use data on conflict incidence and intensity from the UCDP, which uses a more conservative definition of conflict. The UCDP dataset is manually curated and compiled with automated computer assistance (Sundberg and Melander, 2013). The UCDP defines an armed conflict event as “an incident where armed force was used by an organized actor against another organized actor, or against civilians, resulting in at least one direct death at a specific location and a specific date” (Pettersson et al., 2020). We extract daily event observations from the UCDP dataset if the location of the actual event is exactly known, the event location is within a radius of less than 25 km around a known point, or at least the administrative district where the event happened is known. As pointed out by Eberle et al. (2020), the UCDP events are more likely to capture violence between large-scale and more structured groups. Table B.2 shows that on average, conflict events seem to occur more in refugee-hosting areas. This is of course not a causal interpretation but a simple correlation. As can be seen from both panel A and panel B, non-violent conflicts seem to occur slightly more than violent conflicts. On average, the likelihood of violent conflict stands at about 48%, while this figure increases to 52% in refugee-hosting areas. Conflicts among more structured and large groups, as captured by the UCDP data, appear to be less frequent. UNHCR refugee data. To exploit the variation in ethnic diversity induced by the annual variation in refugees (and also to control for the direct effect of refugees on our outcomes), we use data on refugee camps provided by the UNHCR. The dataset contains detailed time-series information on the locations and sizes of 1,453 refugee camps across the world and 821 refugee camps in Sub-Saharan Africa over the 2000–2016 period. To the best of our knowledge, the UNHCR currently provides the most comprehensive information available on refugees at the subnational level, allowing us to assess the ethnic composition of camps, which is key to our research question. First, we use the country of origin of refugees recorded for each year at the camp level to approximate the ethnic composition each camp. Second, we restrict the data on refugees to those aged 18 and above in order to make it comparable to the Afrobarometer- based individual data. Third, we only use data on refugees hosted within the boundaries of the host country. Merging data on refugee camps with the Afrobarometer, we end up with information on 172 camps 12 that are at a maximum distance of 80 km from the 7,547 clusters.12 Figure 1 shows the locations of these refugee camps and clusters. Clusters are represented in green, while clusters in the vicinity of a refugee camp are represented in red. Refugee camps are designated with a red + sign. There are some important limitations associated with this data. First, the data only provides information on refugees residing in camps monitored by the UNHCR. In Figure B.7, we combine the UNHCR refugee camp data on the annual number of refugees and the UNHCR official statistics on refugees (which includes people in refugee-like situations) at the country level.13 Although the overall trends match, our constructed dataset clearly underestimates the true refugee population in Africa, which is not surprising since our camp-specific data does not contain dispersed refugees or refugees living outside of camps. While our data seem to represent quite fairly the number of refugees in camps, there is significant heterogeneity across countries. Based on the visual inspection of Figure B.8, the quality of the refugee data appears to be less reliable for the following countries in our sample: Gabon, Mali, Senegal, and Togo. We conduct robustness tests excluding these 4 countries from our sample in Section 5.3. 12 There are 189 camps at a distance of ≤ 120 km and 113 camps at a distance of ≤ 40 km from these clusters. 13 https : //www.unhcr.org/ref ugee − statistics/download/?url = R1xq 13 Figure 1: Data and Descriptive Statistics: Clusters, Refugee Camps, and Conflicts Revised refugee diversity indices. We first use Afrobarometer data to construct standard indices of diversity, namely the EF and the EP indices (Bazzi et al., 2019; Esteban and Ray, 1994). The EF index describes the probability that two randomly selected individuals from a given location belong to two different ethnic groups (Alesina et al., 2003, 2016; Gomes, 2020b). The EF index can be defined as Njt EFjt = get (1 − get ), (2) e=1 where Nj is the number of ethnic groups in cluster j at time t and get is the population share of 14 ethnic group e at time t. It can also be expressed as one minus the Herfindahl index (Alesina et al., 2016). The EP index gives more weight to intergroup differences at the expense of within group homo- geneity. It can be defined as (Esteban and Ray, 1994, 1999; Montalvo and Reynal-Querol, 2005)14 Nrt 2 EPjt = (get )(1 − get ). (3) e=1 We compute this index for each cluster at the time of each Afrobarometer survey to assess how refugee-induced changes in diversity differ from standard indices of diversity. In order to construct the revised refugee diversity indices according to ethnicity e, we first combine information about the country of origin of refugees hosted in refugee camps c in year t with the data from the EPR-ER 2019 dataset. The EPR-ER records the ethnic composition of refugee stocks originating from neighboring countries and countries in proximity to each other (maximal distance between country borders ≤ 950 km) with at least 2,000 refugees and provides the ethnic composition of refugees (Vogt and Girardin, 2015). More specifically, the EPR-ER dataset gives us the share of refugees from ethnic group e moving from country o to country d at year t. The EPR-ER data gives us the three main ethnic groups. The number of refugees belonging to camp c at year t is therefore approximated by the following formula: Refcet = Refocdt ∗ Shareodet . (4) We thus obtain the number of refugees Refcet of ethnicity e per camp c at time t. We can then sum the number of refugees by group e at year t for each cluster j within a buffer of, for example, 80 km. To restrict the count of refugees to those within the boundaries of the host country, this buffer does not cross country borders. Figure 2 shows, for each cluster of interest (i.e., ⊕) in our data, the refugee camps (i.e., red +) within a buffer of 40 km, 80 km, and 120 km in the host country. 14 As pointed out by Desmet et al. (2020, footnote 13), Esteban and Ray (1994) offer a slightly more general index, which is the formulation proposed by Reynal-Querol (2002). 15 Figure 2: Data and Descriptive Statistics: Correcting Ethnic Diversity Indices E EE E E E E E E E E E E E E E E E E E E E ´ E E E E E E E E E E E E E E E >E E E E E E E E E E E E E Legend E E country boders E E ethnic boders clusters (Afrobarom.) E > cluster of interest E E E refugee camps E conflicts (ACLED) E E E E E E E buffer (40 km) E E E E E buffer (80 km) E E E buffer (120 km) E EE countries in sample E 0 20 40 80 Kilometers In order to obtain a sample-based number of hosts comparable to a population-based number of N refugees, we multiply the number of respondents in the Afrobarometer surveys by the ratio n , where N is the total population of the surveyed country at year t and n is the sample size of the survey collected at year t. Intuitively, we need to make sure that one refugee is comparable to a host in the computation of the revised diversity indices. Based on the World Development Indicators, the country population aged 18 and above at t − 1 is considered equivalent to the most recent official national census used as the sampling frame for the Afrobarometer surveys. As individuals in the Afrobarometer are aged 18 and above, for comparison purposes we also restrict our analysis to refugees aged 18 and above.15 15 On average, 45% of refugees in the camps are aged 18 and above. 16 There is one limit to our approximation in Equation 4. The ethnic composition of refugees in each year t for a given origin–destination pair of countries obtained from the EPR-ER database is assumed to be homogeneous across camps of the same origin–destination pair of countries for the refugees at year t. This may seem to be a strong assumption; however, the risk of misallocating refugees is reduced as the annual variation in the EPR-ER is generated by just a few dominant groups for a given origin–destination pair and the geographical distribution of refugees by country of origin is 16 highly influenced by the proximity to their countries of origin. As can be seen from panel A of Table B.2, in refugee-hosting areas, on average, both EF and the EP seem to increase quite significantly when they are revised by incorporating the number of refugees in an 80-km buffer: the mean value of the standard EF index is 25.58%, while the mean value of the revised refugee EF index is 37.90%. The mean value of the standard EP index is 10.11%, while the mean value of the revised refugee EP index is 14.07%. Figure 3 also shows that there is considerable variation in both indexes within our sample when averaging these indexes at the regional level over the period of investigation. 16 It is possible that our approximation is noisy and could potentially induce non-random measurement errors. In Section 4.3, we propose an instrumental variable approach and estimate Shareodet from the EPR-ER data using a gravity model. Our findings concerning the number of ethnic groups across time for a given origin–destination pair are in line with the EPR-ER data. It seems that refugees of a given origin–destination pair mainly belong to two major ethnic groups. This also means that the variation in diversity in refugee-hosting areas is coming from the refugee composition at the camp level. Figure B.5 shows the movements of refugees from origin to destination countries under scrutiny. Somalia, the Democratic Republic of Congo, Liberia, South Sudan, and Sudan are major source countries for refugees, while Kenya, Tanzania, Uganda, Zambia, and Ghana appear to be countries hosting most refugees. Representing refugees in camps per ethnic group for the top 5 asylum countries over the sample period, Figure B.9 shows that there is considerable variation in ethnic composition across camps. 17 Figure 3: Ethnic Fractionalization and Ethnic Polarization Ethnicity. A major task for the construction of our dataset is the combining of data on ethnicity from various sources. Indeed, linking ethnic groups is challenging as ethnic identities are socially constructed and there are different definitions, categorizations, and even conceptual approaches when it comes to identifying ethnicities in various databases or scientific disciplines. This makes the task of treating, combining, and analyzing ethnicities extremely daunting as it requires substantial back- ground knowledge on hundreds of ethnicities and a manual treatment would inevitably lead to incon- sistencies, errors of manipulation, and/or subjective choices. Fortunately, we can rely on the Linking uller-Crepon et al. Ethnic Data from Africa (LEDA) open-source software package constructed by M¨ (2020), which contains a full pipeline to link ethnic datasets from Africa in a consistent and replicable way. We obtain ethnicities of refugees from the EPR-ER dataset, while the ethnicities of individuals in the hosting areas stem from the Afrobarometer. These ethnicities are not systematically reported 18 at the same level with a similar categorization process; instead, the information can relate to an individual’s linguistic ethnicity, dialect, or an ethnic group encompassing several languages. In our main analysis, we use LEDA’s binary linking at the “dialect” level, based on the minimum linguistic distance to link these ethnic groups.17 This involves computing a value corresponding to the shortest path (see Equation A.1) between ethnic groups using a language tree. In our case, “dialect” is the uller-Crepon et al. (2020) for a Ghanaian level defined to match the two groups (see Figure A.1 from M¨ case).18 We further describe the use of the LEDA software package in Section Appendix A.1. 4.3 An instrumental variable approach In Section 4.1, we acknowledged that non-random measurement errors might be a concern. Another major identification challenge is the risk that our revised measures of diversity are biased due to the selection of hosting areas by refugees. We should first acknowledge that the ability of refugees to select their places of residence is much more limited than economic migrants. However, we can- not exclude the possibility that refugees would sort non-randomly into areas with particular ethnic characteristics.19 In order to address this potential endogeneity, we implement an instrumental variable (IV) ap- proach. We are particularly concerned about certain ethnic groups from certain countries of origin moving to destination countries with similar ethnic characteristics. Such endogenous selection would be reflected in the EPR-ER data. To deal with the plausibly endogenous nature of the resulting refugee EF and EP indices, we implement a gravity model to predict the number of refugees of a certain ethnic group e moving from country o to d at time t, based on EPR-ER data. The predicted (and plausibly exogenous) number of refugees by ethnic group e is then used to create other (plausibly exogenous) diversity indices to be used as instrumental variables. More specifically, we estimate the following gravity model: 17 We also use this method to link data from EPR-ER on the ethnicities of refugees with data from the Murdock Atlas on their historical homeland (Section 4.3). 18 As a robustness check (Section 5.3), we use an alternative linkage based on the relations between sets of language nodes associated with two groups. 19 Another source of selection may come from the fact ethnic groups are more likely to be displaced when they share territory with regime supporters in their countries of origin (Lacina et al., 2017). Since similar ethnic groups are likely to share common borders (Michaelopoulos and Papaioannou, 2016), it is not impossible to think conflict might spill over through this channel. 19 REFodet = αod + γe + τt + β1 Conf lictot−1 + β2 Conf lictet−1 + β3 Distanceed + ϵodet , (5) where REFodet is the stock of refugees of ethnic group e from country o in country d at year t. As we have data on yearly refugee stocks and would like to estimate the changes in these stocks over time using a gravity model, we include origin–destination fixed effects αod so that identification is based only on changes in stock over time (Zylkin, 2019).20 We also include time τt and ethnic group fixed effects γe . Here we obtain data on the ethnicity of refugees from Murdock’s Atlas, which provides a map of ethnographic regions for Africa and the historical homelands of refugees (Murdock, 1967). To match ethnic groups across datasets, we again use LEDA21 to link data on ethnicity from Murdock’s Atlas with data on ethnicity from the EPR-ER dataset and, later, with data from Afrobarometer. We use the sum of conflict events occurring in the historic homeland of ethnic group e in the previous year t − 1, denoted as Conf lictet−1 , and we use the mean distance between the historic homeland of ethnic group e and the border of country d to predict the number of refugees of a certain ethnic group e moving from country o to d at time t.22 In order to be consistent with EPR-ER data construction, we restrict our analysis to all origin– destination country pairs that are at a maximum distance ≤ 950 km from each other. Predicted numbers of refugees are then transformed into predicted shares for the three largest groups to follow the logic used by the EPR-ER dataset. We then plug in these predicted shares in the following way: P redictedRefcet = Refocdt ∗ Shareodet . (6) The predicted shares of refugees per camp c are then used to compute (as documented above) refugee diversity indices to be used as instrumental variables. The first-stage equations corresponding to the 2SLS-equivalent of Equation 1 can be expressed as 20 We conduct a robustness check on Equation 5, replacing the dyadic origin–destination fixed effects with separate origin and destination fixed effects (Section 5.4). 21 More information on LEDA can be found in Appendix A.1. 22 The construction of the IV follows a long tradition in using the gravity model to predict bilateral migration flows (Ravenstein, 1985, 1989; Crozet, 2004; Mayda, 2010; Garcia et al., 2015; Beine et al., 2016). In our analysis, a major difference is that we have an additional dimension: the ethnic group e. 20 REFjt = αj + τt + δ1 P redictedEFjt + δ2 P redictedEPjt + δ3 Ref ugesjt + δ5 Qjt + ϵ1,jt (7) and REPjt = αj + τt + δ1 P redictedEFjt + δ2 P redictedEPjt + δ3 Ref ugesjt + δ5 Qjt + ϵ2,jt . (8) 5 Results In this section, we discuss the results of our benchmark analysis (Section 5.1), of a number of robust- ness tests with alternative outcome variables and alternative specifications (Section 5.3), and of our IV approach, in which we obtain diversity indices using a gravity model (Section 5.4). 5.1 Main Results In Table 1, we present the results of a linear probability model with the incidence of violent conflict as the dependent variable. As suggested by the theoretical framework proposed by Esteban and Ray (2011), we introduce both indices, i.e., EF and EP, in the same specification. Columns (1) and (2) show the standard diversity indices and column (3) shows our revised refugee diversity indices. In columns (2), (4), and (6), the presence of refugees within a distance of 80 km is introduced. Indeed, despite the recent literature rejecting the conflictive impact of refugees in hosting areas (Zhou and Shaver, 2021), the magnitude of our coefficients might be explained by the confounding presence of refugees. Columns (5) and (6) further introduce climatic controls. Column (6) corresponds to Equation 1 and refers to our benchmark specification. Columns (1) and (2) show that without incorporating the changes in ethnic diversity induced by refugees we would not be able to identify a relationship between diversity and violent conflicts. In column (3), the revised refugee fractionalization index has a negative and significant coefficient, while the revised refugee polarization index has a positive and significant effect on the incidence of violent conflicts. In columns (2), (4), and (6), our coefficients of interest are of the same order of magnitude when the number of refugees is controlled for. Our results are not altered by incorporating rainfall and temperature anomalies (columns (5) and (6)), but the estimates become slightly more precise. 21 According to our benchmark specification in column (6), a one standard deviation rise in the revised refugee EF index decreases the incidence of violent conflict by 5 percentage points. At the mean, this corresponds to a fall of about 10 percent. In contrast, a similar increase in the revised refugee EP index raises the likelihood of violent conflict by 5 percentage points. The magnitude of our results appears to be relatively large, and they are in line with the findings of Esteban et al. (2012a), Esteban et al. (2012b), and Bazzi et al. (2019). Although contrasting with Esteban et al. (2012a) and Esteban et al. (2012b), the negative effect found for the fractionalization index is again consistent with Bazzi et al. (2019). Our decrease of 5 percentage points following a 1 standard deviation change in ethnic fractionalization is close to their corresponding 3.7 percentage points. As pointed out by these authors, the negative sign is still consistent with Esteban and Ray (2011) in a situation where conflict materializes around public resources. Esteban et al. (2012a) and Esteban et al. (2012b) indeed interpret the positive coefficient found for fractionalization as evidence that private components, such as the existence of natural resources, matter. In our context, where refugees are likely to be accompanied by contestable public resources, intergroup contact with many small groups (high fractionalization) is likely to reduce the risk of conflict. We should stress that our results do not contradict other relevant papers commissioned by the World Bank (Betts et al., 2021; Zhou et al., 2021; Pham et al., 2021; Coniglio and Vurchio, 2021). According to our results, refugees per se do not affect directly the likelihood of conflict, but it is the way they alter ethnic diversity in refugee-hosting areas that matters. Finally, the magnitude of our results—a 5 percentage-point change due to a 1 standard devia- tion change in diversity, or a 10 percent change at the mean—can be compared with other major determinants of conflict in Africa, namely the role of economic shocks (often associated to climatic shocks), natural resources, and price shocks.23 First, one of the most robust findings is the link between economic shocks and conflict in Africa (Blattman and Miguel, 2010). The seminal paper by Miguel et al. (2004) shows that a 1 standard deviation increase in economic growth leads to a drop in the likelihood of conflict of more than 16 percentage points.24 Although the income effect is not the only possible interpretation (Mach et al., 2019), economic shocks in Africa have often been associated with climatic shocks (Harari and Ferrara, 2018), and the meta-analysis by Hsiang et al. 23 Since Collier and Hoeffler (1998) and Fearon and Laitin (2003), there has been a booming literature on the economics of conflict. We limit our comparison to time-varying factors. It is indeed difficult to compare with long- term (time-constant) determinants of conflict such as ethnic partitioning (Michaelopoulos and Papaioannou, 2016) or historical conflict (Besley and Reynal-Querol, 2014). 24 We should acknowledge that the validity of these results beyond 1999 has been questioned (Ciccone, 2011; Miguel and Satyanath, 2011). 22 (2013) indicate that a 1 standard deviation change in climate towards warmer temperature or more extreme rainfall increases the frequency of conflict by 14 percent at the mean. A second major deter- minant of violence in Africa is the presence of natural resources. For instance, Berman et al. (2017) found that a 1 standard deviation increase in the price of minerals translates into a 5.6 percentage points rise in conflict. Other price shocks matter too, however. For instance, a 1 standard deviation increase in producer prices reduces the incidence of conflict by 17 percent at the mean (McGuirk and Burke, 2020b). The equivalent change in consumer price exacerbates conflict by 8 percent. According to Berman and Couttenier (2015), a 1 standard deviation increase in world demand for agricultural commodities also increases conflict by 1 to 3 percentage points. Overall, although we do not claim to provide an exhaustive review of the literature, our estimated effect sizes are clearly comparable with existing studies, placing ethnic diversity on par with other major determinants of conflict in Africa. 23 Table 1: Benchmark Analysis: Diversity and Violent Conflict (1) (2) (3) (4) (5) (6) Violent Conflict, Incidence EF -0.1090 -0.1096 (0.0716) (0.0718) EP 0.2263 0.2278 (0.2103) (0.2108) Refugees (80 km, IHS) 0.0004 0.0012 0.0008 (0.0028) (0.0030) (0.0030) REF (80 km, min. ling. dist.) -0.1593* -0.1686** -0.1717** -0.1780** (0.0814) (0.0856) (0.0815) (0.0858) REP (80 km, min. ling. dist.) 0.4181* 0.4238* 0.4450** 0.4487** (0.2180) (0.2196) (0.2186) (0.2202) Rain anomalies (80 km) -0.0008** -0.0008** (0.0004) (0.0004) Temp. anomalies (80 km) -0.0834*** -0.0832*** (0.0230) (0.0230) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.661 0.661 0.661 0.662 0.662 0.662 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). Columns (1) and (2) introduce the standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between the Afrobarometer and EPR-ER datasets. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. ∗ More information on LEDA in Appendix A.1. 24 5.2 Non-violent conflict Ray and Esteban (2017, 264) define “social conflict as within-country unrest, ranging from peaceful demonstrations, processions, and strikes to violent riots and civil war. In whatever form it might take, the key feature of social conflict is that it is organized.” In Table 2, we replicate our main results using the incidence of non-violent conflict as the de- pendent variable. Similar to Table 1, without incorporating the annual variation in refugee flows we find no relationship between our diversity indices and the likelihood of non-violent conflict. It is only when we introduce the revised refugee indices that interesting results are found. Changes in ethnic polarization induced by refugee inflows increases the likelihood of non-violent conflict (whether or not we control for the presence of refugees). However, the magnitude of the effect is lower compared to the same effect in Table 1. We do not obtain a significant coefficient for the refugee-induced frac- tionalization index. Thus, our results indicate that violence is not an automatic outcome in polarized refugee-hosting communities. However, they support the call to pay more attention to changes in ethnic diversity in refugee-hosting areas and identify specific interventions in polarized communities. 25 Table 2: Diversity and Non-Violent Conflict, Incidence (1) (2) (3) (4) (5) (6)a Non-Violent Conflict, Incidence EF 0.0493 0.0443 (0.0672) (0.0675) EP 0.1136 0.1273 (0.2040) (0.2041) Refugees (80 km, IHS) 0.0033 0.0019 0.0018 (0.0025) (0.0027) (0.0027) REF (80 km, min. ling. dist.) -0.0395 -0.0549 -0.0395 -0.0536 (0.0760) (0.0802) (0.0754) (0.0797) REP (80 km, min. ling. dist.) 0.3656* 0.3750* 0.3568* 0.3654* (0.2114) (0.2136) (0.2106) (0.2127) Rain anomalies (80 km) -0.0018*** -0.0018*** (0.0003) (0.0003) Temp. anomalies (80 km) 0.0375* 0.0380* (0.0198) (0.0197) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.704 0.704 0.704 0.704 0.705 0.705 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable, the in- cidence of non-violent conflict, presented in column (6). Columns (1) and (2) introduce the standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between the Afro- barometer and EPR-ER datasets. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for revised ethnic fractionalization (REF) and revised ethnic polarization (REP) in column (6) presented in line C of Table 3. 26 ∗ More information on LEDA can be found in Appendix A.1. 5.3 Robustness We now examine the sensitivity of these main results to a battery of robustness checks. First, we assess the robustness of our results to the use of alternative outcomes (Table 3). Second, we conduct robustness tests with some alternative specifications (Table 4). Each line corresponds to the same specification as in the previous Table 1, with an alternative outcome. For reasons of space, only the results for revised refugee EF and EP—our variables of interest—from column (6), corresponding to Equation 1, are presented.25 Line A presents results from our benchmark estimation in column (6) of Table 1. Being transformed into an inverse hyperbolic sine, our results can then be interpreted as quasi- elasticities. As can be seen from Table 3, the revised refugee EP significantly impacts other conflict measures, with the exception of non-violent conflict and protest intensity (lines C and J). Our diversity indices point to a lack of relationship between REF and REP and the intensity of non-violent events and protests. The magnitude of the coefficients for the intensity of civilian conflicts and violent conflicts are particularly high (lines B and F). The revised EF seems to also have a significant and negative impact on the incidence of civilian conflict and intensity (lines E and F). Interestingly, our results do not hold for violence perpetrated by larger-scale and more structured groups, as captured by the UCDP data (lines I and J). As can be seen from Table 4, our results do not depend on the choices made in the construction of our main variables of interest. Here, each line brings a modification to our benchmark specification corresponding to Equation 1 and is conducted in the same way as in Table 1. Again, only the results for revised refugee EF and EP—our variables of interest—from column (6) are presented for the sake of space.26 Line A presents the results from our benchmark estimation in column (6) of Table 1. Using a more restrictive linking between ethnic groups from the Afrobarometer and the EPR-ER data, i.e., binary linking based on the relations of sets of language nodes associated with two groups (line B), also obtains a negative coefficient (significant at the 10% level) for REF and a positive coefficient for REP (significant at the 10% level). The size of the buffer used to capture the number of refugees in the vicinity of clusters and their contribution to ethnic composition is rather arbitrary, and thus we tested other buffer sizes. Our results are robust to using both a smaller (40 km) or larger (120 km) buffer, albeit they are less 25 Notes below Table 3 include cross-references to tables with the full set of specifications corresponding to each line. 26 Notes below Table 3 include cross-references to tables with the full set of specifications corresponding to each line. 27 significant and the coefficient for REF is no longer significant with the larger buffer. As in our benchmark analysis, we have so far used a linear probability model. We also conduct robustness tests implementing a non-linear conditional logit model (line E). The essence of our results does not change using this non-linear estimation. Our results are also robust to alternative variable constructions and specifications but rely on strong identifying assumptions; we may thus need to be concerned about the existence of confounding trends. We therefore augment our main specification with country-specific time trends (line F), leading to an even slightly higher magnitude of the coefficients for REF and for REP. As discussed in Section 4.2, the quality of data on refugee camps for some countries in Sub-Saharan Africa is quite low.27 These countries are Gabon, Mali, Senegal, and Togo. When we exclude these 4 countries, both the magnitude and significance of the results improve. We perform another check by relaxing the criteria on the exactness of the geographic information provided by Afrobarometer, using a precision code ≤ 3 (line H) or completely disregarding it (line H I). Our results are overall similar when implementing such modifications to the sample. 27 See Figure B.8. 28 Table 3: Summary Table: Alternative Outcomes (1) (2) REF REP (80 km, min. ling. dist.) (80 km, min. ling. dist.) A. Benchmark results (N=14,441)a -0.1780** 0.4487** (0.0858) (0.2202) B. Violent conflict, intensity (N=14,441)b -0.3088 0.9170* (0.2086) (0.5183) C. Non-violent conflict, incidence (N=14,441)c -0.0536 0.3654* (0.0797) (0.2127) D. Non-violent conflict, intensity (N=14,441)d -0.2402 0.8250 (0.2092) (0.5588) E. Civilian conflict, incidence (N=14,441)e -0.1460* 0.4032* (0.0847) (0.2169) F. Civilian conflicts, intensity (N=14,441)f -0.3762** 1.1916** (0.1892) (0.4660) G. Protest, incidence (N=14,441)g -0.0853 0.4754** (0.0809) (0.2160) H. Protest, intensity (N=14,441)h -0.1891 0.8323 (0.2062) (0.5536) I. Conflict (UCDP), incidence (N=14,441)i 0.0642 -0.2532* (0.0638) (0.1514) J. Conflict (UCDP), intensity (N=14,441)j -0.0143 -0.1200 (0.1624) (0.3972) Notes: Estimated equation: Equation (1) using OLS with alternative dependent variables. Level of analysis, period, LEDA function: similar to Table 1. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a b c d e See column (6) in the following tables for: in Table 1; in Table B.3; in Table 2; in Table B.4; f g h in Table B.5; in Table B.6; in Table B.7; in Table B.8; i in Table B.9; j in Table B.10. ∗ More information on LEDA can be found in Appendix A.1. 29 Table 4: Summary Table: Alternative Specifications (1) (2) REF REP A. Benchmark results (N=14,441)a -0.1780** 0.4487** (0.0858) (0.2202) B. Alternative ethnicity linking (N=14,441)b -0.1597** 0.3202* (0.0690) (0.1939) C. Buffer at 40 km (N=14,441)c -0.1188* 0.3145* (0.0711) (0.1827) D. Buffer at 120 km (N=14,441)d -0.1090 0.3311* (0.0755) (0.1999) E. Non-linear model (N=5,761)e -0.2259** 0.5649** (0.1030) (0.2486) F. Incl. country–time trends (N=14,441)f -0.1887** 0.4343** (0.0855) (0.2194) G. Excl. countries with low-quality camp data (N=12,397)g -0.2316*** 0.6038*** (0.0891) (0.2323) H. Precision code ≤ 3 (N=22,415)h -0.1469** 0.2874* (0.0655) (0.1635) I. Incl. all geocoded locations (N=23,256)i -0.1675*** 0.3608** (0.0631) (0.1584) Notes: Estimated equation: Equation (1) using OLS with alternative specifications, except for line E, estimated using LOGIT, and Line F, which included country–time trends. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. a See column (6) for: in Table 1; b in Table B.11; c in Table B.12; d in Table B.13; e in Table B.14; f in Table B.15; g h in Table B.16; in Table B.17; i in Table B.18. 5.4 Results with instrumented diversity indices Despite the plausible nature of our identifying assumptions, we cannot exclude the possibility that refugees sort ethnically. As an additional analysis, we therefore implement the 2SLS approach, using 30 the results of a gravity equation to predict where refugees would go based only on plausibly exogenous factors. In Table 5, we first report results from our gravity model. Column (1) of Table 5 corresponds to Equation 5. Column (2) follows the same specification, with the exception that the dyadic origin– destination fixed effects are replaced by separate origin and destination fixed effects. Conflict in the origin country and distance between the origin and destination countries have an expected negative impact on the predicted number of refugees. Conflicts in the ethnic group’s historical homeland and their distance to the destination country seem not to have an impact on this prediction. We report our results in Table 6.28 Panel A presents second-stage results, while panel B and panel C present first-stage results. All columns include climatic controls. Columns (2) and (4) include the number of refugees within a distance of 80 km from the clusters. Columns (3) and (4) further include country-specific time trends. The 2SLS equivalent of Equation 1 and the first-stage results from Equation 7 and Equation 8 are presented in column (2) of panels A, B, and C, respectively. Our results are confirmed. The first-stage results in panels B and C indicate that our main vari- ables of interest are almost perfectly correlated with the plausibly exogeneous instrumental variables. Our main variables of interest can thus be considered quasi-random.29 28 Our results using an IV obtained by estimating Equation 5 but replacing the dyadic origin–destination fixed effects by separate origin and destination fixed effects (column (2) of Table 5) are reported in Table B.19. 29 The results remain robust to the use of alternative IVs, but at a less significant (10%) level for the REP index. 31 Table 5: Instrumental Variable Approach: Gravity Model (1) (2) Stock of refugees per ethnic group Conflict events at origin 0.0008*** 0.0008*** (0.0003) (0.0003) Distance, origin–destination - -0.0034*** - (0.0011) Conflict events in hist. ethnic homeland -0.0002 -0.0002 (0.0002) (0.0002) Distance, hist. ethnic homeland–destination -0.0001 -0.0014** (0.0005) (0.0007) Destination FE N Y Ethnic Group FE Y Y Origin FE N Y Origin–Destination FE Y N Year FE Y Y Observations 4,068 4,140 Pseudo R-squared 0.667 0.607 Notes: Estimated equation: Equation (5) using PPML, presented in column (1). Equation (5) with alternative fixed effects presented in column (2). No. of countries: 23. Period: 2005–2016. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between the Murdock Atlas and the EPR-ER data. Robust standard errors clustered at the origin and destination are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. 32 Table 6: Instrumental Variable Approach: Diversity and Violent Conflict, Incidence. (1) (2) (3) (4) Violent Conflict, Incidence Panel A: Second-stage REF (80 km, min. ling. dist.) -0.1983** -0.2090** -0.1968** -0.2100** (0.0888) (0.0923) (0.0885) (0.0920) REP (80 km, min. ling. dist.) 0.5595** 0.5722** 0.5546** 0.5702** (0.2572) (0.2601) (0.2563) (0.2590) R-squared 0.0018 0.0019 0.0029 0.0030 Kleibergen–Paap rk Wald F 885.8 916.2 888 918.6 Root MSE 0.290 0.290 0.290 0.290 Panel B: First-stage (REF) Predicted REF 0.9552*** 0.9616*** 0.9542*** 0.9603*** (0.0072) (0.0086) (0.0072) (0.0084) Predicted REP 0.0400** 0.0330* 0.0367** 0.0304 (0.0181) (0.0189) (0.0183) (0.0186) Panel C: First-stage (REP) Predicted REF 0.0125*** 0.0064** 0.0124*** 0.0066** (0.0018) (0.0029) (0.0018) (0.0028) Predicted REP 0.9663*** 0.9731*** 0.9661*** 0.9720*** (0.0050) (0.0061) (0.0052) (0.0060) Observations 14,441 14,441 14,441 14,441 Year FE Y Y Y Y PSU FE Y Y Y Y Country–time trends N N Y Y Refugees (80 km, IHS) N Y N Y Climatic controls Y Y Y Y Notes: Estimated equation in panel A: Equation (1) using 2SLS. Predicted EF in panel B: Equation (7). Predicted EP in panel C: Equation (8). Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. Refugee camps in an 80 km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between the Afrobarometer, Murdock Atlas, and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 33 0.10), all for two-sided hypothesis tests. FE: fixed effects. 5.5 Ethnic diversity at different levels Despite the use of sampling weights in the construction of the diversity indices, we have no guarantee that our diversity indices are representative at the local level. Although similar ethnic diversity indices have been used at the local level (Nunn and Wantchekon, 2011; Rohner et al., 2013; Robinson, 2017; Desmet et al., 2020; Gomes, 2020b,a; Hodler et al., 2020), we cannot exclude the possibility that a lack of representativeness at the local level introduces some noise into our estimates. Ideally, we would have liked to construct our local diversity indices based on census data. However, such data are not available on an annual basis and only a minority of African countries include ethnicity questions on their censuses (Robinson, 2017). Robinson (2017) highlights other benefits but also warns against the risk of using non-random samples or of the size of samples introducing significant errors. As summarized by Robinson (2017), “fortunately, Afrobarometer respondents comprise stratified random samples at all levels, making population estimates based on them unbiased: thus, the major concern with using Afrobarometer sample data to construct demographic measures is unbiased measurement error.” Based on a comparison of census-based and survey-based diversity indexes across five African countries, Robinson (2017, 224) found that a “sample-based measure tends to underestimate the overall degree of diversity compared to census data”. In theory, this should make it more difficult to observe the true relationship between ethnic diversity and some outcomes at the local level. Diversity indices are more likely to be measured with noise in highly diverse communities at the local level. We nonetheless argue that such a concern should not be overestimated, for three reasons. First, such noise cannot easily explain the contrast between the coefficients corresponding to the pre-revised and revised indices and the opposite results found for the revised refugee fractionalization and the revised polarization. This set of results can be explained by the fact that our identification comes from the annual changes in refugees flows. Second, the IV approach is likely to deal with the measurement errors if they are correlated with our main variables of interest. Our IV estimates therefore capture a local average treatment effect coming from the plausibly exogenous increase in annual refugee flows of particular ethnic groups. The similarity of the IV results to the OLS results supports this interpretation. Third, at the cost of introducing attenuation bias30 , we also aggregate the number of conflict events at the regional level. Lines B and C of Table 7 confirm the negative and positive effects found for the revised fractionalization and polarization indexes, respectively, whether or not 30 Another risk highlighted by Robinson (2017) is the fact that ethnic diversity may also capture different theoretical mechanisms at aggregated levels. 34 an instrumental variable approach is used.31 Similar signs are found for non-violent events, but these results are poorly estimated. 31 We focus on conflict intensity in Table 7. At the aggregated level, it does not make statistical sense to use conflict incidence. Given the high persistence of conflict at the aggregated level—one of the most established stylized facts in the conflict literature (Blattman and Miguel, 2010; Besley and Reynal-Querol, 2014)—the incidence of conflict will feature very little time variation in aggregated data. Table B.21 confirms the signs of the coefficients of interest but also the lack of efficiency of the estimates when using conflict incidence as the dependent variable. Furthermore, the regional aggregation implemented in Table 7 uses the GADM2 classification corresponding to the second sub-national administrative division. The name of this type of administrative unit may vary from country to country but can be seen as corresponding to the district level. It has an average size of 4,651 km2 , with a minimum of 0.636 km2 (Southernijaw in Nigeria) and a maximum of 345,145 km2 (Tombouctou in Mali). Alternative aggregation units are GADM1 and GADM3. The former corresponds to the first sub-national administrative level and is sometimes referred to as “province” or “governorate”. The average size is 27,352 km2 , with a minimum of 55 km2 (Littoral in Benin) and a maximum of 643,078 km2 (Agadez in Niger). To put the size into perspective, a maximum GADM1 size is basically similar to that of France (632,734 km2 ). We are very much concerned by attenuation effects in this latter case. We nonetheless replicate Table 7 at the GADM1 level in Table B.22. Similar signs are found but the coefficients are not precisely estimated (with the exception of those explaining non-violent conflict). For the latter (GADM3), it does not make sense to reduce the noise associated with the size of units on which the construction of the diversity indices are based. The average size is 419 km2 , with a minimum of 0.006034 km2 (Kanabugire-Mubuga in Burundi) and a maximum of 301,358 km2 (Tombouctou-Central-Alafia in Mali). In many instances, the GADM3 size is smaller than the area defined by our 80-kilometer buffer (approximately 20,000 km2 ). 35 Table 7: Summary Table: Aggregation at the GADM2 Level (1) (2) REF REP (min. ling. dist.) (min. ling. dist.) A. Benchmark results, intensity (N=14,441) -0.3088 0.9170* (0.2086) (0.5183) B. Violent conflict, intensity, OLS, GADM2 (N=1,565) -1.8531* 5.0279* (1.0836) (3.0463) C. Violent conflict, intensity, IV, GADM2 (N=1,565) -1.8515* 5.0235* (1.0829) (3.0443)S D. Non-violent conflict, intensity, OLS, GADM2 (N=1,565) -0.9374 0.4003 (1.1385) (3.3325) E. Non-violent conflict, intensity, IV, GADM2 (N=1,565) -0.9360 0.3957 (1.1377) (3.3304) Notes: Estimated equation: Equation (1) using OLS with alternative dependent variables. Level of analysis: GADM2. Period, LEDA function: similar to Table 1. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. ∗ More information on LEDA in Appendix A.1. 6 Discussion In this section, we provide some implications for policy by discussing some alternative explanations and sociodemographic aspects available from the Afrobarometer data. Alternative explanations. The theoretical framework used to guide this analysis is mainly driven by a competition over resources between (ethnic) groups. Social conflict is mainly driven by a combi- nation of intergroup differences and within-group cohesion. Such a theory implies that polarization is more likely to capture intergroup antagonism and competition between a few large groups. In this way, our results indicate that refugee inflows affect group sizes, the diversity indices, incentives to compete over resources and, in turn, conflict. Alternative explanations may question the fixed nature 36 of the groups and the distances between them. For instance, (Bazzi et al., 2019) shows that polariza- tion increases ethnic attachment. Others have highlighted the reduction in trust, either interpersonal trust or institutional trust Alesina and Ferrara (2002); Beugelsdijk and Klasing (2016). To assess the importance of alternative explanations, we first replicate our analysis using individual data on violence. In addition to participation in protests, we follow McGuirk and Burke (2020b) in using the Afrobarometer survey data on interpersonal crime and physical assault. We then assess the relationship between the revised refugee diversity indices and alternative individual outcomes such as ethnic vs. national identity, generalized trust, trust in neighbors, and institutional trust (trust in government). The questions from the Afrobarometer mentioned below are used as a proxy for these outcomes:32 1 Attack: Over the past year, how often (if ever) have you or anyone in your family: Been physically attacked? 2 Crime: Over the past year, how often (if ever) have you or anyone in your family: Feared crime in your own home? 3 National identity: Let us suppose that you had to choose between being a [Ghanaian/Kenyan/etc.] and being a [respondent’s identity group]. Which of these two groups do you feel most strongly attached to? Ethnic or national identity 4 Protest: Here is a list of actions that people sometimes take as citizens. For each of these, please tell me whether you, personally, have done any of these things during the past year. If not, would you do this if you had the chance: Attended a demonstration or protest march? 5 Theft: Over the past year, how often (if ever) have you or anyone in your family: Had something stolen from your house? 6 General trust: Generally speaking, would you say that most people can be trusted or that you must be very careful in dealing with people? 7 Neighborhood trust: How much do you trust each of the following types of people: Your neighbors? 32 These questions are available in rounds 3–6 of our analysis, with the exception of “General Trust” and “Neighbor- hood Trust”, which are available in rounds 3 and 5. 37 8 Institutional trust: How much do you trust each of the following, or haven’t you heard enough about them to say: The President/Prime Minister? To do so, we adopt a similar specification to our benchmark estimation at the individual level: Yijt = αj + τt + γ1 REFjt−1 + γ2 REPjt−1 + γ3 Ref ugesjt−1 + γ4 Xijt + γ5 Qjt + νijt , (9) where Yijt represents a number of outcomes such as the likelihood of experiencing attacks, crimes, or theft, participating in a protest, ethnic (vs. national) attachment, interpersonal, neighborhood, and institutional feelings of trust of individual i, in cluster j surveyed in year t. The other variables are similar to Equation 1, with the exception of Xijt . Xijt is a vector of individual control variables such as age, education, gender, marital status, and rural/urban status. To assess the risk of inappropriate controls, we introduce these control variables progressively. Sampling weights are used to render our estimates representative at the country level.33 As shown in Table B.23, the results regarding physical assault, and to some extent interpersonal crime, confirm our main results. We find that a one standard deviation increase in the revised ethnic polarization index raises the likelihood of experiencing physical assault by 2.1 percentage points. Such a change represents an increase of about 18.2 percent at the mean. Inversely, a similar change in the revised ethnic fractionalization index decreases the risk of physical assault by 1.9 percentage points, i.e., 16.3 percent at the mean (although this is not precisely estimated). For interpersonal crime, the equivalent change translates into a fall of 4.2 percentage points, 13 percent at the mean. However, although similar in magnitude the estimated coefficient for the revised ethnic polarization index is not statistically different from zero. Table B.23 also indicates that none of our coefficients of interest statistically impact the other individual outcomes of ethnic attachment, generalized trust, trust in neighbors, and institutional trust. Heterogeneity. Another way to identify possible entry points for policy interventions is to exploit the sociodemographic heterogeneity provided by the individual data. The conflict literature indeed indicates that the likelihood of participating in violence is negatively correlated with age, being a female, wealth, and employment. This is generally thought to be due to a higher opportunity cost among the old, females, and wealthy segments of the population (Blattman and Miguel, 2010). To 33 We provide the descriptive statistics of these variables in Table B.20. 38 shed light on particular vulnerabilities, we assess the effect of the revised ethnic fractionalization and polarization indices on samples grouped according to these individual characteristics. Our results are presented in Figure B.10. Our analysis certainly lacks power in detecting clear-cut heterogeneous effects. However, it seems that our revised refugee polarization and fractionalization indices are stronger in magnitude in terms of the likelihood of experiencing physical assault and theft if the respondent is unemployed. The same is true for the likelihood of participating in a protest. Other heterogeneous effects are much less clear. We remain cautious about the lack of power of this analysis, but the group of unemployed might be a particular target for interventions seeking to reduce prejudice and strengthen cooperation between groups. Overall, our results question previous results on the impact of refugees on conflict, as they indicate that the relationship largely depends on the way diversity changes as a result of refugee inflows. When polarization increases, the risk of conflict exacerbates. For fractionalization, it is the opposite. It is therefore important to consider that the risk of conflict increases when refugees tend to strengthen polarization between a few large groups. In that case, fostering intergroup interactions will not necessarily reduce intergroup prejudice and strengthen cooperation like it does in other contexts (Finseraas and Kotsadam, 2017; Corno et al., 2018). Given the lack of results for alternative outcomes such as trust or ethnic attachment, competition between polarized groups is the most likely driver of our results. Implementing specific interventions in refugee-hosting and polarized communities is therefore strongly recommended.34 Limits. It should be noted that many other factors may affect conflict, and we cannot definitively exclude the possibility that they are not correlated with our main variables of interest. Regarding the most established determinants of conflict, such as the role of economic shocks (often associated with climatic variation) and the presence of natural resources (Blattman and Miguel, 2010), we doubt that these would move simultaneously with the variation in diversity induced by refugee flows. Although the inflow of new resources associated with the inflow of refugees may exacerbate the public nature of the prize to be fought over, beyond its effect through the diversity indices, any increased incentive will be captured by the variable related to the size of the refugee population. We are more concerned about the role of economic inequality, migration, or price shocks. Strongly influenced by prominent 34 See Paluck (2012) for a review of possible interventions to reduce prejudice and conflict. For instance, intergroup sports have been shown to help rebuild intergroup social cohesion among displaced people in northern Iraq (Moussa, 2020). 39 writers like Karl Marx or Montesquieu, class struggle—or more generally economic inequality—has been argued to be a major driver of conflict (Gurr and Harff, 1991). Even Sen (1997, 1) suggests that “the relation between inequality and rebellion is indeed a close one, and it runs both ways. That a perceived sense of inequality is a common ingredient of rebellion in societies is clear enough.” However, this relation lacks empirical support (Russett, 1964; Muller, 1985; Midlarski, 1988; Lichbach, 1989). More recently, Collier and Hoeffler (2004) and Fearon and Laitin (2003) found that income inequality does not systematically affect the risk of conflict. Empirically, this may be explained by the fact that economic inequality does not change enough over time to explain variation in violence. Theoretically, the lack of relationship between income inequality and conflict can be explained by the fact the have-nots do not have the means to organize violence (Esteban et al., 2012a; Ray and Esteban, 2017). As summarized by Ray and Esteban (2017, 276), “the rich have the means but not the motive to express this conflict, while the poor have the motive but lack the means”. However, this does not mean economic inequality cannot interact with ethnic diversity in two ways. On the one hand, economic inequality between groups is likely to exacerbate the distance between groups, explaining why “horizontal inequalities” have been found to be strongly correlated with conflict (Stewart, 2000; Ostby, 2008; Cederman et al., 2011). On the other hand, it has been demonstrated that inequality within groups may affect the likelihood of violence. Theoretically, groups with a higher degree of within-group inequality have been shown to be more effective in conflict (Esteban and Ray, 2011; Esteban et al., 2012a). Due to the lack of socioeconomic information on refugees, we cannot test these theoretical predictions. Next, anecdotal evidence suggests that refugee camps have often attracted internal migrants from surrounding areas (Maystadt and Verwimp, 2014; Maystadt and Duranton, 2019). Although migration has received little empirical support as a driver for conflict (Fearon and Laitin, 2011; Mach et al., 2019, 2020), a link between in-migration and conflict could potentially affect our results. While the direct effect of refugees is accounted for by controlling for the number of refugees in our empirical specification, internal migrants can certainly affect the ethnic diversity in the refugee-hosting area. Such an indirect effect is likely to be captured by our pre-revised fractionalization and polarization indexes, but we cannot distinguish the role of internal migrants in the effects of the revised refugee diversity indices. Finally, refugee flows have been found to have an inflationary effect in hosting areas (Alix-Garcia and Saah, 2010; Maystadt et al., 2019). While certainly a key policy dimension, there is no reason to believe such price effects are correlated with our diversity indices. 40 In addition, the external validity of our results is constrained by data availability. We only exploit UNHCR-monitored camps, due to data availability. Our results thus may not be generalizable to other forms of forced displacement such as internally displaced people or so-called dispersed refugees in informal settlements or urban areas. 7 Conclusions Refugees have often been blamed for propagating social conflict in their hosting countries. Previous research has rejected a causal effect of hosting refugees on violence (Zhou and Shaver, 2021). We offer further insight by highlighting a particular channel through which refugees may impact the level of violence in their hosting communities, namely the resulting changes in ethnic composition. We use annual variation in the presence of refugees to approximate the resulting changes in diversity in refugee-hosting areas in 23 countries in Sub-Saharan Africa between 2005 and 2016. Our results point to the risk of conflict when refugees exacerbate ethnic polarization in the hosting communities. A one standard deviation increase in the polarization index raises the incidence of violent conflict by 5 percentage points, representing 10 percent at the mean. In contrast, a situation where refugee flows increase the level of ethnic fractionalization is likely to see a lower risk of violence, with a similar order of magnitude. Our estimated effect sizes are comparable in magnitude to other determinants of conflict such as economic, price, and climatic shocks. It is therefore important for policymakers and practitioners to consider that the risk of violence increases when refugees exacerbate polarization between a few large groups. In this case, fostering intergroup interactions will not necessarily reduce intergroup prejudice and strengthen cooperation, as it has been found to in other contexts (Finseraas and Kotsadam, 2017; Corno et al., 2018). Identifying specific interventions in polarized refugee- hosting communities is therefore strongly needed. Others such as Betts et al. (2021) have identified programs seeking to improve refugee–host so- cial cohesion, such as facilitating opportunities for refugee-host interactions, promoting intragroup attitude changes, focusing on the perceived “winners” and “losers” among the hosts, or designing social cohesion programs in urban and camp-like contexts. There is also an emerging literature seek- ing to assess the role of cash transfers, educational programs, and targeted aid in mitigating social tensions between displaced and host populations, but so far with mixed results (Aguero and Fasola, 2021; Ferguson et al., 2021; Lehmann and Masterson, 2020). What our paper shows is that these 41 efforts should be primarily targeted towards highly polarized hosting areas. A mapping exercise com- bining fractionalization and polarization indexes at the local level with information on the ethnic composition of new refuge flows could provide valuable information for policymakers and organiza- tions seeking to implement initiatives to improve refugee–host relations. Such an exercise would also require systematically collecting ethnic information in future UNHCR and World Bank surveys. The results obtained employing individual data also highlight the importance of ethnic diversity in refugee-hosting situations. For instance, we find that a one standard deviation increase in polar- ization raises the likelihood of experiencing physical assault by 2.1 percentage points. Such a change represents an increase of about 18.2 percent at the mean. Inversely, a similar change in fractional- ization decreases the risk of physical assault by 1.9 percentage points, i.e., 16.3 percent at the mean. Of particular interest is the fact that our polarization and fractionalization indices are stronger in magnitude when explaining the likelihood of experiencing physical assault if the respondent is unem- ployed. The unemployed might therefore be a particular target for interventions seeking to reduce prejudice and strengthen cooperation between groups. We know, for example, that cash-transfer pro- grams have been particularly effective (compared to skill training and micro-finance, for example) in stimulating employment and social stability in poor and fragile states (Blattman and Ralston, 2015). The increased cooperation between the UNHCR and other development and peace-building actors since the Global Compact on Refugees offers a suitable framework for supporting such interventions. References Afrobarometer (2020). Afrobarometer Data, [Algeria, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Egypt, Gabon, Gambia, Ghana, Guinea, Ivory Coast, Kenya, Lesotho, Liberia, Morocco, Madagascar, Malawi, Mali, Mauritius, Mozambique, Namibia, Niger, Nigeria, ao Tom´ S˜ ıncipe, Senegal, Sierra Leone, South Africa, Swaziland, Tanzania, Togo, Uganda, e and Pr´ Zambia and Zimbabwe, 1-6, 1991-2016, available at http://www.afrobarometer.org.. Aguero, J. and E. Fasola (2021). Distributional Policies and Social Cohesion in a High-Unemployment Setting. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Albarosa, E. and B. Elsner (2021). Forced Migration, Social Cohesion and Conflict: The 2015 Refugee 42 Inflow in Germany. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Alesina, A., R. Bakir, and W. Easterly (1999). Public goods and ethnic divisions. The Quarterly journal of Economics 114 (4), 1243–1284. Alesina, A., A. Devleeschauwer, W. Easterly, S. Kurlat, and R. Wacziarg (2003). Fractionalization. Journal of Economic Growth 8 (2), 155–194. Alesina, A. and E. L. Ferrara (2002). Who trusts others? Journal of Public Economics 85 (2), 207–234. Alesina, A., S. Michaelopoulos, and E. Papaioannou (2016). Ethnic Inequality. Journal of Political Economy 124, 428–488. Alesina, A. and E. Zhuravskaya (2011). Segregation and the Quality of Government in a Cross-Section of Countries. American Economic Review 101, 1872–1911. Alix-Garcia, J., A. Bartlett, H. Onder, and A. Sanghi (2018). Do refugee camps help or hurt hosts? The case of Kakuma. Journal of Development Economics 130, 66–83. Alix-Garcia, J. and D. Saah (2010). The Effect of Refugee Inflows on Host Communities: Evidence from Tanzania. World Bank Economic Review 24 (1), 148–170. Alloush, M., J. Taylor, A. Gupta, R. Rojas Valdes, and E. Gonzales-Estrada (2017). Economic life in refugee camps. World Development 95, 334–347. Amodio, F. and G. Chiovelli (2018). Ethnicity and violence during democratic transitions: Evidence from South Africa. Journal of the European Economic Association 16 (4), 1234–1280. Arbatli, C. E., Q. H. Ahsraf, O. Galor, and M. Klemp (2020). Diversity and conflict. Economet- rica 88 (2), 727–797. Bazzi, S., A. Gaduh, A. Rothernberg, and M. Wong (2019). Unity in Diversity? How Intergroup Contact can Foster Nation Building. American Economic Review 109 (11), 3978–4025. Bazzi, S. and M. Gudgeon (2021). The Political Boundaries of Ethnic Division. American Economic Journal: Applied Economics . 43 Becker, S. O. and A. Ferrara (2019). Consequences of forced migration: A survey of recent findings. Labour Economics 59, 1–16. Beine, M., S. Bertoli, and J. F.-H. Moraga (2016). A practitioners’ guide to gravity models of international migration. The World Economy 39 (4), 496–512. Bellemare, M. and C. Wichman (2020). Elasticities and the inverse Hyperbolic Sine Transformation. Oxford Bulletin of Economics and Statistics 82 (1), 0305–9049. Berman, N. and M. Couttenier (2015). External shocks, internal shots: The geography of civil conflicts. Review of Economics and Statistics 97 (4), 758–776. Berman, N., M. Couttenier, D. Rohner, and M. Thoenig (2017). The Mine is Mine! How minerals fuel conflicts in Africa. Amercian Economic Review 107 (6). Besley, T. and M. Reynal-Querol (2014). The legacy of historical conflict: Evidence from Africa. American Political Science Review 108 (2), 319–336. Betts, A., M. F. Stierna, N. Omata, and O. Sterck (2021). Social Cohesion and Refugee-Host In- teractions: Evidence from East Africa. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Beugelsdijk, S. and M. Klasing (2016). Diversity and trust: The role of shared values. Journal of Comparative Economics 44, 522–540. Blattman, C. and E. Miguel (2010). Civil War. Journal of Economic Literature 48 (1), 3–57. Blattman, C. and L. Ralston (2015). Generating Employment in Poor and Fragile States: Evidence from Labor Market and Entrepreneurship Programs. Mimeo . Brubaker, R. and D. Laitin (1998). Ethnic and nationalist violence. Annual Review of Sociology 24, 423–452. Brzoska, M. and C. Frohlich (2016). Climate change, migration and violent conflict: vulnerabilities, pathways and adaptation strategies. Migration and Development 5 (2), 190–210. Burrows, K. and P. Kinney (2016). Exploring the climate change, migration and conflict nexus. Environmental Research and Public Health 13 (443), 1–17. 44 Buscher, K. and K. Vlassenroot (2009). Humanitarian Presence and Urban Development: New Opportunities and Contrasts in Goma, DRC. Disasters 34 (2), 256–273. Cederman, L., N. Weidmann, and K. Gledisch (2011). Horizontal inequalities and ethno-nationalist civil war: a global comparison. American Political Science Review 105, 478–495. Ciccone, A. (2011). Economic shocks and civil conflict: A comment. American Economic Journal: Applied Economics 3 (4), 215–227. Collier, P. and A. Hoeffler (1998). On the Economic Causes of Conflict. Oxford Economic Papers 50, 563–573. Collier, P. and A. Hoeffler (2004). Greed and Grievance in Civil War. Oxford Economic Papers 56, 563–595. Coniglio, N. D. V. P. and D. Vurchio (2021). The geography of displacement, refugees’ camps and social conflicts. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Corno, L., E. L. Ferrara, and J. Burns (2018). Interaction, Stereotypes and Performance: Evidence from South Africa. Working Paper . Crozet, M. (2004). Do Migrants Follow Market Potentials? An Estimation of a New Economic Geography Model. Journal of Economic Geography 4 (4), 439–458. Desmet, K., J. Gomes, and I. Ortuno (2020). The geography of linguistic diversity and the provision of public goods. Journal of Development Economics 143, 102384. Desmet, K., I. Ortuno-Ortin, and R. Wacziarg (2012). The political economy of linguistic cleavages. Journal of Development Economics 97, 322–338. Devictor, X., Q.-T. Do, and A. Levchenko (2021). The globalization of refugee flows. Journal of Development Economics 150, 102605. Easterly, W. and R. Levine (1997). Africa’s growth tragedy: Policies and ethnic divisionsn. Quarterly Journal of Economics 112 (4), 1203–1250. 45 Eberle, U., D. Rohner, and M. Thoenig (2020). Heat and Hate: Climate security and farmer-herd conflicts in Africa . CEPR Discussion paper 15542. Esteban, J.-M., L. Mayoral, and D. Ray (2012a). Ethnicity and conflict: An empirical study. Amer- ican Economic Review 102 (4), 1310–1342. Esteban, J.-M., L. Mayoral, and D. Ray (2012b). Ethnicity and conflict: Theory and facts. Sci- ence 336, 858. Esteban, J.-M. and D. Ray (1994). On the Measurement of Polarization. Econometrica 62 (4), 819–851. Esteban, J.-M. and D. Ray (1999). Conflict and Distribution. Journal of Economic Theory 87, 379–415. Esteban, J.-M. and D. Ray (2011). Linking conflict to inequality and polarization. American Economic Review 101 (4), 1345–1374. Fearon, J. D. and D. D. Laitin (2003). Ethnicity, Insurgency, and Civil War. American Political Science Review 97 (1), 75–90. Fearon, J. D. and D. D. Laitin (2011). Sons of the Soil, Migrants, and Civil War. World Develop- ment 39 (2), 199–211. Ferguson, N., R. W. anf Laila Amine, E. Ramadi, and L. Shahin (2021). Building Stability Between Host and Refugee Communities: Evidence from a TVET Program in Jordan and Lebanon. Un- published Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Finseraas, H. and A. Kotsadam (2017). Does personal contact with ethnic minorities affect anti- immigrant sentiments? Evidence from a field experiment. European Journal of Political Re- search 56 (3), 703–722. Fotz, J. and S. Shibuya (2021). The Effects of Internally Displaced Peoples on Income and Inequality in Mali. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . 46 Gaikwad, N. and G. Nellis (2017). The Majority-Minority Divide in Attitudes toward Internal Mi- gration: Evidence from Mumbai. American Journal of Political Science 61 (2), 456–472. Garcia, A. J., D. K. Pindoliay, K. K. Lopiano, and A. J. Tatemzz (2015). Modeling internal migration flows in sub-Saharan Africa using census microdata. Migration Studies 3 (1), 89–110. Gomes, J. (2020a). Linguistic fractionalization and health information in Sub-Saharan Africa. World Bank Economic Review 34 (1), S20–S25. Gomes, J. (2020b). The health costs of ethnic distance: evidence from sub-Saharan Africa. Journal of Economic Growth 25, 195–226. Gurr, T. R. and B. Harff (1991). Ethnic Conflict in World Politics, Dilemmas in World Politics. Oxford University Press, Oxford. Habyarimana, J., M. Humphreys, D. N. Posner, and J. M. Weinstein (2007). Why does ethnic diversity undermine public goods provision? American Political Science Review 101 (4), 709–725. Harari, M. and E. L. Ferrara (2018). Conflict, climate, and cells: A disaggregated anlysis. Review of Economics and Statistics 100 (4), 594–608. Hodler, R., S. Srisuma, A. Vesperoni, and N. Zurlinden (2020). Measuring ethnic stratification and its effect on trust in Africa. Journal of Development Economics 146, 102475. Hsiang, S., M. Burke, and E. Miguel (2013). Quantifying the influence of climate on human conflict. Science 342(6151), 1235367. Huntington, S. (1996). The Clash of Civilizations and the Remaking of World Order. New York: Simon and Schuster. Ignatieff, M. (1993). Blood and Belonging. London: Noonday Press. Kreibaum, M. (2016). Their suffering, our burden? How Congolese refugees affect the Ugandan population. World Development 78, 262–87. Lacina, B., K. Albert, and E. VanMeter (2017). Shared Territory, Regim Alignment, and Forced Displacement. mimeo . 47 Lehmann, C. and D. Masterson (2020). Does Aid Reduce Anti-Refugee Violence? Evidence from Syrian Refugees in Lebanon. American Political Science Review 114 (4), 1335–1342. Lichbach, M. I. (1989). An Evaluation of “Does economic inequality breed political conflict?” Studies. World Politics 41 (4), 431–470. Linke, Raleigh, C. A., H. Hegre, and J. Karlsen (2010). Introducing ACLED-Armed Conflict Location and Event Data. Journal of Peace Research 47 (5), 651–660. Mach, K., N. Adger, H. Buhaug, M. Burke, J. Fearon, C. Field, C. Hendrix, C. Kraan, J. Maystadt, J. O’Loughlin, P. Rossler, J. Scheffran, K. Schultz, and N. von Uexkull (2020). Directions for reserach on climate and conflict. Earth’s Future 8, 1–7. Mach, K., C. Kraan, N. Adger, H. Buhaug, M. Burke, J. Fearon, C. Field, C. Hendrix, J. Maystadt, J. O’Loughlin, P. Rossler, J. Scheffran, K. Schultz, and N. von Uexkull (2019). Climate as a risk factor for armed conflict. Nature 571, 193–197. Mayda, A. M. (2010). International migration: A panel data analysis of the determinants of bilater- alflows. Journal of Population Economics 23 (4), 1249–1274. Maystadt, J.-F. and G. Duranton (2019). The Development Push of Refugees. Journal of Economic Geography 19 (2), 299–334. Maystadt, J.-F., K. Hirvonen, A. Mabiso, and J. Vandercasteelen (2019). Impacts of Hosting Forced Migrants in Poor Countries. Annual Review of Resource Economics 11, 439 – 459. Maystadt, J.-F. and P. Verwimp (2014). Winners and Losers Among a Refugee-Hosting Population. Economic Development and Cultural Change 62 (4), 769–809. McGuirk, E. and M. Burke (2020a). The economic origins of conflict in africa. Journal of Political Economy 128 (10), 3940–3997. McGuirk, E. and M. Burke (2020b). The economic origins of conflict in Africa. Journal of Political Economy 128 (10), 3940–3997. Michaelopoulos, S. and E. Papaioannou (2011). Divide and rule or the rule of the divided? Evidence from Africa. NBER Working Paper Series 17184, 1–77. 48 Michaelopoulos, S. and E. Papaioannou (2016). The long-term effects of the scamble for Africa. American Economic Review 106 (7), 1802–1848. Midlarski, M. (1988). Rulers and the ruled: patterned inequality and the onset of mass political violence. American Political Science Review 82 (2), 491–509. Miguel, E. and M. Gugerty (2005). Ethnic diversity, social sanctions, and public goods in Kenya. Journal of Public Economics 89 (11), 2325–2368. Miguel, E. and S. Satyanath (2011). Re-examining economic shocks and civil conflict. American Economic Journal: Applied Economics 3 (4), 228–232. Miguel, E., S. Satyanath, and E. Sergenti (2004). Economic Shocks and Civil Conflict : An Instru- mental Variable Approach. Journal of Political Economy 112 (4), 725. uller-Crepon, C., Y. Pengl, and N.-C. Bormann (2020). Linking ethnic data from africa (leda), M¨ accepted for publication. Journal of Peace Research . Montalvo, J. G. and M. Reynal-Querol (2005). Ethnic polarization, potential conflict and civil war. American Economic Review 95 (3), 796–816. Montalvo, J. G. and M. Reynal-Querol (2020). Ethnic diversity and growth: Revisiting the evidence. Review of Economics and Statistics . Moussa, S. (2020). Building social cohesion between Christians and Muslims through soccer in post- ISIS Iraq. Science 369, 866–870. Muller, E. N. (1985). Income Inequality, Regime Represiveness, and Political Violence. American Sociological Review 50 (1), 47–61. Murard, E. (2021). Mass refugee inflow and social cohesion in the long-run: Evidence from the Greek population resettlement. Unpublished Working paper. Commissioned as part of the “Pre- venting Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Murdock, G. P. (1967). Ethnographic atlas. Nunn, N. and L. Wantchekon (2011). The Slave Trade and the Origins of Mistrust in Africa. American Economic Review 101 (7), 3221–3252. 49 Ostby, G. (2008). Polarization, horizontal inequalities and violent civil conflict. Journal of Peace Research 45, 143–162. Paluck, E. (2012). Interventions aimed at the reduction of prejudice and conflict. The Oxford Hand- book of Intergroup Conflict . Pettersson, T., S. H¨ ¨ ogbladh, and M. Oberg (2020). Organized Violence. Journal of Peace Research. 1989-2019. 57 (4). Pham, P., T. O’Mealia, C. Wei, K. K. Bindu, A. Makoond, and P. Vinck (2021). Hosting New Neighbors: Perspective of host communities on displacement and social cohesion. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Ravenstein, E. (1985). The Laws of Migration. Journal of the Statistical Society of London 48 (2), 167–235. Ravenstein, E. G. (1989). The laws of migration – second paper. Journal of the Royal Statistical Society 52 (2), 241–305. Ray, D. and J.-M. Esteban (2017). Conflict and Development. Annual Review of Economics 9, 263–293. Reynal-Querol, M. (2002). Political Systems, Stability and Civil wars. Defense and Peace Eco- nomics 13, 465–483. Robinson, A. L. (2017). Ethnic Diversity, Segregation and Ethnocentric Trust in Africa. British Journal Political Sciences 50, 217–239. Rohner, D., M. Thoenig, and F. Zilibotti (2013). Seeds of trust: Conflict in Uganda. Journal of Economic Growth 18, 217–252. Ruegger, S. (2017). Refugees and Conflict Diffusion. Peace and Conflict , 142–153. Ruiz, I. and C. Vargas-Silva (2013). The Economics of Forced Migration. Journal of Development Studies 49 (6), 772–784. Ruiz, I. and C. Vargas-Silva (2015). The Labor Market Impacts of Forced Migration. American Economic Review: Papers & Proceedings 105 (5), 581–586. 50 Ruiz, I. and C. Vargas-Silva (2016). The labour market consequences of hosting refugees. Journal of Economic Geography 16 (3), 667–694. Russett, B. M. (1964). Inequality and Instability: The Relation of Land Tenure to Politics. World Politics 16 (3), 442–454. Salehyan, I. (2006). Refugees and the Spread of Civil War. International Organization 60 (2), 335. Salehyan, I. (2008). The Externalities of Civil Strife: Refugees as a Source of International Conflict. American Journal of Political Science 52 (4), 787–801. Sedova, B., L. Ludolph, and M. Talevi (2021). Inequality and security in the aftermath of internal population displacement shocks: evidence from Nigeria. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Sen, A. (1997). On Economic Inequality. Oxford: Clarendon Press. Stewart, F. (2000). Crisis Prevention: Tackling Horizontal Inequalities. Oxford Development Stud- ies 28 (3), 245–262. Sundberg, R. and E. Melander (2013). Introducing the UCDP Georeferenced Event Dataset. Journal of Peace Research. 1989-2019. 50 (4). Taylor, J., M. Filipski, and M. Alloush (2016). Economic impact of refugees. Proceedings of the National Academy of Sciences 113 (27), 7449–53. United Nations High Commissioner for Refugees (2020). Global Trends: Forced Displacement in 2019. Geneva. Verme, P. and K. Schuettler (2021). The impact of forced displacement on host communities a review of the empirical literature in economics. Journal of Development Economics 102606. Verwimp, P. and J. Maystadt (2015, December). Forced Displacement and Refugees in Sub-Saharan Africa.An Economic Inquiry. World Bank Policy Research Working paper 7517. Vogt, Manuel, N.-C. B. S. R. L.-E. C. P. H. and L. Girardin (2015). Integrating Data on Ethnicity, Geography, and Conflict: The Ethnic Power Relations Data Set Family. 51 Zhou, Y.-Y. (2018). Refugee Proximity and Support for Citizenship Exclusion in Africa. Unpublished . Zhou, Y.-Y. (2019). How Refugee Resentment Shapes National Identity and Citizen Participation in Africa. Unpublished . Zhou, Y.-Y., G. Grossman, and S. Ge (2021). When Refugee Exposure Improves Local Development and Public Goods Provision: Evidence from Uganda. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Zhou, Y.-Y. and J. Lyall (2021). Prolonged Contact Does Not Improve Locals Relations with Migrants in Wartime Settings. Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group . Zhou, Y.-Y. and A. Shaver (2021). Reexamning the Effect of Refugees on Civil Conflict: A Global Subnational Analysis. Amerian Political Science Review 115 (4), 1175–1196. Zylkin, T. (2019, Mar). ppml with negative values. Statalist.. 52 Tables and figures for: Refugee, Diversity, and Conflict in Sub-Saharan Africa February 10, 2022 Abstract This document contains a set of appendices with supplemental material. Appendix A Data Appendix A.1 Linking Ethnic Data from Africa (LEDA) LEDA offers an interface—a language tree—to flexibly link ethnic groups from different databases to each other and calculate the linguistic distances between them. LEDA is currently structured around lists of ethnic groups from 12 original datasets, which are the following: ˆ Afrobarometer Surveys ˆ All Minorities at Risk (AMAR) ˆ Census data from IPUMS ˆ Ethnic Power Relations (EPR) dataset ˆ Ethnologue languages ˆ Political Relevant Ethnic Groups from Posner (2004) ˆ Ethnic groups in Francois, Trebbi & Rainer (2015) ˆ Ethnic groups from Fearon (2003) ˆ GREG Data (based on the Russian Atlas Miradova) ˆ Demographic and Health Surveys ˆ Murdock Atlas ˆ Spatially Interpolated Data on Ethnicity (SIDE) These lists are structured in LEDA’s interface by data source, country, year, or in the case of survey data, survey rounds. In our analysis, we use Afrobarometer, EPR, and Murdock Atlas data; therefore, we can use LEDA functions to link the different ethnic groups to each other. LEDA consists of three main linkage types: binary linking based on the relations of sets of language nodes associated with two groups; binary linking based on linguistic distances; and a full computation of dyadic linguistic distances. In our main analysis, we use the second type of linkage, binary linking based on linguistic dis- tances, and set the level of linking to “dialect”. This is done using the “mindistlink” function of 1 LEDA, which computes the minimum linguistic distance between two ethnic groups and, therefore, provides the closest linguistic neighbor for each given ethnic group (see Figure A.1). This function computes a variable called distance, which measures the linguistic distance between two ethnic groups. Mathematically, these distances are calculated as δ 2d(ω (L1,...,O ) ∩ ω (L2,...,O )) DL1 L2 = 1 − , (A.1) d(ω (L1,...,O )) + d(ω (L2,...,O )) where d(ω (L1,...,O ) is the length of the path from the first language to the tree’s origin and d(ω (L1,...,O ) ∩ ω (L2,...,O ) is the length of the intersection of the paths from the first and second language to the origin. δ is an exponent to discount distances further away from the root of the tree; it is typically set to 0.5. Figure A.1: Linking Ethnic Data from Africa uller-Crepon et al., 2020. Source: M¨ As a robustness check, we also use the first type of linkage: binary linking based on the relations between sets of language nodes associated with two groups. This is done using the “setlink” function of LEDA. With this function, the two groups are linked to each other as soon as they share any 2 language node at the level of the language tree specified by the link level. Concretely, we first use LEDA to obtain linkage tables between Afrobarometer and EPR data for our main analysis and the Murdock Atlas and EPR data for our IV strategy, using the “mindistlink” function. We also obtain the same tables using the “setlink” function for robustness. We choose “dialect” as our link level therefore adopting a strict definition of ethnic similarity (vs. difference). We also obtain these tables choosing “language” as our link level for robustness. We therefore end up with 4 linkage tables. Note that in using the “setlink” function, choosing “dialect” or “language” as our link level provides exactly the same linkage table between the Afrobarometer and EPR data. As a reminder, we use the data on ethnicity from the Afrobarometer to obtain the diversity indices in the host country (data on the ethnicity of the surveyed individuals in the host country), while we use the data on ethnicity from the EPR dataset to define the revised refugee diversity indices (data on the ethnicity of refugees in the camps in the host country). Finally, we use data on ethnicity from the Murdock Atlas to obtain the historical homeland of refugees that we use in our IV approach. These linkage tables between the different databases from LEDA do not provide one-to-one links. Indeed, with ethnicities being identified at different levels in these different databases, they may be linked to several others. In other words, we still need to determine an approach through which to end up with a single definition of ethnicity in our analysis. The Afrobarometer and the Murdock Atlas ethnicities are overall defined at a more disaggregated level than the EPR ethnicities. As we can aggregate the disaggregated ethnicities but cannot disaggregate the aggregated ones, where possible we rename the Afrobarometer and the Murdock Atlas ethnicities based on the EPR ethnicities. We merge these tables with the data on ethnicity we have from rounds 3–6 of the Afrobarometer and the UNHCR refugee camps data for the corresponding period, 2005–2016. We drop all pairs of links between ethnicities when they do not occur in the Afrobarometer and the UNHCR refugee camps data simultaneously. In other words, we only keep the data on the link between the ethnicities that are present in our database. We isolate one-to-one (injective) relations between ethnicities in the Afrobarometer and the UN- HCR refugee camps data. These are trivial to handle (See Figure A.2). 3 Figure A.2: Injective relations We also isolate many-to-one (bijective) relations. In this case, we have to aggregate the Afro- barometer ethnicities with their unique and more aggregated correspondence in the UNHCR refugee camps data (See Figure A.3). Figure A.3: Bijective relations The remaining correspondences are either (i) one-to-many (bijective) but opposite to Figure A.3 (i.e., many ethnicities from the UNHCR refugee camps data correspond to one ethnicity from the Afrobarometer) or (ii) many-to-many relations. For both cases, we apply a more pragmatic approach: a. In both cases, we disregard ethnicities that do not appear either in the Afrobarometer or in the UNHCR refugee camps data. This means that for the remaining ethnicity that has no counterpart in either the Afrobarometer or the UNHCR refugee camps data, we simply keep the name of the ethnicity as such, i.e., this information is not dropped. b. Then, after ignoring ethnicities that have no occurrence in our datasets, we check whether the one-to-many or the many-to-many relation has not boiled down to a one-to-one resp. many-to- one relation again. If so, we can treat them as above. c. For the remaining one-to-many relations, we keep these ethnicities in the Afrobarometer as such and consider them as a single ethnic group. Some manual treatment can even further improve 4 the correspondence. d. For the few remaining many-to-many relations, we consider the ethnicities on either sides as separate ethnicities. Here also, some further manual treatment can improve the correspondence. 5 Appendix B Tables and Figures Table B.1: Summary Table for Data Availability and Quality for Countries in Sub-Saharan AfricaI (1) (2) (3) (4) (5) (6) (7) (8) (9) Data The Afrobarometer UNHCR refugee camps EPR-ER Round 1II Round 2II Round 3 Round 4 Round 5 Round 6 Round 7III Period 1999–2001 2002–2003 2005–2006 2008–2009 2012–2013 2014–2015 2016–2017 2000–2016 1975–2017 Benin 1,198 1,200 1,200 1,200 1,200 Good Quality Available Botswana 1,200 1,200 1,200 1,200 1,200 1,200 1,198 Bad Quality Not Available Burkina Faso 1,200 1,200 1,200 1,200 Good Quality Available Burundi 1,200 1,200 1,200 Good Quality Available Cape Verde 1,268 1,256 1,264 1,208 1,200 1,202 Not Available Not Available Cameroon 1,200 1,182 1,200 Good Quality Available Gabon 1,198 1,200 Bad Quality Available Gambia 2,400 Bad Quality Available Ghana 2,004 1,200 1,197 1,200 2,400 2,400 1,194 Good Quality Available Guinea 1,200 1,200 1,599 Good Quality Available Ivory Coast 1,200 1,199 1,200 Good Quality Available Kenya 2,398 1,278 1,104 2,399 2,397 1,200 Good Quality Available Lesotho 1,177 1,200 1,161 1,200 1,197 1,200 1,200 Not Available Available Liberia 1,200 1,199 1,199 1,200 Good Quality Available Madagascar 1,350 1,350 1,200 1,200 1,200 Not Available Not Available Malawi 1,208 1,200 1,200 1,200 2,407 2,400 1,200 Good Quality Available Mali 2,089 1,283 1,244 1,232 1,200 1,200 1,200 Bad Quality Available Mauritius 1,200 1,200 1,200 Not Available Not Available Mozambique 1,400 1,198 1,200 2,400 2,400 1,200 Good Quality Available Namibia 1,183 1,199 1,200 1,200 1,200 1,200 1,200 Good Quality Available Niger 2,363 1,199 1,200 1,600 Good Quality Available Nigeria 3,603 2,428 2,324 2,400 2,400 1,200 Good Quality Available Sao Tome and Principe 1,196 1,200 Not Available Not Available Senegal 1,200 1,200 1,200 1,200 1,200 1,200 Bad Quality Available Sierra Leone 1,190 1,191 1,840 Good Quality Available South Africa 2,200 2,400 2,400 2,400 2,399 2,390 1,200 Not Available Available SudanIV 1,199 1,200 2,400 Good Quality Available Swaziland 1,200 1,200 1,199 Bad Quality Not Available Tanzania 2,198 1,223 1,304 1,208 2,400 2,386 1,200 Good Quality Available Togo 1,200 1,200 1,199 Bad Quality Available Uganda 2,271 2,400 2,400 2,431 2,400 2,400 1,200 Good Quality Available Zambia 1,198 1,198 1,200 1,200 1,200 1,199 1,200 Good Quality Available Zimbabwe 1,200 1,104 1,048 1,200 2,400 2,400 1,200 Good Quality Available I Quality refers to the exactness of UNHCR refugee camps data for a given country, which is determined by comparison with UNHCR official bilateral data in Figure B.8. II There is no data on ethnicity in rounds 1 and 2 of the Afrobarometer III There is no geocoded data for round 7 of the Afrobarometer. IV The question on an individual’s ethnicity is not asked in Sudan. 6 Table B.2: Descriptive Statistics (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Obs. Mean Std. Dev. Min. Max. Obs. Mean Std. Dev. Min. Max. Panel A: Refugee-Hosting Areas Panel B: All Areas Diversity Indices EF 2,327 0.2558 0.2529 0 0.8791 14,441 0.2897 0.2705 0 0.8828 EP 2,327 0.1011 0.0911 0 0.2500 14,441 0.1072 0.0905 0 0.25 REF (80 km, min. ling. dist.) 2,327 0.3790 0.2446 0 0.8494 14,441 0.2818 0.2637 0 0.8664 REP (80 km, min. ling. dist.) 2,327 0.1407 0.0782 0 0.2500 14,441 0.1072 0.0911 0 0.25 Refugees (80 km, HIS) 2,327 6.9173 4.6184 0 13.7611 14,441 1.1146 3.1471 0 13.7611 Conflict Events. Violent conflict, incidence 2,327 0.5170 0.4998 0 1 14,441 0.4802 0.4996 0 1 Violent conflict, intensity (IHS) 2,327 0.8992 1.0807 0 5.6131 14,441 0.9743 1.2658 0 6.5367 Non-violent conflict, incidence 2,327 0.6162 0.4864 0 1 14,441 0.5728 0.4947 0 1 7 Non-violent conflict, intensity (IHS) 2,327 1.2188 1.1382 0 4.8521 14,441 1.4067 1.5436 0 5.7808 Civilian conflict, incidence 2,327 0.4083 0.4916 0 1 14,441 0.3968 0.4892 0 1 Civilian conflict, intensity (IHS) 2,327 0.6370 0.8938 0 4.1591 14,441 0.7444 1.1157 0 6.5309 Protest, incidence 2,327 0.5548 0.4971 0 1 14,441 0.5363 0.4987 0 1 Protest, intensity 2,327 1.0622 1.1148 0 4.7708 14,441 1.3011 1.5196 0 5.7746 UCDP conflicts, incidence 2,327 0.0812 0.2732 0 1 14,441 0.1322 0.3387 0 1 UCDP conflicts, intensity (IHS) 2,327 0.1309 0.5543 0 5.4468 14,441 0.2371 0.7017 0 5.5373 Climate Data Rain anomalies 2,327 1.0191 10.1544 -48.2476 44.6816 14,441 0.2015 10.4917 -57.7804 44.8193 Temperature anomalies 2,327 0.1001 0.2171 -0.5537 1.2996 14,441 0.1138 0.2206 -0.5938 1.2996 Notes: EF, EP: standard diversity indices. REF (80 km, min. ling. dist.), REP (80 km, min. ling. dist.): revised refugee diversity indices. Refugee camps in an 80 km buffer around each cluster. The “minimum linguistic distance” function from LEDA. Table B.3: Diversity and Violent Conflict, Intensity (1) (2) (3) (4) (5) (6)a Violent Conflict (IHS), Intensity EF 0.0940 0.1214 (0.1954) (0.1959) EP -0.1074 -0.1827 (0.5490) (0.5502) Refugees (80 km, IHS) -0.0179*** -0.0170*** -0.0170*** (0.0058) (0.0065) (0.0065) REF (80 km, min. ling. dist.) -0.4386** -0.3022 -0.4447** -0.3088 (0.1980) (0.2072) (0.1994) (0.2086) REP (80 km, min. ling. dist.) 0.9754* 0.8923* 0.9987* 0.9170* (0.5133) (0.5151) (0.5168) (0.5183) Rain anomalies (80 km) 0.0017** 0.0017** (0.0008) (0.0008) Temp. anomalies (8km) -0.0842* -0.0891** (0.0451) (0.0450) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.806 0.807 0.806 0.807 0.807 0.807 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: violent con- flict intensity, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line B of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 8 Table B.4: Diversity and Non-Violent Conflict, Intensity (1) (2) (3) (4) (5) (6)a Non-Violent Conflict (IHS), Intensity EF 0.2519 0.2758 (0.1980) (0.1966) EP -0.8126 -0.8784 (0.5905) (0.5880) Refugees (80 km, IHS) -0.0157*** -0.0154** -0.0152** (0.0056) (0.0062) (0.0062) REF (80 km, min. ling. dist.) -0.3714* -0.2478 -0.3619* -0.2402 (0.2109) (0.2101) (0.2101) (0.2092) REP (80 km, min. ling. dist.) 0.9248 0.8494 0.8981 0.8250 (0.5720) (0.5607) (0.5699) (0.5588) Rain anomalies (80 km) -0.0006 -0.0007 (0.0006) (0.0006) Temp. anomalies (80 km) 0.0900** 0.0855* (0.0447) (0.0445) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.862 0.862 0.862 0.862 0.862 0.862 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: intensity of non-violent conflict, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line D of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 9 Table B.5: Diversity and Civilian Conflict, Incidence (1) (2) (3) (4) (5) (6)a Civilian Conflict, Incidence EF 0.0348 0.0422 (0.0711) (0.0715) EP -0.0612 -0.0816 (0.2115) (0.2132) Refugees (80 km, IHS) -0.0049* -0.0043 -0.0048* (0.0027) (0.0029) (0.0029) REF (80 km, min. ling. dist.) -0.1730** -0.1384 -0.1843** -0.1460* (0.0824) (0.0845) (0.0826) (0.0847) REP (80 km, min. ling. dist.) 0.4101* 0.3890* 0.4262** 0.4032* (0.2163) (0.2163) (0.2167) (0.2169) Rain anomalies (80 km) -0.0025*** -0.0025*** (0.0004) (0.0004) Temp. anomalies (80 km) -0.0404* -0.0418* (0.0217) (0.0216) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.674 0.674 0.674 0.674 0.676 0.676 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: incidence of civilian conflict, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line E of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 10 Table B.6: Diversity and Civilian Conflict, Intensity (1) (2) (3) (4) (5) (6)a Civilian Conflict (IHS), Intensity EF 0.2347 0.2607 (0.1821) (0.1827) EP -0.4596 -0.5309 (0.5107) (0.5128) Refugees (80 km, IHS) -0.0170*** -0.0162*** -0.0163*** (0.0050) (0.0056) (0.0056) REF (80 km, min. ling. dist.) -0.5070*** -0.3767** -0.5061*** -0.3762** (0.1843) (0.1895) (0.1839) (0.1892) REP (80 km, min. ling. dist.) 1.2741*** 1.1947** 1.2696*** 1.1916** (0.4685) (0.4667) (0.4677) (0.4660) Rain anomalies (80 km) -0.0005 -0.0005 (0.0007) (0.0007) Temp. anomalies (80 km) 0.0167 0.0119 (0.0392) (0.0391) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.804 0.804 0.804 0.804 0.804 0.804 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: intensity of civilian conflict, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line F of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 11 Table B.7: Diversity and Protests, Incidence (1) (2) (3) (4) (5) (6)a Protests, Incidence EF 0.0494 0.0423 (0.0706) (0.0710) EP 0.1070 0.1264 (0.2108) (0.2113) Refugees (80 km, IHS) 0.0046* 0.0033 0.0033 (0.0025) (0.0027) (0.0027) REF (80 km, min. ling. dist.) -0.0590 -0.0853 -0.0593 -0.0853 (0.0763) (0.0808) (0.0763) (0.0809) REP (80 km, min. ling. dist.) 0.4597** 0.4757** 0.4597** 0.4754** (0.2129) (0.2157) (0.2133) (0.2160) Rain anomalies (80 km) -0.0001 -0.0001 (0.0004) (0.0004) Temp. anomalies (80 km) 0.0005 0.0015 (0.0211) (0.0211) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.696 0.697 0.697 0.697 0.697 0.697 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: inci- dence of protests, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line G of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 12 Table B.8: Diversity and Protests, Intensity (1) (2) (3) (4) (5) (6)a Protests (IHS), Intensity EF 0.3196* 0.3411* (0.1890) (0.1875) EP -0.8972 -0.9561* (0.5683) (0.5658) Refugees (80 km, IHS) -0.0140*** -0.0149*** -0.0143*** (0.0051) (0.0056) (0.0055) REF (80 km, min. ling. dist.) -0.3195 -0.2002 -0.3036 -0.1891 (0.2071) (0.2057) (0.2076) (0.2062) REP (80 km, min. ling. dist.) 0.9316* 0.8588 0.9011 0.8323 (0.5633) (0.5524) (0.5642) (0.5536) Rain anomalies (80 km) 0.0019*** 0.0019*** (0.0007) (0.0007) Temp. anomalies (80 km) 0.0897** 0.0855* (0.0452) (0.0449) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.854 0.854 0.854 0.854 0.854 0.854 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: intensity of protests, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line H of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 13 Table B.9: Diversity and UCDP Major Conflicts, Incidence (1) (2) (3) (4) (5) (6)a UCDP Major Conflicts, Incidence EF 0.1075* 0.1017* (0.0580) (0.0574) EP -0.2880* -0.2719* (0.1601) (0.1587) Refugees (80 km, IHS) 0.0038*** 0.0043*** 0.0044*** (0.0013) (0.0015) (0.0015) REF (80 km, min. ling. dist.) 0.0937 0.0594 0.0993* 0.0642 (0.0589) (0.0641) (0.0587) (0.0638) REP (80 km, min. ling. dist.) -0.2592* -0.2383 -0.2743* -0.2532* (0.1503) (0.1524) (0.1494) (0.1514) Rain anomalies (80 km) -0.0002 -0.0002 (0.0003) (0.0003) Temp. anomalies (80 km) 0.0503*** 0.0516*** (0.0151) (0.0151) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.703 0.703 0.703 0.703 0.703 0.703 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: UCDP major conflict incidence, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line I of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 14 Table B.10: Diversity and UCDP Major Conflicts, Intensity (1) (2) (3) (4) (5) (6)a Dependent variable: UCDP Major Conflicts (IHS), Intensity EF 0.1119 0.0994 (0.1808) (0.1778) EP -0.3252 -0.2908 (0.4983) (0.4908) Refugees (80 km, IHS) 0.0082** 0.0096** 0.0099** (0.0037) (0.0044) (0.0044) REF (80 km, min. ling. dist.) 0.0544 -0.0224 0.0644 -0.0143 (0.1499) (0.1630) (0.1493) (0.1624) REP (80 km, min. ling. dist.) -0.1444 -0.0976 -0.1673 -0.1200 (0.3930) (0.3988) (0.3915) (0.3972) Rain anomalies (80 km) 0.0004 0.0004 (0.0005) (0.0005) Temp. anomalies (80 km) 0.0726** 0.0755*** (0.0282) (0.0280) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.728 0.728 0.728 0.728 0.728 0.728 Year Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS and an alternative dependent variable: UCDP major conflict intensity, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnic- ities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line J of Table 3. ∗ More information on LEDA can be found in Appendix A.1. 15 Table B.11: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6)a Violent Conflict, Incidence EF -0.1090 -0.1096 (0.0716) (0.0718) EP 0.2263 0.2278 (0.2103) (0.2108) Refugees (80 km, IHS) 0.0004 0.0018 0.0014 (0.0028) (0.0029) (0.0029) REF (80 km, SetLink) -0.1404** -0.1526** -0.1500** -0.1597** (0.0666) (0.0689) (0.0666) (0.0690) REP (80 km, SetLink) 0.2949 0.3017 0.3149 0.3202* (0.1930) (0.1933) (0.1935) (0.1939) Rain anomalies (80 km) -0.0008** -0.0008** (0.0004) (0.0004) Temp. anomalies (80 km) -0.0833*** -0.0829*** (0.0230) (0.0230) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.661 0.661 0.661 0.662 0.662 0.662 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). The “set link” function from LEDA∗ is used as an alternative to link ethnicities between Afrobarometer and EPR-ER data. Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line B of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 16 Table B.12: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6)a Violent Conflict, Incidence EF -0.0049 -0.0056 (0.0579) (0.0578) EP 0.0437 0.0455 (0.1655) (0.1651) Refugees (40 km, IHS) 0.0008 0.0015 0.0014 (0.0034) (0.0035) (0.0035) REF (40 km, min. ling. dist.) -0.1055 -0.1139 -0.1113 -0.1188* (0.0683) (0.0710) (0.0684) (0.0711) REP (40 km, min. ling. dist.) 0.2929 0.3008* 0.3074* 0.3145* (0.1803) (0.1823) (0.1806) (0.1827) Rain anomalies (40 km) -0.0005 -0.0005 (0.0003) (0.0003) Temp. anomalies (40 km) -0.0404** -0.0402* (0.0205) (0.0205) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.696 0.696 0.696 0.696 0.696 0.696 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). Refugee camps in a 40-km buffer around each cluster. Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line C of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 17 Table B.13: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6)a Violent Conflict, Incidence EF -0.0408 -0.0469 (0.0652) (0.0651) EP 0.1837 0.1938 (0.2025) (0.2017) Refugees (120 km, IHS) 0.0040* 0.0040 0.0034 (0.0023) (0.0025) (0.0025) REF (120 km, min. ling. dist.) -0.0583 -0.0937 -0.0792 -0.1090 (0.0725) (0.0751) (0.0726) (0.0755) REP (120 km, min. ling. dist.) 0.2824 0.2954 0.3205 0.3311* (0.1987) (0.1989) (0.1996) (0.1999) Rain anomalies (120 km) -0.0019*** -0.0019*** (0.0004) (0.0004) Temp. anomalies (120 km) -0.1376*** -0.1363*** (0.0227) (0.0227) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.648 0.648 0.648 0.648 0.650 0.650 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). Refugee camps in a 120-km buffer around each cluster. Columns (1) and (2) introduce standard diversity indices. From col- umn (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line D of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 18 Table B.14: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6)a Violent Conflict, Incidence EF -0.1305 -0.1343 (0.0970) (0.0960) EP 0.2502 0.2583 (0.2602) (0.2570) Refugees (80 km, IHS) 0.0016 0.0027 0.0024 (0.0019) (0.0022) (0.0024) REF (80 km, min. ling. dist.) -0.1642* -0.2049** -0.1912** -0.2259** (0.0872) (0.0948) (0.0958) (0.1030) REP (80 km, min. ling. dist.) 0.4744** 0.5148** 0.5315** 0.5649** (0.2244) (0.2278) (0.2463) (0.2486) Rain anomalies (80 km) -0.0008* -0.0008* (0.0004) (0.0004) Temp. anomalies (80 km) -0.0741** -0.0718** (0.0327) (0.0325) Observations 5,761 5,761 5,761 5,761 5,761 5,761 Number of cluster id 1,835 1,835 1,835 1,835 1,835 1,835 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using logit presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. An 80-km buffer around each cluster is used to revise standard ethnic diversity measures with the number and ethnic composition of refugees in the camps within this distance. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line E of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 19 Table B.15: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6) Violent Conflict, Incidence EF -0.1473** -0.1500** (0.0721) (0.0717) EP 0.2882 0.2952 (0.2098) (0.2092) Refugees (80 km, IHS) 0.0039 0.0050 0.0048 (0.0030) (0.0031) (0.0031) REF (80km, min. ling. dist.) -0.1493* -0.1842** -0.1554* -0.1887** (0.0834) (0.0853) (0.0833) (0.0855) REP (80 km, min. ling. dist.) 0.4071* 0.4231* 0.4192* 0.4343** (0.2194) (0.2190) (0.2196) (0.2194) Rain anomalies (80 km) -0.0008* -0.0007* (0.0004) (0.0004) Temp. anomalies (80 km) -0.0947*** -0.0940*** (0.0237) (0.0236) Observations 14,441 14,441 14,441 14,441 14,441 14,441 R-squared 0.671 0.671 0.671 0.671 0.672 0.672 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Country-Time Trends Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). Includes country-specific time trends. Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line F of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 20 Table B.16: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6) Violent Conflict, Incidence EF -0.1746** -0.1710** (0.0713) (0.0720) EP 0.4468** 0.4328** (0.2173) (0.2188) Refugees (80 km, IHS) -0.0044 -0.0036 -0.0042 (0.0030) (0.0033) (0.0033) Refugee EF (80 km, min. ling. dist.) -0.2419*** -0.2163** -0.2613*** -0.2316*** (0.0857) (0.0890) (0.0858) (0.0891) Refugee EP (80 km, min. ling. dist.) 0.5712** 0.5619** 0.6137*** 0.6038*** (0.2301) (0.2313) (0.2311) (0.2323) Rain anomalies (80 km) -0.0004 -0.0004 (0.0004) (0.0004) Temp. anomalies (80 km) -0.0990*** -0.1014*** (0.0238) (0.0239) Observations 11,909 11,909 11,909 11,909 11,909 11,909 R-squared 0.665 0.666 0.666 0.666 0.666 0.666 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). Countries with low-quality refugee camp data excluded. Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 19. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line G of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 21 Table B.17: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6)a Violent Conflict, Incidence EF -0.0794 -0.0826 (0.0597) (0.0597) EP 0.2246 0.2330 (0.1621) (0.1622) Refugees (80 km, IHS) 0.0024 0.0037* 0.0035 (0.0020) (0.0022) (0.0022) REF (80 km, min. ling. dist.) -0.1220* -0.1454** -0.1248** -0.1469** (0.0629) (0.0653) (0.0630) (0.0655) REP (80 km, min. ling. dist.) 0.2774* 0.2826* 0.2826* 0.2874* (0.1624) (0.1631) (0.1628) (0.1635) Rain anomalies (80 km) -0.0013*** -0.0013*** (0.0003) (0.0003) Temp. anomalies (80 km) -0.0444** -0.0439** (0.0183) (0.0183) Observations 22,415 22,415 22,415 22,415 22,415 22,415 R-squared 0.640 0.640 0.640 0.640 0.640 0.640 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). Level of precision set to ≤3 to increase number of geocoded locations. Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line H of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 22 Table B.18: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) (5) (6)a Violent Conflict, Incidence EF -0.0668 -0.0694 (0.0579) (0.0580) EP 0.2230 0.2301 (0.1579) (0.1580) Refugees (80 km, IHS) 0.0022 0.0035 0.0033 (0.0020) (0.0022) (0.0022) REF (80 km, min. ling. dist.) -0.1435** -0.1654*** -0.1468** -0.1675*** (0.0605) (0.0629) (0.0607) (0.0631) REP (80 km, min. ling. dist.) 0.3516** 0.3551** 0.3575** 0.3608** (0.1573) (0.1579) (0.1578) (0.1584) Rain anomalies (80 km) -0.0013*** -0.0013*** (0.0003) (0.0003) Temp. anomalies (80 km) -0.0502*** -0.0499*** (0.0175) (0.0175) Observations 23,256 23,256 23,256 23,256 23,256 23,256 R-squared 0.642 0.642 0.642 0.642 0.643 0.643 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Notes: Estimated equation: Equation (1) using OLS, presented in column (6). All geocoded locations are included. Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in line I of Table 4. ∗ More information on LEDA can be found in Appendix A.1. 23 Table B.19: Instrumental Variable Approach: Diversity and Violent Conflict, Incidence (1) (2) (3) (4) Violent Conflict, Incidence Panel A: Second Stage REF (80 km, min. ling. dist.) 0.9463*** 0.9536*** 0.9456*** 0.9542*** (0.0086) (0.0101) (0.0084) (0.0094) REP (80 km, min. ling. dist.) 0.4712* 0.4768* 0.4674* 0.4757* (0.2603) (0.2629) (0.2593) (0.2618) Refugees (80 km, IHS) 0.0005 0.0007 (0.0029) (0.0029) Observations 14,441 14,441 14,441 14,441 R-squared 0.0019 0.0019 0.0029 0.0030 Kleibergen–Paap rk Wald F 868.9 890.1 871.2 892.8 Root MSE 0.290 0.290 0.290 0.290 Panel B: First Stage (REF) Predicted REF 0.9463*** 0.9536*** 0.9456*** 0.9542*** (0.0086) (0.0101) (0.0084) (0.0094) Predicted REP 0.0501** 0.0426* 0.0447** 0.0367* (0.0218) (0.0227) (0.0211) (0.0212) Panel C: First Stage (REP) Predicted REF 0.0095*** 0.0054* 0.0091*** 0.0056* (0.0017) (0.0029) (0.0018) (0.0029) Predicted REP 0.9687*** 0.9729*** 0.9686*** 0.9719*** (0.0052) (0.0064) (0.0054) (0.0064) Year FE Y Y Y Y PSU FE Y Y Y Y Country–time trends N N Y Y Climatic controls Y Y Y Y Notes: Estimated equation in panel A: Equation (5). Estimated equation in panel B: Equation (7). Es- timated equation in panel C: Equation (8). Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA is used. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. REF and REP predicted using a gravity model presented in column (2) of Table 5. 24 Table B.20: Descriptive Statistics: Individual Data (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Obs. Mean Std. Dev. Min. Max. Obs. Mean Std. Dev. Min. Max. Panel A: Refugee-Hosting Areas Panel B: All Areas Diversity Indices EF 7,893 0.2666 0.2580 0 0.8791 46,528 0.3131 0.2842 0 0.8828 EP 7,893 0.1029 0.0900 0 0.2500 46,528 0.1078 0.0873 0 0.2500 REF (80 km, min. ling. dist.) 7,893 0.3818 0.2481 0 0.8494 46,528 0.3038 0.2773 0 0.8664 REP (80 km, min. ling. dist.) 7,893 0.1385 0.0768 0 0.2500 46,528 0.1075 0.0876 0 0.2500 Refugees (80 km, IHS) 7,893 6.9647 4.6902 0 13.7611 46,528 1.1815 3.2503 0 13.7611 Sociodemographic Variables Age 7,893 36.6117 13.9844 18 100 46,528 35.7502 13.8922 18 105 Basic education 7,893 0.3532 0.4780 0 1 46,528 0.2911 0.4543 0 1 Secondary education 7,893 0.3248 0.4683 0 1 46,528 0.3664 0.4818 0 1 Tertiary education 7,893 0.0817 0.2740 0 1 46,528 0.1279 0.3339 0 1 Female 7,893 0.4994 0.5000 0 1 46,528 0.5000 0.5000 0 1 Urban/rural status 7,893 0.3739 0.4839 0 1 46,528 0.4235 0.4941 0 1 Employment 7,893 0.4208 0.4937 0 1 46,528 0.3746 0.4840 0 1 25 Marital status 7,893 0.0552 0.2285 0 1 46,528 0.0679 0.2516 0 1 Outcome Variables Attacks 7,893 0.0889 0.2847 0 1 46,528 0.1139 0.3177 0 1 Crime 7,893 0.2951 0.4561 0 1 46,528 0.3199 0.4664 0 1 Identity: Ethnicity vs. Nationality 7,893 0.5110 0.4999 0 1 46,528 0.4665 0.4989 0 1 Protest 7,893 0.2272 0.4190 0 1 46,528 0.3206 0.4667 0 1 Theft 7,893 0.3012 0.4588 0 1 46,528 0.3055 0.4606 0 1 Trust: general 4,784 0.2255 0.4180 0 1 22,593 0.1954 0.3965 0 1 Trust: government 7,893 0.6264 0.4838 0 1 46,528 0.6127 0.4871 0 1 Trust: neighborhood 4,784 0.6166 0.4863 0 1 22,593 0.5993 0.4901 0 1 Climate Data Rain anomalies 7,893 -0.9722 11.4333 -48.2476 28.5457 46,528 -0.8006 11.0475 -57.7804 41.6399 Temperature anomalies 7,893 0.0575 0.2409 -0.5414 1.2996 46,528 0.0811 0.2561 -0.5938 1.2996 Notes: EF, EP: standard diversity indices. REF (80 km, min. ling. dist.), REP (80 km, min. ling. dist.): revised refugee diversity indices. Level of analysis: cluster. No. of countries: 23. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer surveys and EPR-ER data. ∗ More information on LEDA can be found in Appendix A.1. Table B.21: Summary Table: Aggregation at the GADM2 Level (1) (2) REF REP (min. ling. dist.) (min. ling. dist.) A. Benchmark results, incidence (N=14,441) -0.1780** 0.4487** (0.0858) (0.2202) B. Violent conflict, incidence, OLS, GADM2 (N=1,565) -0.3952 1.1009 (0.5440) (1.4174) C. Violent conflict, incidence, IV, GADM2 (N=1,565) -0.3947 1.0999 (0.5436) (1.4164) D. Non-violent conflict, incidence, OLS, GADM2 (N=1,565) -0.0381 -0.2136 (0.2848) (0.9435) E. Non-violent conflict, incidence, IV, GADM2 (N=1,565) -0.0376 -0.2155 (0.2846) (0.9429) Notes: Estimated equation: Equation (1) using OLS with alternative dependent variables. Level of analysis: GADM2. Period: 2005–2016. LEDA function: similar to Table 1. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. ∗ More information on LEDA can be found in Appendix A.1. 26 Table B.22: Summary Table: Aggregation at the GADM1 Level (1) (2) REF REP (min. ling. dist.) (min. ling. dist.) A. Benchmark results, intensity (N=14,441) -0.3088 0.9170* (0.2086) (0.5183) B. Violent conflict, intensity, OLS, GADM1 (N=1,283) -0.5196 3.5058 (1.1462) (3.0826) C. Violent conflict, intensity, IV, GADM1 (N=1,283) -0.9360 0.3957 (1.1377) (3.3304) D. Non-violent conflict, intensity, OLS, GADM1 (N=1,283) -2.2781** 4.8473* (0.8922) (2.5191) E. Non-violent conflict, intensity, IV, GADM1 (N=1,283) -2.2416** 4.8300* (0.8793) (2.5133) Notes: Estimated equation: Equation (1) using OLS with alternative dependent variables. Level of analysis: GADM1. Period: 2005–2016. LEDA function: similar to Table 1. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. ∗ More information on LEDA can be found in Appendix A.1. 27 Table B.23: Discussion: Diversity and Individual Outcomes (1) (2) (3) (4) (5) (6) (7) (8) Attacka Crimeb National Id. Protest Theftc Gen. Trust Neigh. Trust Gov. Trust Panel A: Excl. Countries with low-quality camp data (19 countries) REF (80 km, min. ling. dist.) -0.0668 -0.1494* -0.0280 -0.0544 -0.1193* -0.0577 -0.2156 0.0166 (0.0452) (0.0852) (0.0852) (0.0584) (0.0693) (0.1515) (0.1562) (0.0817) REP (80 km, min. ling. dist.) 0.2368* 0.2337 -0.0396 0.0191 0.1664 -0.0496 0.1815 0.0896 (0.1264) (0.2333) (0.2730) (0.1739) (0.2022) (0.4397) (0.5312) (0.2197) Observations 46,528 46,528 46,528 46,528 46,528 22,592 22,592 46,528 R-squared 0.161 0.199 0.236 0.496 0.176 0.249 0.276 0.276 Panel B: Incl. All countries (23 countries) REF (80 km, min. ling. dist.) -0.0559 -0.1142 -0.0511 -0.0480 -0.0837 -0.0553 -0.1834 0.0003 28 (0.0414) (0.0819) (0.0890) (0.0532) (0.0694) (0.1416) (0.1455) (0.0764) REP (80 km, min. ling. dist.) 0.1954* 0.1603 0.1474 -0.0364 0.1291 0.1891 0.4807 0.1574 (0.1118) (0.2179) (0.2406) (0.1565) (0.1957) (0.4072) (0.4573) (0.2042) Observations 56,706 56,706 56,706 56,706 56,706 27,126 27,126 56,706 R-squared 0.160 0.195 0.225 0.496 0.175 0.229 0.295 0.263 Year FE Y Y Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Y Y Controls: climate, Ref. (80 km, IHS), Ind. Y Y Y Y Y Y Y Y Notes: Estimated equation: Equation (9) using OLS. Individual controls: age, age squared, education, gender, marital and rural/urban status. Level of analysis: cluster. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. See column (6) for: a in Table B.24; b in Table B.25; c in Table B.26; ∗ More information on LEDA can be found in Appendix A.1. Table B.24: Diversity and Attacks (1) (2) (3) (4) (5) (6)a Attack EF 0.0025 -0.0042 (0.0434) (0.0437) EP 0.0259 0.0435 (0.1226) (0.1236) Refugees (80 km, IHS) 0.0037** 0.0038** 0.0033* (0.0018) (0.0018) (0.0019) REF (80 km, min. ling. dist.) -0.0513 -0.0644 -0.0563 -0.0668 (0.0451) (0.0455) (0.0446) (0.0452) REP (80 km, min. ling. dist.) 0.2145* 0.2267* 0.2293* 0.2368* (0.1287) (0.1274) (0.1274) (0.1264) Rain anomalies 0.0002 0.0003 (0.0006) (0.0006) Temp. anomalies -0.0361 -0.0303 (0.0273) (0.0280) Observations 46,528 46,528 46,528 46,528 46,528 46,528 R-squared 0.161 0.161 0.161 0.161 0.161 0.161 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Individual Controls Y Y Y Y Y Y Notes: Estimated equation: Equation (9) using OLS, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Individual controls: age, age squared, education, gender, marital and rural/urban status. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA is used. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in column (1) of Table B.23. ∗ More information on LEDA can be found in Appendix A.1. 29 Table B.25: Diversity and Crime (1) (2) (3) (4) (5) (6)a Crime EF -0.0888 -0.0952 (0.0750) (0.0743) EP 0.1626 0.1794 (0.2179) (0.2173) Refugees (80 km, IHS) 0.0035 0.0048 0.0046 (0.0049) (0.0050) (0.0049) REF (80 km, min. ling. dist.) -0.1310 -0.1476* -0.1350 -0.1494* (0.0824) (0.0843) (0.0835) (0.0852) REP (80 km, min. ling. dist.) 0.2108 0.2261 0.2234 0.2337 (0.2284) (0.2293) (0.2331) (0.2333) Rain anomalies -0.0001 -0.0001 (0.0010) (0.0010) Temp. anomalies -0.0221 -0.0142 (0.0531) (0.0523) Observations 46,528 46,528 46,528 46,528 46,528 46,528 R-squared 0.198 0.198 0.198 0.199 0.199 0.199 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Individual Controls Y Y Y Y Y Y Notes: Estimated equation: Equation (9) using OLS, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Individual controls: age, age squared, education, gender, marital and rural/urban status. Level of analysis: cluster. No. of countries: 20. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in column (2) of Table B.23. ∗ More information on LEDA can be found in Appendix A.1. 30 Table B.26: Individual Outcomes: Diversity and Theft (1) (2) (3) (4) (5) (6)a Theft EF -0.0518 -0.0592 (0.0708) (0.0702) EP -0.0042 0.0153 (0.1908) (0.1904) Refugees (80 km, IHS) 0.0041 0.0054 0.0053 (0.0046) (0.0047) (0.0047) REF (80 km, min. ling. dist.) -0.1065 -0.1250* -0.1024 -0.1193* (0.0672) (0.0703) (0.0663) (0.0693) REP (80 km, min. ling. dist.) 0.1709 0.1881 0.1543 0.1664 (0.1999) (0.2034) (0.1994) (0.2022) Rain anomalies 0.0013 0.0013 (0.0008) (0.0008) Temp. anomalies -0.0087 0.0005 (0.0386) (0.0378) Observations 46,528 46,528 46,528 46,528 46,528 46,528 R-squared 0.176 0.176 0.176 0.176 0.176 0.176 Year FE Y Y Y Y Y Y PSU FE Y Y Y Y Y Y Individual controls Y Y Y Y Y Y Notes: Estimated equation: Equation (9) using OLS, presented in column (6). Columns (1) and (2) introduce standard diversity indices. From column (3), revised refugee diversity indices are introduced. Individual controls: age, age squared, education, gender, marital and rural/urban status. Level of analysis: cluster. No. of countries: 20. Period: 2005–2016. Refugee camps in an 80-km buffer around each cluster. The “minimum linguistic distance” function from LEDA∗ is used to link ethnicities between Afrobarometer and EPR-ER data. Robust standard errors clustered at the cluster level are reported in parentheses. *** denotes statistical significance at the 1 percent level (p < 0.01), ** at the 5 percent level (p < 0.05), and * at the 10 percent level (p < 0.10), all for two-sided hypothesis tests. FE: fixed effects. a Results for REF and REP in column (6) presented in column (5) of Table B.23. ∗ More information on LEDA can be found in Appendix A.1. 31 Figure B.1: UNHCR Aggregated Refugee Data by Region of Asylum Africa Asia and Pacific 35 35 30 30 Displaced population (in millions) Displaced population(in millions) 25 25 20 20 15 15 10 10 5 5 0 0 2000 2005 2010 2015 2020 2000 2005 2010 2015 2020 Year Year Europe Americas 35 35 30 30 Displaced population (in millions) Displaced population(in millions) 25 25 20 20 15 15 10 10 5 5 0 0 2000 2005 2010 2015 2020 2000 2005 2010 2015 2020 Year Year 32 Figure B.2: People Displaced Across Borders by Country of Origin, UNHCR, End of 2019 People displaced by country of origin in Africa South Sudan Somalia Dem. Rep. of the Congo Sudan Central African Rep. Eritrea Burundi Nigeria Rwanda Mali Western Sahara 0 0 0 0 0 0 50 00 50 00 50 1, 1, 2, 2, People displaced (in thousands) Note: Countries with more than 100,000 refugees at origin. Figure B.3: UNHCR Share of Refugees in Neighboring Countries Africa 1.0 Share of Refugees in neighboring countries 0.2 0.4 0.6 0.0 0.8 2000 2005 2010 2015 2020 Year 33 Figure B.4: People Displaced Across Borders by Country of Asylum, UNHCR, End of 2019 People displaced by country of asylum in Africa Uganda Sudan Ethiopia Dem. Rep. of the Congo Chad Kenya Cameroon South Sudan Egypt United Rep. of Tanzania Niger Rwanda 0 0 0 0 50 00 50 1, 1, People displaced (in thousands) Note: Countries hosting more than 100,000 refugees. 34 Figure B.5: Refugee Flows between Source and Asylum Countries 35 Figure B.6: UNHCR Number of Protracted Refugee Situations Africa Asia and Pacific 35 35 30 30 Nbr of protracted situations Nbr of protracted situations 25 25 20 20 15 15 10 10 5 5 0 0 2005 2010 2015 2020 2005 2010 2015 2020 Year Year Europe Americas 35 35 30 30 Nbr of protracted situations Nbr of protracted situations 25 25 20 20 15 15 10 10 5 5 0 0 2005 2010 2015 2020 2005 2010 2015 2020 Year Year Note: Data are directly aggregated from UNHCR camp-level data. 36 Figure B.7: UNHCR Official Refugee Statistics vs. UNHCR Refugee Camps: Aggregated Data Refugee data (2000-2016) 2500000 2000000 #refugees 1500000 1000000 500000 00 05 10 15 20 20 20 20 out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) 37 Figure B.8: UNHCR Official Refugee Statistics vs. UNHCR Refugee Camps: Country-Level Data Benin Burkina_Faso corr camps EPR . corr camps EPR . corr camps UNHCR .94 corr camps UNHCR . corr UNHCR EPR . corr UNHCR EPR . 30000 40000 30000 20000 #refugees #refugees 20000 10000 10000 0 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Burundi Cameroon corr camps EPR .719 corr camps EPR . corr camps UNHCR .716 corr camps UNHCR . corr UNHCR EPR 1 corr UNHCR EPR .999 60000 400000 300000 40000 #refugees #refugees 200000 20000 100000 0 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Gabon Ghana corr camps EPR .993 corr camps EPR .99 corr camps UNHCR .971 corr camps UNHCR .993 corr UNHCR EPR .992 corr UNHCR EPR .999 20000 50000 15000 40000 #refugees #refugees 10000 30000 20000 5000 10000 0 0 5 0 5 0 5 0 5 0 0 1 1 0 0 1 1 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) 38 Figure B.8 (cont.): UNHCR Official Refugee Statistics vs. UNHCR Refugee Camps: Country-Level Data Guinea Ivory_Coast corr camps EPR .99 corr camps EPR .944 corr camps UNHCR .99 corr camps UNHCR .945 corr UNHCR EPR 1 corr UNHCR EPR 1 150000 400000 300000 100000 #refugees #refugees 200000 50000 100000 0 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Kenya Liberia corr camps EPR .971 corr camps EPR .93 corr camps UNHCR .974 corr camps UNHCR .93 corr UNHCR EPR .999 corr UNHCR EPR 1 600000 150000 500000 100000 #refugees #refugees 400000 50000 300000 200000 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Malawi Mali corr camps EPR . corr camps EPR -.405 corr camps UNHCR .875 corr camps UNHCR -.76 corr UNHCR EPR . corr UNHCR EPR .848 10000 20000 8000 15000 #refugees #refugees 6000 10000 4000 5000 2000 0 0 0 5 0 5 0 5 0 5 0 0 1 1 0 0 1 1 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) 39 Figure B.8 (cont.): UNHCR Official Refugee Statistics vs. UNHCR Refugee Camps: Country-Level Data Mozambique Namibia corr camps EPR . corr camps EPR .987 corr camps UNHCR . corr camps UNHCR .974 corr UNHCR EPR . corr UNHCR EPR .996 6000 30000 4000 20000 #refugees #refugees 2000 10000 0 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Niger Nigeria corr camps EPR . corr camps EPR -.09 corr camps UNHCR . corr camps UNHCR .998 corr UNHCR EPR . corr UNHCR EPR -.034 200000 10000 8000 150000 #refugees #refugees 6000 100000 4000 50000 2000 0 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Senegal Sierra_Leone corr camps EPR -.332 corr camps EPR .981 corr camps UNHCR -.227 corr camps UNHCR .981 corr UNHCR EPR .244 corr UNHCR EPR 1 40000 80000 30000 60000 #refugees #refugees 20000 40000 10000 20000 0 0 0 5 0 5 0 5 0 5 0 0 1 1 0 0 1 1 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) 40 Figure B.8 (cont.): UNHCR Official Refugee Statistics vs. UNHCR Refugee Camps: Country-Level Data Tanzania Togo corr camps EPR .881 corr camps EPR -.145 corr camps UNHCR .882 corr camps UNHCR -.057 corr UNHCR EPR 1 corr UNHCR EPR .99 800000 25000 20000 600000 #refugees #refugees 15000 400000 10000 200000 5000 0 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Uganda Zambia corr camps EPR 1 corr camps EPR .878 corr camps UNHCR 1 corr camps UNHCR .895 corr UNHCR EPR 1 corr UNHCR EPR .998 1000000 300000 800000 200000 #refugees #refugees 600000 400000 100000 200000 0 0 00 05 10 15 00 05 10 15 20 20 20 20 20 20 20 20 out-of-sample Refugees in camps out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) EPR-ER Refugees (UNHCR official figures) Zimbabwe corr camps EPR .808 corr camps UNHCR .896 corr UNHCR EPR .995 15000 10000 #refugees 5000 0 0 5 0 5 0 0 1 1 20 20 20 20 out-of-sample Refugees in camps EPR-ER Refugees (UNHCR official figures) 41 Figure B.9: Refugees in Camps per Ethnic Group 42 43 Figure B.10: Heterogeneity Analysis attack crime Refugee EF (OLS estimates) Refugee EP (OLS estimates) Refugee EF (OLS estimates) Refugee EP (OLS estimates) −0.073 0.243 −0.148 0.218 All All All All −0.100 0.286 −0.196 0.301 Out of 18−25 age range Out of 18−25 age range Out of 18−25 age range Out of 18−25 age range 0.037 0.062 −0.005 −0.021 aged 18−25 aged 18−25 aged 18−25 aged 18−25 −0.059 0.120 −0.045 −0.102 No sec. or tertiary educ. No sec. or tertiary educ. No sec. or tertiary educ. No sec. or tertiary educ. −0.080 0.275 −0.171 0.322 Sec. or tertiary educ. Sec. or tertiary educ. Sec. or tertiary educ. Sec. or tertiary educ. −0.088 0.312 −0.039 −0.186 Male Male Male Male −0.069 0.193 −0.259 0.636 Female Female Female Female −0.129 0.467 −0.128 0.221 Unemployed Unemployed Unemployed Unemployed 0.019 0.012 −0.179 0.249 Employed Employed Employed Employed −0.053 0.308 −0.096 0.142 Rural Rural Rural Rural −0.069 0.154 −0.124 0.150 Urban Urban Urban Urban −0.20 −0.10 0.00 0.10 0.20 −0.50 0.00 0.50 0.40 1.00 −0.40 −0.20 0.00 0.20 −0.50 0.00 0.50 0.40 1.00 theft protest Refugee EF (OLS estimates) Refugee EP (OLS estimates) Refugee EF (OLS estimates) Refugee EP (OLS estimates) −0.145 0.289 −0.050 0.053 All All All All −0.160 0.263 0.013 −0.176 Out of 18−25 age range Out of 18−25 age range Out of 18−25 age range Out of 18−25 age range −0.013 0.273 −0.166 0.678 aged 18−25 aged 18−25 aged 18−25 aged 18−25 −0.207 0.288 −0.162 0.293 No sec. or tertiary educ. No sec. or tertiary educ. No sec. or tertiary educ. No sec. or tertiary educ. −0.130 0.266 0.026 −0.116 Sec. or tertiary educ. Sec. or tertiary educ. Sec. or tertiary educ. Sec. or tertiary educ. −0.162 0.351 −0.061 0.131 Male Male Male Male −0.131 0.218 −0.035 −0.029 Female Female Female Female −0.174 0.444 −0.131 0.256 Unemployed Unemployed Unemployed Unemployed −0.096 −0.087 0.059 −0.183 Employed Employed Employed Employed −0.048 0.156 0.127 −0.251 Rural Rural Rural Rural −0.135 0.173 −0.055 0.014 Urban Urban Urban Urban −0.40 −0.20 0.00 0.20 0.40 −0.50 0.00 0.50 0.40 1.00 −0.40 −0.20 0.00 0.20 0.40 −1.00 −0.50 0.00 0.50 0.40 1.00 44