Building Stability Between Host and Refugee Communities Evidence from a TVET Program in Jordan and Lebanon

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


Policy Research Working Paper 10101
The resettlement of refugees in host communities increases (perceived) competition for scarce economic and non-economic resources, which can contribute to tensions between the communities. This study tests the impact of a TVET program in Jordan and Lebanon that aims to tackle stresses associated with competition, particularly in the labor market. The authors test the impact of the program on economic outcomes, economic and life optimism, experience and perception of economic competition and ingroup-outgroup discrimination using a range of survey measures and behavioral experiments. They also conduct heterogeneity analyses to assess whether the intervention affects host and refugee communities similarly. The authors show that by the end of the training, the program has not yet achieved its employment aims for either hosts or refugees. However, for refugees, there are significant improvements in optimism and decreases in the experience of short-term economic stress. There are also improvements in inter-group behavior for refugees. These results provide insights on how to better tailor labor market programs to host and refugees while being conflict sensitive. This paper is a product of the Social Sustainability and Inclusion Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at ferguson@isdc.org.

Introduction:
The Syrian war is now more than a decade old. In the ten years since violence broke out over six million people have fled the country with little prospect of returning. While some refugees have travelled well beyond the region, most have stayed within the Middle East, with Jordan, Lebanon and Turkey absorbing a vast majority. Although all three countries are in the upper middle-income bracket, they have struggled with economic issues and other structural and institutional weaknesses. Consequently, there was widespread concern that the influx of so many people -nominally and proportionally -could put too much pressure on already fragile systems. Host governments and donors feared that tensions between the communities could spark violence (e.g. Tan, 2015), further destabilizing the region. The source of such concerns is fairly well established, with episodes of displacement correlated with tensions and even conflict onset in host regions (Harari and Ferrara, 2018;Theisen et al., 2013;Hendrix and Salehyan, 2012), particularly when refugees increase ethnic diversity (Bertinelli et al., 2021). Although, we note, recent studies challenge the relationship between refugees and violence to some degree (Masterson and Lehmann, 2020;Shaver and Zhao, 2020).
The policy response to mitigate tensions and reduce the risk of violence in the region was to fund programs that support both hosts and refugees in a conflict sensitive manner (e.g. Ghreiz, 2020). In such a context, as well as others where there is a significant influx of refugees, it is reasonable that attention turns to understanding the policy interventions that can reduce tensions between host and refugee populations (e.g. Valli et al., 2019;Hangartner et al., 2019;Adida et al., 2018;Wike et al., 2016). In this article, we evaluate one potential approach to tension reduction -increasing employability (Date-Bah, 2003). Brück et al. (2021) argue that pro-employment interventions can affect social outcomes through both "employment effects" and "program effects", which suggests that interventions can stimulate social impacts, even when they fail in their narrower economic aims. These program effects may have social impacts as a direct result of how the intervention is designed (e.g., training people of different social groups together; changing perceptions of the future). Employment effects imply the program must first deliver its economic outcomes.
Despite the theoretical promise, empirical observations are mixed (Ferguson et al., 2019;Lyall et al., 2019) and, to our knowledge, the effectiveness of such programs has not yet been analyzed in the context of host-refugee tensions. Where the evidence is a bit clearer is when tensions are economically motivated (Blattman and Annan, 2016). Thus, employment programs might be well-placed to tackle these tensions, not least because competition for scarce (economic) resources is often cited as a driver of tensions between host and refugee communities (e.g. Alsharabati and Nammour, 2015). In this sense, we hypothesize that jobsbased programming should reduce real and perceived competition for these scarce (economic) resources, which should in turn improve group-based relationships between the communities.
In this article, we investigate this potential by analyzing the impacts of a set of vocational training (TVET) interventions -implemented by Mercy Corps, an international humanitarian and development organization -for, mostly mixed groups of, host and refugee communities in both Jordan and Lebanon. We assign treatment status from oversubscribed application lists. We rebalance the data using probability weights to account for potential biases resulting from non-random treatment assignment and use difference-in-difference based estimators to determine impacts of the training on employment, optimism, experience of economic scarcity and inter-group behavioral indicators immediately after the training has been completed.
These analyses show little sign that by the end of the training the intervention had notable effects on employment status, optimism of its participants or on their experience of and attitudes towards economic scarcity. By contrast, the program does seem to have had some important behavioral impacts. Individuals who went through the training exhibit lower ingroup-outgroup bias in the dictator game.
To see if the program affected hosts and refugees differently, we conduct a set of heterogeneity tests. The training has a positive, albeit small, impact on optimism amongst refugees but not the host community. Ability to meet current needs improves for refugees, too, and is higher among the treatment than the control group. No comparable effect arises for hosts. Even the behavioral impacts delineate in this way. The reduction in ingroupoutgroup bias is driven, entirely, by refugees, with no significant change among the host community. By contrast, optimism among members of the host community actually goes down as a consequence of acceptance into the program. Despite this, we see no change in employment status for either group, showing that social impacts are driven by something in the program, itself, and not via its employment effects.
Given that tensions between hosts and refugees are often driven by negative perceptions and fears among host communities (Fajth et al., 2019;Kheireddine et al., 2021), our results -while narrowly positive for the perceptions and behaviors of refugees -suggest major limitations in the achievements of the intervention. A lack of employment effects is perhaps to be expected at the time of our endline data collection. Training required a full-time commitment from participants, which should reduce capacity to undertake job searches and work. 2 Despite this, the results show notable, significant and positive changes for the refugee community. Experience of short-term economic scarcity appears to decline (although we do not see improving expectations of long-term improvement, suggesting a potentially limited time horizon for these effects) and optimism increases. Both, in part, could drive the improved behavior towards hosts hinted at by the behavioral games but do not explain the full effect. At the same time, it is striking that these results do not extend to the host community. This suggests that host communities might require differing forms of intervention to shift their behavior towards refugees. These results, therefore, raise the question about the capacity of joint, single-input programming to meet the needs of both communities.
This work contributes to the literature on which policy interventions can build cohesion and reduce tensions between hosts and refugees, even in severe and prolonged episodes of forced displacement. We also show some key limitations of this approach, particularly for the host community. There is a tendency for programming for hosts and refugees to look identical even when tensions might not run in both directions, or when the reasons for tensions between communities might not be symmetrical. We also contribute to wider debates on whether or not it is possible for jobs programming to deliver on social, as well as economic, outcomes. This work is complementary to a range of other articles within this World Bank paper series. For example, the general lack of adverse group-based behaviors at the baseline stage overlaps with Aksoy and Ginn (2021), who show that refugee arrivals do not correlate with adverse attitudes in host communities, at least in the short-term. Albarosa and Elsner (2021) show no impacts on self-reported social cohesion in Germany; and Murard (2021), who shows that refugees inflows do not impact on political fragmentation. More generally, they fit within a more complex set of findings within the series. Pham et al. (2021) show that in Eastern DRC, overall, people had negative perceptions related to social cohesion. However, those with experiences hosting the displaced, particularly IDPs, had more positive views of social cohesion. Ruiz and Vargas-Silva (2021) suggest that refugee return can undermine social cohesion. Our analysis of the impacts of the TVET program, similarly, are complementary to other work in the series. Agüero and Fasola (2021), show limited social cohesion impacts from a cash-transfer in South Africa, but note positive outcomes in other attitudinal and behavioral domains. Betts et al. (2021) show positive impacts from intergroup contact in urban areas, which is potentially replicable in mixed host-refugee training groups.
The remainder of this article is structured as follows: in the next section, we discuss the background context and design of the intervention. In Section 3, we discuss the theoretical motivation of our work, including a review of the literature and the derivation of the theories and hypotheses that inform our work. In Section 4, we discuss the data and methods used to identify the impacts of the program. In Section 5, we present our results. We offer reflections from these findings in Section 6.

Study Location:
The conflict in Syria has resulted in almost 6.6 million refugees fleeing the country, with over 5 million residing in neighboring countries (UNHCR, 2021a). Jordan and Lebanon, already had absorbed millions of refugees due to various conflicts in the region, with both countries taking in millions of Palestinians, and Jordan continues to host 67,000 Iraqis (UNHCR, 2019).
In Jordan, there are 657,000 registered Syrian refugees and as many as 1.4 million unregistered (ACAPS, 2021), although not all in this estimate consider themselves as refugees. 3 Either way, this has increased population in Jordan by up to 10%. Since the Syrian conflict, economic growth in Jordan has been sluggish, and unemployment has increased from a low of 11% in 2014 to 24.7% in 2021 (World Bank 2021), and the youth unemployment rate is estimated at 50% (World , suggesting the situation has further worsened since the last rounds of official statistics. The Covid crisis has only put additional pressures on the economy. Many of the Syrian refugees in Jordan are from southern Syria, near the border, and share tribal affiliations with their hosts and largely are Sunni. These shared ties help to minimize potential tensions amongst groups, with most Jordanians reporting being accepting of Syrian refugees (Alrababa'h et al 2020). Additionally, the Jordanian government delineated sectors where Syrian refugees could receive work permits, to avoid tensions with citizens regarding job opportunities, while at the same time recognizing the need for Syrian refugees to work. These include construction, agriculture, food and textiles (UNHCR, 2017).
In Lebanon, there are 1.5 million Syrian refugees, with close to 1 million of which who are registered, accounting for almost a quarter of Lebanon's population (UNHCR, 2021b). In addition to the pressures that the influx of refugees has presented, other economic and financial crises, including cycles of protests, Covid, and the explosion in the Port of Beirut, have contributed to plummeting growth rates and the devaluation of the Lebanese dollar (World Bank 2020). Youth unemployment has been estimated by the Ministry of Labor at 37% and general unemployment estimated in the media to be as high as 25% as of August 2019 (Hamadi, 2019), suggesting a worsening picture. While there aren't updated statistics on youth unemployment, the general unemployment rate is currently estimated at 40% (World . The combination of the previous financial crises, protests and Covid has put more than half of Lebanon's population below the poverty line (World Bank, 2020). Similar to Jordan, Lebanon has restricted the sectors that can legally hire refugees. These are construction, agriculture, and environmental/cleaning services. Unemployment rates are considerably higher for refugees, particularly women (VASYR 2020). Moreover, the Covid crisis has contracted the construction industry, one of the few sectors where refugees could find legal employment (VASYR 2020).
With this overall deteriorating economic landscape, the presence of a large number of refugees in Lebanon contributes to an underlying fragility. In a poll conducted in 2019, Lebanese cited resource constraints related to public services and jobs as contributing to both intra-Lebanese and host-refugee tensions (UNDP and ARK, 2019). While tensions related to employment decreased after restrictions were put in place with regard to refugee employment, it is unclear how recent economic crises may have changed tensions, though there is widespread agreement that risks to social and civil unrest are growing due to this combination of crises (World Bank 2021).

Description of the programs:
To address the risk that economic pressures could increase instability, Mercy Corps implemented the Fostering Resilience by Strengthening Abilities in Lebanon and the Access to Justice and Jobs in Jordan, funded by the Dutch Ministry of Foreign Affairs. This analysis also includes data from the 3Amaly program in Lebanon, funded by Global Affairs Canada. All three programs focus on increasing employment through skill building, targeting hosts and refugees who are largely 18-34 years old. These interventions are targeted in locations where a significant influx of refugees could affect labor markets. In Jordan, Mercy Corps implemented the program in Irbid and Mafraq Governorates, which host over 47% of the Syrian refugees in the country. In Lebanon, the programs were implemented in Zahle, West Bekaa, Chouf, Jezzine and Saida where one-third of the population are Syrian refugees live.
Participants enrolled in courses that aligned with their private interests as well as market demand and sectors in which refugees could legally work. Courses were implemented by local training providers and lasted two to eight weeks. Topics included aluminum fabrication and installation, woodworking and carpentry, food and dairy processing, electrical repair, beautician, light construction rehabilitation, mechanical repair, artisanal manufacturing, greenhouse maintenance, and drip irrigation installation and repair. Although a small number of sessions trained only members of one nationality-partially due to employment restrictions--a majority mixed host-refugee groups. On average, each group contained an approximate mix of 65% hosts and 35% refugees.

Theoretical Motivation:
Literature Review: Jobs programs are often utilized to not only promote economic outcomes, but also social cohesion goals. As delineated in the World Development Report 2013, there are two main pathways for jobs to promote social cohesion. One pathway is indirect. When jobs are scarce, the heightened competition can reduce prosocial behaviors, like altruism, cooperation or trust (Grosch et al., 2017;Holmström, 2017;Lazear, 1989) and increase antisocial ones, like willingness to harm (Falk and Szech, 2013). These tendencies are magnified in the context of inter-group competition, which is associated with an increase in willingness to discriminate against members of outgroups (Sääksvuori et al., 2011;Abbink et al., 2010). Jobs programs may alleviate economic insecurity through the acquisition of work or greater optimism about finding employment. This in turn alleviates related feelings of competitiveness in the job market, reducing societal tensions. Additionally, employment reduces the ability of elites to use financial incentives for recruitment.
The other pathway between jobs and social cohesion is more direct. The job itself may promote social cohesion through contact and interaction with people from other backgrounds (Okunogbe, 2016). Those involved learn about people from different backgrounds, realizing there may be more similarities than differences between them. Jobs also provide people with a sense of purpose and status, elevating their social identity, and reducing the need to find meaning elsewhere, such as in violent groups (Pixley, 2019;Herriot and Scott-Jackson, 2002).
The question of whether jobs programs, as opposed to having a job, alleviate societal tensions has limited and mixed evidence (Brück et al., 2021). For one, jobs programs in fragile environments have shown limited results, largely due to labor market demand constraints (Blattman and Ralston, 2015). If the job program does not produce economic effects, the effects on social cohesion may be constrained. Additionally, much of the work examining jobs programs and societal tensions and stability, as opposed to jobs programs aimed at reducing crime, has focused on participation in and attitudes towards political violence. These studies largely show that jobs programs, while improving some economic outcomes, had limited effects on stability (Blattman et al., 2014;Kurtz, 2015) except in the presence or perception of stronger governance (Fetzer, 2020;Dasgupta et al., 2017;Kurtz et al., 2018;Lyall et al., 2019). Additionally, job programs seem to have added dividends when the motivation for fighting was primarily economic (Blattman and Annan, 2016).
While the ability of jobs programs to alleviate economic stress, and therefore reduce societal tensions may be constrained in the weak labor markets that are often found in fragile states, jobs programs themselves may alleviate societal tensions regardless of the economic outcomes. In workplaces and educational facilities, jobs programs provide an opportunity for people to interact with people from different backgrounds. For example, in a study of computer training program in Northern Nigeria, those who participated in mixed Muslim-Christian classrooms showed more cooperative behavior than those who participated in either all Christian or all Muslim classrooms (Scacco and Warren, 2018). Jobs programs often include technical and relational (soft) skills, the latter of which helps people manage social interactions more productively (Darvas and Palmer, 2014).
Although not the only source of tension between hosts and refugees, challenges with refugee integration exemplify a situation where perceptions of economic scarcity often drive anti-refugee sentiments and discriminatory behavior. Jobs programming might help alleviate those tensions. Refugee flows and the perception of the effect they have on access to jobs and other forms of economic infrastructure are commonly cited sources of tensions between hosts and refugees (Adida et al., 2018;Alsharabati and Nammour, 2015;Hangartner et al., 2019;Wike et al., 2016). This is despite the fact that refugees often have a positive impact on local economies (Taylor 2016).
Yet research on perceptions of refugees in the Global South raises questions whether negative views of refugees are largely about competitiveness, but instead related to identity and cultural preservation. In sub-Saharan Africa, countries where the leadership had ethnic ties to refugees were more likely to have more generous policies towards refugees (Blair et al., 2021). In Jordan, while the influx of refugees raised worries about economic scarcity, empathetic attitudes based on common cultures seemed to dominate Jordanians' perceptions and attitudes (Alrababa'h et al, 2020). While these two paths are distinct in the literature review, we recognize that perceptions of economic scarcity and identity may not be orthogonal. If refugees do not share cultural ties with significant proportions of the population, this could activate group-based threat and competition related to economic resources (Craig and Richeson, 2014).
If economic scarcity and competition for a scarce resource (jobs) are driving attitudes among hosts and refugees, any change in social cohesion or pro-social behavior will be dependent on improvement in economic outcomes. However, if changes in cohesion occur despite little increase in employment, the interaction within the training is likely driving any shifts. For example, in a study of host and refugee children in Turkey, contact plus explicit perspective-taking exercises led to more prosocial behaviors between groups (Alan et al, 2021). However, as children do not feel the economic competition, at least not acutely, a jobs training program provides an apropos context to try to disentangle these different pathways between jobs and social cohesion.

Hypothesis Development:
From this literature, we deduce a series of potentially relevant routes through which this program can deliver social, as well as economic, change. We discuss these hypotheses and their derivation below.

Economic Change:
First, we consider the economic potential of the program. Broadly speaking, we would anticipate that a TVET program should increase the competitiveness of its graduates in any given labor market. However, we also note that the trainings require a hefty time commitment, which can restrict both the capacity to undertake work and the capacity to undertake job search. This is not true for the control group. Consequently, when we collect data from beneficiaries immediately following the completion of the training, we actually anticipate no positive, and potentially negative, economic effects.
From this, we deduce: H1: At endline, employment indicators for, and the economic status of, the treatment group are unlikely to have improved above those for the control group. It is possible that economic indicators might actually have worsened for the treatment group.
However, we anticipate that beneficiaries will anticipate improvements in their economic situation in the future, as they enter into the labor market with new skills. In this regard: H2: At endline, beneficiaries of the program will exhibit improved levels of optimism and expectations about capacity to meet future needs, relative to the control group.

Social Change:
Especially because we do not anticipate economic improvements by endline, we consider our analyses on social change to focus on the program effect of the intervention. Both directly and via H2, the program should reduce the anticipated experience of excess competition in the labor market in the future. As a result, participants may become less biased toward their own group, feeling less need to protect them or to give them an advantage. Specifically: H3: At endline, those in the training group will exhibit a set of behaviors that indicate reductions in bias towards members of one's group, relative to the control group.

Research Design:
Data Collection: Data were collected, subject to voluntary participation, from all individuals included in Mercy Corps' initial outreach to participants. All individuals self-selected their willingness to participate in the training program. The treatment and control groups were assigned from this oversubscribed list. Selection into the training group was based on a "vulnerability score" that gave priority to younger, female and unemployed individuals. Despite this approach, intake was "fuzzy" -participants were ordered by their vulnerability score, with the most vulnerable entering up until capacity. In some intakes, individuals with comparatively high scores were not taken into the program. In others, individuals with comparatively low scores were included. We construct our treatment and control groups from these intake decisions. Data were collected from members of the host and refugee communities in each country. 4 In both Jordan and Lebanon, the intervention was implemented on a rolling basis. As soon as one training cycle was completed, another would begin. Data were collected in three waves during each training cycle. First, during an "outreach" phase, where data were collected in order to assign treatment status. Second, at "baseline", which occurred before the training had begun but after treatment assignment was known. Third, data were collected at "endline", immediately following the end of the training. Data collection for those assigned to the treatment and control followed the same pattern. 5 Outreach and baseline data collection took place less than a week apart and were collected between July 2018 and September 2019. Between outreach and baseline, one full survey round was collected due to potential survey fatigue and on the understanding that nothing of importance would likely change in such a short period. Basic demographic information, such as age, gender, marital status and employment status were collected at outreach. At baseline, additional indicators were collected, relating to the behavior, attitudes, opinions and personalities of the participants. The only exception to this is data on optimism, which were collected at both outreach and endline. This allowed us to test whether or not the intake decision had effects, even before the training began. Endline data were collected between July 2018 and November 2019 and repeated the combined outreach and baseline surveys and experiments.

Variables:
We collected a range of survey and experimental indicators in order to assess our key research questions and associated hypotheses: 6 Economic and life optimism: We collected two survey questions about optimism at outreach, baseline and endline. These questions ask individuals to rank their expectation that their life and economic situation will be better in one year than it is now. Answers are scored on a Likert scale running from 0 (significantly worse) -10 (significantly better). The survey questions on optimism were collected at outreach, baseline and endline.
Employment status: Due to slight differences in access to labor markets for refugees in Jordan and Lebanon and differences in how we were able to ask about employment status, we tabulate employment status as whether or not an individual is employed. Participants were asked at outreach about their employment status, and, in subsequent rounds, whether or not this had changed. This variable is coded 0 for not currently employed and 1 for employed.

Economic scarcity:
We collect survey questions on individual perceptions on: ability to meet current needs; ability to meet future needs; expectation that access to jobs is fair; expectation that salaries are fair; and belief that unfair access to labor markets fuels tensions. Ability to meet current and future needs are coded on a Likert scale running from 1 (completely unable) to 5 (fully able). The "fairness" indicators are coded: 0 (unfair) or 1 (fair). Whether or not competition around employment contributes to tensions is captured on a 1 (not at all) to 5 (absolutely) Likert scale.

Intergroup behaviors:
We collect data from two one-shot incentivized behavioral games: the dictator game (a division game where players choose how to split a prize) and stag hunt (which gives players the chance to cooperate). 7 In each wave of data collection, players were randomized, at the session-level, to play either with a partner from the host or refugee community in that country. For example, a Lebanese player could be paired with a Lebanese or a Syrian resident in Lebanon but not with a Jordanian or a Syrian resident in Jordan. Partner identities were re-randomized between the waves so that not all players played with a partner of the same identity in both rounds. We made clear that partners were not individuals in the same room and, at endline, that the partner was not the same partner from baseline.
A hint was given about the partner's identity based on dialectic differences in the words for common foods, along with a small amount of innocuous information (approximate age, favorite hobby and marital status). 8 Sample intakes and partner assignments by data collection wave are shown in Table 1. This prime relies on a minor, and subtle, difference in dialects in settings with an otherwise high degree of cultural similarity. Standard hints, such as names or language, would not sufficiently differentiate nationalities. More direct ones, such as stating an individual's nationality, risk interviewer demand biases. This "prime" relies on three things. First, that due to cultural similarities between the countries, many dishes are commonly eaten in both origin and host countries. Second, that dialectic differences mean that some of the same dishes are called (slightly) different things across the region. Third, again due to cultural closeness, the nature of these variations is known across the region. 9 Additional measures: We collected usual socio-economic and demographic information, including: age, gender, marital status and education. In addition, we collected data on selfreported risk preference and a short-form personality survey to ascertain GRIT among participants. Noting that a small number of participants did not answer all survey questions, we undertake a regression-based data interpolation process to complete the dataset. 10 We present summary statistics of demographic data and other covariates for the baseline (Top) and endline (Bottom) for Jordan in Table 2 and for Lebanon in Table 3.
[TABLES 2 AND 3 ABOUT HERE]

Identification:
The "fuzzy", treatment intake is not random. As can be seen in Table 1, there are some elements of attrition from the sample. The sample decreases by about 10% from baseline to endline. We, therefore, first examine whether or not there is structure, both, to selection into the treatment group and attrition, which could undermine our econometric approach, where we rely on difference-in-difference estimators. Imbalance between treatment and control groups could undermine the key assumption of parallel trends. For example, given that men and women face different barriers in the labor market, we should not expect employment to evolve in the same way for men and women after the treatment. We would expect to observe a difference-in-differences for a treatment group where women are more common than in the reference group, even without the program.
To test for imbalances, we run a simple regression of treatment and attrition indicators at baseline on the socio-economic and demographic controls, GRIT indicators, self-reported optimism, employment status and risk. Table 4 (Column 1 for the treatment analysis, Column 2 for the attrition analysis) shows some signs of structure. In particular, host status and risk preferences are significantly different between treatment and control, with employment status and education level also significant at 10%. Marital status and education are important predictors of attrition. As we might expect, these imbalances suggest some threats that, if left uncorrected, could undermine the parallel trends assumption of difference-in-difference estimators. That said, we see no sign of differences between treatment and control, or attritors and non-attritors, over the key GRIT personality features. This suggests that members of the treatment group are not, for example, more motivated to succeed than members of the control group.
To account for these biases, we generate a series of inverse probability weights to balance the data. These weights define the probability of an individual with particular characteristics (e.g. host or refugee status) being in each of the treatment and control groups at baseline and endline and are used to rebalance the data in order to closer support the parallel trends assumption. Results are shown in Column 3 of Table 4. Following weighting, data balances on all key factors, including nationality. This suggests that the parallel trends assumption is more reasonable under the weighted dataset than in the raw treatment/control data. 11 Based on these analyses, we conclude that it is safe to use weighted OLS-based approaches.
As pre-specified, we then use difference-in-difference based estimators.
[ First, for the survey-based indicators, we analyze: where: is the variable of interest for individual at time ; is the regression constant; is a binary indicator taking the value of 1 if an individual is assigned to the treatment group; is a binary indicator taking the value of 1 if the data is observed in the second of the two waves; * is the interaction of these two variables and captures the impact of the program. is an × matrix of control variables, comprising: age, gender, host status, marital status, education level, risk, and GRIT indicators, as well as optimism indicators when these are not the outcome of interest. is a × 1 vector of regression coefficients; and is the idiosyncratic error.
For the games-based indicators, we analyze: where: is the outcome variable of interest for individual at time ; is the regression constant; and L are as they are in Equation (1); is a binary variable taking the value of 1 if an individual is assigned to play with a partner of his / her own nationality and 0 if with a partner of a different nationality.
* shows the general impact of the program on prosocial behavior; * * shows the degree of group bias. Thus, should the program reduce bias, 7 < 0. is the same × matrix of control variables. is a × 1 vector of regression coefficients; and is the idiosyncratic error.
We employ two small deviations from these approaches to produce the full set of results. First of all, as we do not have two sets of control variables from outreach to baseline, we run a fixed effects analysis to understand the impact of assignment to treatment status on life and economic optimism. Second, due to a data collection error in the field, indicators of economic scarcity were not collected from all of the control group at baseline. Instead, we seek to approximate the effect of treatment on these indicators by triangulating comparisons in two dimensions. First, we test whether or not these indicators improved for the treatment group from baseline to endline. Second, we test whether or not there are differences between the treatment and control groups at endline. This stops short of causality but still reveals interesting information about the dynamics at play.
We produce five outputs for each analysis, with the exception of the economic scarcity indicators. First, we use uncontrolled OLS. Second, we introduce control variables. Third, we remove the controls but add inverse probability weights. Fourth, we include controls and weights. Finally, we cluster our standard errors. Each cluster is a single data session, which delineates across treatment and control groups in each wave of data collection (denoted: treatment-session-line). These results show the impacts of control variables and weights (and in combination). Due to the large number of economic scarcity indicators, we present only the final specification for these analyses for parsimony.

Results:
In this section, we present a set of results for the analysis of the entire dataset, then present heterogeneity tests where we explore results for host and refugee samples. We present results as coefficient plots. Accompanying regression tables can be found in Appendix 2. 12

Presence of Group-Based Bias at Baseline:
First, we assess the extent to which group biases are observable at baseline. Figure 1 shows no sign of significant difference in giving to members of one's own group or members of one's outgroup. While this indicates that there is not much bias in behavior, this does not strictly mean that behavior towards groups may not shift due to training (Barriga et al., 2020), nor that tensions and biases between the communities exist (Berge et al., 2019), although that there is little to no bias remains a plausible explanation nonetheless. Main Results:

Impact of Treatment Assignment on Optimism:
We test whether or not acceptance into the program has any impact on individuals' optimism about their lives in general, or about their economic situation. As can be seen in Figure 2, the coefficients of these analyses are very close to zero and are statistically insignificant. From this, we conclude that the designation to treatment or control status has no impact on individuals' optimism about their future.

Impact of Treatment on Optimism:
Similarly Figure 3 shows see a range of strongly insignificant coefficients close to zero, suggesting no effect of the training program on the optimism of its beneficiaries, either.

Impact of Treatment on Employment Status:
Next, we test whether if the treatment impacts employment status. Figure 4 shows a highly insignificant coefficient very close to zero. The treatment appears to have little impact on the employment status by the end of the training. This is to be expected and is predicted by H1. That we don't see a negative effect emerge is, perhaps, more surprising, as it suggests a control group who were free to undertake a job search and work have been unsuccessful in doing so. This is, perhaps, suggestive of the difficulties of the labor market into which beneficiaries have graduated, and might temper longer-term expectations of the program.

Impact of Treatment on Perception of Economic Scarcity:
In Figure 5, we present two slightly different analyses. On the left hand side, we look at whether or not perceptions of economic scarcity have improved between baseline and endline for the treatment group. On the right hand side, we look at whether or not there are differences in perceptions of scarcity between the treatment and control groups at endline. Figure 5 shows an increase in ability to meet current needs in the treatment group. However, the other indicators -ability to meet future needs, perceptions that access to employment and salaries are fair, and perceptions that inequalities in access to employment drive tensions, do not move. We see no sign of differences between treatment and control at endline in any indicator other than with regard to tensions to surrounding employment.
Here, a belief that tensions arise from inequalities in access to employment are greater among the treatment group. This suggests little sign that going through the treatment program has led to notable impacts on individuals' perception of scarcity.

Impact of Treatment on Group-Based Biases:
Finally, we look at whether or not there are impacts of the program on group-based biases. In Figure 6, we look at choices made in the dictator game (left) and the stag hunt game (right). We are interested in two outcomes of interest -first, whether or not there are changes in overall behaviors and, second, whether or not there are changes in relative behaviors towards members of ingroups and outgroups. In neither game do we see any sign of increases in overall giving. Coefficients are close to zero and strongly insignificant. We also see no sign of group-based changes in behavior in the stag hunt game. However, in the dictator game the amount given to ingroups declines, relative to that given to outgroups, suggesting a reduction in ingroup-outgroup discrimination. That this occurs in the dictator game but not the stag hunt suggests that the treatment group has become, relatively, more generous towards their outgroups but are not more willing to cooperate with them.

Figure 6: Impact of Training on Choices in Dictator Game (Left) and Stag Hunt Game (Right)
Heterogeneity Tests: In this sub-section, we seek to understand the degree to which these results hold in both the host and the refugee communities, again using (newly) weighted OLS analyses.

Impact of Treatment Assignment on Optimism:
In Figure 7, we see very small (about 0.1 -0.15 points on an 11 point scale) but statistically significant impacts on life optimism for both the host (left hand side) and the refugee community (right hand side). We also see a significant impact on economic optimism for members of the host community. Additionally, we see significant variation in the nature of the effect between the communities. Members of the host community seem to become more pessimistic about their future as a result of intake into the program. Given the potential for tensions to run from the host community to refugees, this finding is particularly concerning. The reason why intake into such a program makes matters worse for hosts merits further exploration. By contrast, refugees appear more optimistic. This suggests that the null findings in the main analysis is, in fact, the result of these counteracting effects across communities cancelling each other out.

Impact of Treatment on Optimism
In Figure 8, we see no further impacts on life or economic optimism for the treatment group, beyond those associated with the treatment assignment. This suggests that the adverse effects of the treatment assignment are sustained at endline, reinforcing how concerning this finding is. By contrast, the results suggest further increases in life optimism among the refugee community that is now matched with a (marginally) significant increase in economic optimism. This suggests that the program has had major impacts -both through treatment assignment and delivery of the training -on optimism among the refugee community. We find no employment impacts for either community. In both cases, coefficients are close to zero and statistically insignificant. At the same time, we note a robust difference in the sign of the effect.
This suggests that the overall insignificant finding is not driven by either population being specifically excluded from the potential economic benefits the program might have offered.

Impact of Treatment on Perception of Economic Scarcity:
Although we see differences in the degree of statistical significance between hosts and refugees, a similar set of outcomes emerge in the analysis of the evolution of the treatment group's perceived ability to meet its current needs. In the host community, the finding is strongly significant. In the refugee community, the finding is larger in absolute terms but only marginally significant. The other indicators are strongly statistically insignificant. Perhaps of greater interest is that perceived ability to meet needs improves for refugees. Ability to meet current needs has improved among refugees in the treatment group from baseline to endline and is higher among refugees in the treatment group than in the control group at endline. Whether this derives directly from the program -which covered travel and subsistence costs for its participants -or from a more general sense of personal, economic or even psychological wellbeing is unclear. As with the main analysis, we also see that perceptions that tensions surround employment are significantly greater among the treatment group for both the refugee community and (with marginal significance) among the host community.

Impact of Treatment on Group-Based Biases:
In Figure 11, we see significant heterogeneities across hosts and refugees and notable differences from the main analyses. In the dictator game, while neither hosts nor refugees in the treatment group become more generous overall, the degree of ingroup-outgroup discrimination goes down among refugees. In the host community, the coefficient is close to zero and insignificant, suggesting no behavioral change. This shows that it is the refugee community that are responsible for the overall shift in the main analyses. In the stag hunt game, we again see no sign of changes in behavior of members of the host community. However, while there does not appear to be a group-based aspect. Members of the refugee community are, with marginal significance, more likely to choose to coordinate with their partner, regardless of whether or not the person is a host or refugee. These results suggest that the program has had an array of significant behavioral impacts on the refugee community that, disappointingly, are not replicated within the host community.

Discussion:
We find that the intervention has uneven effects among hosts and refugees. While we see no changes in employment for either group, we do see a change in behavior. However, this overall change in behavior is driven by the refugee population admitted to the training. Relatively speaking, they give more to hosts after the training than those in the control group. We do not see similar behavioral changes in the host community. Additionally, we see that admission into the program actually worsens the level of optimism of members of the host community. As host communities' often have negative attitudes and behavior towards refugees (Adida et al., 2018;Hangartner et al., 2019;Valli et al., 2019;Wike et al., 2016), a failure to stimulate positive change among the host community indicates that the intervention does not address their underlying concerns regarding employment and refugees.
Given that our endline survey was collected right after the completion of the training, it is perhaps not surprising that economic impacts failed to materialize. Indeed, we imagined such an outcome in our pre-analysis plan. Endline data was collected, literally, on the last day of a set of trainings that required significant time commitment from participants. That the control group, who were free to pursue and undertake employment, did not manage to do so more so than the treatment group suggests significant labor market constraints. This should temper expectations about the potential longer-term impacts of this program, which are reinforced by more recent macro-economic problems in the region. Both countries have suffered economically due to Covid, and Lebanon has had additional political and financial crises. Broadly speaking, these concerns are reflected by null findings in terms of optimism and experience / perceptions of economic scarcity in the host community.
Similarly, the program did not assist refugees in finding employment. However, in contrast to the host community, the program stimulated a range of positive attitudinal and behavioral outcomes among the refugee community. In particular, the program made refugees in the treatment group more optimistic, left them feeling more capable to meet their current needs and has resulted in a reduction of relative group bias in the dictator game. These social outcomes appear to be driven by a direct "program effect" as identified by Brück et al. (2021) as opposed to an indirect effect via employment gains. For these specific outcomes, we imagine, simply, that involvement in the program sent a positive signal to refugee participants about their future competitiveness in the labor market, which improved optimism. Ability to meet current needs might have derived from something more prosaic -the program provided small stipends to cover necessities for the training period. These small amounts may have been sufficient to help refugees but not hosts, which is why we see shifts in ability to meet current needs for refugees and not hosts.
The major differences in outcomes across the host and refugee communities suggest, not only differential effects of the program itself but also differential needs and expectations of the two communities, and how that may limit the ability of the same program to achieve similar results across both populations. The heterogeneity tests go some way to suggesting that improvements in general economic optimism and optimism about meeting basic needs (i.e., economic scarcity) are linked to wider interpersonal behaviors among the refugee community, given that the results move in the same direction. At the same time, since we control for optimism in our analyses, we note that other factors must also be contributing to behavioral change amongst the refugee community. In turn, it is also clear that, whatever these additional factors might be, they are not stimulating similar change in the host community.
While consistent with the predictions of contact theory -as most trainings took place in mixed nationality groups -and of reduced perceived competition, it is not fully clear if our findings represent a meaningful reduction in intergroup tensions. Changes among the refugee community occurred in the apparent absence of meaningful baseline group-based biases in either the treatment or control group. This could indicate that what is driving the positive results in the refugee community is not reflective of reduced tensions, but some entirely different mechanism, such as gratitude, as a consequence of the program. Such a mechanism does not, necessarily, arise due to the specific design of the program but simply as a consequence of any program having been made available to the refugees who entered the training, or at least, of the availability of a program that matched (self-perceived) needs among this community. Our results are consistent with the idea that refugees become grateful, indirectly, to the host community for any stimulus that might improve their situation and their behavior follows accordingly. One potential interpretation of our results is that refugees became more generous towards their hosts as a reflection of this gratefulness. This is consistent, both, with the significant finding in the dictator game (which captures other-regarding preferences) and the insignificant finding in the stag hunt (where cooperation enriches both players). Refugees may feel gratitude for access to the program, even if work is limited, and want to give back to their hosts for this potential opportunity.
That the effects do not move in both directions -that is, also from hosts to refugeessuggests limitations to the program's achievements and provides additional support for alternative interpretations of the results. Hosts may have higher expectations about what benefits they should receive. In both Jordan and Lebanon, refugees are only allowed to work in certain sectors, largely low status ones. To be in a training for work in these sectors may be perceived as a slight to host communities, hence why acceptance into the program reduces optimism in the host community. Certainly, it is less clear why hosts would be grateful towards refugees for receiving those benefits; but rather, might be more grateful towards their government, or the international community.
The lack of evidence of group-based bias at baseline among either hosts or refugees, and that our overall results show a reduction in group-based bias is largely driven by refugees, raises questions about the amount of bias that exists between these groups. While in both Lebanon and Jordan, host community attitudes towards refugees are not as negative as policy makers feared at the beginning of the Syrian crisis, they do exist, especially among sub-populations (Alrababa'h et al., 2020;UNDP and ARK, 2019). Our results indicate that in the domains we measured, with a youth population, biases between groups may not be strong. However, with a different population, or if we measured bias through different measures or in different dimensions, we might have seen more evidence of group-bias (Berge et al., 2019).
Our results speak to three literatures--social impacts of employment programs; contact theory; and reciprocation literature. With regard to the effect of employment programs on social impacts, we see that in some populations, in this case the lower power group (i.e. refugees) social and psychological gains to the program arise. While this seems to be more related to economic optimism and less due to actual employment, we nevertheless can see how a jobs program, through reducing perceptions of economic scarcity, can increase some forms of social stability when economic issues drive tensions.
That the results are largely driven by the refugee participants rather than hosts contradicts much of the work on contact between groups with unequal power (Ditlmann and Samii, 2018;Gubler, 2013). Typically contact is more beneficial to higher power groups as they are able to learn about the lower power group. However, given the length of time refugees have been in their respective host communities and the history between groups (for example, Syrians having had significant presence and even influence in Lebanon for years) and the high degree of economic and social contact outside of the program, knowledge about each other wasn't a factor. We do note that this is speculative given our inability to compare single identity vs mixed training sessions, limiting our ability to examine the role of contact.
One reason we may have seen more movement from refugees is that they were feeling grateful that after 7 years of conflict, they were still being afforded opportunities. Hosts wouldn't feel the same gratitude for the program, at least not towards refugees. Indeed, our results fit broader outcomes from the reciprocation literature, where individuals seek to "repay" kindness, not just to those individuals from whom they have received kindness (Whatley et al., 1999;Burger et al., 2006) but also towards society as a whole, or particular groups within it (Nowak and Sigmund, 2005;Hugh-Jones and Leroch, 2017). In our case, it is potentially important that some structure is imposed on such reciprocation -that refugees are not being more generous generally, but specifically to hosts. That results only arise in the dictator game -and not the stag hunt game -suggests that some measure of otherregarding action underpins the behavioral changes we see. In the dictator game, individuals choose between their own outcome and that of another person and can, therefore, be interpreted in some way as generosity. In the stag hunt game, by contrast, an individual seeks to maximize private outcome through coordination, which cannot be so easily attributed to these motives.
Our results also point to some potential difficulty in sustaining these effects over a longer time horizon. Unless the anticipated economic benefits of the program (e.g. employment, higher or more secure incomes) are realized, it is unclear how or why optimism about one's future, or perceptions about abilities to meet immediate needs, would endure. Should refugees become more pessimistic about their futures, it is possible that at least some component of the observed changes in interpersonal behavior will follow suit. Given the observed and known difficulties in the labor markets in Jordan and Lebanon, for both hosts and refugees, it is unclear how likely it is that these positive outcomes will be sustained into the medium-or longer-term. Similarly, it is important to recognize that these findings do not arise, in any sense, for members of the host community. In the immediate-term, this suggests that the program has failed to improve social relationships or mitigate tensions running from hosts to the refugee community. In a more general sense, this could reflect that while employment-based interventions for host communities are important in an economic sense, it does not follow that they also bring social or attitudinal change. Other interventions, focused on perspective-taking (Adida et al, 2018;Alan et al., 2020) or understanding similarities between groups (Williamson et al. 2021) may provide a more direct way of shifting host perceptions of refugees and mitigating tensions.

Policy and Program Implications:
Numerous donor-funded programs -not only jobs programs, but also infrastructure and educational programs -aim to address challenges between host communities and refugees by providing the same intervention to both communities and conduct activities jointly. However, our results raise questions about whether that is in fact "conflict-sensitive" in situations where the two communities have different needs and expectations. In the case presented here, jobs open to refugees might be considered low status jobs and undesirable for many Jordanians and Lebanese. This is why these sectors were open to refugees. While fortunately it didn't harden host communities' attitudes, the program did not address their hopes for employment and therefore we see little change in optimism and behavior among host communities compared to refugees.
Two components underlie tensions between hosts and refugees. One is related to scarcity. Host communities often worry that the influx of refugees may make ability to gain employment or receive public services harder. In these cases, development programs-such as job training or infrastructure-can address this scarcity. The second is related to stereotypes and lack of knowledge about each other. In these cases, programs that facilitate contact can address and overcome these stereotypes and knowledge gaps.
For efficiency, donors and implementers have tried to combine addressing these two components in one program. However, when the two communities' expectations and needs differ related to scarcity, as we see here, it may not be possible to combine achieving development and social outcomes in the same program and be truly conflict sensitive at the same time.
Policy makers and program designers need to better understand the expectations and needs of both communities when addressing scarcity and see if the same program will address the goals for both communities. If not, targeted programs related to the specific needs of each community may be more successful, at least in achieving development objectives. Then additional programs can focus on the social and psychological aspects of integration.
However, if the needs and expectations of the two communities are similar (e.g., basic education if students from the different communities are at similar levels or if labor markets are completely open), then combining the two objectives may make better sense. By being more intentional about these two objectives-development and cohesion-and identifying when it makes sense to combine objectives versus keeping them separate, programs may be more successful at addressing the economic and social underpinnings of tensions between hosts and refugees.
Additional Tables:   OLS regression of selection into treatment group (Column 1) and attrition (Column 2) on a range of key control variables. Column 3 shows the determinants of selection into the treatment group following the application of inverse probability weights on the demographic variables (marriage and host status) that determine selection into treatment or into attrition. Standard errors in parentheses. ***, ** and * = p<0.01, p<0.05 and p<0.10 respectively.