Policy Research Working Paper 10111 Informal Microenterprises in Senegal Performance Outcomes and Possible Avenues to Boost Productivity and Jobs İzak Atiyas Mark A. Dutz Africa Region Office of the Chief Economist June 2022 Policy Research Working Paper 10111 Abstract This paper explores differences and similarities across several characteristics that are correlates of better perfor- formal and informal microenterprises in Senegal. It uses mance (being in the top two clusters) for informal firms a new national sample of more than 500 firms, of which are identical to those for all firms in the high-performance two-thirds are informal and over 95 percent are micro-size, cluster: having electricity, having had a loan, and in terms employing five or fewer full-time employees. The analysis of uses of digital technologies, having a smartphone and finds that formal firms have average performance outcomes using a mobile phone to communicate with suppliers and that are in the range of three to five times higher than infor- customers. However, a sizable number of high-performance mal firms. Formal firms are also more likely than informal informal firms are lagging in terms of good characteristics. firms on average to possess “good” characteristics, namely That roughly half of formal firms and no informal firm had a assets and uses of digital technologies that are positively loan implies that it is possible to be in the top performance correlated with productivity, sales, exporting, and employ- cluster even without having access to such formal financing. ment. Despite these average differences, informal firms are That over half of formal firms in the top cluster as well as highly heterogeneous, with a sizable number similar to in the top decile of productivity and sales use inventory formal firms in terms of both performance outcomes and control/point of sales software as a management tool while good characteristics: the share of informal firms in the top only one informal firm does is both indicative of the small productivity and sales deciles having good characteristics number of informal firms that use these technologies and is substantial, and one-third of all firms in the high-perfor- suggestive of the potential for performance improvements mance cluster based on a data-driven combination of the if such technologies were used more widely. four performance variables are informal firms. Importantly, This paper is a product of the Office of the Chief Economist, Africa Region. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at mdutz@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Informal Microenterprises in Senegal: Performance Outcomes and Possible Avenues to Boost Productivity and Jobs İzak Atiyas and Mark A. Dutz1 Keywords: microenterprise, informal, 2G/2.5G feature phone, smartphone, digital technologies, productivity, jobs, inclusion. JEL codes: D22, J24, L25, L86, O14, O33, O55. 1Affiliations: Atiyas (Economic Research Forum and TUSIAD-Sabanci University Competitiveness Forum), Dutz (World Bank). Corresponding author: mdutz@worldbank.org. This research has received funding from the joint Africa Chief Economist-Digital Development Research Program on Technology Transformation for Jobs in Africa (project ID: P170151). The authors are grateful to Alison Gillwald and Onkokame Mothobi for making available the RIA (Research ICT Africa) ICT Access business data for 2017-18 and for their support in the use of these data. 1. Introduction This paper seeks to better understand the heterogeneity of microenterprises in Senegal—those with five full-time employees or fewer. It seeks to better understand both differences and similarities among microenterprises, including the performance gaps between the average informal and formal firms. It seeks to understand the extent to which some informal microenterprises are similar to “good” formal firms, where “good” firms are defined as those that have higher productivity and sales levels including sales abroad, and that generate more full-time jobs. It explores whether there are observable characteristics associated with these desirable performance outcomes, whether correlates of good performance apply equally to formal and informal firms, and whether any of these correlates could be candidates for appropriate policy support—enabling and empowering informal firms to improve their performance outcomes. A key set of hypotheses that the paper explores is the extent to which specific characteristics of higher performance microenterprises, both formal and informal, reflect the willingness, abilities and opportunities for employers and workers to learn about better practices, including to learn as they work through access to productivity-related information facilitated by the use of specific digital technologies (DTs)—coupled with access to financing to avail themselves of these learning opportunities and to invest in associated better technologies and access to markets to expand production and sales in line with new learnings, and to transition to more complex tasks and earn higher incomes over time. The paper begins with a simple framework that seeks to clarify the choices that firm owners or entrepreneurs make in deciding whether to remain informal or to become formal enterprises, and the constraints and opportunities that they face in their informal status. Formal firms are defined as firms that have legally registered either at the national level, local level, or both. Firms with sufficiently low productivity are presumed to self-select into informality whenever the net benefits of informality outweigh the net benefits of formality. However, it is plausible that there may also be non-selection effects at play, namely that firms have low productivity precisely because they are informal. These firms may face an informality trap where they cannot improve productivity because they lack sufficient access to technologies, financing, and markets, as well as to effective government support programs. The paper seeks to understand, to the extent possible from the available data, whether the performance gaps for informal firms exist more due to selection versus due to informality itself. This direction of analysis could provide insights into how best to enable and empower those willing and able informal firm owners and workers to generate better productivity and jobs outcomes. The paper explores firm-level heterogeneity linked to performance outcomes among both informal and formal firms in several ways. The paper focuses on four key performance variables: labor productivity, sales, employment, 2 and a binary variable capturing whether the firm exports or not. An initial exploration consists of comparing the entire distribution for formal versus informal firms of three key continuous variables, namely labor productivity, sales, and employment, plus a comparison of the productivity, sales, exporting and employment outcomes for the median and mean formal versus informal firm. Quantile regressions are then carried out for productivity and sales to explore whether the performance gaps change across the distribution of firms or whether they remain constant, to understand whether the 2 Productivity and sales are continuous variables. Employment should be taken as a discrete variable since firms in the sample are very small in terms of employment. 2 comparison of performance between formal and informal firms yields different results at other points in the distributions of these performance variables. The next section explores why the average informal firm has a lower performance than the average formal firm. It does this through two complementary approaches. First, propensity score methods are used to explore albeit in a tentative manner the extent to which performance gaps may be due to selection effects (that firms are informal because they have low performance) or due to informality itself (that firms have low performance because they are informal). Second, probit regressions are run to explore the main statistically significant correlates of (in)formality. The final section explores differences between formal and informal firms in terms of underlying firm-level characteristics. It highlights the large degree of heterogeneity across informal firms. A first sub-section presents and discusses comparisons of unconditional means and medians across the main available groups of firm-level characteristic variables. A second sub-section introduces the notion of “good” characteristic variables, namely those positively correlated with performance outcomes. It explores the extent to which informal firms are “similar” to good formal firms in terms of having these characteristics – both overall shares of informal firms, and the extent to which informal firms within the top decile of firms in terms of productivity and sales have these characteristics. A third sub-section broadens the exploration by creating three “clusters” of firms, namely high-, medium-, and low-performance firms, based on a data-driven mix of the four performance variables, and examines the share of informal firms in the top-performance cluster, relative to formal firms and the other clusters. The heterogeneity within the group of high- and medium-performance informal firms provides several possible entry points for policy to help support those informal firms willing and able to upgrade their performance over time. The main findings of this paper include: • On average, formal microenterprises have higher performance outcomes than informal microenterprises. Formal firms dominate informal firms in terms of the overall distribution of labor productivity, sales, and number of jobs created: more precisely, formal firms’ density distributions of these performance variables are to the right of those of informal firms. For both the median and mean firms, formal firms have productivity, sales, exporting and employment outcomes that are in the range of 3 to 5 times higher than informal firms. And based on quantile regressions, the productivity and sales gaps between formal and informal firms remain roughly similar across all quantiles. • Exploratory analysis using propensity score weighting methods suggests three distinct findings. First, the effect of informality per se on productivity is not substantial on average, so that gaps in productivity seem to be mostly explained by the fact that low productivity firms select into informality to start with. According to this tentative finding, firms are informal because they have low productivity. Second, the analysis suggests that informality itself may be playing a larger role in constraining employment. According to this tentative finding, firms have low employment because they are informal, with informality serving as a trap preventing jobs growth. Third, the gap in sales between informal and formal firms on average seems to be explained both by selection and the causal impact of informality itself. These findings are only weakly supported by complementary (and simpler) regression analyses; they would need to be revisited with more in-depth analyses with panel data. However, if confirmed with better data, they would suggest that to address productivity shortfalls, it may be best for policy to be directed towards the underlying constraints impeding productivity across formal and informal firms. In addition, to address the small jobs numbers generated by micro-size 3 informal firms, it may be important also to address the stigma and incentives that accompany informality—with many informal firms perhaps unwilling to expand employment so as to remain below any regulatory radar screen or to avoid possible harassment. • Based on available data, the strongest average correlates of selection into formality are productivity, having electricity, being located in an urban area, being a transformational entrepreneur (namely an owner who chose entrepreneurship not because s/he could not find a job but due to the profit-making opportunity that owning a business provides), being a young female owner, and having large formal suppliers. The strong positive association between formality and productivity is aligned with the exploratory findings of the propensity score weighting approach that low productivity is a result of informality itself. Both findings support the view that policy should strive to improve the productivity of those informal firms willing and able to do so, as this should also enable and empower informal firms to choose formal markets over time. • Formal firms are more likely than informal firms on average to possess “good” performance-linked characteristics, namely characteristics that are statistically significantly and positively correlated with productivity, sales, exporting, and employment. The main performance-linked characteristic variables are regarding innate features, having electricity (for productivity and sales); regarding assets, having vocational training (for employment), having had a loan and a line of credit from suppliers (for productivity, sales, and exporting), and having large suppliers, foreign suppliers, and large buyers and non-local buyers as opportunities that all are presumed to support learning and market access (for productivity, sales, and exporting, with the latter only for sales and exporting); regarding DTs for access, having a 2G/2.5G phone relative to not having any mobile phone (for productivity and sales), a smartphone (for productivity, sales and exporting), and a computer (for productivity and sales); regarding DTs for external-to-firm GBFs (general business functions), communicating with suppliers and customers through mobile phone (for productivity and sales), paying suppliers and receiving payments from customers using mobile money (for productivity and sales, and also exporting for receiving payments from customers), and using internet to look for suppliers, understand customers, and for e-commerce (for exporting); and regarding DTs for internal-to-the-firm GBFs, using inventory control/POS (point of sales) software (for productivity and sales). • In addition to insights based on averages and other summary statistics of the entire distribution, a key finding is that informal microenterprises are highly heterogeneous, with a sizable number that “look” like formal firms with respect to indicators of performance and that possess “good” performance- linked characteristics. First, the share of informal firms with a good characteristic among all firms with this characteristic is substantial: it ranges between 17 and 72 percent. Second, the share of informal firms in the top productivity and sales deciles having good characteristics is also substantial: 37 percent of top decile productivity firms (19 out of 51) and 25 percent of top decile sales firms (11 out of 44) are informal firms. Third, one-third of all firms (16 out of 50) in a high-performance cluster based on a data-driven combination of the four performance variables are informal firms. Importantly, a number of characteristics that are statistically significant correlates (at least at the 5% level) of better performance (being in the top two clusters) for informal firms are identical to those for formal firms in the high-performance cluster: having electricity, having had a loan, and in terms of uses of DTs, having a smartphone, and using a mobile phone to communicate with suppliers and customers. Hence for these characteristics, high performing informal firms are similar to high performing formal firms. 4 • There are also a sizable number of informal firms that look like formal firms in terms of performance variables (that are in the top deciles of productivity and sales, and in the high-performance cluster), but that are behind in terms of having a large number of good characteristics (such as use of DTs associated with performance outcomes). The largest differences between informal and formal firms in the top cluster, aligned with firms in the top decile of productivity and sales, are whether the firm had a loan, has supplier credit, and has a computer, in addition to the manager having vocational training and using inventory control/POS software: the differences are in the order of 40 to 60-plus percentage points. Interestingly, having had a loan, having a computer, and using inventory control/POS software are all highly statistically significant correlates of being in the top cluster (in addition to having electricity, having upstream and downstream value chain relationships with more knowledgeable or market-access facilitating suppliers and buyers, and using a mobile phone to communicate with suppliers and customers). This implies that it is possible to be in the top performance cluster even without having access to these forms of financing: roughly 55% of formal firms in the top cluster have not had a loan, and no informal firm has had a loan, though they are still top performers. Using inventory control/POS software is an important attribute of formal firms in the top cluster as well as in the top decile of productivity and sales, with over one-half of firms using this management tool. In contrast, only one informal firm uses this type of DT. This is both indicative of the small number of informal firms that use computers and smartphones (such management software cannot be accessed without a computer or smartphone), and suggestive of the potential for further performance improvements if appropriate applications of this type were used more widely by informal enterprises, if the correlations reported above between use of DTs and performance variables partially reflect a causal impact as well. Enabling them better access to such DTs, especially those that allow employers and workers to learn as they work, is expected to further improve their performance. • Conditional on being informal, being a transformational entrepreneur is a highly statistically significant positive correlate of higher performance outcomes (being in the top two performance clusters, relative to being in the lowest cluster). The willingness of being a profit-seeking entrepreneur is strongly associated with better performing informal firms, and conversely the roughly half of all informal firms that are in the low-performance cluster are likely there because they are mostly subsistence entrepreneurs who are only owners due the absence of alternative wage jobs. • Young women entrepreneurs pose a special challenge and opportunity. Regression results show that, controlling for industry and whether the firm is in an urban location, on average firms owned by young women have a 22 percentage points higher probability of being formal. And those that are formal are more likely to have good characteristics. However, they remain behind in terms of key performance outcomes. Good characteristics are highest among formal firms owned by young women: 52 percent have a smartphone vs 29 and 28 percent for young and older men-owned firms, 54 percent use mobile money to pay suppliers vs 42 and 33 percent, and 26 percent use inventory control/POS software vs 11 and 9 percent, respectively. In contrast, women-owned informal firms and especially young women-owned informal firms are largely under-represented in the high-performance cluster of firms: the share of women-owned firms is only 7% but 59% in the low-performance cluster, while the share of young women-owned firms is only 2% versus almost 10 times more (19%) in the low-performance cluster. And conditional on being informal, being a young woman-owned firm is a statistically significant negative correlate of higher performance outcomes, relative to being in the lowest performance cluster. As well, on two key performance indicators, namely median productivity and 5 the share of firms that export, formal firms with young women owners lag behind formal firms owned by young or older men. These findings are likely a reflection of biases preventing them to take advantage of the opportunities offered to men-owned firms. Policies that would enable them to better leverage their assets, supported by equal access to the range of performance-enhancing DTs that allow them to learn as they work, to financing, and to markets for the purchase of required inputs and for the sale of outputs, are important to enhance overall productivity, sales, exporting and jobs outcomes. 2. Literature review and conceptual framework Informality and associated lower productivity levels on average are assumed to result both from the selection choices of firm owners and due to the inherent constraints of being informal. In a recent analytical framework developed by Ulyssea (2018), firms with sufficiently low productivity are presumed to self-select into informality when the net benefits of informality outweigh the net benefits of formality. It is assumed that firms face a number of costs and benefits associated with informality and formality. Firms are assumed to choose to legally register or not based on undertaking a cost-benefit analysis. Examples of costs of formality given in the literature include costs of entry (e.g., costs of registration), plus tax, regulatory and harassment/corruption burdens associated with formality. Some of these costs could be due to purely wasteful or rent-seeking regulations (e.g., licensing costs that are maintained to preserve privileges of existing formal enterprises in industries or professions) and some may be useful from a social welfare point of view (e.g., reasonable phytosanitary or environmental regulations). As benefits of formality, the literature has emphasized, among others, the ability to access financial markets (i.e., the ability to get loans) and to access government support programs to improve productivity. Costs of informality often include expected costs of getting caught including associated fines (which would increase as enforcement increases) and the opportunity costs of not having access to the modern economy, including access to financial markets. The benefits of informality likely include increased flexibility (e.g., easier responsiveness to changing market circumstances, a greater ability to change product lines, etc.) as well as the ability to avoid tax, regulatory and harassment burdens. Based on such a model, Ulyssea proposes three types of informal firms, based on complementary views for understanding informality. Those that correspond to the “De Soto view” (or “held-back entrepreneurs”), namely enterprises that would switch to formality if entry costs were reduced to zero. The net present value (NPV) of these firms is higher remaining informal than becoming formal but the difference is less than entry or registration costs. Relative to other informal firms, these firms have higher productivity. They are kept out of formality solely due to high regulatory costs. To the extent that these entry costs are due to non-welfare-improving regulations and associated onerous licensing procedures and red tape, eliminating them would be clearly desirable. Recent empirical evidence suggests that these costs are either lower than economists used to believe, and/or that their reduction does not really create big changes in incentives to register. Based on an estimated structural model and matched employer- employee data on formal and informal firms in Brazil, Ulyssea infers that these firms make up 9 percent of firms. The second group (the “parasitic view”, 42 percent of firms) refer to those that would remain informal even if entry costs were reduced to zero. These are enterprises that would still have positive NPV if they somehow switched to formality, but the NPV of informality is higher. They are productive enough to survive in the formal sector but choose to remain informal to earn higher profits. Their productivity lies in a range below those of held-back entrepreneurs but higher than that of the third group, namely 6 “survival” or “subsistence” firms (49 percent of firms). The third group contains firms that would have negative NPV if they were formal and have a positive NPV only if they choose informality. They are too unproductive to ever become formal. Other things being equal, firms are categorized based on their productivity. The most productive firms choose formality. Most studies find that the average level of productivity among informal firms is lower than that of formal firms. In principle, this could reflect two dynamics. It could reflect selection, as emphasized by Ulyssea, that is, the simple fact that low productivity firms are more likely to choose informality and not register. It could also reflect an “informality trap” or a dynamic inefficiency of remaining informal. It could be that by being informal, firms are foregoing the benefits of accessing assets (finance, technologies, capabilities (e.g., vocational training), and effective government support programs) and markets that are necessary to improve productivity. To the extent that these non-selection effects are important, informality itself causes a lack of productivity upgrading, a situation that is not captured in the Ulyssea framework. Under this alternative perspective, these non-selection effects result in firms having low productivity because they are informal and result in social costs to the extent that by remaining so they stay smaller and create fewer jobs than if they became more productive. It would be useful for purposes of improved policy formulation to understand the extent to which any productivity gaps between informal and formal firms are primarily due to selection or to dynamic inefficiencies associated with informality itself. 3 More generally, in the Ulyssea framework, firm-level productivity is given and is not influenced by the economic environment or policy choices. Hence the main targets of policy in that framework are the main costs and benefits that firms include in their cost benefit calculations regarding whether to remain informal or not. By contrast, policy discussion surrounding informality often explores whether firm productivity can be enhanced by various types of support programs or mechanisms that would enhance learning, which in turn improves productivity (and perhaps consequently increase the likelihood of formalization). Some recent studies draw attention to the fact that in some countries there may exist a “middle range” of informal firms that in some respects are “similar” to formal firms and may be induced to choose formality over time through productivity upgrading policies. 4 While this study relies on a much more restricted data set relative to the one used by Ulyssea, it nevertheless adopts a broader analytical framework. Under this expanded framework, the design of policies depends on the characteristics of the country and the heterogeneous distribution of enterprises therein. For large informal firms, especially for those that have access to a variety of assets (e.g., informal firms that have bank loans appear to be not far from the domestic frontier in terms of technology adoption), their continued informality status may reflect particularly low levels of enforcement, in turn possibly driven by political connections and the ability to avoid regulations and to access rents. For such firms, higher levels of enforcement should likely be the primary policy instrument. Senegal does have such firms. For relatively high productivity smaller informal firms that still lack some critical external resources, the best policy approach may be to increase benefits that are traditionally associated with formality such as targeted productivity-upgrading support programs including facilitating participation in productive supply chains—so that these firms can better access markets and learn from larger formal downstream buyer firms and/or upstream supplier firms. Ideally, formalization would not be required in the short to 3 For example, Demenet et. al. (2015) find that firms that switch from informal to formal status in Vietnam substantially increased their value added. 4 See, among others, Aga et. al. (2021). 7 medium run until any inefficient costs associated with formality such as high registration costs, restrictive product licensing requirements and harassment/corruption costs are removed, with more productive growing informal firms likely to choose formality over time as the benefits to them become more valuable. Policy design should therefore be informed by an assessment of the private benefits and costs of formality and informality to influence the cost-benefit calculations of firms. If unnecessary regulatory costs are a main component of firms’ cost benefit calculations, addressing these would be a priority. 5 If informality has generated an informality trap, support programs increasing access to resources that are critical for productivity improvements could induce firms to become more productive, grow and formalize over time. Policy design should also be informed by an assessment of the social costs of informality and the welfare benefits of formalization, i.e., those that are not captured by private business calculations. The main social benefits include a reduction of foregone tax revenues, an increase in allocative efficiency (i.e. reducing the degree to which low productivity informal firms have larger market shares than would be warranted by their level of productivity while enabling the growth in earnings and jobs of those informal firms willing and able to upgrade their productivity), an elimination of “unfair competition” facing formal firms, and strengthening the prevalence of rule based norms of behavior that would make compliance with welfare improving regulations more widespread. The relative weight of each of these sources of potential benefits will depend on country circumstances. 3. Formal versus informal firms and their relationship with performance variables 3.1 Formal versus informal firms The Senegal data were collected through a Business Survey of microenterprises compiled by Research ICT Africa (RIA) in 2017-18.6 Formal firms are defined as firms that have legally registered either at the national registrar general or country-level revenue authority, at the local authority or municipality, or both. This definition aligns with what Ulyssea (2020) refers to as the dominant approach in the literature to define informality, namely a legalistic approach: he defines informal firms as those that do not register and pay entry fees to achieve a formal status. 7 Informality of the enterprise is referred to as the extensive margin 5 As mentioned above, the recent empirical literature has emphasized that unnecessary regulatory costs, in particular unduly high registration costs do not seem to play a major role in firms’ decisions and their elimination does not seem to generate widespread formalization. In particular, Benhassine et. al. (2018) find that an effort in Benin to induce firms to formalize (while at the same time decreasing costs of formalization) resulted in a significant increase in formalization but this formalization did not bring firms higher sales or profits. De Giorgi and Rahman (2013) find that an information campaign in Bangladesh made firms more aware but had no impact on registration. More generally, in their review of the literature, Bruhn and McKenzie (2014) conclude that “evidence on the effects of entry reforms and related policy actions to promote firm formalization … result in only a modest increase in the number of formal firms, if there is any increase at all. Most informal firms appear to not benefit on net from formalizing”. See also World Bank Group (2021). 6 Because RIA did not have independent information on the distribution of microenterprises, sampling for the business survey was done in parallel to the sampling for the RIA household survey examined in Atiyas and Doğanoğlu (2020). Specifically, the national census sample frames were split into urban and rural Enumerator Areas (EAs). Then EAs were sampled for each stratum using probability proportional to size. For each EA, two listings were compiled, one for households and one for firms; these listings served as the sampling frames. From each EA, 24 households and 10 firms were selected randomly. The data contain sampling weights, yielding EA-level representative data. See Mothobi et al. (2020) for a descriptive analysis of these data. 7 Legal requirements for any natural or legal person(s) wishing to exercise a commercial activity in Senegal include registration with the local business registry (RCCM, Registre du commerce et du crédit mobilier) and with the national tax authorities (DGID, Direction Générale des Impots et des Domaines, Senegal’s revenues and customs authority). Although the RIA data also include variables on whether the business pays local or municipal taxes and whether the business is registered for VAT or sales taxes, these are separate legal compliance issues for ongoing operation rather than entry registration: firms with annual turnover 8 of informality. Out of 517 firms in the full sample, 163 firms or 32 percent are formal according to this definition. As shown in Table 1, it is more common for firms to register locally: out of 517 firms in the full sample, 158 firms (31%) are registered locally while only 59 (11%) are registered at the central or national level. In terms of dual registration, out of these 59 firms that have registered centrally, 90 percent (53) have also registered locally, but out of the 158 firms that have registered locally, only 34 percent (53) registered also centrally. 8 While most of the paper focuses on this type of informality, section 3.4 briefly discusses the intensive margin of informality, namely informality of employment. Table 1: Distribution of firm (in)formality variables Registered at registrar general? No Yes Total Registered No 353 6 359 locally? Yes 105 53 158 Total 458 59 517 Note: Unweighted data. The questions are: “Is your business registered with any local authority / municipality?” and “Is your business registered with the country revenue authority (or national registrar general)?” 3.2 Formal versus informal firms and performance variables: Descriptive statistics An initial question is whether most informal firms are associated with lower performance outcomes than formal firms. The main performance variables are labor productivity, sales, employment, and exporting. Labor productivity is measured as value added (total sales minus raw materials & intermediate inputs plus water & electricity used in production) divided by the total number of full-time working people including owners. The value of sales is asked in three complementary ways in the survey: total sales, turnover, and “revenues money received by the business.” Employment is the number of full-time employees plus the number of owners. Exporting is a non-continuous indicator variable taking the value of 0 or 1 depending on whether the firm reports having customers located in other countries or selling goods or services abroad; it is typically reported as a share of the total number of firms in a particular sub-sample. The graphs in Figure 1 present conditional kernel densities (residuals of OLS regressions on urban and industry dummies) for formal and informal firms of the three main performance variables: log labor productivity, log sales, and log employment. The unconditional kernel densities of productivity and sales variables look similar. The densities of all three performance variables for formal firms are to the right of those for informal firms, although there are large areas of overlap. The second humps in the log productivity and log sales densities reflect the existence of a higher concentration of firms at relatively below FCFA 50 million are eligible to pay a unique global contribution or CGU, and thereby are exempt from separate VAT or sales taxes in Senegal. If formal firms that are required to pay local and municipal taxes do not pay them, this makes them non- tax-compliant rather than informal. And while the keeping of formal accounts is often included in national definitions of formality, the relevant RIA survey question does not allow a direct correspondence with Senegal’s approach. The national statistical agency, ANSD, labels firms as informal if they do not have a formal accounting system according to the West African Accounting System or SYSCOA standards or an alternate formal harmonized accounting system. 8 Before deciding on this two-group classification of enterprises, we explored whether the category of “semi-formal” firms, namely firms that are either registered at the local or national level but not both, is sufficiently different from firms registered at both local and national levels, to warrant defining three separate groups. Since the kernel densities of both groups were almost identical, since they were both almost identically statistically significantly associated with performance variables relative to informal firms, and since their quantile regression coefficients along the entire productivity and sales distributions were almost identical relative to informal firms, they are considered part of the same “formal” category. 9 high levels of productivity and sales. The density graph of log employment (total number of working people including owners) does not look very smooth because the underlying data are discrete. The density of employment for informal firms has a high value at 0 reflecting the large number of informal firms with self-employed owners and zero full-time employees. 9 Figure 1: Formal vs informal firms - Kernel densities of productivity, sales, and employment Note: Weighted data. Table 2 presents comparisons of unconditional medians and means for the four performance variables. Formal firms have substantially higher performance indicators than informal firms based on these summary statistics, in the range of 3 to 5 times higher. The productivity of the median and average formal firms are three times higher than those of the median and average informal firms, with the median informal firm having a monthly value added per working person of 61,250 FCFA vs 175,000 FCFA for the median formal firm. In terms of monthly sales, the median formal firm’s monthly turnover is more than four times higher than the median informal firm’s monthly turnover, 120,000 FCFA vs 500,000 FCFA; the difference for the average firm is seven times, skewed by the much higher sales of the top performing formal firms. The share of formal firms that have customers located in other countries is almost five times higher than for informal firms, roughly 22% vs 4.5%. The differences in the mean employment between informal and formal firms is also more than four times: while the mean informal firm employs less than one full-time employee, the mean formal firm employs more than 3 full-time employees. The difference remains large for the median informal vs formal firm: while the median informal firm has a self-employed owner and zero full-time employees, the median formal firm has 2 full-time employees. 9 With weighted data, the share of formal enterprises in the total number of firms with self-employed owners with zero full- time employees is 33%, 38% for 1-2 employee firms, and 49% for 3+ employee firms. Firms with self-employed owners and zero full-time employees are counted as firms with one employee, namely the owner. 10 Table 2: Formal vs informal firm performance outcomes – unconditional medians and means No.firms Labor productivity Sales Exporting Employment median mean median mean share (%) median mean informal 353 61,250 202,174 120,000 384,401 4.6 0.0 0.7 formal 164 175,000 622,135 500,000 2,686,768 21.6 2.0 3.1 TOTAL 517 90,000 335,650 200,000 1,114,746 10.3 1.0 1.5 Note: All values are based on weighted data. Labor productivity is measured as value added (total sales minus raw materials & intermediate inputs plus water & electricity used in production) divided by the sum of full-time employees and the number of owners. The value of total sales is asked in three complementary ways in the survey: total sales = turnover = “revenues money received by the business.” Exporting reflects shares of firms active in each sectoral area that report having international customers or non-zero exports (in response to the yes/no question “Does the business have customers located in other countries (selling goods or services abroad)?” Labor productivity and total sales are monthly values (with total sales based on annual audited statements if available and divided by 12), in local currency (FCFA). Employment is the number of full-time employees. How significant are these differences in means? Simple ordinary least square (OLS) regressions of the performance variables on a dummy variable capturing formality shows that the coefficients are highly significant (at 1 percent level). They remain significant when additional controls are introduced for industry and urban effects. 3.3 Formal versus informal firms and performance variables: Quantile regressions The analysis in the preceding sub-section suggests that there are statistically significant gaps in the mean performance between formal and informal firms. This sub-section explores whether the performance gaps change across the distribution of firms or whether they remain constant, to understand whether the comparison of performance between formal and informal firms may yield different results at other points in the distribution of performance variables. For example, the difference between formal and informal may be smaller at the top of the distribution. This could be, for example, because highly productive informal firms choose informality not because of certain characteristics that correlate with productivity but simply because it is profit-maximizing for them presuming that they are subject to low enforcement relative to their formal counterparts. Figure 2: Plots of quantile regression coefficients of log productivity and log sales on formal vs informal firms Panel A: Log Productivity Panel B: log sales Note: Weighted data. To explore such possibilities, we carry out quantile regressions of log productivity and log sales on informality dummies, controlling for industry and urban location. Additional quantile regressions with finer and coarser partitions were explored and the results remain similar. Figure 2 presents plots of resulting coefficients of dummies for formal firms (relative to informal) from a quartile regression of log 11 productivity (Panel A) and log sales (Panel B) on formality status and controls for industry and urban location. The horizontal lines in the pictures are OLS coefficients (measuring conditional correlations at the mean of the distribution), while the green lines plot the quartile regression coefficients. The shaded areas are the conditional confidence intervals. The graphs suggest that both coefficients are positive all along the distribution. The figures might suggest that the coefficients may be larger at the low end of the distribution for log productivity and at the top of the distribution for log sales. To assess whether this is the case, several tests of equality of coefficients were carried out: • For log productivity, equality of the coefficients at the 10th and 50th percentiles is rejected at the 5 percent level. However, this conclusion needs to be qualified because the number of formal firms in the lowest decile is very small (3 to be exact). Equality of coefficients at 10th and 90th percentiles, at 25th and 75th and 25th and 90th cannot be rejected. • For log sales, none of the tests of equality can be rejected. 10 Overall, evidence therefore suggests that the productivity and sales gaps between formal and informal firms are quite uniform across the distributions of productivity and sales. It is interesting to note that the formality/informality composition of firms changes sharply in different deciles of productivity and sales. For example, while out of 51 firms in the top productivity decile, 62 percent are formal (32 firms), this ratio drops to 6 percent (3/53) in the bottom decile. Further evidence on the average gaps in performance between formal and informal firms is provided in section 4. 3.4 Formal versus informal employment Informal workers are typically defined as those who do not have a formal employment contract. The RIA data allow a description of the prevalence of this type of firm as well as a characterization of their performance relative to firms that offer at least some of their workers a written employment contract. Out of the 517 firms in the full sample, just under one-half (249 or 48%) employ at least one full-time employee. Of these, as highlighted in Table 3, just over 10 percent (33 out of 249 or 13%) offer a written employment contract, denoted as “formal worker-employing firms”. Interestingly, roughly 20 percent of these firms (7 out of 33 or 21%) are informal firms, that is, unregistered at local and/or national levels. Table 3: Distribution of employment (in)formality variable across worker-employing firms Ha s a t l ea s t 1 empl oyee wi th forma l contra ct? For fi rms wi th 1+ workers No Yes Tota l i nforma l fi rms 110 7 117 forma l fi rms 106 26 132 Tota l 216 33 249 Note: Unweighted data. The question is: “How many of your employees have a written employment contract?” 10 Quantile regressions were carried out at finer intervals as well. For log productivity, when regressions are carried on the quintiles of the distribution (20 intervals), results suggest that the productivity gap between formal and informal firms are larger at the bottom of the distribution relative to the middle and the top. We have more confidence in the results reported in the text because as the intervals get smaller, the number of formal firms at the bottom becomes very small. 12 The graphs in Figure 3 present conditional kernel densities (residuals of OLS regressions on urban and industry dummies) for formal and informal worker-employing firms of the three main performance variables: log labor productivity, log sales, and log employment. As in the comparison between formal vs informal firms, the unconditional kernel densities of productivity and sales variables look similar. And also similarly, the densities of all three performance variables for formal worker-employing firms are to the right of those for informal worker-employing firms, with large areas of overlap. Figure 3: Formal vs informal employment - Kernel densities of productivity, sales, and employment Note: Weighted data. Table 4 presents comparisons of unconditional medians and means for all four performance variables for the subset of worker-employing firms. As this sub-sample does not include firms with self-employed owners and no full-time employees, and larger firms have higher productivity, sales, are more likely to export, and by definition have higher employment, all values are higher than those in the corresponding Table 2 comparing formal with informal firms. And as with the case of formal vs informal firms, formal worker-employing firms have substantially higher performance indicators than informal worker- employing firms based on these summary statistics, again in the range of 3 to 5 times higher. 13 Table 4: Formal vs informal worker-employing firm performance outcomes – unconditional medians and means No.firms Labor productivity Sales Exporting Employment median mean median mean share (%) median mean informal 216 105,000 331,798 375,000 1,238,338 11.1 1.0 2.1 formal 33 258,333 865,805 1,000,000 6,508,652 39.4 2.0 6.4 TOTAL 249 125,000 400,702 450,000 1,936,813 14.9 2.0 2.7 Note: All values are based on weighted data. Labor productivity is measured as value added (total sales minus raw materials & intermediate inputs plus water & electricity used in production) divided by the sum of full-time employees and the number of owners. The value of total sales is asked in three complementary ways in the survey: total sales = turnover = “revenues money received by the business.” Exporting reflects shares of firms active in each sectoral area that report having international customers or non-zero exports (in response to the yes/no question “Does the business have customers located in other countries (selling goods or services abroad)?” Labor productivity and total sales are monthly values (with total sales based on annual audited statements if available and divided by 12), in local currency (FCFA). Employment is the number of full-time employees. 4. Differences between average informal versus formal firms 4.1 Does informality per se reduce performance? To what extent do the average gaps in the performance between informal and formal firms reflect selection and to what extent are these differences the consequence of informality itself (or a treatment effect)? The available data do not allow us to provide a rigorous and definitive answer to this question since it is a single cross section. Panel data are required to better understand the causal effect of informality, especially if such data would allow observing some firms from the time that they are founded and some firms upgrading their technologies and capabilities and switching from informal to formal (or the reverse). Nevertheless, the data can be used for exploratory purposes to obtain some insights into the likely effects of selection and informality. A simple way to examine the likely existence of selection effects is to explore, in a regression framework, to what extent controlling for covariates that could play a role in selection would affect the conditional association between performance variables and informality. Table 5 reports some results. For each performance variable, the table reports the regression coefficient on a dummy variable that takes the value 1 if the firm is formal and zero otherwise for the following cases. In the case of “no controls,” the regression is on the dummy variable only. In the case of “controls A”, the following variables are added to the right-hand side: gender-age combinations, urban, schooling of the owner, whether the owner had vocational training, whether the owner is transformational, had a loan, had a credit line from a supplier, has large firms as customers, has non-local buyers, has large formal suppliers, has suppliers abroad, as well as controls for industry. These covariates are similar to the ones used in the discussion of correlates of informality (see section 4.2). In the case of “controls B”, log productivity is added to the list of controls in A. 11 We include log productivity in control set B because in most theoretical models it is a key determinant of firms’ choices between formality and informality (see, for instance, the description of the Ulyssea framework in section 2 where low productivity firms select informality by not registering). To help interpretation, normalized regression (beta) coefficients are reported in parentheses. 12 11Below, a similar set of variables are used to explore correlates of (in)formality. 12Normalized or standardized regression coefficients are a simple transformation of the regression coefficients by subtracting the variable’s mean and dividing by its standard deviation. This results in normalized variables having mean zero and a standard 14 Table 5: Conditional associations between (in)formality and performance with and without controls log productivity log sales log total no. of working people customers abroad No No No No Controls A Controls A Controls B Controls A Controls B Controls A Controls B controls controls controls controls 1.289*** 0.500*** 1.824*** 0.829*** 0.362*** 0.639*** 0.344*** 0.375*** 0.155*** 0.0593** 0.0500* (0.369) (0.141) (0.469) (0.211) (0.0935) (0.430) (0.230) (0.251) (0.254) (0.0957) (0.0814) [0.144] [0.151] [0.151] [0.154] [0.0712] [0.0593] [0.0601] [0.0603] [0.0261] [0.0268] [0.0270] Normalized beta coefficients in parentheses. Standard errors in square brackets. *** p<0.01, ** p<0.05, * p<0.1 The results show substantial decreases in the regression coefficients when additional controls are included. In the case of log productivity, the case of no controls shows a gap of 1.29 log points between the mean productivity of formal and informal firms; addition of controls A reduces the gap to 0.5 log points. 13 In the case of log sales, a gap of 1.8 log points is reduced to 0.83 log points under controls A and to 0.36 log points under controls B. In the case of total number of working people, the difference is reduced from 0.64 log points to 0.34-0.38 depending on whether log productivity is included among the controls. In the case of customers abroad, the difference in the probability of exporting drops from 16 percent to 5 percent, and in fact the latter is significant only at the level of 10 percent. These results suggest significant selection effects in the case of all performance variables. A more direct way to improve our understanding of the relative weights of selection effects relative to the existence of a causal effect of informality (i.e., the existence of an informality trap) is to create samples of formal and informal firms that are “similar” and compare the performance variables across these otherwise similar samples. The propensity score inverse probability weighting (IPW) estimator is used to assess the degree to which the performance gaps may be due to informality itself. The procedure first estimates a probit model where covariates are used to predict informality and the predicted probability (the propensity score) is used to weight observations to make them similar. 14 Then, the weighted samples of formal and informal firms are used to calculate the mean difference in the performance variables. The covariates used in the probit model are similar to the ones used in the discussion of correlates of informality below (see section 4.2). Log productivity is both a key performance variable as well as a key driver of selection into formality, as emphasized by Ulyssea. The use of an outcome variable in constructing a propensity score is clearly problematic in cross section data. Since the purpose here is exploratory, two models are used to predict propensity scores. In the first model, log productivity is not deviation of 1, which facilitates comparison across different explanatory variables. As such, they measure how many standard deviations the left-hand side variable changes in response to a standard deviation change in a right-hand side variable. 13 The magnitude of the reduction is assessed in the following way: The coefficient on the informality variable is equal to the difference in the means of log productivity of formal and informal firms. The exponentiated value of the coefficient in turn is equal to the ratio of the geometric means of productivity in formal and informal firms. This means that without any controls, the (unconditional) ratio of the geometric mean of productivity among formal firms to that of informal firms is 3.63 (exp (1.289)= 3.63). Once controls A are introduced, the (conditional) ratio of geometric means for the two groups is reduced to 1.65 (exp(0.50)= 1.65). Hence the presence of covariates reduces the ratio by almost 55 percent ((3.63-1.65)/3.63= 0.55). Similar calculations show that for sales, the ratio of geometric means is reduced by 77 percent when controls B are introduced, relative to the case where there are no controls. For the case of total number of working people, the reduction is lower, about 24 percent. The effect of covariates on the share of firms with customers abroad is more straightforward to calculate: the difference in the shares is reduced from 16 percentage points to about 5 percentage points: a reduction of 10 percentage points corresponds to about 68 percent of the level of the gap with no controls ((15.5-5.0)/15.5= 0.68). 14 Specifically, if the estimated propensity score for observation i is pi, to calculate the “average treatment effect on the treated (ATET)” observations of formal firms are weighted by pi/(1-pi) and those of the informal firms are weighted by 1. 15 used as a covariate; in the second model, it is. The calculation of the average treatment effect on the treated (ATET) for log productivity is carried out only on the basis of the first model. The results are presented in Table 6. 15 Panel A reports the difference between the means of informal versus formal observations for the performance variable listed in the column (share in the case of customers abroad). Panels B and C report the difference between the two groups when the observations have been reweighted using the propensity score. In panel B, log productivity has not been used the estimation of the propensity score, while in panel C, log productivity is included as a covariate in the probit equation. Table 6: Inverse probability weighting results l og no. of l og cus tomers l og s a l es worki ng producti vi ty a broa d peopl e 1.29*** 1.82*** 0.64*** 0.16*** A - Di fference i n ra w da ta (0.144) (0.151) (0.059) (0.026) B - di fference wi th wei ghted s a mpl e (l og 0.18 0.72*** 0.57*** 0.01 producti vi ty not us ed a s a cova ri a te i n the es ti ma ti on of the propens i ty s core) (0.247) (0.215) (0.099) (0.081) C - di fference wi th wei ghted s a mpl e (l og 0.48*** 0.61*** -0.03 producti vi ty us ed a s a cova ri a te i n the es ti ma ti on of the propens i ty s core) (0.166) (0.100) (0.086) Note: Standard errors in parentheses (*** p<0.01, ** p<0.05, * p<0.1). The table reports the difference between the means of the performance variables of formal and informal firms. Panel A of the table reports the difference in the unweighted data. Panels B and C reports the difference between the two groups when the observations have been reweighted using the propensity score. For Panel B, propensity scores are estimated through a probit equation with the following covariates: categorical variable capturing the gender and age of the owner of the firm, whether it is located in an urban area, whether the firm has electricity, schooling of the owner, whether the owner had vocational training, whether the owner is “transformational”, whether the firm has a loan, whether it has obtained credit line from suppliers, whether it has formal large enterprises as customers, whether its customers are non-local buyers, whether its suppliers are large and formal and whether the firm has suppliers from abroad plus controls for industry. In Panel C, log productivity is also included in the list of covariates. Sampling weights were used in the calculations. In the case of log productivity, the table shows that the difference between informal and formal firms is reduced from 1.29 log points in the case of raw data to 0.18 when the data is weighted with the propensity score. This is a substantial reduction and in fact the resulting gap is insignificantly different from zero. In the case of log sales, the difference is reduced from 1.82 log points to 0.72 when log productivity is not used in the derivation of the propensity scores and to 0.48 log points when it is included in the set of covariates. 16 In the case of log of total number of working people, the reduction in the difference in the means is very small. In the case of customers abroad, the gap is practically reduced to zero. What this 15 Stata provides a test to examine if, after the weighting procedure, covariates for the formal and informal groups of firms are sufficiently similar (“tebalance overid”). Test results show that the hypothesis that the inverse probability weighting model balanced all covariates cannot be rejected for both models B and C. 16 If one were to take these results at face value, and carry out similar calculations to those reported in footnote 13 above, one can conclude that reweighting the data reduces the raw gap in the ratios of geometric means of productivity by about 67 percent ((exp(1.29)-exp(0.18)/exp (1.29) = 0.67), that is, about 67 percent of the raw ratio can be attributed to the selection effect. Given that the resulting gap in the reweighted data is not significantly different from zero, one might also conclude that the gap in the raw data is completely due to the selection effect. Moving from row A to row B, the reduction in the ratio of the geometric means of sales is again 67 percent, and that of employment is 7 percent. 16 exercise suggests is that, again, selection plays an important role in explaining the differences in productivity, sales and exports. However, in contrast to what one would have expected on the basis of the comparison of firms that have at least one formal employment contract and those that have none (Table 3 above), selection seems to play a smaller role in the case of log employment. This suggests that the difference in employment levels between informal and formal firms may be more closely related to the status of informality itself rather than selection. Put differently, if the evidence were taken at face value, one would conclude that that informality does not cause low productivity, it causes lower sales to some extent, and it seriously constrains jobs. This suggests that there may be factors in the business environment associated with informality that discourage or prevent informal firms from employing more people relative to formal firms; it suggests that informal firms may prefer to keep a low profile and stay below the enforcement radar, to prevent the hiring of additional workers possibly jeopardizing the informality status of these firms. It might also indicate that firms may face more pressure to formalize if they hire full-time employees: with weighted data, the share of formal firms with self-employed owners and no full-time employees is only 33 percent, whereas it rises to 38 and 48 percent for firms with 1-2 full- time employees and 3 or more employees, respectively. It could also suggest high costs of formality, especially associated with labor regulations (e.g., restrictions on firing). As indicated above, these insights, especially the suggestion that selection may be playing a limited role in the case of employment, should be treated as tentative conclusions that need to be further explored with better data. 17 4.2 Correlates of (in)formality What types of firms choose to be formal? With a single cross section, it is only possible to carry out an exploratory analysis to identify variables that are corelated with formality since inference about causal effects is not possible. Table 7 reports the results of probit regressions where the dependent variable is a dummy that takes the value 1 if the firm is legally registered either at the national registrar general or country-level revenue authority, or at the local authority or municipality, or both. The table reports average marginal effects. The baseline model is reported in column 1: log productivity, having electricity and having a young woman owner (relative to having an older woman owner) are positively correlated with formality, with positive coefficients significant at the 1 percent level. In the baseline specification, having a young woman owner is associated with a 22 percentage points higher probability of being formal, relative to having an older woman owner. These variables remain significant when additional covariates are included in the model. Being in an urban area is positively associated with formality but only at 10 percent level of significance. The second column adds having customers abroad as a covariate. Its coefficient is positive but again statistically significant only at the 10 percent level, and, as can be seen in 17 The results discussed in the text were obtained using the sampling weights in the RIA data set. The IPW approach was also used on the data without sampling weights and the results were overall similar. The main differences were that the formal-informal gap for log productivity was reduced less, to 0.38, which is significantly different from zero only at the 10 percent level. Also, the final ATET for employment of 2.31 is a bit lower than the gap in the unweighted data. A further robustness check was carried out by using a propensity score matching approach rather than inverse probability weighting, using the “teffects psmatch” command of Stata. The results are as follows: For log productivity, the gap is reduced to 0.41 log points, and is still significant at the 5 percent level. The gap in log sales is reduced to 0.37. The gap in log employment does not change much (is reduced to 0.62). The gap in customers abroad is reduced to 0.08 when log productivity is not included in the estimation of the propensity score and 0.03 when log productivity is included in the estimation of the propensity score (not significantly different from zero). We conclude that qualitatively the results are similar to those in Table 6: results suggest that selection is an important driver in the observed performance differences between informal and formal firms for productivity, sales and exports, but not for employment, where it seems to play a smaller role. If the evidence were taken at face value, one would conclude that informality has a negative effect on employment and exports, but the effect on productivity seems smaller or even non-existent, depending on the methodology. 17 columns 12 and 13, becomes insignificant when other covariates are added. Column 3 adds a dummy for owners who are “transformational”. This variable has a positive coefficient that is significant at the 1 percent level and remains significant in most of the specifications presented in the table. Columns 4 and 5 include variables that capture the quality of the human capital of the owner. Column 4 includes a variable that captures whether the owner has a certificate of vocational training. Its coefficient is positive but significant only at the 10 percent level and loses significance when additional covariates are added to the specification. Column 5 includes the number of years of schooling of the owner and its coefficient is not statistically significant. Column 6 adds variables capturing access to finance, namely having a loan and having a credit line from a supplier. They are not statistically significant. 18 Columns 7-10 test whether having sophisticated buyers is associated with a higher probability of being formal. Having big enterprises or non-local buyers as customers, individually or together, are not positively associated with the probability of being formal (columns 7, 8, 9 and 10). Columns 11 and 12 test the possibility that having sophisticated suppliers may be associated with formality. Column 11 shows that having large formal enterprises as suppliers is significantly positively associated with formality. This could reflect the fact large formal suppliers prefer dealing with formal enterprises or it could reflect the fact that small firms jump a quality hurdle and are more willing to become formal when they deal with large formal enterprises as suppliers. The coefficient of this variable remains significant when used together with variables representing sophisticated buyers (column 12). Column 13 tests whether size is an important corelate of formality and it turns out that it is: relative to firms with zero full time employees, having a positive number of full-time employees is associated with 30 to 37 percentage points higher probability of being formal. Possibly because or multicollinearity, adding size to the regression equation modifies some of the coefficients in other variables as well: in particular, urban becomes positive and significant and having suppliers located abroad becomes negative and significant but only at the 10 percent level. Finally, column 14 reports our preferred specification, where most of the variables with insignificant coefficients are excluded from the equation. The summary of the analysis is that having higher productivity, having more full-time employees, having a young woman owner, having electricity, being in an urban location, having a transformational owner, and having large formal firms as suppliers are positively associated with formality. The strong positive association between formality and productivity is aligned with the findings from the exploratory propensity score weighting investigation that low productivity is not a result of informality itself, and that informality does not cause low productivity and does not necessarily constrain firms to remain at very low productivity levels. Both findings support the view that policy should strive to improve the productivity of those informal firms willing and able to do so, as this should also lead informal firms to choose formality over time as they grow in productivity, sales, jobs and earnings and as the benefits to them of becoming formal become more valuable. 18 When the equation is run only with having a loan, the coefficient is not significant. When the equation is run only with having a credit line from a supplier, the coefficient of this variable is significant only at the 10 percent level. The coefficients remain insignificant when the covariate set excludes having electricity or being transformational. 18 Table 7: Correlates of (in)formality (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) VARIABLES formal formal formal formal formal formal formal formal formal formal formal formal formal formal lnlp2 0.0643*** 0.0534*** 0.0608*** 0.0571*** 0.0567*** 0.0489*** 0.0480*** 0.0474*** 0.0406*** 0.0360** 0.0411*** 0.0320** 0.0397** 0.0501*** (0.0154) (0.0157) (0.0153) (0.0153) (0.0153) (0.0156) (0.0156) (0.0156) (0.0157) (0.0157) (0.0154) (0.0154) (0.0161) (0.0155) size = 2, 1_or_2 0.237*** 0.238*** (0.0502) (0.0481) size = 3, 3_or_more 0.317*** 0.324*** (0.0745) (0.0678) gender_age = 2, old male 0.100* 0.112** 0.0904 0.100* 0.0997* 0.108* 0.0942* 0.107* 0.100* 0.106** 0.0804 0.0825 0.0460 0.0400 (0.0565) (0.0552) (0.0566) (0.0563) (0.0559) (0.0553) (0.0552) (0.0547) (0.0544) (0.0542) (0.0566) (0.0562) (0.0599) (0.0592) gender_age = 3, young female 0.223*** 0.216*** 0.237*** 0.225*** 0.226*** 0.219*** 0.218*** 0.227*** 0.218*** 0.212*** 0.202** 0.199** 0.141* 0.154** (0.0801) (0.0810) (0.0821) (0.0819) (0.0821) (0.0810) (0.0811) (0.0822) (0.0813) (0.0807) (0.0791) (0.0801) (0.0781) (0.0760) gender_age = 4, young male 0.135* 0.137* 0.141** 0.127* 0.134* 0.134* 0.132* 0.171** 0.169** 0.168** 0.108 0.141* 0.0832 0.0471 (0.0703) (0.0702) (0.0711) (0.0693) (0.0700) (0.0691) (0.0700) (0.0731) (0.0732) (0.0724) (0.0701) (0.0736) (0.0751) (0.0714) urban = 1 0.0872* 0.0836* 0.0904* 0.0812* 0.0836* 0.103** 0.0890* 0.0669 0.0751 0.0909* 0.0914** 0.0875* 0.124*** 0.120*** (0.0487) (0.0485) (0.0486) (0.0478) (0.0481) (0.0474) (0.0475) (0.0488) (0.0483) (0.0483) (0.0461) (0.0469) (0.0443) (0.0424) have_electricity = 1 0.207*** 0.203*** 0.190*** 0.176*** 0.178*** 0.168*** 0.174*** 0.184*** 0.180*** 0.173*** 0.170*** 0.179*** 0.120** 0.118** (0.0520) (0.0522) (0.0532) (0.0533) (0.0538) (0.0541) (0.0536) (0.0545) (0.0542) (0.0547) (0.0529) (0.0536) (0.0560) (0.0542) schooling_owner 0.00488 0.00369 0.00383 0.00446 0.00356 0.00432 0.00437 0.00414 0.00461 0.00472 (0.00334) (0.00335) (0.00336) (0.00326) (0.00341) (0.00331) (0.00333) (0.00327) (0.00328) (0.00329) vocational = 1 0.128* 0.116* 0.0948 0.0919 0.0707 0.0494 0.0361 0.0779 0.0382 -0.0153 (0.0666) (0.0671) (0.0685) (0.0678) (0.0682) (0.0687) (0.0684) (0.0674) (0.0681) (0.0633) transform = 1 0.121*** 0.109** 0.109** 0.106** 0.105** 0.133*** 0.129*** 0.127*** 0.112** 0.126*** 0.0991** 0.0891** (0.0456) (0.0458) (0.0459) (0.0452) (0.0458) (0.0463) (0.0462) (0.0456) (0.0445) (0.0451) (0.0420) (0.0415) loan = 1 0.0499 0.0389 0.0216 0.00210 -0.0335 (0.0930) (0.0931) (0.0926) (0.0937) (0.0769) credit_line_suppliers = 1 0.0955* 0.0772 0.0802 0.0770 0.0359 (0.0580) (0.0637) (0.0582) (0.0636) (0.0514) customers_abroad = 1 0.151* 0.0681 0.0658 (0.0882) (0.0874) (0.0778) customers_big_entrp = 1 0.160* 0.141 0.121 0.0537 0.0227 (0.0897) (0.0944) (0.0956) (0.0918) (0.0767) non_local_buyers = 1 0.133 0.113 0.0979 0.0343 0.0203 (0.0822) (0.0845) (0.0869) (0.0895) (0.0885) suppliers_formal = 1 0.312*** 0.267*** 0.241*** 0.284*** (0.0739) (0.0787) (0.0733) (0.0702) suppliers_abroad = 1 -0.108 -0.122 -0.128* -0.112* (0.0738) (0.0761) (0.0655) (0.0656) manuf = 1 0.124 0.0932 0.152 0.125 0.128 0.122 0.111 0.138 0.123 0.121 0.140 0.142 0.102 0.0962 (0.101) (0.0984) (0.0978) (0.0995) (0.0998) (0.103) (0.102) (0.100) (0.102) (0.105) (0.106) (0.108) (0.0921) (0.0891) trade = 1 0.0765 0.0772 0.0887* 0.112** 0.104** 0.108** 0.101* 0.115** 0.112** 0.116** 0.0929* 0.105** 0.135*** 0.124*** (0.0529) (0.0519) (0.0529) (0.0518) (0.0528) (0.0524) (0.0525) (0.0528) (0.0523) (0.0516) (0.0521) (0.0518) (0.0475) (0.0463) service = 1 0.143** 0.129** 0.118** 0.121** 0.112* 0.111* 0.105* 0.112** 0.106* 0.106* 0.0910 0.0905 0.0666 0.0803 (0.0570) (0.0568) (0.0571) (0.0580) (0.0578) (0.0575) (0.0576) (0.0569) (0.0573) (0.0570) (0.0579) (0.0580) (0.0528) (0.0533) Observations 502 502 502 502 502 502 502 489 489 489 502 489 489 502 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 19 5. Differences between informal versus formal firms: Heterogeneous firm-level characteristics Based on the statistics presented in the previous section, informal firms have lower performance outcomes, both on average and across the probability distribution—though there are large areas of overlap with formal firms. To better understand what may be underlying these areas of overlap as well as what may be underlying the areas of differences, this section explores differences between formal and informal firms in terms of underlying firm-level characteristics. The first sub-section presents and discusses comparisons of unconditional means and medians across the main available groups of firm-level characteristic variables. A second sub-section introduces the notion of “good” characteristic variables, namely those positively correlated with performance outcomes, and explores the extent to which informal firms are “similar” to good formal firms in terms of having these good characteristics – both in terms of overall shares of informal firms, and the extent to which informal firms within the top decile of firms in terms of productivity and sales have these characteristics. A third sub-section then broadens the exploration by creating three “clusters” of firms, namely high-, medium-, and low-performance firms, based on a data-driven mix of the four performance variables, and examines the share of informal firms in the top-performance cluster. 5.1 Unconditional associations of (in)formality with firm-level characteristics This sub-section explores how different the unconditional means and medians of formal vs informal firms are for a broad range of firm-level characteristics—covering innate features, firm assets, and use of DTs. A first group of characteristics covers relatively innate features of firm owners, their firms, and their environment. As shown in Table 8, informal firms on average have owners that are older and more likely to be women, with the underlying firms being younger, 19 more likely to be rural, and significantly less likely to have electricity, as well as more likely to be involved in agricultural activities. Across these innate characteristics, the largest absolute differences between informal and formal firms concern having electricity, where the difference in shares is 34 percentage points, being engaged in other services, where the difference is 23 percentage points, and gender, where the difference is 16 percentage points. Based on relative differences (as a multiple of the lower group), the largest differences between informal and formal firms are for involvement in manufacturing and other services activities, where there are almost twice as many formal as informal firms involved. 20 Table 8: Innate features by formal vs informal owner young women have other firm age urban agri manuf trade age owners owners electricity services informal 37.1 0.28 0.40 5.5 0.60 0.57 0.18 0.04 0.65 0.28 formal 34.7 0.40 0.24 6.6 0.78 0.91 0.05 0.07 0.62 0.51 TOTAL 36.3 0.32 0.35 5.9 0.66 0.69 0.13 0.05 0.64 0.35 Note: The table reports the shares (%) of all firms within each category, except for owner and firm age, which are means in years, all based on weighted data. Youth are owners that are aged 30 years or younger, with the convention that the owner’s age is the age of the youngest owner if there are multiple owners, so there is a downward bias inherent in the owner’s age variable. Means based on weighted data. 19Youth-owned firms are here defined as those whose youngest owner (if multiple) is aged 30 years or younger. 20The total numbers for manufacturing are low, with only 12 informal and 11 formal firms involved, versus 98 informal and 83 formal firms for other services. The only other category with positive responses of less than 20 observations is formal firms involved in manufacturing, with 8 firms involved. 20 A second group of characteristics covers firm assets broadly defined, including the aptitudes and capabilities of the business owners as well as their available soft assets. Regarding aptitudes of the business owners, whether the owners are “transformational” entrepreneurs is a proxy for their inherent aptitude for productive entrepreneurship; this corresponds to owners selecting themselves as entrepreneurs due to the profit-making opportunity that owning a business provides as opposed to a necessity or subsistence choice to supplement earnings or because there is no preferred wage job available. The number of years of formal schooling and whether the owner holds any vocational training certificates are proxies for the capabilities of the owner. Regarding the firm’s available soft assets, whether the firm ever had a business loan from a bank and whether it has a line of credit or credit facility from any supplier are indicators reflecting access to financing. Finally, in terms of the firm’s relationships with value chain partners, whether the firm has large formal businesses as main suppliers, its most important suppliers located abroad, big enterprises as customers, and its most important customers located all over the country (rather than in surrounding towns and villages) are all indicators of soft assets reflecting opportunities of learning and market access from likely more knowledgeable business partners upstream and downstream in the firm’s value chain. Table 9 presents a breakdown of unconditional averages disaggregated by informal vs formal firms across a range of firm assets. As reported in Table 9, informal firms on average are led by owners that are less likely to be transformational, that have lower capabilities (in terms of fewer years of schooling and holding fewer vocational certifications), with the underlying firms less likely to have access to finance (regarding having a loan or a line of credit from suppliers), and much less likely to have opportunities for learning and access to markets from both upstream suppliers and downstream buyers. Across these asset-related variables, the largest absolute differences between informal and formal firms are whether the firm is likely to buy and learn from large suppliers, where the difference in shares is 26 percentage points, and whether the firm is likely to be led by a transformational entrepreneur, where the difference in shares is 23 percentage points. Based on relative differences (the count as a multiple of the lower group), the largest differences between informal and formal firms are whether the firm is likely to buy and learn from large suppliers and to sell to and learn from large customers, where there are 6 and 5 times as many formal as informal firms involved. 21 Table 9: Firm assets by formal vs informal school supplier large foreign large non local transform. school vocational had loan (median) credit suppliers suppliers buyers buyers informal 0.31 5.2 0 0.09 0.03 0.17 0.04 0.05 0.04 0.06 formal 0.54 5.6 6 0.28 0.11 0.38 0.30 0.17 0.22 0.20 TOTAL 0.39 5.3 0 0.15 0.06 0.24 0.13 0.09 0.10 0.11 Note: The table reports shares (%) of all firms within each category, except for the manager’s years of schooling variable, which are means with medians reported in the subsequent column, all based on weighted data. Entrepreneurs are labeled “transformational” if they answered “My own business pays more than being employed” to “What was the main reason to start a business for you? while the remaining (subsistence) entrepreneurs answered either “To make additional money to my salary” or “Otherwise I would have been unemployed”. For vocational training, respondents answered yes to “Do business owners have vocational training certificates?” Had loan is in response to “Has the business ever had a business loan from a bank?” Supplier credit is in response to “Does the business have a line of credit/ credit facility with suppliers?” Large suppliers are firms that responded “large formal businesses” to “Who are your main suppliers?” Foreign suppliers are firms that responded “abroad” to “Where are your most important suppliers located? (if more than one response and they are at different locations, take the 21 The only categories with positive responses of less than 20 observations are: for informal firms, had loan (11), large buyers (13), large suppliers (15), and foreign suppliers (17); for formal firms, had loan (18). 21 furthest away),” with the other possible responses being “locally (surrounding towns and villages)” and “from all over the country”. Large buyers are firms that responded “big enterprises” to “Who are your customers?”. Non-local buyers are firms that responded “from all over the country” to “Where are the most important customers of the business located”, with the other possible response being “locally (surrounding towns and villages).” Means based on weighted data (except for second schooling column, which reports medians). There are 81 firms that have indicated they have zero suppliers. For the variables “supplier credit”, “large suppliers” and “foreign suppliers” the shares are expressed as ratios to the total number of firms. A final group of characteristics covers the use of DTs by businesses. A first set of variables relates to access technologies, namely what type of handset the firm has, whether the firm has a basic 2G or a 2.5G featurephone, or whether the firm accesses the internet via a mobile broadband-enabled smartphone (3G or 4G). It also includes whether the firm uses one or more computers. A second set of variables covers more specialized uses of DTs for external-to-the-firm GBFs (general business functions) enabled by access technologies, largely driven by the lower search costs, cheaper and expanded market coordination, and by the lower transportation costs enabled by these DTs. This includes eight uses of DTs for upstream transactions with suppliers and downstream transactions with customers related to product markets: using a mobile phone to communicate with suppliers, using the internet to look online for suppliers, using mobile money to pay suppliers (which does not require a smartphone), using a mobile phone to communicate with customers, using mobile/SMS to advertise, using the internet to better understand customers, using e-commerce to sell goods and services online to customers, and using mobile money to receive payments from customers. It also includes doing mobile banking. A third set of variables relates to more specialized uses of DTs to undertake internal-to-the-firm GBFs that reduce costs, create efficiencies, and allow users to enhance their capabilities. This includes management functions, whether the firm uses accounting software and inventory control/point of sale (POS) software, as well communication and payment transactions with the firm’s workers through the use of mobile money to pay employees. Whereas DTs facilitating external-to-firm GBFs require not only the using firm but the upstream or downstream counterpart for product market transactions to be using these digital tools, the advantage of specialized DTs facilitating internal-to-firm GBFs linked to business management is that they do not require another external party to also be using these tools for them to be able to yield maximum impact for the user firm; this is not the case for DTs enabling the digital payment of the firm’s workers, as these require the workers also to have a digital device to receive payments. Table 10: Use of DTs by formal vs informal ACCESS TECHNOLOGIES EXTERNAL-TO-FIRM GBFs (General Business Functions) INTERNAL-TO-FIRM GBFs Upstream and downstream transactions Finance Management Workers communicate find pay communicate advertise understand use e receive mobile accounting inventory pay 2G s-phone computer with supplier supplier supplier with customer SMS customer commerce paymnt banking software control employee informal 0.75 0.11 0.03 0.69 0.03 0.19 0.53 0.16 0.06 0.05 0.20 0.03 0.02 0.01 0.03 formal 0.64 0.33 0.20 0.96 0.12 0.38 0.88 0.29 0.26 0.15 0.38 0.14 0.17 0.13 0.12 TOTAL 0.71 0.18 0.09 0.78 0.06 0.26 0.64 0.20 0.13 0.08 0.26 0.07 0.07 0.05 0.06 Note: All responses are shares (%) of firms based on weighted data. Use 2G/2.5G mobile phone is based on responses to “Does the business manager have a mobile phone?” and subtracting those reporting using a 3G/4G phone. Smartphone users answered “yes” to “How does the business access the internet: Mobile broadband (3G/4G, wireless).” Use computer is a non-zero response to “How many computers does your business have?” Communicate with suppliers and customers are “mobile phone” answers (so could be using 2G, 2.5G, 3G or 4G phones) to the question “How does the business usually communicate with its suppliers/customers?” Advertise SMS are “mobile/SMS” answers to the question “How does the business advertise?” Mobile banking is in response to “Have you used mobile phone banking for business?” Reported answers to “What do you use the internet for?” include “looking for suppliers online” (here “find supplier”), and “e-commerce (selling products and services online)”. Reported answers to “Does the business use mobile money for…” include “paying suppliers”, “receiving payments from customers”, and “paying employees”. Understand customers is an “agree” (as opposed to “not sure” or “disagree”) response to the question “Regarding the internet/social media use, it helps to understand our customers better”. The management-related 22 questions are “Does your company use accounting software?” and “Does your company make use of inventory control/point of sale (POS) software?” (both asked in the computer section of the questionnaire). There are 81 firms that have indicated they have zero suppliers. For the variable “communicate with supplier the shares are expressed as ratios to the total number of firms with positive number of suppliers. As shown in Table 10, informal firms on average are less likely to use smartphones (and conversely are more likely to use 2G/2.5G mobile phones) and are less likely to use any of the DTs associated with both external and internal GBFs. Across these uses of DTs, the two largest absolute differences between informal and formal firms involve using mobile phones to communicate with downstream customers and upstream suppliers, where the difference in shares is of 35 and 25 percentage points, respectively. Both these DT uses do not require a smartphone; it is interesting that the absolute gap in the share of firms using mobile phone for business communication is so large in spite of the fact that three-quarters of informal firms report owning a 2G or 2.5G mobile phone for business purposes. The next largest absolute difference regards smartphone ownership, where the difference in shares is of 22 percentage points. It is worth mentioning that based on relative differences (as a multiple of informal use), the largest differences between informal and formal firms involve using inventory control/POS software and accounting software, with formal firms using these DTs 9.1 and 8.2 times more, respectively. 22 5.2 Associations of (in)formality with performance-linked characteristics The previous sub-section highlighted large differences between informal and formal firms for a range of firm-level characteristics. Do these differences matter for performance outcomes? More specifically, how many informal firms look like those formal firms with desirable performance outcomes? This sub-section introduces the notion of “good” characteristic variables, namely those positively correlated with desirable performance outcomes. It then explores the extent to which informal firms are “similar” to good formal firms in terms of having these good characteristics – both in terms of counts or overall shares of informal firms with these characteristics, and the extent to which informal firms within the top decile of firms in terms of productivity and sales have these characteristics. To determine which variables should be used as measures of “good characteristics,” the statistical significance of the association between the broad range of firm-level characteristics and key performance variables (log productivity, log sales, whether firms have customers abroad, and total number of workers including owners) was explored based on separate OLS regressions. All variables that have a significant and positive correlation in any of the regression equations, and that could empower or enable less productive informal firms to boost their performance, were included. The results of these regressions are presented in the Annex (Tables A1-A4). In addition to formal status (significant across all four performance variables), being a male owner (significant across all specifications for productivity and sales) and being involved in trading activities (significantly negative for employment, as retail/wholesale activities tend to employ less workers relative to agriculture activities, often involving a self-employed vendor without any full-time employees), the main “good” characteristic variables are: regarding innate features, having electricity (for productivity and sales); regarding assets, having vocational training (for employment), having had a loan and a line of credit from suppliers (for productivity, sales, and exporting), and having 22The number of informal users is quite low for these DTs: only 5 informal firms use inventory control/POS software (relative to 21 formal) and only 6 informal firms use accounting software (relative to 27 formal). The positive responses for all other uses of DTs are above 20, except: for informal firms, “find suppliers using internet” (9), “use mobile banking” (11), “use computer” (12), “pay employees using mobile money” (12), and “using internet for ecommerce” (16); for formal firms, “pay employees using mobile money” (19). 23 large suppliers, foreign suppliers, and large buyers and non-local buyers as opportunities for learning and market access (for productivity, sales, and exporting, with the latter only for sales and exporting); regarding DTs for access, having a 2G/2.5G phone relative to not having any mobile phone (for productivity and sales), a smartphone (for productivity, sales and exporting), and a computer (for productivity and sales); regarding DTs for external-to-firm GBFs, communicating with suppliers and customers through mobile phone (for productivity and sales), paying suppliers and receiving payments from customers using mobile money (for productivity and sales, and also exporting for receiving payments from customers), and using internet to look for suppliers, understand customers, and for e-commerce (for exporting); and regarding DTs for internal-to-the-firm GBFs, using inventory control/POS software (for productivity and sales). Table 11: Share of informal in total firms having “good” characteristics No. firms in Share of dataset informal (%) innate features have electricity 354 57 had vocational training 80 41 had loan 29 36 have supplier credit 125 49 assets have large formal businesses as main suppliers 68 22 have most important suppliers abroad 47 37 have big enterprises as customers 51 25 have most important customers non-locally 55 39 have 2G phone 368 72 DTs for access have smartphone 95 40 have computer 47 25 use mobile to communicate with suppliers 405 60 use mobile to communicate with customers 333 56 pay suppliers with MM 132 51 DTs for external GBFs receive payment from customers with MM 134 52 use internet to look for suppliers 30 29 use internet to understand customers 66 32 use internet for e commerce 42 38 DTs for internal GBFs use inventory control/POS software 27 17 Note: Weighted data. The first column shows the number of firms that respond positively to the question about the availability of the respective characteristic. The second column shows the share of informal firms in that group; note that the share of informal firms in the overall sample is 68%. Firms with electricity are based on a yes/no answer to “Does the business premises have electricity?” For vocational training, respondents answered yes to “Do business owners have vocational training certificates?” Large suppliers are firms that responded, “large formal businesses” to “Who are your main suppliers?” Had loan is in response to “Has the business ever had a business loan from a bank?” Have supplier credit is in response to “Does the business have a line of credit/ credit facility with suppliers?”. Foreign suppliers are firms that responded “abroad” to “Where are your most important suppliers located? (if more than one response and they are at different locations, take the furthest away),” with the other possible responses being “locally (surrounding towns and villages)” and “from all over the country”. Large buyers are firms that responded, “big enterprises” to “Who are your customers?”. Non-local buyers are firms that responded, “from all over the country” to “Where are the most important customers of the business located”, with the other possible response being “locally (surrounding towns and villages. “Communicate with suppliers through mobile phone” is answer “Yes” to the question option “Mobile phone” to the question “How does the business usually communicate with its suppliers”; “pay suppliers through mobile money” is the response “Yes” to the question “does this business use mobile money for paying suppliers”; “communicate with customers through mobile phone” is the answer “Yes” to the option “Mobile phone” to the question “How does the business usually communicate with its customers”; “receive payment from customers through mobile money” is the response “Yes” to the question “Does the business use mobile money for receiving payments from customers”; firms that use social media are those 24 that respond “yes” to the question “are you using internet/social media for business purposes?” ; firms that use “e-commerce” are those that respond yes to the question: “What do you use the internet for: e-commerce (selling products and services online)”. “use inventory control software” is positive response to “Does your company make use of inventory control/point of sale (POS) software?”. The number of informal firms using inventory control software is lowest across all these characteristics, at only 5. A first way to assess the extent to which there are informal firms that look “similar” to “good” formal firms is to explore the share of informal firms of all firms that have “good” characteristics. Table 11 presents the results of this exercise applied to informal enterprises. Overall, the table shows that the share of informal firms with good characteristics is substantial and ranges between 17 and 72 percent. A high share of informal firms can be interpreted as stronger evidence of similarity with good formal firms, especially for characteristics that are relatively scarce, namely where there are a smaller number of total firms with such characteristics. The highest shares of informal firms are associated with having a 2G/2.5G mobile phone (72%), using a mobile to communicate with suppliers and customers (60 and 56%), and having electricity (57%). The characteristics that are relatively scarce across all firms include using inventory control/POS software, having had a loan, using the internet for looking for suppliers and e-commerce, having a computer, and having the most important suppliers located abroad. About 38 percent of firms that have had a loan are informal. About one-fifth (17%) of firms that use inventory control software are informal, though in this case the number of observations may be considered too low to allow a robust assessment. Table 12: Share of (in)formal firms in the top productivity and sales decile for “good” characteristics Productivity Sales informal formal informal formal (19 firms) (32 firms) (11 firms) (33 firms) innate features have electricity 97 96 100 96 had vocational training 10 49 23 49 had loan 4 43 4 42 have supplier credit 19 61 8 64 assets have large formal businesses as main suppliers 24 48 14 62 have most important suppliers abroad 22 41 14 42 have big enterprises as customers 19 53 6 52 have most important customers non-locally 29 37 9 37 have 2G phone 58 43 69 46 DTs for access have smartphone 42 57 32 54 have computer 8 58 4 52 use mobile to communicate with suppliers 91 98 89 100 use mobile to communicate with customers 69 97 68 100 DTs for external GBFs pay suppliers with MM 29 36 33 33 receive payment from customers with MM 43 44 28 47 DTs for internal GBFs use inventory control/POS software 4 52 4 52 Note: Weighted data. Each cell shows the percentage of firms having the row characteristic as a percentage of the total number of firms indicated in the top of the column. For example, 19 percent of informal firms and 61 percent of formal firms in the top productivity decile have supplier credit. A second, complementary way to assess the extent to which there are informal firms that look similar to good formal firms is to explore the share of informal firms with good characteristics in the top (10th) decile 25 of all firms ranked by productivity and sales, respectively. Table 12 shows the share of these informal firms, compared to formal firms, for those variables that are significantly associated with productivity and/or sales. 23 Importantly, the share of informal firms in the top productivity and sales deciles having good characteristics is substantial: 37 percent of top decile productivity firms (19 out of 51) and 25 percent of top decile sales firms (11 out of 44) are informal firms. The results are quite similar for the top decile of productivity and sales, reflecting the strong positive association in the Senegal micro-enterprise data between productivity and sales. Both in terms of the smallest absolute and relative (as a multiple of the lower group) differences in shares between informal and formal firms, the top 5 characteristics that are most similar include having electricity and using mobile phones to communicate with suppliers (almost all firms have these characteristics, making them a pre-requisite for being in the top decile of both productivity and sales), as well as using mobile money to pay suppliers (roughly a third of both informal and formal firms use this DT). The largest differences between informal and formal firms in the top decile are whether the firm had a loan, has supplier credit, and has a computer: the differences are in the order of 40-50 percentage points. This suggests, intriguingly, that it is possible to be in the top deciles of productivity and sales even without having access to these forms of financing: roughly 60% of formal firms have not had a loan, and 95% of informal firms have not had a loan, though they are still able to be top performers. 24 Using inventory control/POS software is an important attribute of formal firms in the top productivity and sales decile, with just over one-half of firms using this management tool. In contrast, only one informal firm uses this type of DT: this is both indicative of the small number of informal firms that use computers and smartphones, and suggestive of the potential for further productivity and sales improvements if appropriate applications of this type were used more widely by informal enterprises. 5.3 Cluster-level associations based on combined performance outcomes This sub-section broadens the exploration of the extent to which informal firms are “similar” to good formal firms by creating three clusters of high-, medium-, and low-performance firms, based on a data- driven combination of the four performance variables (productivity, sales, customers abroad, and employment). It then examines the share of informal firms in the top-performance cluster relative to formal firms and relative to the other clusters. The clustering procedure creates groups that minimize within-group variation according to a specific measure of distance. 25 The results suggest that 3 clusters adequately describe the data. 26 23 The only characteristic variables not included, relative to Table 6, are using internet to look for suppliers, to understand customers, and for e-commerce, as these were only statistically significantly associated with exporting. 24 The mean labor productivity of firms without a loan is 190,000 cedis for informal firms and 524,000 cedis for formal firms. The corresponding values for firms that did have a loan are 606,000 and 1,524,000 cedis, respectively. 25 The k-medians command of Stata was used to create 2 to 7 clusters using the Gower distance measure. Because the result of this clustering procedure is not unique, for each number of clusters the process was run 200 times and for each run the Calinski- Harabasz pseudo-F index was calculated. This index provides a measure of the ratio of the between to the within sum of squares so that higher values of the index reflect that the sum of squares between clusters is maximized while the sum of squares within clusters is minimized. Higher values thereby reflect more distinct cluster structures. The highest 4 values of the index were attained when the number of clusters was specified as 3. Moreover, in the 100 runs with the highest value of the index, 59 runs had 3 clusters. 26 We also tried using the k-means command following a similar procedure. In that case the top and most frequent index values were reached with four groups. The difference is due to the existence of a few observations with large numbers of full-time employees, and the fact that k-means is sensitive to these extreme values relative to k-medians. Both classifications make sense, depending on the objectives of the analysis. In this classification, there are two “high productivity” groups, one group of 6 (high employment) firms, all of them formal, and a group of 46 firms with 18 formal. The medium labor productivity group consists of 26 The main performance indicators of the three groups constructed through the clustering process are presented in Table 13. The high-performance group consists of 40 firms, 16 of which (32 percent) are informal. There are 261 firms in the medium-performance group and 60 percent of these firms are informal. The low performance group consists of 201 forms and only one eighth of these firms are formal. The productivity gap between formal and informal firms among the high and medium performance groups are negligible. Even though clusters comprise similar looking firms by construction, some gaps between formal and informal firms within each cluster persist. For example, in each group formal firms are larger in terms of sales, and they create more full-time employment. Even though some extreme values play a role in generating the employment gaps between formal and informal firms in the medium and high- performance groups, the gap is not only due to these extreme values. In the high-performance group the median number of full-time employees is 1 for informal firms and 3 for formal firms. Finally, in each group a larger percentage of formal firms have customers abroad: in the high-performance group the shares are 48 and 25 percent for formal and informal firms, respectively. Table 13: Performance indicators by clusters of firms no. of mean log mean log mean FT share of firms with firms productivity sales employees customers abroad High performance informal 16 14.5 14.8 0.2 24.9 formal 34 14.4 16.0 6.7 48.4 Medium performance informal 157 12.0 12.7 1.0 5.9 formal 104 12.0 13.1 1.7 15.3 Low performance informal 176 9.8 10.6 0.5 1.0 formal 25 10.3 11.7 4.4 3.2 Note: Unweighted data. The clusters were created through the k-medians command of Stata. Which firms are in the high-performance cluster, how different are they from firms in the medium- and low-performance clusters, and what are the main and missing “good” characteristics of informal firms in the high-performance cluster, relative to formal firms? Table 14 provides information about general features as well as the extent of heterogeneity across clusters for the “good” characteristics that are statistically significantly associated with performance outcomes, for both informal and formal firms. An important finding—aligned with the previous findings based on the substantial share of informal firms in both total firms having good characteristics and in the top productivity and sales deciles having good characteristics—is that one-third of all firms (16 out of 50) in the high-performance cluster are informal firms. In terms of the total number of informal firms, this represents 5% of informal firms, with the remaining 45% and 50% of informal firms being in the medium- and low-performance clusters, respectively. 307 firms, 194 of which are informal. The lowest productivity group consists of 153 firms with only 16 formal firms. Regarding the presence of informal firms in clusters of relatively high-performance firms, the two approaches yield the same qualitative result, namely that in both cases clusters of “high performance firms” include a sizeable share of informal firms. 27 Table 14: Share of (in)formal firms across clusters for general and “good” characteristics (%) high-performance med-performance low-performance informal formal informal formal informal formal (16 firms) (34) (157) (104) (176) (25) young owners 29 40 25 38 29 45 women owners 7 23 26 26 59 23 young women owners 2 13 4 14 19 15 general (non- urban 74 77 60 80 57 70 performance-linked) agriculture 0 7 16 5 22 2 characteristics manufacturing 0 15 4 6 3 0 trade 87 65 60 61 68 55 other services 35 42 33 52 22 51 innate features have electricity*** 97 96 77 88 31 97 had vocational training** 12 50 11 24 7 16 had loan*** 5 44 6 2 0 5 have supplier credit* 0 67 27 27 9 29 assets have large formal businesses as main suppliers*** 30 60 3 18 3 27 have most important suppliers abroad*** 27 43 7 8 0 5 have big enterprises as customers*** 19 54 5 13 0 7 have most important customers non-locally*** 23 41 7 14 3 17 have 2G phone** (negative) 62 45 78 71 75 66 DTs for access have smartphone** 38 55 14 24 4 34 have computer*** 9 56 5 11 1 13 use mobile to communicate with suppliers*** 89 100 82 94 54 95 use mobile to communicate with customers*** 62 97 73 87 31 75 pay suppliers with MM 19 32 33 39 6 35 DTs for external GBFs receive payment from customers with MM 36 42 32 37 5 34 use internet to look for suppliers 0 23 5 8 1 5 use internet to understand customers** 22 49 8 17 2 26 use internet for e commerce 19 27 6 12 1 4 DTs for internal GBFs use inventory control/POS software*** 5 53 2 3 1 4 Note: Each cell in the table shows the share of firms having the row characteristic as a percentage of the total number of firms indicated in the top of the column. Hence about 29 percent of formal firms in the high-performance cluster are owned by young owners. Sampling weights are used in the calculations. Regarding general features, the difference in the distribution of age and gender of owners is a striking feature of microenterprises: women-owned informal firms, and especially young women-owned informal firms are largely under-represented in the high-performance cluster of firms. Whereas the share of young, less than 30 year old owners (relative to older-aged owners) is roughly constant across clusters at approximately 25%-30% for informal firms and 40% for formal firms, and while the share of women- owned firms is roughly constant across clusters for formal firms at roughly 25% (relative to men-owned firms), the share of women-owned firms is only 7% for informal firms in the high-performance cluster but 59% in the low-performance cluster. This pattern is repeated and even accentuated for young women- owned firms: whereas their share is roughly constant across clusters for formal firms at approximately 15%, only 2% (or 1 firm) of the 16 informal firms in the high-performance cluster are owned by young women, versus almost 10 times more (19%) of the 176 informal firms in the low-performance cluster. In terms of other general features, firms in the top-performance cluster are overwhelmingly urban, with typically two-thirds or more having an urban location. In terms of sectors, interestingly there are no informal agricultural firms in the top cluster, though the largest share of formal firms active in agriculture are in the top cluster, indicating a potential for informal firms in agriculture to boost their performance. Only 5 percent of all microenterprises are in manufacturing (Table 4), so it is not surprising that most 28 formal firms in these activities are in the top cluster. Finally, most of the high-performance informal firms are active in trading, which is not surprising given that 65 percent of all informal firms are in trading, and that it likely requires less investments in capabilities and technologies relative to agriculture or manufacturing to be a top performer. As with firms in the top deciles of productivity and sales, the top 5 characteristics that are most similar for those in the top cluster (in terms of the smallest absolute differences in shares between informal and formal firms) include having electricity and using mobile phones to communicate with suppliers (almost all firms have these characteristics, further making them a pre-requisite for being in the top cluster), as well as using mobile money to receive payments from customers and to pay suppliers (roughly a third of both informal and formal firms use this DT). The largest differences between informal and formal firms in the top cluster, again aligned with firms in the top decile of productivity and sales, are whether the firm had a loan, has supplier credit, and has a computer, in addition to having vocational training and using inventory control/POS software: the differences are in the order of 40 to 60-plus percentage points. This suggests once again that it is possible to be in the top performance cluster even without having access to these forms of financing: roughly 55% of formal firms have not had a loan, and no informal firm has had a loan, though they are still able to be top performers. Using inventory control/POS software is again an important attribute of formal firms in the top cluster, with just over one-half (53%) of firms using this management tool. In contrast, only one informal firm (5% of 16 firms) uses this type of DT: this is again both indicative of the small number of informal firms that use computers and smartphones that are required to use such software, and suggestive of the potential for further productivity and sales improvements if appropriate applications of this type were used more widely by informal enterprises. Two sets of probit regressions explore the extent to which these performance-linked characteristics have a statistically significant association with higher performance outcomes. The first set of regressions, reported in Annex Table A2, shows the statistically significant correlates of being in the top high- performance cluster, relative to being in the lower two performance clusters—for all firms, irrespective of whether the firms are formal or informal. Stars next to the variables in Table 14 denote levels of statistical significance. None of the general characteristics are statistically significant (though being an older male-owned firm is a statistically significant correlate at least at the 5% level across 15 of the 17 specifications). The characteristics that are statistically significant (at least at the 5% level) are: having electricity, having had vocational training, having had a loan, having upstream and downstream value chain relationships with more knowledgeable or market-access facilitating suppliers and buyers, and in terms of uses of DTs, having a smartphone and a computer, using a mobile phone to communicate with suppliers and customers, using internet to understand customers, and using inventory control/POS software as an internal-to-the-firm management tool. Given that a larger share of both informal and formal firms in the lower two performance clusters than in the top cluster have only the more basic 2G phones, it is not surprising that having a 2G phone is a negative correlate of being in the high-performance cluster. The second set of regressions, reported in Annex Table A3, shows the statistically significant correlates of being in the top two clusters, relative to being in the lowest performance cluster—for informal firms only. A first finding is that, conditional on being informal, being a young woman-owned firm is a statistically significant negative correlate of higher performance outcomes (at least at the 5% level across 15 of the 17 specifications): this result brings statistical rigor to the descriptive finding that the share of young women-owned firms is only 2% and 4% in the top two clusters versus 19% in the low-performance cluster. 29 A second finding is that, conditional on being informal, being a transformational entrepreneur (namely an owner who chose entrepreneurship not because s/he could not find a job but due to the profit-making opportunity that owning a business provides) is a highly statistically significant positive correlate of higher performance outcomes (at the 1% level across all specifications). So the willingness of being a profit- seeking entrepreneur is strongly associated with better performing informal firms, and conversely the roughly half of all informal firms that are in the low-performance cluster are likely there because they are mostly subsistence entrepreneurs who are only owners due the absence of alternative wage jobs. In terms of characteristics that are statistically significant (at least at the 5% level), a number of them are identical to those for all firms in the high-performance cluster: having electricity, having had a loan, and in terms of uses of DTs, having a smartphone, and using a mobile phone to communicate with suppliers and customers. However, there are interesting differences as well. First, no longer are all four dimensions of having upstream and downstream value chain relationships with more knowledgeable or market access- facilitating suppliers and buyers statistically significant, with only having big enterprises as customers being significant. Second, there are too few informal firms having a computer or using inventory control/POS software across clusters for any statistically significant relationship with performance. Third, the use of mobile money to pay suppliers and to receive payment from customers, while not a statistically significant correlate of being in the top performance cluster across all firms, is a statistically significant correlate (at the 1% level) of performance conditional on being informal. Finally, having supplier credit is statistically significantly associated with performance. 5.4 Young women-owned enterprises: Characteristics and performance outcomes Firms owned by young women pose both a special challenge and an opportunity. One of the findings from section 4.2 is that firms with young women owners are more likely to be formal. Table 15 provides more disaggregated data on performance outcomes and characteristics of firms by age and gender. There is a large gap in performance between informal and formal firms owned by young women. The median labor productivity of women-owned formal firms is about 6 times that of informal women-owned firms, in contrast to smaller gaps in median labor productivity for other age-gender groups (e.g., a factor of 2 or less among firms owned by older or young male owners); similar large gaps exist for mean sales and mean employment. This finding is accompanied by the finding that formal firms owned by young women are proportionately more likely to have a broad range of good characteristics: 52 percent have a smartphone vs 29 and 28 percent for young and older men-owned firms, 54 percent use mobile money to pay suppliers vs 42 and 33 percent, and 26 percent use inventory control/POS software vs 11 and 9 percent. As well, the shares of firms that have vocational training, have had a loan, have large formal businesses as main suppliers, use mobile money to pay suppliers or receive payments from customers, and use internet to look for suppliers, understand customers and do ecommerce are also higher among formal firms owned by young women entrepreneurs relative to the other three age-gender groups. In contrast, women- owned informal firms, and especially young women-owned informal firms are largely under-represented in the high-performance cluster of firms (Table 12), and by the statistically significant negative association between being a young woman-owned firm and being in the top two performance clusters, conditional on being informal. And on two key performance indicators, namely median productivity and the share of firms that export, formal firms with young women owners lag behind formal firms owned by young or older men, whereas for median sales they lag formal firms with older men owners while being at the same 30 performance level as firms with young men owners.27 Hence young women-owned formal firms are quite sophisticated in terms of characteristics such as assets and use of DTs. However, they seem to face limitations in translating these good characteristics to good performance outcomes, especially in terms of productivity and exports. This may perhaps reflect biases and discrimination in the Senegalese markets in which they operate, or other constraints. Policies to help young women entrepreneurs take full advantage through markets of their sophisticated assets and uses of DTs may provide significant returns in both productivity and employment gains as well as inclusion benefits. Table 15: Performance outcomes and share of good characteristics for young women-owned firms relative to other age-gender groups (%) young women owners older women owners young men owners older men owners informal informal informal formal (20) formal (16) formal (42) informal (155) formal (82) (37 firms) (104) (52) mean labor productivity 81,712 484,683 88,902 335,731 354,388 620,866 310,425 642,159 median labor productivity 25,000 150,000 35,000 100,000 90,000 166,667 100,000 200,000 mean sales 269,264 4,434,114 168,885 3,013,794 635,831 3,647,975 597,332 2,424,789 performance outcomes median sales 55,000 450,000 70,000 375,000 250,000 450,000 180,000 600,000 exporting 10 17 3 29 9 29 3 20 mean employment 0.6 5.8 0.4 5.0 1.3 2.7 0.8 2.2 median employment 0.0 1.0 0.0 2.0 1.0 1.0 0.0 2.0 innate features have electricity 32 86 40 86 73 93 69 92 had vocational training 7 48 6 30 23 29 7 23 had loan 4 16 4 15 3 11 2 10 have supplier credit 10 44 12 43 36 44 17 32 assets have large formal businesses as main suppliers 0 35 1 30 10 33 6 29 have most important suppliers abroad 7 23 8 28 8 19 1 13 have big enterprises as customers 1 18 0 15 11 19 4 26 have most important customers non-locally 0 19 2 48 15 21 8 13 have 2G phone 68 48 76 48 65 71 80 69 DTs for access have smartphone 12 52 5 46 25 29 10 28 have computer 4 28 2 34 3 23 5 16 use mobile to communicate with suppliers 63 100 50 88 88 94 80 96 use mobile to communicate with customers 46 85 38 75 74 88 57 90 pay suppliers with MM 8 54 14 20 19 42 26 33 DTs for external GBFs receive payment from customers with MM 10 49 14 35 33 35 21 36 use internet to look for suppliers 0 25 3 12 6 17 2 8 use internet to understand customers 11 44 2 35 18 26 3 20 use internet for e commerce 3 19 4 18 12 14 3 15 DTs for internal GBFs use inventory control/POS software 4 26 0 27 0 11 2 9 Note: Weighted data. Labor productivity is measured as value added (total sales minus raw materials & intermediate inputs plus water & electricity used in production) divided by the sum of full-time employees and the number of owners. The value of total sales is asked in three complementary ways in the survey: total sales = turnover = “revenues money received by the business.” Exporting reflects shares of firms that report having international customers or non-zero exports (in response to the yes/no question “Does the business have customers located in other countries (selling goods or services abroad)?” Labor productivity and total sales are monthly values (with total sales based on annual audited statements if available and divided by 12), in local currency (FCFA). Employment is the number of full-time employees. Informal firms led by young women owners, on the other hand, are generally behind not only in terms of performance but also in terms of having good characteristics. In terms of most characteristics, they are behind firms with young and older men owners, and sometimes but not always better than informal firms with older women owners. Taken together, informal women-owned firms exhibit low levels of good characteristics as well as low performance, relative to both older and especially young male owned informal firms. This is consistent with roughly 60 percent (59%) of informal firms in the low performance 27 The mean number of full-time employees for formal firms with young women owners is high, but this is driven by only two firms that have very high levels of full-time employees (25 and 39 full-time employees). 31 cluster being owned by women (Table 12). This is the demographic group that appears to be most populated by subsistence entrepreneurs in Senegal. This could also result from the selection effect, namely the fact that women are more likely to become formal hence those that remain informal have even lower performance indicators or good characteristics. In any case, this group would likely benefit most from alternate employment opportunities as wage workers in expanding more productive formal and informal firms. Given the evidence above, the fact that young women are more likely to be formal may reflect a number of factors. The costs of informality for women can be especially high because of lower access to assets under informality or they may benefit more from the legal status of formality (due to better protection against crime, for example). Women owners may also be subject to stricter enforcement with respect to formalization. More detailed data would be needed to explore which one of these suggested factors is more relevant. 6. Conclusion This paper has explored various types of heterogeneities among micro sized firms in Senegal. It has paid special attention to differences in the performance of formal versus informal firms. While on average the performance indicators of formal firms are higher than those of informal firms, there are sizable areas of overlap. Under the significant caveat that results are mainly suggestive rather than definitive because they are based only on the analysis of a single cross section, the paper finds evidence that selection is a main driver of informality: specifically, informal firms seem to choose informality. This finding provides support to the view that informality is not because of costs associated with registering firms, for example. Rather, informality appears to be a profit maximizing choice implemented by low productivity firms. Importantly, the analysis suggests that informality per se does not seem to reduce productivity. There are significant shares of informal firms among high performing firms, including some of the most productive firms. While these “good” informal forms are “similar” to good formal firms in terms of performance and some characteristics correlated with good performance (such as having electricity and having a smartphone), in terms of usage of various kinds of more sophisticated DTs, they lag behind formal firms. There is also evidence that employment may be significantly affected by informality per se, that is, informal firms seem to generate less employment. Again, this does not need to be the case, as there are some informal microenterprises that generate as much employment as the larger formal microenterprises. Informality clearly is a widespread phenomenon in low-income countries. Low average levels of productivity observed among informal firms relative to formal firms have often led to the policy advice that formality can and should be enhanced by a combination of stronger enforcement and a reduction in the bureaucratic costs of formality. Our findings based on heterogeneity beyond average levels suggest that a concerted effort to increase the productivity of low performing firms, especially those willing and able to do so, may be a more promising approach. 32 References Aga, Gemechu, Francisco Campos, Adriana Conconi, Elwyn Davies, and Carolin Geginat (2021), “Informal firms in Mozambique: status and potential”, Policy Research Working Paper No. 9712. World Bank, Washington, DC. Atiyas, Izak and Toker Doğanoğlu. 2020. “Using the RIA data set to explore correlates of mobile internet use in Senegal.” Mimeo. Atiyas, Izak and Mark A. Dutz (2021). “Digital Technology Uses among Informal Micro-Sized Firms: Productivity and Jobs Outcomes in Senegal”, Policy Research Working Paper 3942, World Bank. Bruhn, Miriam and David McKenzie (2014) “Entry Regulation and the Formalization of Microenterprises in Developing Countries”, The World Bank Research Observer, 29:186–201 De Giorgi, Giacomo and Aminur Rahman (2013) “SMEs’ registration: Evidence from an RCT in Bangladesh” Economics Letters 120 (2013) 573–578 La Porta Rafael, and Andrei Shleifer (2008). “The unofficial economy and economic development”. Brookings Papers on Economic Activity. 105:473–522. La Porta, Rafael, and Shleifer Andrei (2014). “Informality and development”. Journal of Economic Perspectives. 28:109–26 Levy, Santiago (2008). Good Intentions, Bad Outcomes: Social Policy, Informality, and Economic Growth in Mexico. Washington, DC: Brookings Institution Press. Loayza, Norman A. (1996). “The economics of the informal sector: a simple model and some empirical evidence from Latin America”. Carnegie-Rochester Conference Series on Public Policy. 45:129–62. Mothobi, Onkokame, Alison Gillwald, and Pablo Aguera. 2020. “A demand side view of informality and financial inclusion.” Research ICT Africa Policy Paper No.9, Series 5: After Access, February. Ulyssea, Gabriel (2020). “Informality: Causes and Consequences for Development”, Annual Review of Economics. 12: 525-46. Ulyssea, Gabriel (2018). “Firms, Informality, and Development: Theory and Evidence from Brazil”, American Economic Review. 108 (8): 2015-47. World Bank Group (2021). Country Private Sector Diagnostic – Informality Knowledge Note, March, IFC, FCI and MTI, Washington, DC. 33 Annex Tables Table A1.1: correlates of log productivity 34 Table A1.2 Correlates of log sales 35 Table A1.3: Correlates of exports (having customers abroad) Note: OLS results. 36 Table A1.4: Correlates of employment Note: The right-hand side variable is number of full time employees plus the number of owners. 37 Table A2: Correlates of being in the top cluster (all firms) 38 Table A3: Correlates of being in the top 2 clusters (informal firms only) 39