Improving Management with Individual and Group-Based Consulting: Results from a Randomized Experiment in Colombia * Leonardo Iacovone, World Bank William Maloney, World Bank David McKenzie, World Bank October 22, 2018 Abstract Differences in management quality are an important contributor to productivity differences across countries. A key question is then how to best improve poor management in developing countries. We test two different approaches to improving management in Colombian auto parts firms. The first uses intensive and expensive one-on-one consulting, while the second draws on agricultural extension approaches to provide consulting to small groups of firms at approximately one-third of the cost of the individual approach. Both approaches lead to improvements in management practices of a similar magnitude (8-10 percentage points), so that the new group-based approach dominates on a cost-benefit basis. Moreover, we find some evidence that the group-based intervention led to increases in firm size over the next 1.5 years, including a statistically significant increase in employment, while the impacts on firm outcomes are smaller and statistically insignificant for the individual consulting. The results point to the potential of group-based approaches as a pathway to scaling up management improvements. Keywords: Management, Employment, Scaling-Up Interventions, Colombia. JEL codes: O14, O32, L2, M2 * The authors gratefully acknowledge the collaboration of Paula Toro Santana, the staff of CNP and DNP in Colombia, and project management and research assistance provided by Cosma Gabaglio, Camilo Andrés Gutiérrez Silva, Pablo Villar, María Aránzazu Rodríguez Uribe and Innovations for Poverty Action Colombia. Funding is gratefully acknowledged from the DIME i2i Trust Fund, the Knowledge for Change Program (KCP), the World Bank and the IPA SME Initiative, as well as intervention funding from SENA. This study is registered in the AEA RCT registry AEARCTR-0000528. Since no identifying information was collected on human subjects, the study was exempted from the Innovations for Poverty Action IRB. Comments from participants in seminars at the IDB, the Management Practices in the Private and Social Sectors conference, MIT Sloan, and Williams College are greatly appreciated. 1 1. Introduction There are large differences in the management practices used by firms within and across countries (Bloom and Van Reenen, 2007). These differences are strongly correlated with productivity, with Bloom et al. (2014) estimating that differences in management can account for 30 percent of cross- country productivity differences. An experiment with 17 textile firms in India provides a proof-of- concept that intensive individualized consulting can deliver lasting improvements in the practices of badly managed firms, resulting in productivity improvements of 17 percent (Bloom et al, 2013; Bloom et al, 2018). However, the intervention was implemented by an international consulting company under close supervision from researchers, and cost $75,000 per treated firm. This high cost is likely to be prohibitive for many small and medium enterprises (SMEs) to finance themselves, and for governments seeking to scale-up this to assist large numbers of firms. This paper seeks to test two approaches that governments can use to scale-up management improvements. The first is to use a very similar intervention of intensive individualized consulting, but to use local teams of consultants to deliver the intervention at a lower cost of approximately $30,000 per firm. The second, more novel, intervention is a group-based approach that aims to deliver improvements at lower cost (around $10,000 per firm), and to leverage group-learning dynamics, inspired by the approach used in the delivery of agricultural extension services. We conduct an experiment to measure the impact of these two competing interventions on SMEs in the Colombian auto parts manufacturing sector. Our sample of 159 firms with an average size of 58 employees, randomized into three groups of 53 firms, is an order of magnitude larger than that used in Bloom et al. (2013) and enables us to measure the impact of such a program when implemented at a multi-million dollar scale by a government. We show that the Colombian auto parts sector has similar levels of management practices to start with as the average Colombian manufacturing firm, which is low by global standards. Both the individual and group-based interventions lead to improvements in management of similar magnitudes of 8 to 10 percentage points. This improvement is broad-based, with improvements in just over half of a detailed set of 141 practices measured. We then track firms for 1.5 to 2.5 years post-implementation. We find evidence that the group-based interventions has grown the treated firms, with a statistically significant 3 to 7 worker (8 to 15 percent) increase in employment relative to the control group; 8 to 9 percent growth in sales which is not statistically significant when 2 compared to the control (p=0.12), but is statistically greater than the impact of the individual treatment; and with higher energy input usage. In contrast, we find smaller and statistically insignificant impacts of the individual-based treatment. Neither treatment has a significant increase in productivity, but, given that the improvement in management practices is approximately one- third that in India, we can also not rule out that productivity improved by the 5 or 6 percent that would be predicted by extrapolating from the Indian case. The group-based intervention clearly dominates the individual intervention on a cost-benefit basis, and, although there is considerable uncertainty associated with this estimate, we estimate that the group-based intervention is likely to pay for itself in terms of higher firm profits within the first year. This work contributes to at least three literatures. The first is a general literature on improving business practices and management in firms. Most of this literature has focused on short training courses and microenterprises (see McKenzie and Woodruff, 2014 for a review). However, several studies show the potential of more intensive individualized consulting to improve management in small and medium enterprises. In addition to Bloom et al. (2013)’s work in India, this includes Bruhn et al. (2018) with firms averaging 14 workers in Mexico, and Higuchi et al. (2017) with firms averaging 20 workers in Vietnam. Secondly, while we are not aware of other studies that directly test group-based versus individual consulting, a recent literature has highlighted the ability of firms to improve their business practices when formed into groups or paired with other firms that can serve as role models (e.g. Cai and Szeidl 2018, Chatterji et al. 2018, Dalton et al. 2018, Lafortune et al. 2018). Finally, this paper contributes to a broader literature on how to scale-up policies from promising researcher pilot studies (e.g. Banerjee et al, 2017, Bold et al, 2018). Our results show the promise of group-based consulting as a pathway to greater scale. 2. Context and Sample 2.1 Choosing the Industry and Sample Labor productivity in Colombia is low, with it taking around four Colombian workers to produce what one worker does in the United States (Londoño, 2017). As a result, improving productivity is a priority for government policy. The Government of Colombia was interested in testing whether the productivity improvements from better management demonstrated in India by Bloom et al. (2013) could be achieved at a larger scale in Colombia. In order to test different approaches, they wanted to choose a sector that was thought to have sufficient numbers of firms, to have production 3 in a number of locations throughout the country, was thought to have some potential for growth, and was thought to be similar enough to other industrial sectors in the country that the results from this pilot could be applicable to other industries. These criterion led to the selection of the auto- parts sector. This sector consists largely of second-tier suppliers to large car manufacturers, producing parts like fenders, tires, suspension parts, plastic parts, paints, etc. that are sold to the assemblers that supply directly national and international car manufacturers as well as to retailers of spare parts. Appendix 1 provides some examples of the products. The auto parts sector in Colombia employs approximately 25,000 workers, and sells both to car and bus manufacturers within Colombia, as well as exporting approximately $US500 million each year, with Ecuador, Venezuela and Brazil the main export markets (Proexport Colombia, 2012). Public announcement of the program was made in April 2012 (Appendix 2 contains the full timeline), and firms were also informed of the program through the car manufacturers such as Sofasa (which assembles Renault cars in Colombia), General Motors, and Busscar (which manufacturers buses). To be eligible firms had to be legally registered, in business for at least two years, be a first or second-tier supplier to the automobile industry, and be located in one of the four provinces Antioquia, Cundinamarca, Valle and Eje Cafetero. The firms were told the program would offer assistance in improving production practices in order to improve profitability, productivity and competitiveness, and that the program would not require any payment by the firms, but that they would need to commit time and effort of their workforce to supply information required and to implement suggestions made. Public provision of the program to firms was justified both with reference to the overall policy objective of improving productivity, as well as due the the presence of several market failures that prevent firms from improving management on their own. A first issue is that of information: many badly managed firms do not know they are badly managed, with data from the World Management Survey showing that Colombian managers perceive their firms to be slightly better managed than U.S. firms, when the reality is substantially worse management. 1 Secondly, even if firms know they need to improve, they may be unable to identify which providers can offer good services, may 1 Colombian firms had an average WMS score of 2.50 in 2014 (described below), but an average perceived score of 3.76. In contrast, U.S. firms had an average WMS score of 3.32, and perceived score of 3.57. 4 lack the financial resources to pay for consulting, and a lack of insurance may prevent them from investing in an activity with uncertain payoffs. 218 firms applied for the program. 180 of these were accepted in the preliminary step, with the remainder rejected for being too small, or for only being distributors rather than manufacturers of parts. 11 firms then dropped out, so 169 firms formed the group to take part in the first, diagnostic, phase of the project. Following the diagnostic, we dropped firms with fewer than 10 workers, to leave a sample of 159 firms for the experiment. 2.2 Random assignment and firm characteristics Firms were randomly assigned to three groups of 53 firms each. Since the number of firms in each group would be small, we aimed to improve balance on observables by forming matched triplets of firms, choosing this grouping in a way to minimize the Mahalanobis distance between firms in a triplet in terms of their geographic location, size, labor productivity, and management practices. 2 This took place in November 2013, after the diagnostic phase (described below). Then within each triplet, firms were randomly allocated to a control group and two treatment groups: an individual- consulting treatment group and a group-consulting treatment group. Table 1 provides some summary characteristics of the firms, along with their means by treatment group status. The mean (median) firm has been in business for 24 (23.5) years, with only 20 percent having been in business for fewer than 10 years. A key feature of the data is that firms are heterogeneous in terms of size and product produced. Firms had a mean of 59 and median of 40 employees at the time of application, with 59 percent of the firms classified as small (10-50 workers), and the remainder as medium (51 or more workers, with the maximum being 310, and the 10-90 range being from 13 to 119 workers. Mean sales were approximately USD2.7 million in 2013, with a 10th percentile of USD280,000 and 90th percentile of USD6.3 million showing the large variation in firm size. These are almost all single plant firms, with the main subsectors being metal products (60%) and plastic products (18%). The sample also includes firms making rubber products (5%), chemical products such as injection molding (4%), electronic components (4%), as well as firms working with leather, wood, and glass. 94 percent are tier 2 firms in the value chain, 2 Location consisted of Cundinamarca and Valle regional dummies; firm size consisted of dummies for small (10 to 50 workers) and medium size (51 to 310 workers), as well as for the number of employees; management practices consisted of indices for practices in human resources, production, logistics, marketing and finance; as well as for seven individual management practices identified as priority areas in many diagnostic plans. 5 with 6 percent tier 1. 3 Forty-five percent of firms had exported in at least one month of 2013. Half the firms are located in the Cundinamarca region, which includes Bogota, with the region of Valle, which includes Cali, the next biggest. Management practices were measured in terms of 141 individual practices, developed by the Colombian National Productivity Center, classified into five areas: financial practices (made up of 29 individual practices), human resource practices (20), logistics practices (31), marketing practices (22), and production practices (39). Each practice was scored on a five point scale, where 1 indicates that the practice is not used, and 5 that it is implemented and under control. Scores were then aggregated and calculated as a percentage of the maximum possible score for that index. Appendix 3 provides more details of the specific measures. At baseline average scores for these practices range from 43 (human resources) to 51 (financing practices), indicating that firms have significant room to improve on these practices. We refer to this as the Anexo K management practices measure, with this terminology referring to the form used to collect this data. Table 1 shows that while the random assignment was able to achieve balance on most baseline variables, there are a couple of imbalances. These reflect the difficulty of balancing many variables in a relatively small sample of heterogeneous firms. For example, the control group is more likely to be in metal products than either treatment group, and starts with lower labor productivity. In our analysis we use firm fixed effects or controls for the baseline value of interest to make the firms more comparable and reduce the effect of this heterogeneity. 2.3 External validity and comparison to Bloom Van Reenen Management Practices In 2013, prior to the interventions, we commissioned the LSE survey team responsible for the Bloom and Van Reenen (2007) World Management Surveys (WMS) to apply their methodology to a random sample of 180 firms representative of the Colombian manufacturing sector, as well as to a sub-sample of 72 companies from our sample with 40 or more employees. Appendix 7 summarizes this survey process, and provides three key results. First, the mean and distribution of WMS management practices scores for our auto parts firms is similar to that of the overall manufacturing sector in Colombia (2.38 versus 2.54). Second, Colombia’s average management practices score shows firms are, on average, poorly managed in global terms, but similar to many 3 Tier 1 means that the firm directly supplies the original equipment manufacturer (e.g. Ford, Suzuki, etc.), while tier 2 means the firm supplies a tier 1 supplier without supplying the vehicle manufacturer directly. 6 other developing countries. The average score is just below that of firms in India and just above that in Kenya in the WMS. The auto parts sector in Colombia is thus a fairly typical sector for both the country, and for developing countries as a whole, in terms of management practices. A final use of this baseline WMS data is to compare the Anexo K management measure, our main measure of management used in this paper, to the WMS. Appendix 5 shows that the two are significantly correlated in the cross-section at baseline, with a correlation of 0.26 between the two overall indices. The Anexo K has a stronger correlation (0.44) with the monitoring subcomponent of the WMS, reflecting a particular emphasis on measurement and monitoring than on other management practices. 2.4 Macroeconomic context The Colombian auto parts sector had sales grow at an annual average of 5.4 percent per year over the 2002 to 2012 period leading up to our experiment (Reina et al, 2014). 4 At the start of our study, imports averaged 68 percent of total sales in the sector, and were the main source of competition for most firms in our study. However, the country was hit by a combination of external and internal shocks starting in late 2013, which resulted in a large depreciation of the peso, from an average of 1930 COP to the USD in 2013 to approximately 3000 COP to the USD in each of 2015, 2016, and 2017. Domestic new vehicle sales fell from 326,000 units in 2014 to 238,000 units in 2017, a 27% drop (BBVA Research). Export sales of auto parts fell 51 percent in dollar terms over the 2013- 2016 period, driven by weak economies in the main export destinations of Venezuela, Ecuador and Brazil. The aggregate context is thus one of weakening overall demand for the sector, but where the weakened currency increased competitiveness against imports. Real sales of domestic production were then fairly flat over our study period, falling 0.12 percent between 2013 and 2016. 5 3. The Intervention The program was implemented by the National Productivity Center (Centro Nacional de Productividad, CNP), which is a Colombian non-profit institution with the mandate to contribute to increase productivity, innovation and competitiveness of Colombian businesses. CNP originally 4 The report notes a nominal growth rate of 11.2 percent, which we deflate using the Colombian inflation rate taken from the World Development Indicators. 5 Export data and sales data from DANE and are for the CIIU sector 2930 “Manufacturing of parts, pieces, and accessories for automobiles and their motors”. 7 was funded and supported by Japanese technical cooperation and has been the recipient of training and in-house technical assistance to develop capabilities in implementing managerial consulting services such as Lean, Six-Sigma, etc. During its 15 years of experience CNP has developed a model of operation that has allowed it to support more than 4,000 Colombian companies in different areas of management, innovation productivity and competitiveness. CNP used two types of consultants for the intervention. The first were lead consultants, who were long-term employees of CNP with more than 10 years experience, and experience managing teams. They led area consultants, who had to have had at least 5 years experience, and specialized in a particular focus area such as logistics or finance. The direct cost of implementation of this program was approximately US$2.4 million. 3.1 Diagnostic phase All firms, including the control, received a diagnostic as the first phase. This was implemented on a rolling basis between June and October 2013. The diagnostic was carried out by a team of 6 consultants, consisting of the leader and five specialists, one for each area (Logistics, Human Resources, Finance, Marketing and Sales, and Production). The diagnostic began with an opening meeting with top and middle management, and then each area specialists would have five days of meetings with the responsible manager in the firm for their area to evaluate the 141 individual management practices that form Anexo K. This forms the baseline management practices measure. The consultants would also examine the firms key performance indicators for the last three years (to the extent records existed), and work with the leader to finish with a report (improvement plan) that analyzed managerial practices for each area, the key performance indicators for each area, and recommended practices to prioritize. This diagnostic phase lasted 2 full-time weeks and cost 8,426,550 COP (US$3,553) per firm. 6 The diagnostic identified priority practices to be improved together with the firm. These practices were intended to be ones which required minimal capital investment, and which could be implemented reasonably quickly and were expected to lead to relatively rapid improvements in the firm. While these priority practices were individualized by firm, some of the priority areas for 6 We use the average exchange rate over 2014-15 of 2372 COP = 1 USD for all currency conversions in this paper. Cost numbers are implementation costs, and exclude initial costs of intervention design, and additional costs of data collection for the impact evaluation. To the extent this data collection process also helps firms improve management, it could be considered another part of the intervention, and averaged a further US$20,000 per firm (including the control group). 8 improvement in each of the five areas were common to many firms. These including implementing master budgets across areas, improving systems for tracking costs, defining explicitly the strategic objectives of each position in the plant, implementing plans to improve the skills of people in management roles, lining up sales and marketing plans with business strategy, and analyzing machine downtime and quality problems daily across different supervisors. 3.2 Individual Consulting Treatment Assignment to treatment took place after the diagnostic phase, in November 2013. Firms assigned to the individual consulting treatment group then received individual support for a period of 6 months, in the time window between March and November 2014. They were assigned a team of five consultants, one for each of the five processes (logistics, human resources, finance, marketing and sales, and production), along with a leader. The intervention began with an opening meeting that brought together the leaders within the firm responsible for each of these five processes, along with the six consultants to define the different roles and responsibilities and set out a work plan. Then each of the five area consultants would visit the firms and provide 20 hours of training to the person in the firm in charge of their respective area. This would involve a theoretical part with the goal of familiarizing the firm’s management with modern management concepts and methods, complimented with practical exercises to apply these concepts to their firm. This was then followed by individual consulting to help the firms implement the improvement plan developed during the diagnostic phase. Every area would be covered by different consultants and with different schedules, but would typically involve weekly meetings for four hours per visit, spread over three to five months. Once per month, the team would meet with the whole firm’s management to discuss improvements and re-define priorities and next actions. The total consultant time was 500 hours, consisting of 100 hours of providing training, and then approximately 100 4-hour sessions per firm of individual consulting. The cost of this individual intervention was US$28,950 per firm receiving treatment. Based on our discussions with firms and own observations of the process, the implementation appears to have involved an emphasis on teaching firms how to measure and monitor key performance indicators, and on providing firms with the set of tools needed to better understand how their business is performing. It appears that there was less direct implementation from the consultants. For example, the consultants might go through the financial and performance data 9 from the firm and suggest the need for the firm to consider new product lines or develop new markets abroad, but seldom make more direct recommendations (e.g. you should try exporting product X to Ecuador, or you should start using this production technology). 3.3 Group Consulting Treatment The idea behind the group consulting treatment was to test whether the same gains in management improvements could be achieved more efficiently through working with small groups at a time, motivated in part through the way agricultural extension services are often implemented. The group treatment arm aimed to lower costs in two key ways. First, by working with multiple firms at once, and potentially having them also learn from one another, each consultant’s time was spread over more firms. Secondly, rather than consultants having to travel to the firms, most of the meetings took place in central meeting places such as conference rooms, cutting down on consultant travel time. Groups were formed of 3 to 8 firms located in the same region, such that members are not direct competitors to one another, but are instead producing complementary products with similar management problems. 7 These groups were formed after the randomization, in November 2013. However, unfortunately a different government budgetary entity was designated to pay for this treatment arm than that was paying for the individual treatment. This entity significantly delayed the payment, meaning that the group intervention was unable to start until over a year after the individual intervention, running six months from September 2015 to May 2016 (with different groups starting and stopping at different times, and a break over the Christmas period). Leaders from the firms in a group signed an agreement to work together and help each other improve. Like the individual treatment, the group treatment began with training classes that covered theoretical aspects of management. The difference is that these classes were delivered to the group in a classroom setting, instead of one-on-one in the firm. Each firm would send the staff in charge of a particular area or production process along to that training session. For example, when financial training was performed, firms would send the people responsible for the firm’s 7 The composed groups are 1 group of 8 firms, 4 groups of 7 firms, 2 groups of 6 firms, 1 group of 4 firms, and 1 group of 3 firms. 10 financial components to the training. These sessions lasted for a total of 40 hours per group, including a session on the topic of cooperation among firms. This was then followed by group consulting sessions, designed to help firms implement the management improvements in their diagnostics and action plans. In any given week, a group would discuss two areas, having one or two meetings focusing on a single area (for a maximum of four meetings a week per group). Only management with responsibilities over the area being discussed would participate in the meetings. The same two areas would be covered at the same time over about eight weeks. After a break over Christmas, the remaining three areas would be covered the same way. The order in which areas were discussed was not the same for each group. The group meetings would focus on the implementation of the actions agreed in the action plans of each company. Within each group, each firm had to work on the improvement of the topic that had been prioritized for a number of firms in the group, unless the firm excelled already in that topic. Therefore, each firm would still be focused on the issues that had been prioritized in the Improvement Plan but its Action Plan would be updated to include relevant issues taken from the other firms’ Improvement Plans. If a firm already excelled in topics that were central in other firms’ Improvement Plans, it would be used as an example and its experience would be discussed in detail. In the individual intervention, consultants were at the firm for all visits, so could directly see implementation attempts and problems and adjust their recommendations accordingly. In contrast, during the group intervention, it was more difficult to directly verify changes being made in logistics and production. This was solved by requiring firms to provide evidence of what they had implemented in the form of bringing photos to the group meetings. In addition, firms in the group treatment still had a monthly one-on-one meeting with senior management which took place at the plant, and one hour at the end of each meeting was used to visit the plant and review improvements. This process enabled the group intervention to be significantly cheaper than the individual intervention, with an average cost of US$10,500 per firm receiving treatment. Firms received 408 hours of consultant time each, consisting of 40 hours group training, and 92 4-hour group sessions. 11 4. Take-up, Data sources, and Attrition 4.1 Take-up The take-up rate for the individual intervention was 86.8%, with all 46 of the 53 firms which started this intervention completing it. The longer delay until beginning the group intervention reduced the take-up rate for this intervention, with 40 of the 53 firms in this group (75.4%) starting the intervention, and 36 firms (67.9%) completing it. Table A4.1 shows the baseline characteristics of those who completed the intervention are not statistically different from those who dropped out, with the one exception being that dropout from the individual treatment was more common in the Antioquia region than elsewhere. The main reasons given for drop-out from both groups were lack of owner time to participate, and lack of continuity in the program (especially for the group treatment). 4.2 Data Sources, Measurement of Key Outcomes, and Attrition Baseline data were collected from the application form and diagnostic phase and cover firm characteristics in 2013. We then use three types of follow-up data, discussed in detail in Appendix 3. The first is data on the management practices in the firm. Our main measure is the Anexo K management score, which is the average of the 141 different practices detailed in Appendix 3. This was collected by CNP during in-person visits to the firms. It was measured during the diagnostic for 156 of the 159 firms (3 of the firms had components missing), monthly from the treatment groups during the time of their interventions, as well as annually in 2014 and 2015 for the individual and control groups, and in 2015 and 2016 for the group treatment. The second type of data consists of key performance indicators (KPIs) from the firms, which were collected during in-person visits. We use this to measure impacts on firm sales and employment, as well as on defect rates, inventory levels, and energy usage. The final source of data comes from linking firms to administrative data sources on employment and exports. Obtaining data from the firms was difficult and complicated by several factors. First, a consequence of poor management is that firms did not routinely and consistently keep records of some KPIs. Firms would change the units of measurement at times from pesos to physical units, and the type of physical unit they used (e.g. from number of items to kilograms). 8 Second, data 8 These changes in units also occurred because firms would produce different products at different times, depending on what orders they received. 12 collection in the firms was conducted during on-site visits by CNP. We hired Innovations for Poverty Action to provide an independent check on this data, and to help in extracting data from the firms. But CNP had breaks in its contracts, which meant data collection halted for months at a time, and they had a long list of KPIs they wanted from firms, which increased the burden on firms of reporting. The result was that some firms dropped out of providing follow-up information, even after repeated follow-up visits seeking just a few key variables. Third, ten of the firms closed during the course of the study (4 control, 3 individual treatment, and 3 group treatment, p-value of equality of death rates 0.911). These three factors mean that we only have both employment and sales data through to December 2017 for 105 firms (69% of the sample), comprising 33 control firms, 37 individual treatment firms, and 35 group treatment firms (p-value of equality of attrition rates is 0.744). Table A4.2 compares the baseline characteristics of these firms to those that attrit, and shows that we cannot reject equality of means. Moreover, balance on baseline observables for those firms which do report is similar to our balance on the overall sample. Nevertheless, we use firm fixed effects in our estimation of impacts on firm outcomes to control further for any time-invariant differences among firms. In addition, for the employment outcome, we use administrative data to boost the sample size of firms for which we have post-intervention data. 5. Impact on Management Practices The interventions aimed to improve specific management practices covered under the 141 practices that comprise Anexo K. These practices were measured for all firms during the diagnostic phase in 2013, and then measured monthly during the implementation periods of the individual and group interventions, and again one-year post-intervention. The control group had these measured towards the end of the individual treatment intervention, and again at the time of the one-year follow-up. Figure 1 shows the trajectory of impacts on management practices for the overall Anexo K management score, and for the scores under the five separate areas of finances, human resources, logistics, marketing and sales, and production practices. We see that the individual treatment group sharply improves practices overall, and in all five areas, during the implementation phase, while the control group improves by much less. The group treatment likewise sharply improves practices for this treatment group during the implementation phase, and end up with practices at or above 13 where the individual treatment group ended. This improvement in management then persists for the following year for both groups. Figure 2 compares the distributions of management practices at baseline, and at the last follow-up, for the three groups. Kolmogorov-Smirnov tests show we cannot reject equality of distributions at baseline, but at the endline, both the individual and group treatments are significantly different from the control group (p-values 0.004 and 0.003 respectively), but are not significantly different from each other (p-value 0.643). For our regression analysis, we therefore classify our data into three periods: baseline, during the intervention (measured at the end of implementation for the individual and group treatments, and the first follow-up for the control group), and post-intervention (measured at the one-year follow- up post-intervention for the individual and group treatments, and the second follow-up for the control group). This time-shifts the data for the group treatment to account for the delay in implementation, which meant that its follow-ups took place a year later than the other two groups. We then estimate the following Ancova regression (McKenzie, 2012) for t=2 (during) and t=3 (post-intervention) that controls for the randomization triplets and the baseline level of management practices, and allows the impacts to vary during the intervention from post- intervention: , = + 1 ∗ , + 2 ∗ , 1 ∗ , + 2 ∗ , + ∑53 =1 1( ∈ ) + 1( = 3) + ,1 + , (1) Where 1( ∈ ) is a dummy for firm i being in randomization triplet g, 1( = 3) is a time period fixed effect, and the standard errors are clustered at the firm level. Table 2 presents the estimated treatment effects on these management practices. Panel A uses the unbalanced panel, which includes firms whose practices were measured in only one of the two follow-up periods, and Panel B the balanced panel of firms measured in both follow-ups. Four key results are evident. First, we see the immediate treatment impacts seen in Figure 1 are statistically significant at the 1 percent levels for both treatments. Second, these treatments persist for at least one year post-intervention. The estimated effect size is between 8 and 10 percentage points, relative to the control group implementing 56 percent of the practices by 2015. Second, the impact persists. Third, the individual and group treatments yield impacts that are similar to one another 14 in magnitude, and we cannot reject equality of treatment effects for the overall index, or for any of the five areas, in the post-intervention period. How large an effect is this improvement of 8 to 10 percentage points in management practices? It is only approximately one-third the size of the improvement of 26 percentage points found by Bloom et al. (2013) from their management intervention in India, but approximately twice the size of the typical improvement found in standard short business training courses given to smaller firms (McKenzie and Woodruff, 2016). 5.1 Which Practices Improved? The improvement in management practices is broad, occurring in Figure 1 and Table 2 across all five areas with reasonably similar magnitudes. Table A4.1 looks at the sub-index and individual practice level. The individual treatment has a positive and statistically significant impact (at the 5% level) on 23 out of the 35 sub-indices (66%), and 67 out of the 141 individual practices (48%), while the group treatment has a positive and statistically significant impact (at the 5% level) on 20 out of the 35 sub-indices (57%), and 73 out of the 141 individual practices (52%). Table A4.2 examines which practices have had the largest impacts. These are mainly practices concerning defining strategic goals and objectives, setting up master budgets, and monitoring key performance indicators. The smallest number of improvements are seen in human resource practices and logistics practices. Figure 3 plots the estimated treatment effects practice by practice for the individual and group treatments. The correlation is 0.71, showing that the two different approaches to improving management not only resulted in a similar aggregate improvement in management, but also to a similar mix of practices improved. The main area of difference occurs with several production practices related to preventative maintenance, which improved more with the group treatment than the individual treatment. Why didn’t firms change more of their management practices? Qualitative interviews suggest several explanations. A first one is delays in implementation, which caused some firms to lose interest. The consultants pointed to problems getting family-run businesses to focus on improvements, and that a lack of a data culture prevents firms from recognizing their flaws. For this reason, much of their initial focus was on getting firms to collect KPIs and to have meetings to identify problems, which, in our opinion, may have come at the expense of “quick wins” in 15 which changes in particular practices could be seen by firms to lead quickly to noticeable improvements in business outcomes. We also asked the consultants to go through a flowchart to explain why key practices identified in the diagnostic were not then implemented. This was done in early 2014 for approximately two practices per firm in 87 firms in the individual and control groups, for a total of 151 practices. Firms had heard of the practices, but were rated low in their knowledge about the practices, with 72% of firms being scored as a 1 or 2 out of 5 on knowledge of how to implement the practice. The consultants believed that external factors (<1%) and firm human and financial resources were not constraints to implementation (only 6%). In contrast, they thought that the firm owner mistakenly did not consider the practices to be profitable in 58% of cases. This is consistent with the findings of Bloom et al. (2013) that the main reasons for practices not being implemented were lack of knowledge about the practices, and firm owners not thinking the practices were worth implementing. 5.2 Robustness Checks of the Management Improvement We consider the robustness of the improvement in management practices to different weighting schemes, to sample attrition, and to alternative measurement tools. Robustness to weights: Our measures of management practices are averages of the different practices. The Anexo K overall index is an average of the 35 sub-indices, and ranges with 20 (indicating scores of 1 for every individual practice) to 100 (indicating scores of 5 for every individual practice). With any aggregate index, there is always a question as to the appropriate choice of weights, and of how sensitive the results are to alternative weighting schemes. Table 3 examines robustness to different choices of how to aggregate the 141 practices. Column 1 shows our aggregate index from Table 2. Columns 2 through 5 then consider four alternative weighting schemes. Column 2 uses the first principal component of the 141 practices; Columns 3 and 4 use lasso regression to identify the sub-set of practices which best predicts baseline log employment and labor productivity respectively, and then post-lasso regression to form the weights. This chooses 19 practices to weight according to their predictive power for employment, and 14 to weight for their predictive power for labor productivity. Finally, column 5 uses the subset of firms for which we also have baseline data from the World Management Survey, and uses lasso 16 to choose weights that best predict the baseline WMS score, which selects only 6 practices. 9 The coefficients cannot be directly compared across columns in terms of magnitudes, but can be considered relative to the control group standard deviation. The estimated treatment effects are 0.8 to 0.9 standard deviations (s.d.) when using our aggregate index, 0.9 to 1.0 s.d. when using principal components, 0.6 s.d. when weighting to predict employment, 0.8 s.d. when weighting to predict labor productivity, and 0.7 to 1.1 s.d. when weighting to predict the WMS score. Thus regardless of the choice of weights, we find the treatment impacts are positive, similar in magnitude, and statistically significant. Robustness to attrition: Appendix 6 examines robustness of our results to attrition of the management practice data. It shows that the firms for which we have endline management practice data have similar baseline management practices to those firms which attrit, and that this also holds separately by treatment status. It provides Lee bounds for the impact on management practices. These bounds are relatively narrow and positive, and statistically significant, even at the lower bound when measuring the impact during the intervention. However, since control group attrition is higher by the endline, the bounds are wider for the post-intervention period, and the lower bound for the treatment effects are positive, but not statistically significant for either treatment. However, for this lower bound to hold, it would need to be the case that the best managed control firms were the ones that attrited. We show that this is not the case in terms of either baseline management practices, nor management practices as measured in the first follow-up. Coupled with our use of a balanced panel and randomization triplet fixed effects as controls (which identifies treatment by comparing firms with similar baseline characteristics), we believe survey attrition is extremely unlikely to be driving the positive impacts found on management. Robustness to alternative measurement of management: Appendix 7 discusses our efforts to also measure changes in management using the World Management Survey (WMS) and Management and Organizational Practices Survey (MOPS). These measures are at a more general level than the Anexo K measures, and were designed for medium-sized firms of 50 or more employees, whereas our sample includes firms with as low as 10 workers. A combination of budget constraints and attrition mean that we only have this data for 70 of the 159 firms (WMS), and 95 firms (MOPS). We show that our Anexo K measures are correlated with the WMS and MOPS in the cross-section, 9 The smaller number of practices chosen is likely because of the much smaller sample for which the WMS is available. 17 but not in the panel, and that our WMS and MOPS measures appear to be noisily measured, with less predictive power for business outcomes than Anexo K. Our measured treatment impacts on these two measures are smaller in magnitude and not statistically significant. The improvement in management we obtain is thus not able to be detected using these alternative management instruments. 5.3 Correlated Practice Changes Within the Group Treatment The motivation for the group intervention suggested two possible ways in which working with firms in groups could foster improvements in management practices. A first possibility is one of coordinated experimentation and learning, whereby group members try to improve the same practice together, so are able to motivate and learn from one another. A second possibility is one of existing knowledge transfer, whereby group members are able to learn how to implement a practice from other group members who were already implementing it well to begin with. We explore the extent to which these two mechanisms are occurring in our sample by running the following regression for the change in management practice j in firm i assigned to group g: ������������������� Δ,, = + Δ,−, + max ,−, + ,, (2) −, ������������������� Where Δ,−, denotes the mean change in practice j for other members in i’s group, and max ,−, denotes the maximum level of practice j at baseline among other −, members in i’s group. We stack the 141 individual practices, and then cluster the standard errors at the firm level. Table 4 reports the results of estimating equation (2). Column 1 shows that there is a significant positive association between the change in a practice for a firm and the mean change made by other firms in their group. Column 2 shows that, in contrast, there is no significant relationship with the highest baseline level of practices observed amongst other firms in the group. Column 3 controls for both factors together, and confirms the significant and positive association with the average change made by others in the group. A one-unit change (on a 5-point scale) in the practice by others in the group is associated by a 0.1unit change by the firm. This suggests some coordinated experimentation and learning is taking place within groups, but that group members are not taking existing best practices from other group members across into their own firms. 18 6. Impacts on Firm Outcomes 6.1 Impact on Employment Data on employment comes from two sources. The first is data from the PILA (Unified Register of Contributions), which is the national information system used by firms to file the mandatory contributions to health, pensions, and disability insurance paid for workers. The second is direct from firm records, collected during our in-person visits to firms. Appendix 8 discusses the two datasets and shows the correlation between the two measures is 0.90. However, both datasets have limitations in terms of coverage: not all firms were able to be matched to the PILA data during requests made to two government agencies, and the data ends February 2017. Firms which refused to supply information in visits made in late 2017/early 2018 do not have data from firm records. Combining the two data series gives us our most comprehensive employment measure, which includes monthly data on 135 firms from January 2013 through February 2017, and for 108 firms through to December 2017. Using this combined data, the left panel of Figure 4 plots the trajectory of mean employment by treatment group, demeaned by the 2013 group means. We see that the individual treatment group reduces employment relative to the control and treatment groups at the start of the individual intervention, and this persists over time. In contrast, the control and group treatments track one another closely until the group intervention is underway. Then the control group reduces employment and the group treatment does not, so that a persistent gap in employment opens up. The right panel of Figure 4 shows the distribution of changes in employment between the months of February 2013 and February 2017 by treatment status. The control and individual interventions have changes centered on zero, although differences in arise at the top of the distribution, where the control distribution has a small mass of firms that increased employment by 50 or more workers. In contrast, the group treatment has a peak at a positive change in employment, and more of the mass with positive changes. Given the heterogeneity amongst firms in initial employment size, and the differences in coverage of the different data sources, we use firm fixed effects in estimating the treatment impacts. We estimate the following equation for firm i at time t: , = + 1 ∗ , + 2 ∗ , 19 1 ∗ , + 2 ∗ , + ∑ =1 1( = ) + , (3) Where the are firm fixed effects, During and Post indicate the periods during the individual or group interventions, and after these interventions respectively, 1(s=t) are time fixed effects, Individual and Group denote assignment to the individual and group treatment status respectively, and the standard errors , are clustered at the firm level. The randomization triplets are subsumed by the firm fixed effects here. We consider both levels and logs of employment as outcomes. We estimate equation (3) for our full set of data, as well as for balanced panels. Table 5 presents the treatment impacts on employment. The first four columns show the impacts using just the PILA data, columns 5 through 8 using data from firm records, and the last four columns using our most comprehensive measure, which combines the two datasets. The estimated impact of the group treatment is positive in all specifications, and corresponds to an increase of 3 to 7 workers post-intervention, or of 8 to 15 percent using log employment. The point estimates are smallest and not statistically significant when using just the PILA sample, and are statistically significant using 6 out of 8 specifications for the firm data and combined data. Using the firm data alone, we can reject that the group intervention has the same impact post-intervention as the individual intervention when using levels, but not when we use logs, and we cannot reject equality of treatment effects using the combined dataset. In contrast to the group treatment, we get a mix of statistically insignificant negative and positive coefficients for the individual treatment. Using the combined sample and unbalanced panel, we find a positive impact of the individual treatment on employment at the 10 percent level, but this is not robust to using the balanced panel or to using log employment. 6.2 Impact on Sales Monthly sales data were collected directly from firm record books, and are converted into millions of real (December 2017) Colombian pesos using the Producer Price Index. We have some months of post-baseline sales data for 145 firms, and data on 99 firms for the balanced panel of all 60 months between January 2013 and December 2017. Figure 5 uses the balanced panel and plots the trajectory of mean real sales by treatment group, demeaned by the 2013 treatment group means (left panel). We see the means of the three treatment groups track each other closely until the group intervention starts. Firms in the group treatment 20 then see mean sales increase relative to the other two groups, with this gap widest in the first six months after treatment, and then closing. The right panel shows the distribution of changes in annual sales between the year 2013 and year 2017. We see the control and individual treatment groups have similar distributions of change (Kolmogorov-Smirnov test of equality p-value 0.855), while we can reject equality of the individual and group distributions (p-value 0.032). The group intervention has more variation in the change of sales, with a few firms experiencing a drop in sales, and more firms also experiencing growth in sales than occurs in the other two groups. Table 6 estimates equation (3) for the level and log of sales, using firm fixed effects to account for potential baseline differences across treatments that can arise from sample attrition, firm heterogeneity, and the sample size. Columns 1 and 4 use the unbalanced panel, and columns 2, 3, and 5 the balanced panel. The group treatment has positive treatment effects on sales, of 63-71 million COP per month (USD $26,500-$29,900) in levels, or 8 to 9 percent in log terms. This treatment effect is not statistically significant compared to the control group (lowest p-value is 0.12 in column 1), but is statistically different from the individual treatment level effect post- intervention. The individual treatment effects have negative point estimates in level terms, and a point estimate close to zero post-intervention for the balanced panel for log sales. 6.3 Channels of Production Impact The results on employment and sales suggest that the group intervention may have increased the size of the firm, causing it to employ more people and sell more. In Table 7 we examine different channels through which this increase may have occurred. Column 1 considers the defect rate. Bloom et al. (2013) found quality improvements to be one of the first signs of improvement from better management in their Indian study. We only have defect data in 2017 for 78 of the firms in the study, due to many firms not keeping consistent records on defects. A first point to note is that the defect rates are low (which is one reason some firms do not record them): the control group has a mean defect rate of 0.025 and median rate of 0.007 in 2017, which compares to much higher defect rates in India (5 percent of output was scrapped, after mending of defects was done). The result is that many of the auto parts firms do not have much scope to reduce defects, and we see treatment effects that are all very close to zero and statistically insignificant. Columns 2 and 3 consider monthly inventories. In India, Bloom et al. (2013) found firms had excess inventory levels, which they reduced when management improved. Large stockpiles of 21 inventories are less common in the auto parts sector, with some firms doing job work and producing upon request. Data are only available for half the sample of firms, due to some firms not keeping records, or changing the units in which they record inventories over time. The control mean level of inventories is equal in value to1.4 months of mean sales. We see no significant change in inventories, with the sign of the coefficients changing between level and log specifications. However, the confidence intervals are wide, and include the 21 percent reduction in inventories found in Bloom et al. (2013), as well as increases in inventories of more than this magnitude. Columns 4 and 5 consider energy costs, which are another input into producing more. The data here are consistent with the group treatment firms getting larger by using more inputs to produce and sell more. They use more energy both during and post-intervention, with this increase statistically significant when measured in levels during the intervention. The log results suggest firms are using 17 percent more energy, although this is not statistically significant. In contrast, the pattern is more mixed for the individual treatment group, which has a statistically insignificant increase in energy costs when measured in levels, but statistically insignificant decrease when measured in logs. Column 6 examines whether the improvement in management has resulted in higher labor productivity (measured as real sales per worker). The percent increase in employment for group treated firms is slightly higher than the percent increase in sales, and the result is a small, and statistically insignificant, drop in labor productivity (3 percent). The individual treatment also has a small and negative point estimate on labor productivity. These results contrast with the 17 percent improvement in productivity found in India by Bloom et al. (2013). However, since the improvement in management in our experiment is only one-third that found in India, a proportional improvement in productivity would be 5.7 percent, which is within the confidence intervals of approximately [-13%, +8%] for the productivity effect found here for each treatment. Finally, columns 6 and 7 examine the extent to which any increase in sales came through exports. We use administrative data on exports, which have the advantage of being available for all firms and all months. Sixty percent of firms exported in at least one month between January 2013 and December, but on average, only 21 percent of firms export in a given month. As a result, most sales are domestic: exports are 0 percent of monthly sales for the median firm, and 3.8 percent for 22 the mean; and even conditional on exporting in a month, exports for the median exporter are only 14 percent of that month’s sales. Column 6 shows that there a small, negative, and insignificant effect on the extensive margin of whether firms export at all in a given month. Column 7 shows negative and statistically insignificant impacts on the amount exported, conditional on exporting. Thus any gains in sales have come through increased domestic sales, not through more exporting. 6.4 Comparison to Policymaker expectations In June 2014, we elicited expectations about the program’s impact on employment and productivity from 15 policymakers drawn from the Ministry of Planning (DNP), Ministry of Commerce and Tourism, SENA, and Program of Productive Transformation (PTP). The expected mean (median) treatment effect for the individual treatment was 5.7% (3%) for employment and 16.3% (10%) for productivity; while for the group treatment the expected mean (median) treatment effect was 3.3% (5%) for employment, and 7.3% (5%) for productivity. Our estimated treatment effects for the group treatment are similar in magnitude to these estimates, while the individual treatment has under-performed relative to expectations, especially on productivity. Moreover, the policymakers thought the individual treatment would have a larger impact, which is the opposite of what we find. We also asked what size impacts they would require to consider the program a success that could be scaled at the national level: the mean response was 6% for employment for both programs, and 24% for the individual program on productivity, and 13% for the group program. The estimated impact of the group intervention on employment is thus large enough to be considered a success, whereas neither program has enough of an impact on productivity to be considered a success. 6.5 Cost-Benefit Both the individual and group treatments succeeded to a similar magnitude in improving the set of management practices measured by the Anexo K. The impacts on firm outcomes are less precisely measured, but show increases in firm size for the group treatment, that in some specifications is statistically different from that of the individual treatment. The group treatment cost USD10,500 per firm for the intervention stage, compared to USD28,950 per firm for the individual treatment. The group treatment therefore clearly dominates the individual treatment on a cost-benefit basis. It is more difficult to measure whether the group treatment pays for itself, given the uncertainty associated with the sales impact, and that we lack firm profitability data over time. Baseline data suggest that profit margins are 11 percent of sales for the median firm. If we take the estimated 23 group treatment effect on sales of USD $26,500-$29,900 per month, and multiply this by the profit rate, this gives a suggested point estimate of USD$3,000 per month in profits, in which case the group treatment would pay for itself within 4 months. If the sales effect is one standard error below the point estimate, then the estimated profit effect would be approximately $750 per month, and it would pay for itself within 14 months. Since 84 percent of the distribution of treatment effects are at least this high, this suggests the group treatment would pay for itself in just over a year, and within the period over which we measure post-intervention outcomes. These cost-benefit calculations would look less promising from a government policy perspective if the gains to treated firms came from them capturing sales from control firms or from other firms outside of the experimental sample. At least within our experimental sample, firms specialize in different products (which is what allowed groups to be formed easily without having firms who are competitors), suggesting that internal validity of our estimates should not be invalidated by such spillovers. Moreover, as noted in our discussion of the setting, the sector is one where the main competitors to most firms are imports, which became more expensive with the depreciation of the peso. It therefore seems likely that any sales gains achieved by the group treatment would have mostly come from taking business away from imports. 6.6 Why did the group treatment do better than the individual? The group and individual treatments led to similar improvements in management practices, yet we only find evidence of improvements in firm outcomes for the group treatment. What explains this difference? A first possibility is that the two treatments did have similar effects, and it is just small sample sizes coupled with firm heterogeneity that prevents us from detecting this effect in the individual treatment group. Although the point estimates show larger impacts from the group treatment, we can only weakly reject equality of the treatment effects of the two interventions when looking at some specifications in levels of employment or sales, whereas when using log outcomes or looking at production channels, we cannot reject equality of impacts. A second possibility is that the group treatment may have a larger impact because it either provides a way for the improvements in management to persist longer, or because it delivers additional benefits to firms beyond the improvements they obtain in management practices. To investigate this possibility, group firms were asked approximately one year after the intervention whether they still met with other group members, and what the main benefit of meeting in a group had been. 24 None of the firms continued formally meeting together as a group, but 54 percent said they still communicate occasionally with other group members. The main benefit they saw of meeting in a group was to interchange experiences, noting the value of seeing other firms facing similar problems, and how others had solved these problems. Only four firms said they saw a possibility of using the group to find a supplier or customer, with only one giving an example of this actually happening, saying it was short-lived. This suggests that if the group treatment is having an additional effect, it is more through providing advice and specific solutions to problems firms face (as in Brooks et al, 2017) rather than through direct business relationships. 7. Conclusions The experiment of Bloom et al. (2013) provided a proof-of-concept that poor management could be improved. But moving from a pilot demonstration to a scalable program of management improvement requires lowering the cost of delivery and testing whether such a program can be locally implemented when subject to the constraints imposed by government bureaucracy. As is common with other social programs (Rossi 1987, Vivalt 2017), impacts on management are smaller when delivered by program run by a government at scale than under a small researcher pilot. Yet, both the individual and group treatments were able to improve management practices by 8 to 10 percentage points, with this resulting in an increase in firm size under the group treatment at least. As a result, the group treatment model pioneered here clearly dominates the individual consulting model on a cost-benefit basis, and offers a promising approach to scaling management. As with firms, good management also matters for the public sector (Rasul and Rogger, 2018), and there were several challenges to implementation. These included delays in contracts which caused challenges for data collection, and delays in implementation which likely reduced the effectiveness of the programs implemented. It is also possible that contracting only a single organization to implement the intervention may have led to hold-up problems and removed the performance incentives that competition among consulting firms could have provided. A Government contemplating scaling up management support programs in the least costly way therefore should consider the group extension approach, but pay careful attention to the quality of their own management in doing so. 25 References Banerjee, Abhijit, Rukmini Banerji, James Berry, Esther Duflo, Harini Kannan, Shobhini Mukerji, Marc Shotland and Michael Walton (2017) “From Proof of Concept to Scaleable Policies: Challenges and Solutions, with an Application”, Journal of Economic Perspectives 31(4): 73-102. BBVA Research (2018) “Situación Automotriz 2018 Colombia”, BBVA Research, March. Bloom, Nicholas, and John Van Reenen (2007). "Measuring and Explaining Management Practices across Firms and Countries" Quarterly Journal of Economics, 122(4), 1341- 1408. Bloom, Nicholas, Benn Eifert, Aprajit Mahajan, David McKenzie, and John Roberts (2013). "Does Management Matter? Evidence from India" Quarterly Journal of Economics, 128(1), 1- 51. Bloom, Nicholas, Aprajit Mahajan, David McKenzie, and John Roberts (2018) “Do Management Interventions Last? Evidence from India”, World Bank Policy Research Working Paper no.8339. Bloom, Nicholas, Raffaella Sadun, and John Van Reenen (2016) “Management as a Technology”. Stanford: Mimeo. Bloom, Nicholas, Erik Brynjolfsson, Lucia Foster, Ron Jarmin, Megha Patnaik, Itay Saporta- Eksten, and John Van Reenen (2018) “What Drives Differences in Management Practices?”, Mimeo. Stanford. Bold, Tessa, Mwangi Kimenyi, Germano Mwabu, Alice Ng'ang'a and Justin Sandefur (2018) “Experimental Evidence on Scaling Up Education Reforms in Kenya”, Journal of Public Economics, forthcoming. Brooks, Wyatt, Kevin Donovan and Terence Johnson (2017) “Mentors or Teachers? Microenterprise Training in Kenya”, American Economic Journal: Applied Economics, forthcoming. Bruhn, Miriam, Dean Karlan, and Antoinette Schoar (2018) “The Impact of Consulting Services on Small and Medium Enterprises: Evidence from a Randomized Trial in Mexico", Journal of Political Economy, 126(2): 635-87. Cai, Jing and Adam Szeidl (2018) “Interfirm Relationships and Firm Performance”, Quarterly Journal of Economics 133(3): 1229-1282. Chatterji, Aaron, Solene Delecourt, Sharique Hasan and Rembrand Koning (2018) “Learning to Manage: A Field Experiment in the Indian Start-up Sector”, Harvard Business School Working Paper no. 17-100. Dalton, Patricio, Julius Rüschenpöhler, Burak Uras and Bilal Zia (2018) “Learning Business Practices from Peers: Experimental Evidence from Small-Scale Retailers in an Emerging Market”, https://pure.uvt.nl/portal/files/23354244/2_WP_Dalton_et_al_Learning_from_Peers_DFI D.pdf Higuchi, Yuki, Vu Hoang Nam and Tetsushi Sonobe (2017) “Management skill, entrepreneurial motivation, and enterprise survival: Evidence from randomized experiments and repeated surveys in Vietnam”, Mimeo. https://www.canr.msu.edu/afre/uploads/files/Higuchi_Paper_1217.pdf 26 Lafortune, Jeanne, Julio Riutort and José Tessada (2018) “Role models or individual consulting: The impact of personalizing micro-entrepreneurship training”, American Economic Journal: Applied Economics, forthcoming. Londoño, Andrés (2017) “Low Productivity: the Elephant in the Room in Colombia’s Minimum Wage Debate”, Panam Post, November 28 https://panampost.com/andres- londono/2017/11/28/low-productivity-minimum-wage-debate/ McKenzie, David (2012) “Beyond Baseline and Follow-up: The case for more T in experiments”, Journal of Development Economics, 99(2): 210-21. McKenzie, David and Christopher Woodruff (2014) “What are we learning from business training evaluations around the developing world?”, World Bank Research Observer, 29(1): 48-82 Proexport Colombia (2012) “Automotive Industry in Colombia”, http://www.investincolombia.com.co/attachments/Automotive%20Industry%20in%20Co lombia%20-%20April%202012.pdf [accessed February 16, 2015] Rasul, Imran and Daniel Rogger (2018) “Management of Bureaucrats and Public Service Delivery: Evidence from the Nigerian Civil Service, Economic Journal 128 (608): 413-446 Reina, Mauricio, Sandra Oviedo and Jonathan Moreno (2014) “Importancia Económica del Sector Automotor en Colombia”, Fedesarrollo, Bogota. Rossi, Peter (1987) ““The Iron Law Of Evaluation And Other Metallic Rules”, pp. 3-20 in Joan Miller and Michael Lewis (ed.) Research in Social Problems and Public Policy volume 4. Jai Press Inc. Vivalt, Eva (2017) “How much can we generalize from impact evaluations?”, Mimeo. ANU. 27 Figure 1: Trajectory of Impacts on Management Practices Notes: Means shown by treatment status. Anexo K was measured at baseline (2013) for all firms. It was then measured monthly during implementation of the individual and group treatments, along with a one- year follow-up, and was measured for the control group at the same time as the end of the individual intervention, and at the time of the individual one-year follow-up. Vertical lines indicate approximate periods of implementation of the individual intervention (first two lines) and group intervention (second two lines). Data are for the unbalanced panel, although figure looks similar for balanced panel. 28 Figure 2: Impact on Distribution of Management Practices Notes: Kernel densities shown of Anexo K management practices at baseline, and at last follow-up, for the balanced panel of firms for which these practices were measured at all points in time. Kolmogorov-Smirnov tests of equality of distributions at baseline have p-values 0.210 (control vs individual), 0.998 (control vs group), and 0.422 (individual vs group); and at endline have p-values 0.004 (control vs individual), 0.003 (control vs group), and 0.643 (individual vs group). 29 Figure 3: The Individual and Group Treatments Improved Specific Practices to a Similar Extent Notes: Empty circles denotes that difference between the two treatments is not statistically significant at the 5% level; Solid circles indicate that difference between the two treatments is statistically significant at the 5% level; Solid diamonds indicate that difference is statistically significant at the 1% level. Correlation between group treatment effect and individual treatment effect is 0.71. 45 degree line shown. 30 Figure 4: Trajectory of Employment and Distribution of Changes in Employment Notes: Employment data are drawn from the combination of firm data and the PILA, and are shown for the 135 firms that have data for every month between Jan 2013 and Feb 2017. Left panel demeans employment by the treatment group mean in 2013. Vertical lines in left panel show the period of the individual intervention (first two lines) and group intervention (second two lines). Right panel shows the kernel density of the change in employment Feb 2013 to Feb 2017 by treatment status. 31 Figure 5: Trajectory of Sales and Distribution of Changes in Sales Notes: Sales are reported in millions of real (December 2017) Colombian pesos, and are shown for the 99 firms that have data for every month between Jan 2013 and Dec 2017. Left panel demeans sales by the treatment group mean in 2013. Vertical lines in left panel show the period of the individual intervention (first two lines) and group intervention (second two lines). Right panel shows the kernel density of the change in log sales for the year 2017 compared to the year 2013 by treatment status. 32 Table 1: Baseline Balance Means by Treatment Group p-value for testing equality Overall Sample Control Individual Group Control v Control v All 3 Mean S.D. Group Consulting Consulting Individual Group Equal Variables used for matched triplets Number of Employees 59 53 64 61 53 0.841 0.285 0.464 Small Firm (<=50 employees) 0.59 0.49 0.60 0.58 0.58 0.845 0.845 0.975 Medium Firm (>50 employees) 0.41 0.49 0.40 0.42 0.42 0.845 0.845 0.975 Cundinamarca 0.48 0.50 0.55 0.49 0.40 0.564 0.122 0.291 Valle 0.16 0.37 0.17 0.09 0.23 0.255 0.469 0.157 Labor Productivity 31 18 26 32 34 0.059 0.027 0.030 Financing Practices 51 14 51 48 53 0.225 0.508 0.164 Human Resources Practices 43 12 42 42 43 0.897 0.686 0.843 Logistics Practices 46 13 49 43 47 0.017 0.457 0.050 Marketing Practices 46 15 47 43 46 0.190 0.687 0.409 Production Practices 47 13 47 47 46 0.963 0.881 0.989 Variables not explicitly balanced on Level 2 Supplier 0.94 0.24 0.94 0.94 0.92 1.000 0.699 0.909 Metal Products 0.60 0.49 0.75 0.51 0.53 0.009 0.015 0.011 Plastic Products 0.18 0.38 0.15 0.17 0.21 0.794 0.452 0.749 Firm Age (Years) 24 14 27 23 22 0.177 0.058 0.147 Anexo K score 46 10 47 45 47 0.200 0.955 0.353 USD Sales in 2013 2715957 3387147 2134280 3345606 2703821 0.098 0.303 0.196 Export at all in 2013 0.45 0.50 0.47 0.42 0.45 0.562 0.847 0.839 Sample Size 159 53 53 53 33 Table 2: Impact on Management Practices Overall Finance HR Logistics Marketing Production Score Practices Practices Practices Practices Practices Panel A: Unbalanced Panel Individual Treatment*During Intervention 9.703*** 9.644*** 10.793*** 8.708*** 10.637*** 5.696*** (1.370) (1.852) (1.822) (1.603) (2.280) (1.806) Individual Treatment*Post Intervention 9.620*** 9.712*** 8.974*** 8.585*** 9.451*** 8.488*** (1.830) (2.413) (2.508) (2.457) (2.466) (1.993) Group Treatment*During Intervention 11.971*** 13.841*** 12.249*** 9.327*** 11.899*** 11.798*** (1.660) (2.057) (2.078) (2.047) (2.599) (1.993) Group Treatment*Post Intervention 8.544*** 9.820*** 7.156*** 5.860** 9.046*** 10.694*** (1.894) (2.306) (2.655) (2.539) (2.637) (2.048) Sample Size 225 226 226 225 226 225 P-value: Individual=Group During 0.145 0.027 0.451 0.753 0.568 0.002 P-value: Individual=Group Post 0.533 0.958 0.365 0.235 0.864 0.315 Control Mean 55.98 59.18 52.39 57.75 54.80 55.79 Control SD 10.79 13.79 11.25 14.33 12.58 11.19 Panel B: Balanced Panel Individual Treatment*During Intervention 9.861*** 10.608*** 11.111*** 8.639*** 9.072*** 6.803*** (1.756) (2.277) (2.328) (1.962) (2.985) (2.010) Individual Treatment*Post Intervention 9.757*** 10.118*** 9.463*** 8.629*** 8.568*** 8.935*** (2.014) (2.650) (2.780) (2.646) (2.723) (2.078) Group Treatment*During Intervention 12.118*** 15.094*** 12.227*** 8.942*** 11.309*** 12.688*** (2.029) (2.373) (2.583) (2.413) (3.349) (2.279) Group Treatment*Post Intervention 8.889*** 9.912*** 7.502** 6.022** 9.166*** 11.513*** (2.067) (2.490) (2.912) (2.729) (2.920) (2.157) Sample Size 202 202 202 202 202 202 P-value: Individual=Group During 0.152 0.027 0.555 0.881 0.341 0.006 P-value: Individual=Group Post 0.627 0.925 0.343 0.274 0.813 0.248 Control Mean 55.98 59.18 52.39 57.75 54.80 55.79 Control SD 10.79 13.79 11.25 14.33 12.58 11.19 Notes: Panel A is for the 124 firms for which Anexo K management practices are measured post-baseline, panel B for the 101 firms for which practices are measured both during and after intervention. Robust standard errors in parentheses, clustered at the firm level. *, **, *** denote significance at the 10, 5, and 1 percent levels respectively. Anexo K management practices are 141 management practices divided into five sub-areas. Ancova estimation controls for baseline (December 2013) mean, and time fixed effects included, along with randomization triplet dummies. Note: Group treatment moved back one period, since no control group data collected during 2016. 34 Table 3: Robustness of Impact on Management Practices to different weighting schemes Overall Principal Lasso Lasso Lasso Anexo K component Log Employ. Productivity WMS Panel A: Unbalanced Panel Individual Treatment*During Intervention 9.703*** 6.014*** 0.227*** 7.065*** 0.079** (1.370) (0.946) (0.085) (1.238) (0.036) Individual Treatment*Post Intervention 9.620*** 6.012*** 0.286** 8.297*** 0.140*** (1.830) (1.217) (0.115) (1.811) (0.041) Group Treatment*During Intervention 11.971*** 7.266*** 0.403*** 9.269*** 0.240*** (1.660) (1.177) (0.090) (1.463) (0.040) Group Treatment*Post Intervention 8.544*** 5.512*** 0.301*** 7.596*** 0.225*** (1.894) (1.220) (0.106) (1.706) (0.040) Sample Size 225 200 213 217 221 P-value: Individual=Group During 0.145 0.208 0.020 0.111 0.000 P-value: Individual=Group Post 0.533 0.658 0.862 0.670 0.043 Control Mean 55.98 5.59 2.46 43.01 0.93 Control SD 10.79 6.03 0.47 9.66 0.20 Panel B: Balanced Panel Individual Treatment*During Intervention 9.861*** 6.048*** 0.273** 7.302*** 0.100** (1.756) (1.327) (0.119) (1.602) (0.049) Individual Treatment*Post Intervention 9.757*** 5.972*** 0.309** 8.451*** 0.148*** (2.014) (1.402) (0.122) (2.003) (0.044) Group Treatment*During Intervention 12.118*** 7.494*** 0.445*** 9.624*** 0.263*** (2.029) (1.525) (0.118) (1.781) (0.051) Group Treatment*Post Intervention 8.889*** 5.736*** 0.361*** 8.009*** 0.242*** (2.067) (1.416) (0.111) (1.914) (0.043) Sample Size 202 178 190 194 198 P-value: Individual=Group During 0.152 0.174 0.032 0.114 0.000 P-value: Individual=Group Post 0.627 0.844 0.539 0.797 0.032 Control Mean 55.98 5.59 2.46 43.01 0.93 Control SD 10.79 6.03 0.47 9.66 0.20 Notes: Panel A is for the 124 firms for which Anexo K management practices are measured post-baseline, panel B for the 101 firms for which practices are measured both during and after intervention. Robust standard errors in parentheses, clustered at the firm level. *, **, *** denote significance at the 10, 5, and 1 percent levels respectively. Anexo K management practices are 141 management practices divided into five sub-areas. Ancova estimation controls for baseline (December 2013) mean, time and triplet fixed effects. Principal Component takes the first principal component of the 141 practices. Remaining columns using Lasso to choose the subset of practices that best predict log baseline employment, log labor productivity, and the WMS baseline management score respectively, with post-Lasso coefficients then providing the weightings on the different practices used. 35 Table 4: Correlation of Practice Changes Within Groups Dependent Variable: Change in Practice between Baseline and Endline (1) (2) (3) Mean Change in Practice for other Group Members 0.100* 0.104** (0.050) (0.049) Maximum Baseline Level of Practice for Other Group Members 0.001 0.014 (0.021) (0.019) Sample Size (Firms*Practices) 5069 5210 5069 Mean Change in Practices 0.168 0.171 0.168 Notes: Regression uses the stacked panel of 141 practices for firms in the group treatment. Robust standard errors in parentheses, clustered at the firm level. *, **, and *** denote significance at the 10, 5, and 1 percent levels respectively. 36 Table 5: Impact on Employment PILA Data Firm Data Combined Data Employment Log Employment Employment Log Employment Employment Log Employment Individual Treatment*During Intervention -0.576 -1.595 -0.037 -0.100 -3.012 -5.627 -0.017 -0.054 -3.067 -2.791 0.006 -0.015 (3.624) (3.901) (0.075) (0.066) (2.912) (3.499) (0.040) (0.043) (3.137) (3.170) (0.042) (0.040) Individual Treatment*Post Intervention 0.223 -1.510 -0.015 -0.077 -2.150 -4.121 0.040 0.026 1.168 -0.001 0.120* 0.063 (4.559) (5.072) (0.090) (0.083) (3.742) (4.261) (0.052) (0.057) (3.900) (3.958) (0.068) (0.055) Group Treatment*During Intervention 1.372 1.481 0.055 0.042 3.837* 2.264 0.102** 0.085* 5.155* 3.860 0.132* 0.057 (3.868) (4.642) (0.103) (0.104) (2.268) (2.582) (0.039) (0.046) (3.042) (3.155) (0.075) (0.065) Group Treatment*Post Intervention 3.200 3.522 0.100 0.077 5.874** 4.233 0.128*** 0.129** 7.296** 6.854* 0.139* 0.111 (4.536) (5.018) (0.113) (0.121) (2.849) (3.264) (0.049) (0.056) (3.345) (3.467) (0.075) (0.074) Balanced Panel No Yes No Yes No Yes No Yes No Yes No Yes Sample Size (N*T) 8522 6944 8513 6944 7299 5760 7298 5759 8725 7877 8719 7876 Number of Firms 156 112 156 112 145 96 145 96 157 135 157 135 P-value: Individual=Group During 0.724 0.627 0.473 0.311 0.058 0.062 0.033 0.023 0.062 0.140 0.131 0.351 P-value: Individual=Group Post 0.602 0.414 0.372 0.250 0.072 0.097 0.159 0.125 0.209 0.173 0.826 0.567 Control Mean in 2013 49.0 52.0 3.432 3.637 56.1 49.5 3.666 3.591 56.1 58.6 3.666 3.726 Control S.D. in 2013 42.8 40.0 1.099 0.844 51.3 41.4 0.864 0.789 51.3 52.7 0.864 0.839 Notes: Fixed effects regressions with time and firm fixed effects. Standard errors clustered at the firm level in parentheses. *, **, *** denote significance at the 10, 5, and 1 percent levels. Columns 1-4 are formal employment, taken from administrative records of the PILA. Data cover Jan 2012 to Feb 2017. Columns 5-8 are firm employment data, taken from firm records. Data cover Jan 2013-Dec 2017. Columns 9-12 combine the two datasets to maximize the number of firms with employment data, and cover Jan 2013-Dec 2017. 37 Table 6: Impact on Sales Monthly Sales Log Monthly Sales Individual Treatment*During Intervention -18 -38 -22 0.064 -0.033 (29) (35) (30) (0.051) (0.043) Individual Treatment*Post Intervention -54 -75 -38 0.034 0.003 (59) (65) (37) (0.061) (0.063) Group Treatment*During Intervention 52 51 44 0.075 0.079 (52) (59) (53) (0.059) (0.065) Group Treatment*Post Intervention 71 68 63 0.088 0.074 (46) (50) (48) (0.069) (0.075) Balanced Panel No Yes Yes No Yes Winsorized at the 99th percentile No No Yes No No Sample Size (N*T) 7343 5940 5940 7335 5932 Number of Firms 145 99 99 145 99 P-value: Individual=Group During 0.263 0.222 0.305 0.901 0.190 P-value: Individual=Group Post 0.109 0.095 0.099 0.503 0.407 Control Mean in 2017 388 407 407 5.298 5.339 Notes: Coefficients are from fixed effects regressions with time and firm fixed effects, with standard errors clustered at the firm level. *, **, *** denote significance at the 10, 5, and 1 percent levels. 38 Table 7: Channels of Production Impact Defect Inventories Energy Costs Labor Productivity Export at Log Rate Levels Logs Levels Logs Log Sales/Worker all exports Individual Treatment*During Intervention -0.008 -63 -0.185 544 -0.079 0.016 0.018 -0.116 (0.008) (75) (0.224) (926) (0.063) (0.046) (0.019) (0.211) Individual Treatment*Post Intervention -0.008 -78 0.118 1430 -0.038 -0.024 -0.009 -0.108 (0.005) (180) (0.268) (1079) (0.159) (0.054) (0.026) (0.195) Group Treatment*During Intervention 0.000 79 0.049 1718** 0.155 -0.003 -0.017 -0.271 (0.004) (103) (0.189) (831) (0.094) (0.049) (0.026) (0.170) Group Treatment*Post Intervention -0.005 28 -0.169 1072 0.156 -0.033 -0.039 -0.114 (0.005) (121) (0.259) (821) (0.145) (0.059) (0.028) (0.137) Sample Size (N*T) 3879 3875 3849 5121 5121 5591 8904 1983 Number of Firms 78 76 76 97 97 100 159 96 P-value: Individual=Group During 0.400 0.199 0.332 0.379 0.063 0.762 0.251 0.586 P-value: Individual=Group Post 0.600 0.652 0.350 0.761 0.422 0.897 0.311 0.978 Control Mean in 2017 0.025 554 5.150 8564 8.063 1.771 0.212 9.602 Notes: Regressions control for firm and time fixed effects, and are restricted to samples with data available in December 2017. Defect rate is the proportion of production that is faulty; inventories are in millions of real (December 2017) pesos; energy costs are in thousands of real (December 2017) pesos. Labor productivity is defined as log real sales (in millions of pesos) per worker. Export at all is a dummy variable that takes value one if the firm exported directly abroad in the past month, and zero otherwise; Log exports is the log of the USD value of the amount exported in the month, and is conditional on exporting taking place. Standard errors clustered at the firm level. *, **, and *** denote significance at the 10, 5, and 1 percent levels respectively. 39 ONLINE APPENDIX Appendix 1: Examples of Products Manufactured Appendix 2: Timeline Appendix 3: Data Appendix Appendix 4: Drop-out and Attrition Appendix 5: Impacts on Individual Management Practices Appendix 6: Robustness of Management Improvement to Sample Attrition Appendix 7: Impacts on World Management Survey and MOPS management measures Appendix 8: Comparison of PILA and Firm Employment Data 40 Appendix 1: Examples of Products Manufactured Air Filters Glass Panels Rubber parts Metal parts Plastic parts Tires Injection molding/cushioning GPS tracking services 41 Appendix 2: Timeline April 12, 2012: Pilot program officially launched and firms invited to apply June 25, 2012: Deadline for firms to apply to the program June 11, 2013: Diagnostic phase starts October 30, 2013: Diagnostic phase ends November, 2013: Random assignment to treatment status 2013: World Management Survey administered to subsample of 72 firms with 40+ workers, as well as to random sample of 180 firms representative of Colombian manufacturing sector March-November 2014: Individual Consulting Intervention September 2015-April 2016: Group Consulting Intervention November to December 2015: Round 1firm data collection (individual, group and control treatment) January to February 2016: Round 2 of firm data collection (individual and control treatment) March to April 2016: Round 3 of firm data collection (control treatment) June 2016: Round 4 of firm data collection (group treatment) November 2016: Second round of World Management Survey administered November 2017-July 2018 : Last round of firm data collection from firms Note: firm data collection would collect all months of data available from firm records during in-person firm visits. Timing of when this was extracted from firms varied according to CNP’s contractual agreements, in which they were paid for batches of data collection at a time. 42 Appendix 3: Data Appendix A3.A. Management practices indicators: The 141 management practices defined by CNP can be divided into five main areas: Finance, Production, Logistics, HR, Marketing. Each of these areas can be itself divided into five to eight sub-areas. The score of the five main areas is the average of the score of their sub-areas. Below we discuss each of these sub- areas and explain which practices were considered to calculate their score. At the most basic level, each single practice is graded on the following scale: 1 = “Not existing”, 2 = “In construction”, 3 = “Formalized”, 4 = “Implemented”, 5 = “Operating under control”. For some indicators, the 1 to 5 scale does not exactly refer to the implementation stage of a practice, instead it indicates how developed or optimized a specific aspect is – for instance whether strategical goals and individual responsibilities are clear to each worker. Such information was collected in three stages: during the diagnostic phase, during the intervention, and once a year after the intervention. Human Resources i. Strategic objectives leverage on people’s talent The first aspect of Human Resources relates to the alignment of employees’ objectives with corporate strategy, and to the clarity of such objectives for each employee. Here we consider four components. The first one evaluates how strategic objectives leverage on people’s and teams’ talent. The second component assesses whether there are human talent development plans, and whether these leverage on corporate strategy. The third component assesses whether a strategic plan is defined, that includes clear objectives and goals concerning human talent. The last component assesses whether the skill development plans are defined also for the operational level. ii. Competency-based management model for human talent development The focus of this measure is on whether the company manages employee competences – based on the business strategy – in order to develop human talent. It is comprised of two measures. The first one assesses whether human resources are monitored based on their impact on the strategic objectives of the organization. The second component addresses the development of work profiles, which must be defined and aligned with business competencies. iii. Organizational structure prepared to contribute to the achievement of strategic goals The third sub-area evaluates whether the formal and informal structure of the organization allows the realization of corporate strategy. Is there a formally defined structure? Are all roles well defined at every level of the organization? Three measures are taken into consideration. The first one evaluates if the management’s focus is on processes which are aligned with the strategy of the firm. The second one assesses whether a communication system between the different processes of the organization has been developed. The last measure assesses whether a communication system between the different levels of the organization has been developed. iv. Program of human talent development (according business competences) This measure evaluates how the organization works on building and retaining human talent to achieve a competitive advantage over the competition. Two components are considered: Management of 43 development plans (career plans) for employees at managerial level, and the level of application of the sector’s technical norms for the development of technical operational competences. v. Organizational climate The focus of this sub-area is the management of a work climate. Work climate must be appropriate for the development of Human Capital and directed towards the achievement of corporate strategy. We consider three components. Is there a culture of monitoring work climate, as strategic lever? Are there programs to improve work climate? At which level are risks for health and safety controlled? vi. Social responsibility within the enterprise Here we evaluate how the company manages its internal social responsibilities. This measure is comprised of three components. The first one assesses whether there are programs of improvement of the family environment of employees, in order to incentivize their productivity. The second one verifies whether a formal contracting system is in place, which generates wellbeing and productivity in workers. The last one evaluates the implementation of a system of recognition and retribution of new ideas and improvement suggestions at the operational level. vii. Promotion of an open-communication/high-performance organizational culture, and of a culture of high personal involvement Three measures are considered for this indicator. Did the company develop a culture of control and periodic monitoring of result achievement? How developed is the performance-based reward system for the management? How developed is the performance-based reward system for employees at the operational level? Production i. Alignment of functions at the operational, managerial and directive level The first sub-area of Production focus on whether all people working in the plant know the corporate strategy and work to realize it. To achieve this, it is necessary that all workers and processes have improvement goals aligned with corporate strategy. This measure is comprised of five components. The first two evaluates the implementation and monthly monitoring of strategic goals between the Plant Manager and his/her supervisor. The third and fourth components assess whether strategical goals and individual responsibilities are clear to each worker, and whether each worker has improvement goals. The last component assesses whether the performance of teams at the operational level is evaluated based on the strategic goals. ii. Definitions and management of the most important operational processes Here we evaluate how operational processes are defined and managed, from the order to the delivery of the final product. Do they allow to accomplish the strategy (Standards, Policies, Roles, 5s, Layout, Established Processes)? This sub-area includes six components. The first one evaluates whether processes are well identified and have a proper description (VSN, SIPOC). The second one assesses whether the plant layout allows optimal material flow. The third one concerns the implementation of a 5S program in the plant. The fourth one evaluates how bottle necks are identified and managed. The last two components evaluate standards, specifications and work instructions used by workers, and how these are verified by supervisors. 44 iii. Formal method to measure and manage the plant’s efficiency (Waste, Hours paid/Service capacity, machinery’s efficiency) The third sub-area evaluates how the company measures and manages the main KPIs of the plant, such as team efficiency, efficiency in the use of material, response time, etc. The first of components of this sub- area concerns the monthly measure of the plant’s KPIs (OEE, Waste, Defects, Lead time, Others). The second indicator concerns weekly or bi-weekly management of KPIs’ goals (OEE, Waste, Defects, Lead Time, Others). The third one assesses whether improvement programs for KPIs (times and quality) are developed applying instruments of plant management. The last one assesses whether a culture of daily recollection of facts and data is in place, in order to demonstrate improvement in processes. iv. Recollection of information regarding results, continual improvement, and performance of processes Here we assess how the company is managing data and information regarding processes, results and continuous improvement. The four components of this sub-area are the following: Is there a culture of visual management with daily-updated graphs of machinery performance? Are duration and quality of each process recorded daily by the responsible worker? Does the Administrative Management make sure that monitoring instruments are in good condition and precise? Is there a monitoring and sampling plan to capture the information necessary to the improvement of processes? v. Process to detect and solve anomalies in the execution of tasks The focus of this sub-area is to evaluate how anomalies in processes are managed within the plant. It is comprised of five components. The first one assesses whether there is a mechanism so that workers report anomalies of time and quality to their supervisors. The second one assesses whether criteria are defined to realize analysis of anomalies. The third one concerns the daily analysis of time and quality anomalies by supervisors and workers. The fourth one assesses whether supervisors and workers manage improvement plans to eliminate time and quality anomalies. The last component concerns job descriptions, and whether they include responsibilities of anomalies solving. vi. Technical planning of production based on the analysis of demand The focus of sixth sub-area is the planning of production. Is such planning based on a statistical analysis of clients’ orders? Does such planning guarantee the flexibility necessary to achieve a high level of service? Four components constitute this sub-area. The first one assesses whether meetings to revise programming take place between production and sales areas. The second component evaluates the use of statistic methods to collect information and analyze production programming, according to demand variation. The third one evaluates production planning to ensure the availability of material for the monthly, weekly and daily program. The last component evaluates monitoring and management of service to clients (deliveries in quality, time and quantity). vii. Management of safety during the process, contingencies, emergencies / impact on the environment Here we assess how the company monitors its impact on people and environment, which actions are undertaken to mitigate any negative impact, and how it complies with safety and environmental norms and regulations. This sub-area is comprised by five measures. The first one concerns the compliance with safety requirements, laws and norms. The second measure assesses whether the necessary norms and standards of safety within the plant are well defined. The third one evaluates the management of the indicators of industrial safety within the plant (number of accidents, level of noise, temperature). The fourth one concerns 45 monitoring and management of the plant’s environmental impact. The last measure assesses compliance with the norms regarding evacuation routes and cleared zones for fire-fighting equipment. viii. Maintenance guarantees the optimal condition of infrastructure The last sub-area of Production evaluates the maintenance plan, how maintenance is monitored and managed and how maintenance is related to the creation of value by the enterprise. All this is paramount to guarantee optimal condition of machinery, furniture, equipment and tools. This measure reflects the following four points. Is there a preventive maintenance plan for the equipment? Are technicians able to rapidly repair damage to the machines? Are replacements available, so to allow to rapidly repair damage to the machines? Does Maintenance Management work with indicators such as MTTR, MTBF, Availability? Logistics i. Process of alignment of functions at the operational, managerial and directive level The first sub-area of Logistics looks at the alignment of functions, and at the deployment of the organizational strategy. It is comprised of three components. The first one concerns the implementation of strategic goals between the Logistics Head and his/her supervisor, and whether there are specific projects to achieve such goals. The second component assesses whether there is a monthly control of strategic goals by the Plant Manager and the supervisor. The last component concerns the alignment of employees’ objectives in the logistics area with the firm’s strategic goals. ii. Structure and management of the supply chain (planning, purchases and provisions, storage raw material, plant supply, storage finished product, distribution, client service) Here we evaluate if employees in the logistics area understand their roles and activities. In this sub-area there are four measures. The first one evaluates procedures and work instructions for logistics processes. The second measure is concerned with the layout of the areas of logistic operations in the supply chain. The third component assesses if a 5S plan for the supply chain is in place. The last component evaluates monitoring and management of KPIs in the logistic process (inventory, lead time, service level). iii. Planning and management of demand / alignment of productive and logistic processes This sub-area evaluates the procedure through which demand is planned and the reaction to changes in the established plan. Here we have four distinct components. The first one assesses whether a statistical system is in place, in order to study and analyze demand. The second component concerns the definition of the demand’s planning, and whether such definition is updated with annual, trimestral and monthly frequency. The third component evaluates whether communication between logistics and the areas of marketing and sales goes through a system that includes rules to change the production plan. The last component evaluates the way a firm monitors and manages the compliance with the budgets of production planning. iv. Planning, management and control of inventories of raw material, supplies, product on process and finished product (Inventory Policies) This sub-area evaluates the design of the inventory system, and the maintenance of inventory levels. The five components upon which this measure is based are the following. The first one assesses whether the levels of inventory (raw material, semi-finalized product WIP, finished product) are kept at an optimal level 46 related to the variation in demand. The second component assesses whether the inventory movement it is recorded daily and controlled weekly. The third component states whether a methodology of classification of inventory ABC is in place, in order to establish policies of inventory, supply, storage and control accordingly. The fourth component verifies the use of MRP systems, where product structures are defined, in ways that allow to plan the material needed to comply with production orders. The last component evaluates whether processes are in place, so to guarantee the rotation of inventory according to “First in, first out” schemes. v. Supply system This sub-area concerns the relation with suppliers, the way in which suppliers are evaluated, and the control the firm has over realized purchases. It is comprised of five measures. The first one concerns the management of policies and processes for the selection and evaluation of suppliers. The second measure concerns the management of suppliers’ development. The third measure focusses on the management of raw material prices and supplies. The fourth measure assesses whether Lead Time of suppliers is managed and taken into account in the planning of material supply. The last measure assesses whether purchased items are verified in terms of quantity, quality and opportunity of delivery. vi. Storage system Five components are taken into account while evaluating the storage system. The first one is the management of the inventory of obsolete and non-compliant products. The second one is the implementation of a system to administrate storage locations (layout and 5S). The third one evaluates the implementation of industrial security norms in the warehouse’s operations. The fourth one concerns the use of standards and procedures in the storage operations (picking and packing). The last component evaluates the monitoring and improvement of the storage operation time (picking and packing). vii. Distribution system This last sub-area of Logistics concerns the delivery of the created value to the client. It is comprised of four components. The first one evaluates efficiency in the processes of loading and unloading. The second one evaluates monitoring and management of the efficiency in the delivery process (perfect deliveries). The third component concerns the management of transport routes to reduce costs. The fourth component evaluates the management of reverse logistics for those products, materials or supplies that have to return to the company’s premises. The last component evaluates whether the management of distribution takes into account the current legislation regarding freight transit. Marketing i. Elaboration, management and control of the marketing plan This measure evaluates the design of the guiding document of commercial activities and its alignment to the organization’s strategy. Such indicator is comprised of seven components. The first two assess the implementation of an analysis of trends (economic, commercial, technological, political and social) and of risks (e.g. free commerce, supply, variations in exchange rate, infrastructure, etc.). The third indicator evaluates the segmentation of products, technology, clients, consumers, etc. The fourth component assesses whether commercial strategies are based on contribution margins. The fifth component evaluates the alignment of the marketing and sales plan with Business Strategy. The sixth indicator assesses whether price, promotion and growth policies are defined using the contribution margins. The last indicator addresses monitoring of sale behavior and trends, and of changes in the marketing plan. 47 ii. Processes of market research This measure indicates how the company conducts market research, and is composed by three components. The first one addresses if and how the company conducts inquiries with clients and potential clients. The second one assesses whether the company conducts periodic monitoring of competitors’ offers. The last component evaluates if and how the company conducts research of marketers and/or distributors. iii. Client and after sales service This measure evaluates the company’s approach to client satisfaction and is comprised of four measures. The first one evaluates the management of clients’ complaints and requests. The second measure concerns the analysis of products’ performance in the market. The third measure assesses whether in the company there is a culture of continuous improvement of products and services. The last component verifies if the company holds periodic meetings to discuss clients’ feedback. iv. Sales management This sub-area focusses on the elaboration, management and control of the sales plan. We consider five indicators. The first three assesses whether the company is holding three different type of meetings: with the distribution channels (to capitalize opportunities in the market), planning meetings between sales and production, and meetings of the sales group to analyze sales behavior and trends. The fourth component assesses whether periodic training of the sales team takes place. The last indicator states whether sales agents are evaluated based on performance. v. Relationship management This measure is built on three components evaluating whether the company conducts three types of evaluation studies: of its cooperation with suppliers, of its cooperation with clients, and of its cooperation with competitors. Finance i. Alignment of the financial process with corporate strategy Four components indicate whether strategic objectives and goals are clear at all levels of the financial process, and whether everyone is committed to such goals. The first component refers to the alignment of the Financial Head and Deputy Head with corporate strategic goals. The second component indicates whether a system of monitoring and control of financial goals and objectives is in place. The third indicator refers to the frequency in which financial objectives and goals are achieved. The last component evaluates the financial support to the management processes of the organization. ii. Structure of the administrative and operational information system The administrative information system is evaluated based on monitoring and controlling of processes, in its effectiveness of analysis and decision making. This is reflected in five measures. The first measure evaluates the structure of the corporative information system. The second one assesses whether the setup of administrative and operational business’ information is appropriate. The third one states if Product Structures are associated with cost and profitability margins (standard, estimated, reals). A fourth indicator refers to the protection of the corporative information system, whereas the last one evaluates the organization of the corporative information system. 48 iii. Formulation and management of budgets This sub-area evaluates how the firm formulates and manages budgets. The measure is comprised of four components. The first two focus on the existence of a Master Budget (operational, financial and of investment) and on its control and monitoring (agendas, finances, investment). The third component assesses Tax Planning, and the last one evaluates how deviations from Master Budget are analyzed (regarding costs, expenses, sales, working capital, investment). iv. Financial management of results The fourth component of Finance reflects how well the company monitors and manages indicators of financial management, and how it analyses them to undertake corrective action. Three components build this measure: the first evaluates the structure of control and monitoring indicators (KPIs), the second one the agenda of financial management meetings, and the third one how working capital is managed. v. Programs of financial improvement (costs and expenses, working capital, investment) This sub-area evaluates how projections and saving goals are realized. It is comprised of three components answering the following three questions: is there a program of efficient administration of costs and expenses? Is there an action plan for the compliance with financial improvement programs? Is the available financial information appropriate? vi. Analysis and management of investment projects This sub-area evaluates the process which the firm uses to plan, realize and follow up the purchase of fixed assets. This measure is made of three components. The first component assesses if a program of calculation of investment projects exists and if it is aligned with strategy. The second one verifies whether there is a policy regarding capital investment (CAPEX) and other smaller investments. The last one concerns the implementation of cost-benefit analysis for the different projects and firm’s investments. vii. Information systems The second-last sub-area of finance evaluates if the information systems are interrelated and if strategies are in place to safely conserve information. Three aspects are considered here: the recollection and storage structure of the administrative information system, recollection and storage structure of the operational information system, and validation of information. viii. Structure of the costing system The last sub-area of finance evaluates whether the costing system supplies real and updated information, so to identify cost anomalies in any process. The first of four components reflects the implementation of a costing systems. The second component assesses if results (value estimates and real) are being validated. The last two components evaluate absorption capacity of installed structure and workforce efficiency. 49 A3.B Key Performance indicators Every variable is recorded monthly. Defect rate: this is defined as the ratio of faulty production to total production. Faulty production is defined as not in condition to be sold, and is determined by the firm. There are several key measurement issues with this measure. First, firms vary in whether they record production in physical units (e.g. number of items, kilograms) or in pesos. Secondly, some firms would calculate this product only for a specific production line or product, and not for the whole plant. Thirdly, in a few cases, firms changed the way they measured these units over time. IPA and CNP worked together to identify these cases, and the series we use is for the set of firms with a consistent measure. Energy cost: Cost of the energy in thousands of pesos. Firms are instructed to record the cost of the energy for each month not the bill they paid that month (which refers to the energy used the previous month). Some firms incorrectly recorded the energy bill of that month – which refers to the energy cost of the previous month. However, it was generally possible to correct this during the recollection meetings. In a couple of cases, firms did not record this variable in pesos, but in KW. It has not been possible to correct this discrepancy during data collection, and data are not available for those firms. Net sales: Total sales (gross sales) minus devolutions (discounts, etc.). This is taken directly from the Profit & Loss Statement (P&L) or records of the firms. Average monthly inventory: Stock of final product that is in conditions to be sold (in pesos). Most firms do not keep inventory – for instance because they work on a project schedule. CNP instructed firms to record a missing value if they don’t keep inventory. Other firms record physical inventory every three or six months – not monthly – in which cases during the other months they record a missing value. Some firms keep include in their inventory figures semi-finalized products, not only finalized product. In a limited number of cases, firms did not record inventory in pesos, and it was not possible to correct the values. Total employees: All employees of the firm which are considered "stable or long term", independently of the contract type. There are no standard criteria to define what a "long term" employee is. This is defined by each firm. They calculate it considering the totality of the firm. A3.C Gathering of performance data During the diagnostic phase CNP gave to each firm a specifically designed spreadsheet to track the monthly evolution of KPIs in each of the five main areas (Finance, Production, Logistics, HR, Marketing). CNP also trained each firm to use these spreadsheets. Every firm received such training, which was done before randomly assigning firms to the two treatment groups and the comparison group. Periodically, CNP would visit firms to verify the monitoring of KPIs and resolving any doubt. This information was then recollected during 4 rounds, the first of which took place in July 2015 as described in Appendix 1. The recollection followed this procedure: staff from CNP and IPA would attend a firm’s board meeting, at the end of which the spreadsheets would be revised and KPIs discussed. CNP’s representative would guide the discussion, going through every single indicator, whereas IPA’s analyst would contribute to the data revision and record any relevant information. Special effort was put into ensuring that the data were recorded homogenously across firms and time, also given that some of the information dated back to 2013. During every meeting, inconsistencies were corrected in the use of missing variables, zeros, units, and definitions. Moreover, any anomaly in the evolution of KPIs was also discussed in depth. 50 One challenge stemmed from the fact that not all firms found the use of the provided spreadsheets equally useful. Some firms were therefore filling the spreadsheets only sporadically, and at the same time were using other ways of tracking KPIs as their main instrument, or were not tracking them properly. Other firms were not filling the spreadsheets at all, unless CNP would visit them and help them to do so, which meant that in some cases data were not recorded for months. This resulted in a loss of information, which was sometimes impossible to correct. Another major challenge was that – especially as far as production variables are concerned – CNP did not give strict prescriptions to firms as to how interpret and record variables. This caused differences in the interpretation of variables between firms. Two types of inconsistency are the most frequent: regarding units and regarding whether the variable refers to a production line or to the whole plant. For instance, some firms have recorded the same production variable as “value in pesos” while others recorded it as “number of pieces”. Others have filled “total production” with data regarding their main production line, not regarding the whole plant as it was planned. The freedom in interpreting variables also caused variability in the units used within a given firm, which might have recorded different variables in different ways. Finally, in a limited number of cases there were changes in the way a firm would interpret the same variable over time, and also changes in the way a variable was measured. Given that the freedom to use the spreadsheets in a flexible way was considered by CNP to be part of the intervention, during data collection the only available measure to mitigate these discrepancies was to carefully record any information and explanation. 51 Appendix 4: Drop-Out and Attrition Table A4.1 shows that the firms that completed the interventions are similar on baseline characteristics to those which dropped out. Table A4.1: Comparison of Baseline Characteristics of Firms that Completed Interventions to Drop-Outs Individual Treatment Group Treatment Dropped p- Dropped p- Completed Out value Completed Out value Number of Employees 62.2 54.4 0.746 52.9 53.1 0.981 Small Firm (<=50 employees) 0.59 0.57 0.940 0.58 0.59 0.974 Medium Firm (>50 employees) 0.41 0.43 0.940 0.42 0.41 0.974 Cundinamarca 0.54 0.14 0.049 0.42 0.35 0.665 Valle 0.09 0.14 0.645 0.25 0.18 0.559 Labor Productivity 32 30 0.780 32 39 0.278 Financing Practices 48 50 0.730 53 52 0.855 Human Resources Practices 42 40 0.738 44 43 0.784 Logistics Practices 43 43 0.989 49 43 0.175 Marketing Practices 43 44 0.934 46 46 0.948 Production Practices 46 54 0.229 47 44 0.371 Level 2 Supplier 0.93 1.00 0.496 0.92 0.94 0.758 Metal Products 0.50 0.57 0.731 0.47 0.65 0.242 Plastic Products 0.15 0.29 0.390 0.19 0.24 0.738 Firm Age (Years) 23.3 21.8 0.829 20.9 24.6 0.375 Anexo K score 44.4 46.5 0.679 47.8 45.7 0.487 USD Sales in 2013 3158858 7547448 0.189 2767765 2469362 0.799 Export at all in 2013 0.43 0.29 0.465 0.47 0.41 0.687 Sample Size 46 7 36 17 Table A4.2 compares the characteristics of those firms for which we have December 2017 sales and employment data to the attritors, and then shows the sample of non-attritors is reasonably well balanced on baseline characteristics. 52 Table A4.2: Comparison of Baseline Characteristics of Non-Attritors to Attritors, and Balance on Non-Attiting Sample Full Sample Sample of Non-Attritors Non-Attritors Attritors p-value Control Individual Group p-value Number of Employees 58.9 59.8 0.921 54.9 68.2 52.9 0.441 Small Firm (<=50 employees) 0.58 0.61 0.716 0.67 0.51 0.57 0.426 Medium Firm (>50 employees) 0.42 0.39 0.716 0.33 0.49 0.43 0.426 Cundinamarca 0.50 0.43 0.349 0.58 0.51 0.43 0.480 Valle 0.16 0.17 0.939 0.18 0.08 0.23 0.174 Labor Productivity 30 32 0.460 26 32 32 0.054 Financing Practices 51 51 0.964 51 48 53 0.154 Human Resources Practices 44 40 0.069 45 43 44 0.906 Logistics Practices 47 44 0.147 50 44 48 0.106 Marketing Practices 46 44 0.281 47 45 47 0.841 Production Practices 47 45 0.480 47 48 46 0.867 Level 2 Supplier 0.94 0.93 0.679 0.94 0.95 0.94 0.993 Metal Products 0.57 0.65 0.353 0.79 0.46 0.49 0.004 Plastic Products 0.15 0.22 0.276 0.09 0.16 0.20 0.404 Firm Age (Years) 24.1 24.1 0.997 27.6 24.6 20.2 0.085 Anexo K score 47.0 44.9 0.218 48.1 45.5 47.6 0.538 USD Sales in 2013 2877978 2252395 0.342 2043854 3515012 3013064 0.133 Export at all in 2013 0.47 0.41 0.480 0.48 0.46 0.46 0.969 Sample Size 105 54 33 37 35 Notes: Attrition defined as not having firm sales and employment data reported from firm records in December 2017. This can arise from firms refusing to provide this information, as well as from firm death. P-value in column 3 is for a t-test of equality of means by attrition status. Columns 4 through 6 provide baseline means by treatment status for the sample of non-attritors. P-value in column 7 is for F-test of equality of means. 53 Appendix 5: Impacts on Individual Management Practices Table A5.1 shows the breakdown of significant improvements in management practices within the Anexo K index: Table A5.1: Summary of Impacts at the Sub-Index and Individual Practice Level Sub-Indices Individual Practices # sig. # # sig. Ind. # sig. Group # # sig. Ind. Group Finances 8 6 5 29 17 15 HR 7 3 2 20 11 6 Logistics 7 5 2 31 8 9 Marketing 5 3 3 22 9 13 Production 8 6 8 39 22 30 TOTAL 35 23 20 141 67 73 Note: lists number of practices that are statistically significant at the 5% level post-intervention. Table A5.2 details the individual management practices that have treatment effects of 0.8 or more (on a 5-point scale). Table A5.2: Practices that increase by 0.8 or more from at least one-treatment Individual Group Finance Practices System of monitoring and control of financial goals in place 0.827*** 0.666*** (0.175) (0.189) Frequency at which financial objectives and goals achieved 0.802*** 0.648*** (0.205) (0.212) Existence of a Master Budget 0.718*** 1.163*** (0.263) (0.259) Control and Monitoring of Master Budget 0.765*** 1.016*** (0.226) (0.241) How deviations from master budget analyzed 0.909*** 1.070*** (0.244) (0.265) Structure of Control and Monitoring Indicators (KPIs) 0.935*** 0.956*** (0.247) (0.237) Agenda of Financial Management Meetings 1.055*** 1.055*** (0.230) (0.222) HR Practices Strategic objectives leverage people's and team's talent 0.833*** 0.631*** (0.206) (0.214) Human talent development plans linked to corporate strategy 0.809*** 0.902*** (0.200) (0.215) Strategic plan defined, that includes clear goals for human talent 0.951*** 0.910*** 54 (0.207) (0.194) Marketing Practices Implementation of analysis of marketing trends 0.485** 0.867*** (0.227) (0.196) Implementation of analysis of marketing risks 0.630*** 0.898*** (0.230) (0.226) Alignment of marketing and sales plan with business strategy 0.663*** 0.825*** (0.216) (0.227) Monitoring of sale behavior and trends 0.719*** 0.901*** (0.209) (0.224) Production Practices Implementation of strategic goals between plant manager and supervisor 0.616*** 0.966*** (0.176) (0.173) Monthly monitoring of strategic goals between plant manager and supervisor 0.686*** 0.895*** (0.215) (0.207) Strategic goals and roles clear to each worker 0.670*** 0.896*** (0.166) (0.162) Each worker has improvement goals 0.562*** 0.892*** (0.188) (0.170) Bottlenecks are identified and managed 0.514*** 0.842*** (0.179) (0.194) Monthly measurement of plant KPIs 0.822*** 0.857*** (0.193) (0.200) Weekly or bi-weekly management of KPIs 0.851*** 0.650*** (0.227) (0.212) Improvement programs for KPIs developed 0.927*** 0.989*** (0.223) (0.220) Culture of visual management with graphs of machine performance 0.810*** 0.515** (0.210) (0.212) Supervisors and workers manage improvement plans for quality anomalies 0.802*** 0.944*** (0.187) (0.215) Notes: robust standard errors in parentheses, clustered at the firm level. *** denotes significance at the 1 percent level. Coefficients are treatment effects post-intervention, and control for time effects, randomization strata, and Appendix 6: Robustness of Management Improvements to Sample Attrition Table A6.1 shows the availability of our management score data by time period and measure. The greatest data availability is for the Anexo K measure, but this still suffers from attrition, while the WMS and MOPS data are available for subsets of the same only. 55 Table A6.1: Management Data availability by measure and time period # Firms with Data by Treatment Measure Period Control Individual Group Data source Anexo K management score 2013 52 51 53 Anexo K collected by CNP 2014 42 46 0 Anexo K collected by CNP 2015 26 40 35 Anexo K collected by CNP 2016 0 0 36 Anexo K collected by CNP WMS management score 2013 26 24 27 WMS collected by LSE 2016 20 19 31 WMS collected by IPA MOPS management data 2012 28 33 34 Collected retrospectively by IPA 2017 28 33 34 Collected by IPA Figure A6.1 compares the distribution of baseline management practice data for firms which attrit and do not have endline (2015 for the control and individual treatment, 2016 for the group treatment) Anexo K data. We see that the distribution of those with and without follow-up management data is similar, both for the full sample, and when we split by treatment status. We cannot reject equality of distributions between attritors and non-attritors using a Kolmogorov- Smirnov test of equality of distributions. This shows that attrition is not selective on initial management practices. 56 Figure A6.1: Distribution of Baseline Anexo K Management Practices by Whether or Not Endline Management Data are Missing Notes: Kolmogorov-Smirnov tests of equality of distributions of baseline management practices between firms with missing endline management data and firms with endline management data have p-values 0.979 (all firms), 0.995 (control firms), 0.754 (individual treatment), and 0.425 (group treatment). Note that our main estimates of the treatment effect are for a balanced panel, and include randomization triplet fixed effects. Coupled with the above analysis which shows no selection on baseline management practices into having follow-up data, and Figure 2 which shows clearly the change in distribution of practices for this balanced panel, this suggests our main results are not being driven by selective attrition. Nevertheless, as a further sensitivity check, Table A6.2 provides Lee bounds for the treatment impacts. Table A6.1 shows we have substantially more control firms reporting management practices in 2014 than 2015, so less trimming is required when estimating the impact during the year of intervention than for the post-intervention impact. We see that both the treatments have significant impacts even at the lower bound for the during intervention period. In contrast, the bounds become wider for the post-intervention period. If all the additional firms that attrited from the control group were the best managed firms, then we could not conclude the intervention had had a positive effect. We can examine this assumption using the control firms that attrited between 2014 and 2015. The 16 control firms that attrited had first follow-up (2014) Anexo K scores with a mean of 51.4, while the 26 control firms that did not attit had 2014 mean Anexo 57 K scores with a mean of 52.8 (p-value 0.72). Thus, not only is there no evidence of selective attrition on baseline management practices, neither is there evidence of endline selective attrition based on first follow-up management practices. This strongly suggests that the assumption that it was all the best-managed firms in the control group that differentially attrited is very unlikely to hold, so that the Lee lower bound is unlikely to be applicable. Table A6.2: Lee Bounds of Impact on Anexo K Score Individual Treatment Effect Group Treatment Effect Impact during intervention Lee lower bound 6.303** 9.368*** (2.723) (3.290) Lee Upper bound 9.746*** 16.610*** (3.065) (2.851) Impact post-intervention Lee lower bound 1.076 4.784 (3.628) (3.218) Lee Upper bound 13.993*** 13.913*** (3.011) (3.158) Sample Size 106 106 Proportion trimmed for during intervention 8.7% 16.7% for post-intervention 35.0% 27.8% Notes: robust standard errors in parentheses. *, **, and *** denote significance at the 10, 5, and 1 percent levels respectively. Appendix 7: Impacts on World Management Survey and MOPS management measures WMS 2013 Data Collection We commissioned the London School of Economics (LSE) team responsible for the Bloom and Van Reenen (2007) World Management Surveys (WMS) to apply their methodology to a random sample of 180 firms representation of the Colombian manufacturing sector, as well as to a sub- sample of 77 firms in our sample, focusing on firms with 40 or more employees (Table A6.1). Interviews were done by phone with a manager with thorough knowledge of the production process, typically the plant manager or production manager. The WMS interview is structured as a guided discussion, and is designed to be answered by a manager with thorough knowledge of the production process, typically the production or plant manager. Such discussion lasts between one hour and one hour and a half, and covers the 18 questions related to operations, monitoring, targeting, and people management. The interviewer guides the interviewee by means of open questions, letting him/her speak freely but making sure to have the necessary objective information to score each of the 18 topics using the provided scoring grid. Each of the 18 topics receives a score between 1 (no modern practice is implemented) and 5 (best practice). 58 A first use of this survey was to be able to compare the management practices of the auto parts sector in our sample to that of Colombian manufacturing as a whole. Figure A7.1 shows that the distribution of management practices in our firms is similar to that of all SME manufacturing firms in Colombia. A second purpose was to enable comparison of Colombia to the rest of the world. Figure A7.2 shows Colombia’s average management practices score of 2.54 are poorly managed by global standards, but typical for many developing countries, just below that of India and just above Kenya. The mean management practices score for the auto parts firms of 2.38 is similar. Figure A7.1: Comparison of WMS Management Practices Distribution of our Auto Parts firms to a Representative Sample of the Colombian Manufacturing Sector Source: WMS surveys conducted of 180 Colombian manufacturing firms and 77 auto parts firms conducted by the LSE WMS team in 2013. 59 Figure A7.2: Comparison of Colombian World Management Survey Management Score to Other Countries United States 3.308 Japan 3.230 Germany 3.210 Sweden 3.188 Canada 3.142 Great Britain 3.033 France 3.015 Australia 2.997 Italy 2.978 Mexico 2.899 Poland 2.887 Singapore 2.861 New Zealand 2.851 Northern Ireland 2.839 Portugal 2.826 Republic of Ireland 2.762 Chile 2.752 Spain 2.748 Greece 2.720 China 2.712 Turkey 2.706 Argentina 2.699 Brazil 2.684 Africa India 2.611 Vietnam 2.608 Asia Colombia 2.578 Kenya 2.549 Nigeria 2.516 Oceania Nicaragua 2.397 Myanmar 2.372 Europe Zambia 2.316 Tanzania 2.254 Ghana 2.225 Latin America Ethiopia 2.221 Mozambique 2.027 North America 1.5 2 2.5 3 3.5 Average Management Scores, Manufacturing Source: World Management Surveys, Nick Bloom. WMS 2016 Data Collection In September 2016, we asked Innovations for Poverty Action (IPA) to conduct a second round of the World Management Survey (WMS). The LSE provided support in training the four analysts that conducted the interviews, the two supervisors and the research associate responsible for the survey. All material was provided by the LSE and the training took place in October 2016. Since the WMS is designed for larger firms, we chose as a sample frame the 109 firms in our sample that had had at least 25 employees at baseline. This consisted of 37 control, 41 group treatment, and 31 individual treatment firms. Out of these 109 firms, we were able to collect data on 70 firms (20 control, 31 group, 19 individual), of which 50 firms had also been interviewed in 2013 (14 control, 22 group, 14 individual). This response rate of 64% is double the standard WMS response rate, reflecting the pre-existing contacts with these firms through the project. Of those companies not interviewed, 3 had closed down, and the remainder either refused, or repeatedly rescheduled and could not be interviewed. 60 Management and Organizational Practices Survey (MOPS) Our final measure of management practices comes from a 16-question survey given to firm owners in 2017, derived from the Management and Organizational Practices Survey (MOPS). This survey was created by the U.S. Census bureau, and was designed to enable basic management practices to be measured in a self-administered survey format. The survey asks questions related to monitoring, targeting, and incentives, and is intended to measure similar concepts to the WMS (Bloom et al, 2018). It was carried out by Innovations for Poverty Action during in-person visits to the firms, and firms were also asked to recall what these practices were five years earlier (in 2012). Table A6.1 shows this data were able to be collected for 95 firms. Associations between different measures of management and over time The WMS and MOPs are collected in a much less in-depth way than the Anexo K, and measure different aspects of management. Table A7.1 looks at the baseline correlations between different measures. At baseline, the Anexo K management score has a correlation of 0.26 with the WMS management score, and 0.23 with the MOPS score. By way of comparison, the 38 management practices in Bloom et al. (2013) had a 0.40 correlation with the WMS score. The Anexo K is most highly correlated with the monitoring component of the WMS (correlation of 0.44). When we examine the five areas of the Anexo K, the finance, logistics and production scores are more highly correlated with the WMS than the HR and marketing scores. Recall the WMS does not measure marketing practices, and there is a difference in emphasis in how the two focus on human resource practices. The WMS is more focused on how good and bad performers are hired and rewarded, whereas the Anexo K has more of an emphasis on organizational culture and links to overall business strategy. Notably, while the MOPS and WMS are intended to measure similar concepts, the correlation between the 2012 (recalled) MOPs management score and the WMS is only 0.08, suggesting substantial noise in this measurement. Table A7.1: Correlations between baseline Management Measures WMS WMS WMS WMS WMS MOPS Overall Operations Monitoring Targets People Overall Anexo K Overall Score 0.26 0.16 0.44 0.04 0.11 0.23 Finance Score 0.28 0.22 0.46 0.07 0.07 0.15 HR Score 0.14 0.09 0.33 -0.08 0.03 0.17 Logistics Score 0.23 0.12 0.32 0.07 0.13 0.31 Marketing Score 0.09 0.03 0.12 0.02 0.06 0.10 Production Score 0.26 0.14 0.40 0.07 0.13 0.17 MOPS Overall 0.08 0.00 0.04 0.07 0.10 1.00 Figure A7.3 plots the cross-sectional and panel associations between measures. We see that the endline Anexo K has a cross-sectional correlation of 0.34 at endline with both the WMS and MOPS, and that the WMS and MOPS at endline still only have a correlation of 0.27. More starkly, there is no relationship between the WMS and Anexo K in the panel: firms which improve the most according to the Anexo K are unrelated to those which improve the most according to the WMS. This is also true of the association between changes in the MOPs and changes in the WMS. 61 Recall that the WMS is done double-blind by phone, with enumerators scoring firms on a five- point scale. While there is signal in the responses, this also entails a lot of noise. Bloom et al. (2016) report that the test-retest correlation when two different people from within a plant answered the same questions within a few weeks of one another is only 0.51. In our case, there is an added factor of the baseline being done by the LSE team, while the endline was collected by Innovations for Poverty Action (after training from the LSE team). As such, we should expect much of the change over time in the WMS to reflect measurement error, which can make it difficult to detect treatment effects. Figure A7.3: Cross-sectional and panel correlations between management measures Notes: first column shows cross-sectional correlations pre-treatment, second column shows cross-sectional correlations post-intervention for last measurement obtained by each method, and third column shows correlation of change in management (pre-post) according to each measure. To investigate which of the three management measures is most strongly correlated with business outcomes of interest, we regress baseline log employment and labor productivity on each management measure separately, and then on all three together. The results are shown in Table A7.2. The Anexo K score is strongly associated with both log employment and labor productivity at baseline (both significant at the 1% level), while the WMS and MOPS have weaker associations. When all three measures are included together, the Anexo K measure remains statistically 62 significant, while neither other measure is significant. This suggests the Anexo K measure has a stronger signal for business outcomes than these two alternatives. 63 Table A7.2: Baseline Association of Business Outcomes with Management Measures Log Employment Labor Productivity Anexo K Score 0.035*** 0.017*** 0.672*** 0.877*** (0.006) (0.006) (0.140) (0.186) WMS Management Score 0.250* 0.086 4.914 -0.652 (0.134) (0.153) (4.070) (5.310) MOPS Management Score 0.869* -0.554 8.994 -2.894 (0.465) (0.459) (8.650) (12.164) Sample Size 156 77 95 46 156 77 95 46 R-squared 0.19 0.05 0.03 0.14 0.14 0.01 0.01 0.25 Notes: Anexo K management practices are 141 management practices divided into five sub-areas. WMS is World Management Survey, taken for subsample of firms in 2013. MOPS is Management and Organizational Practices Survey, and was conducted in 2017, with recall of practices 5 years earlier used to obtain baseline measure. Robust standard errors in parentheses, *, **, *** denote significance at the 10, 5, and 1 percent levels respectively. 64 Treatment Effects on WMS and MOPS measures of management Table A7.3 reports the estimated treatment impacts on the WMS and MOPS measures. Since these data are only available for a subset of our firms, we report several different specifications. In Panel A, we use all 70 firms for which follow-up WMS data are available (or the 95 firms with MOPS data for the last column). We do not control for randomization triplet fixed effects given that this would result in relatively few triplets being included. Instead, panel A includes no other controls, while Panel B controls linearly for key baseline variables used in the randomization (region, size, employment, labor productivity, and baseline Anexo K). Panels C through E then use the set of 50 firms for which both baseline and endline WMS data are available. In panels A and B, we find very small and statistically insignificant impacts of either treatment on any of the WMS or MOPS management measures. Restricting to the sample for which we also have baseline data in panels C, D and E results in larger point estimates for the WMS, but the impacts are still far from statistically significant. Our results show that both treatments resulted in significant increases in the Anexo K measure of management practices, and in each of its five subcomponents. This raises the question of why we do not see such a change in the WMS and MOPS? A first potential explanation is that the WMS and MOPS are only available for subsamples of the data, so that the difference in results could stem from sample composition and sample size. To investigate this hypothesis, Table A7.4 re- estimates the management treatment effect regressions for common sub-samples. The first column repeats our estimated impact on the Anexo K measure for the balanced panel. Columns 2 and 3 then consider the 52 firms for which we have both the 2016 WMS and Anexo K measured during and after the intervention. We continue to see a statistically significant impact of the individual treatment on the Anexo K measure using this sub-sample both during and post-intervention, and a significant impact of the group treatment during the intervention, with the magnitude of the estimated effect only falling in a substantive way for the group treatment post-intervention, although with a wide confidence interval. In contrast, there is no significant impact on the WMS using this same sample. The foot of the table converts the estimated treatment effects into confidence intervals expressed in terms of standard deviation changes in the respective management practice. We see that not only are the WMS treatment effects statistically insignificant while those for the Anexo K outcome are statistically significant, but the 95 percent confidence interval for the effect of the individual treatment effect does not even overlap for the two outcomes. This suggests that the lack of impact on the WMS is not simply a matter of the sample composition or statistical power. Likewise, when we restrict to the same sample as the MOPS in columns 4 and 5, we find significant treatment impacts on the Anexo K, and no significant impact on the MOPS, although in this case the confidence intervals do overlap. 65 Table A7.3: Impact on Other Measures of Management Practices WMS WMS WMS WMS WMS MOPS Overall Operations Monitoring Targets People Score All firms interviewed in 2016 Panel A: No controls Individual Treatment 0.040 0.100 0.152 -0.045 -0.003 -0.008 (0.169) (0.345) (0.225) (0.238) (0.156) (0.034) Group Treatment 0.075 0.035 0.152 0.041 0.053 0.013 (0.170) (0.298) (0.209) (0.230) (0.153) (0.031) Panel B: Baseline Controls Individual Treatment -0.000 -0.030 0.095 -0.076 -0.007 -0.005 (0.166) (0.307) (0.235) (0.243) (0.152) (0.032) Group Treatment 0.061 0.009 0.094 0.094 0.025 0.018 (0.166) (0.276) (0.210) (0.231) (0.162) (0.030) Sample Size 70 70 70 70 70 95 Control Mean in 2016 of outcome 2.92 2.90 3.28 2.94 2.61 0.52 Control S.D. in 2016 of outcome 0.55 1.07 0.68 0.79 0.54 0.13 50 firms interviewed in WMS in 2013 & 2016 Panel C: No Controls Individual Treatment 0.143 0.321 0.314 -0.086 0.131 0.010 (0.218) (0.423) (0.256) (0.311) (0.199) (0.051) Group Treatment 0.283 0.357 0.312 0.225 0.284 0.064 (0.216) (0.363) (0.254) (0.293) (0.183) (0.045) Panel D: Baseline Controls Individual Treatment 0.029 0.123 0.153 -0.188 0.074 -0.011 (0.204) (0.388) (0.257) (0.304) (0.197) (0.055) Group Treatment 0.242 0.238 0.210 0.276 0.241 0.066 (0.203) (0.350) (0.266) (0.286) (0.175) (0.049) Panel E: Baseline Controls + Ancova Individual Treatment 0.072 0.233 0.168 -0.160 0.133 -0.009 (0.199) (0.394) (0.252) (0.299) (0.199) (0.055) Group Treatment 0.267 0.335 0.232 0.296 0.214 0.068 (0.214) (0.372) (0.276) (0.302) (0.163) (0.048) Sample Size 50 50 50 50 50 46 Control Mean in 2016 of outcome 2.88 2.89 3.24 2.96 2.51 0.53 Control S.D. in 2016 of outcome 0.65 1.13 0.76 0.90 0.56 0.14 Notes: Each panel represents treatment impacts from a separate regression. 70 of the 159 firms were given the WMS survey in 2016, of which 50 had also received this survey in 2013. Panels A and C regress outcomes on treatment dummies only. Panels B and D add controls for dummies for the Cundinamarca and Valle regions, a dummy for having 10 to 50 workers at baseline, the number of employees in 2013, labor productivity in 2013, and the 2013 Anexo K management practice score. Panel E also controls for the baseline value of the outcome measure. Robust standard errors in parentheses. *, **, and *** indicate significance at the 10, 5, and 1 percent levels. 66 Table A7.4: Impact on Anexo K on Same Samples as WMS and MOPS Balanced Panel WMS Sample MOPS Sample Anexo K Anexo K WMS Anexo K MOPS Individual Treatment*During Intervention 9.413*** 8.350*** 9.669*** (1.760) (2.229) (1.879) Individual Treatment*Post Intervention 9.309*** 8.325*** -0.210 9.657*** 0.017 (1.821) (2.368) (0.176) (1.856) (0.036) Group Treatment*During Intervention 11.384*** 7.602** 11.143*** (2.202) (3.164) (2.438) Group Treatment*Post Intervention 8.155*** 3.911 -0.132 7.549*** 0.040 (2.124) (3.091) (0.174) (2.318) (0.034) Sample Size 202 104 52 172 86 Control Mean 55.98 60.1 2.93 57.44 0.49 Control SD 10.79 6.98 0.41 10.23 0.12 Implied 95% confidence intervals in S.D. Individual Treatment*Post Intervention [0.53,1.19] [0.53,1.86] [-1.35,0.33] [0.59,1.30] [-0.45,0.73] Group Treatment* Post Intervention [0.37,1.14] [-0.31,1.42] [-1.15,0.51] [0.29,1.18] [-0.22, 0.89] Notes: Column 1 is for the 101 firms for which Anexo K management practices are measured both during and post intervention. Columns 2 and 3 restrict to the subset of 52 firms that also had the WMS measured in 2016, Columns 4 and 5 restrict to the subset of 86 firms that also had the MOPS measured in 2017. Regressions control for baseline (December 2013) Anexo K mean, time fixed effects, and controls for region baseline labor productivity, baseline number of employees, and for being a small firm at baseline. Robust standard errors in parentheses, clustered at the firm level. *, **, *** denote significance at the 10, 5, 1 percent levels respectively. A more compelling explanation for the lack of impact on the WMS is due to this measure not being as able to pick up the types of changes in management practices that come from this intervention. A first reason for this is just the general noise in the measure, as discussed above. This noise means that much of the change in the WMS over time may reflect measurement error, making it difficult to detect treatment effects. But a second reason is that the WMS measures practices at a more general level than the level of specificity at which interventions are focused. Evidence in support of the idea that the WMS is not able to pick up the specific changes in practices that these consulting type interventions bring about comes from the India experiment that initially motivated this work. Bloom et al. (2013) report that their treatment plants increased their use of the 38 specific management practices they measure by 37.8 percentage points, significantly larger than the change for the control firms. They asked Accenture to also apply the WMS survey instrument to these firms during this post-intervention measurement phase. However, Accenture did not receive the LSE training on applying this survey instrument, and appear to have graded firms more harshly, with a mean WMS score of 1.45, compared to a baseline mean of 2.69 when conducted by the LSE team. Despite the large change in management practices observed in the 38 management practices used in Bloom et al. (2013), there is no significant difference in the follow- up WMS scores in this case (mean of 1.43 for the treated firms, 1.49 for the control firms, p-value = 0.693). So, as with our Colombian case, if one were to rely on the WMS to measure whether changes in management had occurred, the conclusion would have been that the Indian interventions had no significant effect on management. 67 Appendix 8: Comparison of PILA and Firm Employment Data The PILA is the platform through which firms pay social security data for their employees. We had to request that government ministries with access to this data attempt to match our firms. This was done twice. First, the department of statistics (DANE) matched to the firm data between January 2014 and June 2016. Secondly, the Ministry of Health matched our firms to their database, covering the period January 2011 through February 2017. Matching firms was not trivial, with firms’ names not always given, the identification number of the company changing if the economic activity changes or some other features change, and at times the same firm being listed under the name of the owner versus the firm. Through a lot of manual matching, DANE was able to match more of our firms than the Ministry of Health, so we have more observations for their data period than we do afterwards. Our PILA series uses the Ministry of Health extract as a base, and then adds in observations which are missing in the Ministry of Health data but are present in the DANE dataset – as well as replacing cases where the DANE dataset appears to have identified the firm when the Ministry of Health has only identified the owner. Table A8 shows the availability of our employment data. We see the PILA data covers most firms, but the balanced panel for which we have data from Jan 2012 to February 2017 is for 112 firms. We have data for some of 2016/17 for 110 firms from the firm records, but only 96 firms have data for every month from Jan 2013 to December 2017. Combining the two datasets gives 135 firms with data for every month from Jan 2013 to February 2017. Table A8: Availability of Employment Data All Firms Control Individual Group PILA Employment: Any data 156 52 51 53 PILA Employment: balanced panel 112 38 34 40 Survey data employment: any 2016/17 data 110 35 37 38 Survey data employment: balanced panel 96 30 36 30 Combined Employment: balanced panel 135 45 42 48 Figure A8 shows a scatterplot of the employment reported in the PILA and the employment taken from the firm’s records for the set of 5,870 year-month-firm observations for which we have data from both sources. The correlation is 0.90 over the full period, and the mass of points lie close to the 45-degree line. However, we do see a few points which have very low levels of employment reported in the PILA, and higher levels in firm records. These likely reflect informal employment. 68 Figure A8: Employment Reported in PILA vs Employment Reported by Firms 69