Improving Management with Individual and Group-Based Consulting:
                       Results from a Randomized Experiment in Colombia *

                                      Leonardo Iacovone, World Bank
                                       William Maloney, World Bank
                                       David McKenzie, World Bank


                                               October 22, 2018
                                                    Abstract
Differences in management quality are an important contributor to productivity differences across
countries. A key question is then how to best improve poor management in developing countries.
We test two different approaches to improving management in Colombian auto parts firms. The
first uses intensive and expensive one-on-one consulting, while the second draws on agricultural
extension approaches to provide consulting to small groups of firms at approximately one-third of
the cost of the individual approach. Both approaches lead to improvements in management
practices of a similar magnitude (8-10 percentage points), so that the new group-based approach
dominates on a cost-benefit basis. Moreover, we find some evidence that the group-based
intervention led to increases in firm size over the next 1.5 years, including a statistically significant
increase in employment, while the impacts on firm outcomes are smaller and statistically
insignificant for the individual consulting. The results point to the potential of group-based
approaches as a pathway to scaling up management improvements.

Keywords: Management, Employment, Scaling-Up Interventions, Colombia.
JEL codes: O14, O32, L2, M2




*
  The authors gratefully acknowledge the collaboration of Paula Toro Santana, the staff of CNP and DNP in Colombia,
and project management and research assistance provided by Cosma Gabaglio, Camilo Andrés Gutiérrez Silva, Pablo
Villar, María Aránzazu Rodríguez Uribe and Innovations for Poverty Action Colombia. Funding is gratefully
acknowledged from the DIME i2i Trust Fund, the Knowledge for Change Program (KCP), the World Bank and the
IPA SME Initiative, as well as intervention funding from SENA. This study is registered in the AEA RCT registry
AEARCTR-0000528. Since no identifying information was collected on human subjects, the study was exempted
from the Innovations for Poverty Action IRB. Comments from participants in seminars at the IDB, the Management
Practices in the Private and Social Sectors conference, MIT Sloan, and Williams College are greatly appreciated.

                                                        1
   1. Introduction
There are large differences in the management practices used by firms within and across countries
(Bloom and Van Reenen, 2007). These differences are strongly correlated with productivity, with
Bloom et al. (2014) estimating that differences in management can account for 30 percent of cross-
country productivity differences. An experiment with 17 textile firms in India provides a proof-of-
concept that intensive individualized consulting can deliver lasting improvements in the practices
of badly managed firms, resulting in productivity improvements of 17 percent (Bloom et al, 2013;
Bloom et al, 2018). However, the intervention was implemented by an international consulting
company under close supervision from researchers, and cost $75,000 per treated firm. This high
cost is likely to be prohibitive for many small and medium enterprises (SMEs) to finance
themselves, and for governments seeking to scale-up this to assist large numbers of firms.

This paper seeks to test two approaches that governments can use to scale-up management
improvements. The first is to use a very similar intervention of intensive individualized consulting,
but to use local teams of consultants to deliver the intervention at a lower cost of approximately
$30,000 per firm. The second, more novel, intervention is a group-based approach that aims to
deliver improvements at lower cost (around $10,000 per firm), and to leverage group-learning
dynamics, inspired by the approach used in the delivery of agricultural extension services. We
conduct an experiment to measure the impact of these two competing interventions on SMEs in
the Colombian auto parts manufacturing sector. Our sample of 159 firms with an average size of
58 employees, randomized into three groups of 53 firms, is an order of magnitude larger than that
used in Bloom et al. (2013) and enables us to measure the impact of such a program when
implemented at a multi-million dollar scale by a government.

We show that the Colombian auto parts sector has similar levels of management practices to start
with as the average Colombian manufacturing firm, which is low by global standards. Both the
individual and group-based interventions lead to improvements in management of similar
magnitudes of 8 to 10 percentage points. This improvement is broad-based, with improvements in
just over half of a detailed set of 141 practices measured. We then track firms for 1.5 to 2.5 years
post-implementation. We find evidence that the group-based interventions has grown the treated
firms, with a statistically significant 3 to 7 worker (8 to 15 percent) increase in employment relative
to the control group; 8 to 9 percent growth in sales which is not statistically significant when


                                                  2
compared to the control (p=0.12), but is statistically greater than the impact of the individual
treatment; and with higher energy input usage. In contrast, we find smaller and statistically
insignificant impacts of the individual-based treatment. Neither treatment has a significant increase
in productivity, but, given that the improvement in management practices is approximately one-
third that in India, we can also not rule out that productivity improved by the 5 or 6 percent that
would be predicted by extrapolating from the Indian case. The group-based intervention clearly
dominates the individual intervention on a cost-benefit basis, and, although there is considerable
uncertainty associated with this estimate, we estimate that the group-based intervention is likely
to pay for itself in terms of higher firm profits within the first year.

This work contributes to at least three literatures. The first is a general literature on improving
business practices and management in firms. Most of this literature has focused on short training
courses and microenterprises (see McKenzie and Woodruff, 2014 for a review). However, several
studies show the potential of more intensive individualized consulting to improve management in
small and medium enterprises. In addition to Bloom et al. (2013)’s work in India, this includes
Bruhn et al. (2018) with firms averaging 14 workers in Mexico, and Higuchi et al. (2017) with
firms averaging 20 workers in Vietnam. Secondly, while we are not aware of other studies that
directly test group-based versus individual consulting, a recent literature has highlighted the ability
of firms to improve their business practices when formed into groups or paired with other firms
that can serve as role models (e.g. Cai and Szeidl 2018, Chatterji et al. 2018, Dalton et al. 2018,
Lafortune et al. 2018). Finally, this paper contributes to a broader literature on how to scale-up
policies from promising researcher pilot studies (e.g. Banerjee et al, 2017, Bold et al, 2018). Our
results show the promise of group-based consulting as a pathway to greater scale.

    2. Context and Sample
    2.1 Choosing the Industry and Sample
Labor productivity in Colombia is low, with it taking around four Colombian workers to produce
what one worker does in the United States (Londoño, 2017). As a result, improving productivity
is a priority for government policy. The Government of Colombia was interested in testing whether
the productivity improvements from better management demonstrated in India by Bloom et al.
(2013) could be achieved at a larger scale in Colombia. In order to test different approaches, they
wanted to choose a sector that was thought to have sufficient numbers of firms, to have production

                                                   3
in a number of locations throughout the country, was thought to have some potential for growth,
and was thought to be similar enough to other industrial sectors in the country that the results from
this pilot could be applicable to other industries. These criterion led to the selection of the auto-
parts sector. This sector consists largely of second-tier suppliers to large car manufacturers,
producing parts like fenders, tires, suspension parts, plastic parts, paints, etc. that are sold to the
assemblers that supply directly national and international car manufacturers as well as to retailers
of spare parts. Appendix 1 provides some examples of the products. The auto parts sector in
Colombia employs approximately 25,000 workers, and sells both to car and bus manufacturers
within Colombia, as well as exporting approximately $US500 million each year, with Ecuador,
Venezuela and Brazil the main export markets (Proexport Colombia, 2012).

Public announcement of the program was made in April 2012 (Appendix 2 contains the full
timeline), and firms were also informed of the program through the car manufacturers such as
Sofasa (which assembles Renault cars in Colombia), General Motors, and Busscar (which
manufacturers buses). To be eligible firms had to be legally registered, in business for at least two
years, be a first or second-tier supplier to the automobile industry, and be located in one of the four
provinces Antioquia, Cundinamarca, Valle and Eje Cafetero. The firms were told the program
would offer assistance in improving production practices in order to improve profitability,
productivity and competitiveness, and that the program would not require any payment by the
firms, but that they would need to commit time and effort of their workforce to supply information
required and to implement suggestions made.

Public provision of the program to firms was justified both with reference to the overall policy
objective of improving productivity, as well as due the the presence of several market failures that
prevent firms from improving management on their own. A first issue is that of information: many
badly managed firms do not know they are badly managed, with data from the World Management
Survey showing that Colombian managers perceive their firms to be slightly better managed than
U.S. firms, when the reality is substantially worse management. 1 Secondly, even if firms know
they need to improve, they may be unable to identify which providers can offer good services, may




1
 Colombian firms had an average WMS score of 2.50 in 2014 (described below), but an average perceived score of
3.76. In contrast, U.S. firms had an average WMS score of 3.32, and perceived score of 3.57.

                                                      4
lack the financial resources to pay for consulting, and a lack of insurance may prevent them from
investing in an activity with uncertain payoffs.

218 firms applied for the program. 180 of these were accepted in the preliminary step, with the
remainder rejected for being too small, or for only being distributors rather than manufacturers of
parts. 11 firms then dropped out, so 169 firms formed the group to take part in the first, diagnostic,
phase of the project. Following the diagnostic, we dropped firms with fewer than 10 workers, to
leave a sample of 159 firms for the experiment.

    2.2 Random assignment and firm characteristics
Firms were randomly assigned to three groups of 53 firms each. Since the number of firms in each
group would be small, we aimed to improve balance on observables by forming matched triplets
of firms, choosing this grouping in a way to minimize the Mahalanobis distance between firms in
a triplet in terms of their geographic location, size, labor productivity, and management practices. 2
This took place in November 2013, after the diagnostic phase (described below). Then within each
triplet, firms were randomly allocated to a control group and two treatment groups: an individual-
consulting treatment group and a group-consulting treatment group.

Table 1 provides some summary characteristics of the firms, along with their means by treatment
group status. The mean (median) firm has been in business for 24 (23.5) years, with only 20 percent
having been in business for fewer than 10 years. A key feature of the data is that firms are
heterogeneous in terms of size and product produced. Firms had a mean of 59 and median of 40
employees at the time of application, with 59 percent of the firms classified as small (10-50
workers), and the remainder as medium (51 or more workers, with the maximum being 310, and
the 10-90 range being from 13 to 119 workers. Mean sales were approximately USD2.7 million in
2013, with a 10th percentile of USD280,000 and 90th percentile of USD6.3 million showing the
large variation in firm size. These are almost all single plant firms, with the main subsectors being
metal products (60%) and plastic products (18%). The sample also includes firms making rubber
products (5%), chemical products such as injection molding (4%), electronic components (4%), as
well as firms working with leather, wood, and glass. 94 percent are tier 2 firms in the value chain,

2
 Location consisted of Cundinamarca and Valle regional dummies; firm size consisted of dummies for small (10 to
50 workers) and medium size (51 to 310 workers), as well as for the number of employees; management practices
consisted of indices for practices in human resources, production, logistics, marketing and finance; as well as for
seven individual management practices identified as priority areas in many diagnostic plans.

                                                         5
with 6 percent tier 1. 3 Forty-five percent of firms had exported in at least one month of 2013. Half
the firms are located in the Cundinamarca region, which includes Bogota, with the region of Valle,
which includes Cali, the next biggest.

Management practices were measured in terms of 141 individual practices, developed by the
Colombian National Productivity Center, classified into five areas: financial practices (made up of
29 individual practices), human resource practices (20), logistics practices (31), marketing
practices (22), and production practices (39). Each practice was scored on a five point scale, where
1 indicates that the practice is not used, and 5 that it is implemented and under control. Scores
were then aggregated and calculated as a percentage of the maximum possible score for that index.
Appendix 3 provides more details of the specific measures. At baseline average scores for these
practices range from 43 (human resources) to 51 (financing practices), indicating that firms have
significant room to improve on these practices. We refer to this as the Anexo K management
practices measure, with this terminology referring to the form used to collect this data.

Table 1 shows that while the random assignment was able to achieve balance on most baseline
variables, there are a couple of imbalances. These reflect the difficulty of balancing many variables
in a relatively small sample of heterogeneous firms. For example, the control group is more likely
to be in metal products than either treatment group, and starts with lower labor productivity. In our
analysis we use firm fixed effects or controls for the baseline value of interest to make the firms
more comparable and reduce the effect of this heterogeneity.

    2.3 External validity and comparison to Bloom Van Reenen Management Practices
In 2013, prior to the interventions, we commissioned the LSE survey team responsible for the
Bloom and Van Reenen (2007) World Management Surveys (WMS) to apply their methodology
to a random sample of 180 firms representative of the Colombian manufacturing sector, as well as
to a sub-sample of 72 companies from our sample with 40 or more employees. Appendix 7
summarizes this survey process, and provides three key results. First, the mean and distribution of
WMS management practices scores for our auto parts firms is similar to that of the overall
manufacturing sector in Colombia (2.38 versus 2.54). Second, Colombia’s average management
practices score shows firms are, on average, poorly managed in global terms, but similar to many


3
 Tier 1 means that the firm directly supplies the original equipment manufacturer (e.g. Ford, Suzuki, etc.), while tier
2 means the firm supplies a tier 1 supplier without supplying the vehicle manufacturer directly.

                                                          6
other developing countries. The average score is just below that of firms in India and just above
that in Kenya in the WMS. The auto parts sector in Colombia is thus a fairly typical sector for both
the country, and for developing countries as a whole, in terms of management practices.

A final use of this baseline WMS data is to compare the Anexo K management measure, our main
measure of management used in this paper, to the WMS. Appendix 5 shows that the two are
significantly correlated in the cross-section at baseline, with a correlation of 0.26 between the two
overall indices. The Anexo K has a stronger correlation (0.44) with the monitoring subcomponent
of the WMS, reflecting a particular emphasis on measurement and monitoring than on other
management practices.

    2.4 Macroeconomic context
The Colombian auto parts sector had sales grow at an annual average of 5.4 percent per year over
the 2002 to 2012 period leading up to our experiment (Reina et al, 2014). 4 At the start of our study,
imports averaged 68 percent of total sales in the sector, and were the main source of competition
for most firms in our study. However, the country was hit by a combination of external and internal
shocks starting in late 2013, which resulted in a large depreciation of the peso, from an average of
1930 COP to the USD in 2013 to approximately 3000 COP to the USD in each of 2015, 2016, and
2017. Domestic new vehicle sales fell from 326,000 units in 2014 to 238,000 units in 2017, a 27%
drop (BBVA Research). Export sales of auto parts fell 51 percent in dollar terms over the 2013-
2016 period, driven by weak economies in the main export destinations of Venezuela, Ecuador
and Brazil. The aggregate context is thus one of weakening overall demand for the sector, but
where the weakened currency increased competitiveness against imports. Real sales of domestic
production were then fairly flat over our study period, falling 0.12 percent between 2013 and
2016. 5

    3. The Intervention
The program was implemented by the National Productivity Center (Centro Nacional de
Productividad, CNP), which is a Colombian non-profit institution with the mandate to contribute
to increase productivity, innovation and competitiveness of Colombian businesses. CNP originally

4
  The report notes a nominal growth rate of 11.2 percent, which we deflate using the Colombian inflation rate taken
from the World Development Indicators.
5
  Export data and sales data from DANE and are for the CIIU sector 2930 “Manufacturing of parts, pieces, and
accessories for automobiles and their motors”.

                                                        7
was funded and supported by Japanese technical cooperation and has been the recipient of training
and in-house technical assistance to develop capabilities in implementing managerial consulting
services such as Lean, Six-Sigma, etc. During its 15 years of experience CNP has developed a
model of operation that has allowed it to support more than 4,000 Colombian companies in
different areas of management, innovation productivity and competitiveness. CNP used two types
of consultants for the intervention. The first were lead consultants, who were long-term employees
of CNP with more than 10 years experience, and experience managing teams. They led area
consultants, who had to have had at least 5 years experience, and specialized in a particular focus
area such as logistics or finance. The direct cost of implementation of this program was
approximately US$2.4 million.

    3.1 Diagnostic phase
All firms, including the control, received a diagnostic as the first phase. This was implemented on
a rolling basis between June and October 2013. The diagnostic was carried out by a team of 6
consultants, consisting of the leader and five specialists, one for each area (Logistics, Human
Resources, Finance, Marketing and Sales, and Production). The diagnostic began with an opening
meeting with top and middle management, and then each area specialists would have five days of
meetings with the responsible manager in the firm for their area to evaluate the 141 individual
management practices that form Anexo K. This forms the baseline management practices measure.
The consultants would also examine the firms key performance indicators for the last three years
(to the extent records existed), and work with the leader to finish with a report (improvement plan)
that analyzed managerial practices for each area, the key performance indicators for each area, and
recommended practices to prioritize. This diagnostic phase lasted 2 full-time weeks and cost
8,426,550 COP (US$3,553) per firm. 6

The diagnostic identified priority practices to be improved together with the firm. These practices
were intended to be ones which required minimal capital investment, and which could be
implemented reasonably quickly and were expected to lead to relatively rapid improvements in the
firm. While these priority practices were individualized by firm, some of the priority areas for

6
  We use the average exchange rate over 2014-15 of 2372 COP = 1 USD for all currency conversions in this paper.
Cost numbers are implementation costs, and exclude initial costs of intervention design, and additional costs of data
collection for the impact evaluation. To the extent this data collection process also helps firms improve management,
it could be considered another part of the intervention, and averaged a further US$20,000 per firm (including the
control group).

                                                         8
improvement in each of the five areas were common to many firms. These including implementing
master budgets across areas, improving systems for tracking costs, defining explicitly the strategic
objectives of each position in the plant, implementing plans to improve the skills of people in
management roles, lining up sales and marketing plans with business strategy, and analyzing
machine downtime and quality problems daily across different supervisors.

    3.2 Individual Consulting Treatment
Assignment to treatment took place after the diagnostic phase, in November 2013. Firms assigned
to the individual consulting treatment group then received individual support for a period of 6
months, in the time window between March and November 2014. They were assigned a team of
five consultants, one for each of the five processes (logistics, human resources, finance, marketing
and sales, and production), along with a leader.

The intervention began with an opening meeting that brought together the leaders within the firm
responsible for each of these five processes, along with the six consultants to define the different
roles and responsibilities and set out a work plan. Then each of the five area consultants would
visit the firms and provide 20 hours of training to the person in the firm in charge of their respective
area. This would involve a theoretical part with the goal of familiarizing the firm’s management
with modern management concepts and methods, complimented with practical exercises to apply
these concepts to their firm. This was then followed by individual consulting to help the firms
implement the improvement plan developed during the diagnostic phase. Every area would be
covered by different consultants and with different schedules, but would typically involve weekly
meetings for four hours per visit, spread over three to five months. Once per month, the team would
meet with the whole firm’s management to discuss improvements and re-define priorities and next
actions. The total consultant time was 500 hours, consisting of 100 hours of providing training,
and then approximately 100 4-hour sessions per firm of individual consulting. The cost of this
individual intervention was US$28,950 per firm receiving treatment.

Based on our discussions with firms and own observations of the process, the implementation
appears to have involved an emphasis on teaching firms how to measure and monitor key
performance indicators, and on providing firms with the set of tools needed to better understand
how their business is performing. It appears that there was less direct implementation from the
consultants. For example, the consultants might go through the financial and performance data

                                                   9
from the firm and suggest the need for the firm to consider new product lines or develop new
markets abroad, but seldom make more direct recommendations (e.g. you should try exporting
product X to Ecuador, or you should start using this production technology).

    3.3 Group Consulting Treatment
The idea behind the group consulting treatment was to test whether the same gains in management
improvements could be achieved more efficiently through working with small groups at a time,
motivated in part through the way agricultural extension services are often implemented. The
group treatment arm aimed to lower costs in two key ways. First, by working with multiple firms
at once, and potentially having them also learn from one another, each consultant’s time was spread
over more firms. Secondly, rather than consultants having to travel to the firms, most of the
meetings took place in central meeting places such as conference rooms, cutting down on
consultant travel time.

Groups were formed of 3 to 8 firms located in the same region, such that members are not direct
competitors to one another, but are instead producing complementary products with similar
management problems. 7 These groups were formed after the randomization, in November 2013.
However, unfortunately a different government budgetary entity was designated to pay for this
treatment arm than that was paying for the individual treatment. This entity significantly delayed
the payment, meaning that the group intervention was unable to start until over a year after the
individual intervention, running six months from September 2015 to May 2016 (with different
groups starting and stopping at different times, and a break over the Christmas period).

Leaders from the firms in a group signed an agreement to work together and help each other
improve. Like the individual treatment, the group treatment began with training classes that
covered theoretical aspects of management. The difference is that these classes were delivered to
the group in a classroom setting, instead of one-on-one in the firm. Each firm would send the staff
in charge of a particular area or production process along to that training session. For example,
when financial training was performed, firms would send the people responsible for the firm’s




7
 The composed groups are 1 group of 8 firms, 4 groups of 7 firms, 2 groups of 6 firms, 1 group of 4 firms, and 1
group of 3 firms.

                                                        10
financial components to the training. These sessions lasted for a total of 40 hours per group,
including a session on the topic of cooperation among firms.

This was then followed by group consulting sessions, designed to help firms implement the
management improvements in their diagnostics and action plans. In any given week, a group would
discuss two areas, having one or two meetings focusing on a single area (for a maximum of four
meetings a week per group). Only management with responsibilities over the area being discussed
would participate in the meetings. The same two areas would be covered at the same time over
about eight weeks. After a break over Christmas, the remaining three areas would be covered the
same way. The order in which areas were discussed was not the same for each group.

The group meetings would focus on the implementation of the actions agreed in the action plans
of each company. Within each group, each firm had to work on the improvement of the topic that
had been prioritized for a number of firms in the group, unless the firm excelled already in that
topic. Therefore, each firm would still be focused on the issues that had been prioritized in the
Improvement Plan but its Action Plan would be updated to include relevant issues taken from the
other firms’ Improvement Plans. If a firm already excelled in topics that were central in other
firms’ Improvement Plans, it would be used as an example and its experience would be discussed
in detail.

In the individual intervention, consultants were at the firm for all visits, so could directly see
implementation attempts and problems and adjust their recommendations accordingly. In contrast,
during the group intervention, it was more difficult to directly verify changes being made in
logistics and production. This was solved by requiring firms to provide evidence of what they had
implemented in the form of bringing photos to the group meetings. In addition, firms in the group
treatment still had a monthly one-on-one meeting with senior management which took place at the
plant, and one hour at the end of each meeting was used to visit the plant and review improvements.

This process enabled the group intervention to be significantly cheaper than the individual
intervention, with an average cost of US$10,500 per firm receiving treatment. Firms received 408
hours of consultant time each, consisting of 40 hours group training, and 92 4-hour group sessions.




                                                11
    4. Take-up, Data sources, and Attrition
4.1 Take-up
The take-up rate for the individual intervention was 86.8%, with all 46 of the 53 firms which
started this intervention completing it. The longer delay until beginning the group intervention
reduced the take-up rate for this intervention, with 40 of the 53 firms in this group (75.4%) starting
the intervention, and 36 firms (67.9%) completing it. Table A4.1 shows the baseline characteristics
of those who completed the intervention are not statistically different from those who dropped out,
with the one exception being that dropout from the individual treatment was more common in the
Antioquia region than elsewhere. The main reasons given for drop-out from both groups were lack
of owner time to participate, and lack of continuity in the program (especially for the group
treatment).

4.2 Data Sources, Measurement of Key Outcomes, and Attrition
Baseline data were collected from the application form and diagnostic phase and cover firm
characteristics in 2013. We then use three types of follow-up data, discussed in detail in Appendix
3. The first is data on the management practices in the firm. Our main measure is the Anexo K
management score, which is the average of the 141 different practices detailed in Appendix 3. This
was collected by CNP during in-person visits to the firms. It was measured during the diagnostic
for 156 of the 159 firms (3 of the firms had components missing), monthly from the treatment
groups during the time of their interventions, as well as annually in 2014 and 2015 for the
individual and control groups, and in 2015 and 2016 for the group treatment. The second type of
data consists of key performance indicators (KPIs) from the firms, which were collected during
in-person visits. We use this to measure impacts on firm sales and employment, as well as on
defect rates, inventory levels, and energy usage. The final source of data comes from linking firms
to administrative data sources on employment and exports.

Obtaining data from the firms was difficult and complicated by several factors. First, a
consequence of poor management is that firms did not routinely and consistently keep records of
some KPIs. Firms would change the units of measurement at times from pesos to physical units,
and the type of physical unit they used (e.g. from number of items to kilograms). 8 Second, data


8
 These changes in units also occurred because firms would produce different products at different times, depending
on what orders they received.

                                                        12
collection in the firms was conducted during on-site visits by CNP. We hired Innovations for
Poverty Action to provide an independent check on this data, and to help in extracting data from
the firms. But CNP had breaks in its contracts, which meant data collection halted for months at a
time, and they had a long list of KPIs they wanted from firms, which increased the burden on firms
of reporting. The result was that some firms dropped out of providing follow-up information, even
after repeated follow-up visits seeking just a few key variables. Third, ten of the firms closed
during the course of the study (4 control, 3 individual treatment, and 3 group treatment, p-value of
equality of death rates 0.911).

These three factors mean that we only have both employment and sales data through to December
2017 for 105 firms (69% of the sample), comprising 33 control firms, 37 individual treatment
firms, and 35 group treatment firms (p-value of equality of attrition rates is 0.744). Table A4.2
compares the baseline characteristics of these firms to those that attrit, and shows that we cannot
reject equality of means. Moreover, balance on baseline observables for those firms which do
report is similar to our balance on the overall sample. Nevertheless, we use firm fixed effects in
our estimation of impacts on firm outcomes to control further for any time-invariant differences
among firms. In addition, for the employment outcome, we use administrative data to boost the
sample size of firms for which we have post-intervention data.

   5. Impact on Management Practices
The interventions aimed to improve specific management practices covered under the 141
practices that comprise Anexo K. These practices were measured for all firms during the diagnostic
phase in 2013, and then measured monthly during the implementation periods of the individual
and group interventions, and again one-year post-intervention. The control group had these
measured towards the end of the individual treatment intervention, and again at the time of the
one-year follow-up.

Figure 1 shows the trajectory of impacts on management practices for the overall Anexo K
management score, and for the scores under the five separate areas of finances, human resources,
logistics, marketing and sales, and production practices. We see that the individual treatment group
sharply improves practices overall, and in all five areas, during the implementation phase, while
the control group improves by much less. The group treatment likewise sharply improves practices
for this treatment group during the implementation phase, and end up with practices at or above

                                                13
where the individual treatment group ended. This improvement in management then persists for
the following year for both groups. Figure 2 compares the distributions of management practices
at baseline, and at the last follow-up, for the three groups. Kolmogorov-Smirnov tests show we
cannot reject equality of distributions at baseline, but at the endline, both the individual and group
treatments are significantly different from the control group (p-values 0.004 and 0.003
respectively), but are not significantly different from each other (p-value 0.643).

For our regression analysis, we therefore classify our data into three periods: baseline, during the
intervention (measured at the end of implementation for the individual and group treatments, and
the first follow-up for the control group), and post-intervention (measured at the one-year follow-
up post-intervention for the individual and group treatments, and the second follow-up for the
control group). This time-shifts the data for the group treatment to account for the delay in
implementation, which meant that its follow-ups took place a year later than the other two groups.
We then estimate the following Ancova regression (McKenzie, 2012) for t=2 (during) and t=3
(post-intervention) that controls for the randomization triplets and the baseline level of
management practices, and allows the impacts to vary during the intervention from post-
intervention:

               ������������������������������������������������������������������������������������ ,������������ = ������������ + ������������1 ������������������������������������������������������������������������������������������������������������������������������������ ∗ ������������������������������������������������������������������������������������,������������ + ������������2 ������������������������������������������������������������������������������������������������������������������������������������ ∗ ������������������������������������������������������������ ,������������

         ������������1 ������������������������������������������������������������������������ ∗ ������������������������������������������������������������������������������������,������������ + ������������������������2 ������������������������������������������������������������������������ ∗ ������������������������������������������������������������ ,������������ + ∑53
                                                                                                                                                 ������������=1 ������������������������ 1(������������ ∈ ������������) + ������������1(������������ = 3)


                                                                          +������������������������������������������������������������������������������������������������ ,1 + ������������������������,������������                                 (1)

Where 1(������������ ∈ ������������) is a dummy for firm i being in randomization triplet g, 1(������������ = 3) is a time period
fixed effect, and the standard errors are clustered at the firm level.

Table 2 presents the estimated treatment effects on these management practices. Panel A uses the
unbalanced panel, which includes firms whose practices were measured in only one of the two
follow-up periods, and Panel B the balanced panel of firms measured in both follow-ups. Four key
results are evident. First, we see the immediate treatment impacts seen in Figure 1 are statistically
significant at the 1 percent levels for both treatments. Second, these treatments persist for at least
one year post-intervention. The estimated effect size is between 8 and 10 percentage points,
relative to the control group implementing 56 percent of the practices by 2015. Second, the impact
persists. Third, the individual and group treatments yield impacts that are similar to one another
                                                                                                                     14
in magnitude, and we cannot reject equality of treatment effects for the overall index, or for any
of the five areas, in the post-intervention period.

How large an effect is this improvement of 8 to 10 percentage points in management practices? It
is only approximately one-third the size of the improvement of 26 percentage points found by
Bloom et al. (2013) from their management intervention in India, but approximately twice the size
of the typical improvement found in standard short business training courses given to smaller firms
(McKenzie and Woodruff, 2016).

5.1 Which Practices Improved?
The improvement in management practices is broad, occurring in Figure 1 and Table 2 across all
five areas with reasonably similar magnitudes. Table A4.1 looks at the sub-index and individual
practice level. The individual treatment has a positive and statistically significant impact (at the
5% level) on 23 out of the 35 sub-indices (66%), and 67 out of the 141 individual practices (48%),
while the group treatment has a positive and statistically significant impact (at the 5% level) on 20
out of the 35 sub-indices (57%), and 73 out of the 141 individual practices (52%). Table A4.2
examines which practices have had the largest impacts. These are mainly practices concerning
defining strategic goals and objectives, setting up master budgets, and monitoring key performance
indicators. The smallest number of improvements are seen in human resource practices and
logistics practices.

Figure 3 plots the estimated treatment effects practice by practice for the individual and group
treatments. The correlation is 0.71, showing that the two different approaches to improving
management not only resulted in a similar aggregate improvement in management, but also to a
similar mix of practices improved. The main area of difference occurs with several production
practices related to preventative maintenance, which improved more with the group treatment than
the individual treatment.

Why didn’t firms change more of their management practices? Qualitative interviews suggest
several explanations. A first one is delays in implementation, which caused some firms to lose
interest. The consultants pointed to problems getting family-run businesses to focus on
improvements, and that a lack of a data culture prevents firms from recognizing their flaws. For
this reason, much of their initial focus was on getting firms to collect KPIs and to have meetings
to identify problems, which, in our opinion, may have come at the expense of “quick wins” in

                                                  15
which changes in particular practices could be seen by firms to lead quickly to noticeable
improvements in business outcomes.

We also asked the consultants to go through a flowchart to explain why key practices identified in
the diagnostic were not then implemented. This was done in early 2014 for approximately two
practices per firm in 87 firms in the individual and control groups, for a total of 151 practices.
Firms had heard of the practices, but were rated low in their knowledge about the practices, with
72% of firms being scored as a 1 or 2 out of 5 on knowledge of how to implement the practice.
The consultants believed that external factors (<1%) and firm human and financial resources were
not constraints to implementation (only 6%). In contrast, they thought that the firm owner
mistakenly did not consider the practices to be profitable in 58% of cases. This is consistent with
the findings of Bloom et al. (2013) that the main reasons for practices not being implemented were
lack of knowledge about the practices, and firm owners not thinking the practices were worth
implementing.

5.2 Robustness Checks of the Management Improvement
We consider the robustness of the improvement in management practices to different weighting
schemes, to sample attrition, and to alternative measurement tools.

Robustness to weights: Our measures of management practices are averages of the different
practices. The Anexo K overall index is an average of the 35 sub-indices, and ranges with 20
(indicating scores of 1 for every individual practice) to 100 (indicating scores of 5 for every
individual practice). With any aggregate index, there is always a question as to the appropriate
choice of weights, and of how sensitive the results are to alternative weighting schemes.

Table 3 examines robustness to different choices of how to aggregate the 141 practices. Column 1
shows our aggregate index from Table 2. Columns 2 through 5 then consider four alternative
weighting schemes. Column 2 uses the first principal component of the 141 practices; Columns 3
and 4 use lasso regression to identify the sub-set of practices which best predicts baseline log
employment and labor productivity respectively, and then post-lasso regression to form the
weights. This chooses 19 practices to weight according to their predictive power for employment,
and 14 to weight for their predictive power for labor productivity. Finally, column 5 uses the subset
of firms for which we also have baseline data from the World Management Survey, and uses lasso


                                                 16
to choose weights that best predict the baseline WMS score, which selects only 6 practices. 9 The
coefficients cannot be directly compared across columns in terms of magnitudes, but can be
considered relative to the control group standard deviation. The estimated treatment effects are 0.8
to 0.9 standard deviations (s.d.) when using our aggregate index, 0.9 to 1.0 s.d. when using
principal components, 0.6 s.d. when weighting to predict employment, 0.8 s.d. when weighting to
predict labor productivity, and 0.7 to 1.1 s.d. when weighting to predict the WMS score. Thus
regardless of the choice of weights, we find the treatment impacts are positive, similar in
magnitude, and statistically significant.

Robustness to attrition: Appendix 6 examines robustness of our results to attrition of the
management practice data. It shows that the firms for which we have endline management practice
data have similar baseline management practices to those firms which attrit, and that this also holds
separately by treatment status. It provides Lee bounds for the impact on management practices.
These bounds are relatively narrow and positive, and statistically significant, even at the lower
bound when measuring the impact during the intervention. However, since control group attrition
is higher by the endline, the bounds are wider for the post-intervention period, and the lower bound
for the treatment effects are positive, but not statistically significant for either treatment. However,
for this lower bound to hold, it would need to be the case that the best managed control firms were
the ones that attrited. We show that this is not the case in terms of either baseline management
practices, nor management practices as measured in the first follow-up. Coupled with our use of a
balanced panel and randomization triplet fixed effects as controls (which identifies treatment by
comparing firms with similar baseline characteristics), we believe survey attrition is extremely
unlikely to be driving the positive impacts found on management.

Robustness to alternative measurement of management: Appendix 7 discusses our efforts to also
measure changes in management using the World Management Survey (WMS) and Management
and Organizational Practices Survey (MOPS). These measures are at a more general level than the
Anexo K measures, and were designed for medium-sized firms of 50 or more employees, whereas
our sample includes firms with as low as 10 workers. A combination of budget constraints and
attrition mean that we only have this data for 70 of the 159 firms (WMS), and 95 firms (MOPS).
We show that our Anexo K measures are correlated with the WMS and MOPS in the cross-section,

9
    The smaller number of practices chosen is likely because of the much smaller sample for which the WMS is available.

                                                           17
but not in the panel, and that our WMS and MOPS measures appear to be noisily measured, with
less predictive power for business outcomes than Anexo K. Our measured treatment impacts on
these two measures are smaller in magnitude and not statistically significant. The improvement in
management we obtain is thus not able to be detected using these alternative management
instruments.

5.3 Correlated Practice Changes Within the Group Treatment
The motivation for the group intervention suggested two possible ways in which working with
firms in groups could foster improvements in management practices. A first possibility is one of
coordinated experimentation and learning, whereby group members try to improve the same
practice together, so are able to motivate and learn from one another. A second possibility is one
of existing knowledge transfer, whereby group members are able to learn how to implement a
practice from other group members who were already implementing it well to begin with. We
explore the extent to which these two mechanisms are occurring in our sample by running the
following regression for the change in management practice j in firm i assigned to group g:

                                                             �������������������
Δ������������������������������������������������������������������������������������������������������������,������������,������������ = ������������ + ������������Δ������������������������������������������������������������������������������������������������������������,−������������,������������ + ������������ max ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������,−������������,������������ + ������������������������,������������,������������   (2)
                                                                                                                          −������������,������������


      �������������������
Where Δ������������������������������������������������������������������������������������������������������������,−������������,������������ denotes the mean change in practice j for other members in i’s group, and
max ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������,−������������,������������ denotes the maximum level of practice j at baseline among other
 −������������,������������

members in i’s group. We stack the 141 individual practices, and then cluster the standard errors
at the firm level.

Table 4 reports the results of estimating equation (2). Column 1 shows that there is a significant
positive association between the change in a practice for a firm and the mean change made by
other firms in their group. Column 2 shows that, in contrast, there is no significant relationship
with the highest baseline level of practices observed amongst other firms in the group. Column 3
controls for both factors together, and confirms the significant and positive association with the
average change made by others in the group. A one-unit change (on a 5-point scale) in the practice
by others in the group is associated by a 0.1unit change by the firm. This suggests some
coordinated experimentation and learning is taking place within groups, but that group members
are not taking existing best practices from other group members across into their own firms.



                                                                                                                                   18
   6. Impacts on Firm Outcomes
   6.1 Impact on Employment
Data on employment comes from two sources. The first is data from the PILA (Unified Register
of Contributions), which is the national information system used by firms to file the mandatory
contributions to health, pensions, and disability insurance paid for workers. The second is direct
from firm records, collected during our in-person visits to firms. Appendix 8 discusses the two
datasets and shows the correlation between the two measures is 0.90. However, both datasets have
limitations in terms of coverage: not all firms were able to be matched to the PILA data during
requests made to two government agencies, and the data ends February 2017. Firms which refused
to supply information in visits made in late 2017/early 2018 do not have data from firm records.
Combining the two data series gives us our most comprehensive employment measure, which
includes monthly data on 135 firms from January 2013 through February 2017, and for 108 firms
through to December 2017.

Using this combined data, the left panel of Figure 4 plots the trajectory of mean employment by
treatment group, demeaned by the 2013 group means. We see that the individual treatment group
reduces employment relative to the control and treatment groups at the start of the individual
intervention, and this persists over time. In contrast, the control and group treatments track one
another closely until the group intervention is underway. Then the control group reduces
employment and the group treatment does not, so that a persistent gap in employment opens up.
The right panel of Figure 4 shows the distribution of changes in employment between the months
of February 2013 and February 2017 by treatment status. The control and individual interventions
have changes centered on zero, although differences in arise at the top of the distribution, where
the control distribution has a small mass of firms that increased employment by 50 or more
workers. In contrast, the group treatment has a peak at a positive change in employment, and more
of the mass with positive changes.

Given the heterogeneity amongst firms in initial employment size, and the differences in coverage
of the different data sources, we use firm fixed effects in estimating the treatment impacts. We
estimate the following equation for firm i at time t:

         ������������������������������������������������������������������������������������������������������������������������������������,������������ = ������������������������ + ������������1 ������������������������������������������������������������������������������������������������������������������������������������ ∗ ������������������������������������������������������������������������������������,������������ + ������������2 ������������������������������������������������������������������������������������������������������������������������������������ ∗ ������������������������������������������������������������ ,������������



                                                                                                                         19
       ������������1 ������������������������������������������������������������������������ ∗ ������������������������������������������������������������������������������������,������������ + ������������������������2 ������������������������������������������������������������������������ ∗ ������������������������������������������������������������ ,������������ + ∑������������
                                                                                                                                               ������������=1 ������������������������ 1(������������ = ������������) + ������������������������,������������   (3)

Where the ������������������������ are firm fixed effects, During and Post indicate the periods during the individual or
group interventions, and after these interventions respectively, 1(s=t) are time fixed effects,
Individual and Group denote assignment to the individual and group treatment status respectively,
and the standard errors ������������������������,������������ are clustered at the firm level. The randomization triplets are subsumed
by the firm fixed effects here. We consider both levels and logs of employment as outcomes. We
estimate equation (3) for our full set of data, as well as for balanced panels.

Table 5 presents the treatment impacts on employment. The first four columns show the impacts
using just the PILA data, columns 5 through 8 using data from firm records, and the last four
columns using our most comprehensive measure, which combines the two datasets. The estimated
impact of the group treatment is positive in all specifications, and corresponds to an increase of 3
to 7 workers post-intervention, or of 8 to 15 percent using log employment. The point estimates
are smallest and not statistically significant when using just the PILA sample, and are statistically
significant using 6 out of 8 specifications for the firm data and combined data. Using the firm data
alone, we can reject that the group intervention has the same impact post-intervention as the
individual intervention when using levels, but not when we use logs, and we cannot reject equality
of treatment effects using the combined dataset. In contrast to the group treatment, we get a mix
of statistically insignificant negative and positive coefficients for the individual treatment. Using
the combined sample and unbalanced panel, we find a positive impact of the individual treatment
on employment at the 10 percent level, but this is not robust to using the balanced panel or to using
log employment.

6.2 Impact on Sales
Monthly sales data were collected directly from firm record books, and are converted into millions
of real (December 2017) Colombian pesos using the Producer Price Index. We have some months
of post-baseline sales data for 145 firms, and data on 99 firms for the balanced panel of all 60
months between January 2013 and December 2017.

Figure 5 uses the balanced panel and plots the trajectory of mean real sales by treatment group,
demeaned by the 2013 treatment group means (left panel). We see the means of the three treatment
groups track each other closely until the group intervention starts. Firms in the group treatment


                                                                                                              20
then see mean sales increase relative to the other two groups, with this gap widest in the first six
months after treatment, and then closing. The right panel shows the distribution of changes in
annual sales between the year 2013 and year 2017. We see the control and individual treatment
groups have similar distributions of change (Kolmogorov-Smirnov test of equality p-value 0.855),
while we can reject equality of the individual and group distributions (p-value 0.032). The group
intervention has more variation in the change of sales, with a few firms experiencing a drop in
sales, and more firms also experiencing growth in sales than occurs in the other two groups.

Table 6 estimates equation (3) for the level and log of sales, using firm fixed effects to account for
potential baseline differences across treatments that can arise from sample attrition, firm
heterogeneity, and the sample size. Columns 1 and 4 use the unbalanced panel, and columns 2, 3,
and 5 the balanced panel. The group treatment has positive treatment effects on sales, of 63-71
million COP per month (USD $26,500-$29,900) in levels, or 8 to 9 percent in log terms. This
treatment effect is not statistically significant compared to the control group (lowest p-value is
0.12 in column 1), but is statistically different from the individual treatment level effect post-
intervention. The individual treatment effects have negative point estimates in level terms, and a
point estimate close to zero post-intervention for the balanced panel for log sales.

6.3 Channels of Production Impact
The results on employment and sales suggest that the group intervention may have increased the
size of the firm, causing it to employ more people and sell more. In Table 7 we examine different
channels through which this increase may have occurred. Column 1 considers the defect rate.
Bloom et al. (2013) found quality improvements to be one of the first signs of improvement from
better management in their Indian study. We only have defect data in 2017 for 78 of the firms in
the study, due to many firms not keeping consistent records on defects. A first point to note is that
the defect rates are low (which is one reason some firms do not record them): the control group
has a mean defect rate of 0.025 and median rate of 0.007 in 2017, which compares to much higher
defect rates in India (5 percent of output was scrapped, after mending of defects was done). The
result is that many of the auto parts firms do not have much scope to reduce defects, and we see
treatment effects that are all very close to zero and statistically insignificant.

Columns 2 and 3 consider monthly inventories. In India, Bloom et al. (2013) found firms had
excess inventory levels, which they reduced when management improved. Large stockpiles of

                                                   21
inventories are less common in the auto parts sector, with some firms doing job work and
producing upon request. Data are only available for half the sample of firms, due to some firms
not keeping records, or changing the units in which they record inventories over time. The control
mean level of inventories is equal in value to1.4 months of mean sales. We see no significant
change in inventories, with the sign of the coefficients changing between level and log
specifications. However, the confidence intervals are wide, and include the 21 percent reduction
in inventories found in Bloom et al. (2013), as well as increases in inventories of more than this
magnitude.

Columns 4 and 5 consider energy costs, which are another input into producing more. The data
here are consistent with the group treatment firms getting larger by using more inputs to produce
and sell more. They use more energy both during and post-intervention, with this increase
statistically significant when measured in levels during the intervention. The log results suggest
firms are using 17 percent more energy, although this is not statistically significant. In contrast,
the pattern is more mixed for the individual treatment group, which has a statistically insignificant
increase in energy costs when measured in levels, but statistically insignificant decrease when
measured in logs.

Column 6 examines whether the improvement in management has resulted in higher labor
productivity (measured as real sales per worker). The percent increase in employment for group
treated firms is slightly higher than the percent increase in sales, and the result is a small, and
statistically insignificant, drop in labor productivity (3 percent). The individual treatment also has
a small and negative point estimate on labor productivity. These results contrast with the 17 percent
improvement in productivity found in India by Bloom et al. (2013). However, since the
improvement in management in our experiment is only one-third that found in India, a proportional
improvement in productivity would be 5.7 percent, which is within the confidence intervals of
approximately [-13%, +8%] for the productivity effect found here for each treatment.

Finally, columns 6 and 7 examine the extent to which any increase in sales came through exports.
We use administrative data on exports, which have the advantage of being available for all firms
and all months. Sixty percent of firms exported in at least one month between January 2013 and
December, but on average, only 21 percent of firms export in a given month. As a result, most
sales are domestic: exports are 0 percent of monthly sales for the median firm, and 3.8 percent for

                                                 22
the mean; and even conditional on exporting in a month, exports for the median exporter are only
14 percent of that month’s sales. Column 6 shows that there a small, negative, and insignificant
effect on the extensive margin of whether firms export at all in a given month. Column 7 shows
negative and statistically insignificant impacts on the amount exported, conditional on exporting.
Thus any gains in sales have come through increased domestic sales, not through more exporting.

6.4 Comparison to Policymaker expectations
In June 2014, we elicited expectations about the program’s impact on employment and
productivity from 15 policymakers drawn from the Ministry of Planning (DNP), Ministry of
Commerce and Tourism, SENA, and Program of Productive Transformation (PTP). The expected
mean (median) treatment effect for the individual treatment was 5.7% (3%) for employment and
16.3% (10%) for productivity; while for the group treatment the expected mean (median) treatment
effect was 3.3% (5%) for employment, and 7.3% (5%) for productivity. Our estimated treatment
effects for the group treatment are similar in magnitude to these estimates, while the individual
treatment has under-performed relative to expectations, especially on productivity. Moreover, the
policymakers thought the individual treatment would have a larger impact, which is the opposite
of what we find. We also asked what size impacts they would require to consider the program a
success that could be scaled at the national level: the mean response was 6% for employment for
both programs, and 24% for the individual program on productivity, and 13% for the group
program. The estimated impact of the group intervention on employment is thus large enough to
be considered a success, whereas neither program has enough of an impact on productivity to be
considered a success.

6.5 Cost-Benefit
Both the individual and group treatments succeeded to a similar magnitude in improving the set of
management practices measured by the Anexo K. The impacts on firm outcomes are less precisely
measured, but show increases in firm size for the group treatment, that in some specifications is
statistically different from that of the individual treatment. The group treatment cost USD10,500
per firm for the intervention stage, compared to USD28,950 per firm for the individual treatment.
The group treatment therefore clearly dominates the individual treatment on a cost-benefit basis.

It is more difficult to measure whether the group treatment pays for itself, given the uncertainty
associated with the sales impact, and that we lack firm profitability data over time. Baseline data
suggest that profit margins are 11 percent of sales for the median firm. If we take the estimated
                                                23
group treatment effect on sales of USD $26,500-$29,900 per month, and multiply this by the profit
rate, this gives a suggested point estimate of USD$3,000 per month in profits, in which case the
group treatment would pay for itself within 4 months. If the sales effect is one standard error below
the point estimate, then the estimated profit effect would be approximately $750 per month, and it
would pay for itself within 14 months. Since 84 percent of the distribution of treatment effects are
at least this high, this suggests the group treatment would pay for itself in just over a year, and
within the period over which we measure post-intervention outcomes.

These cost-benefit calculations would look less promising from a government policy perspective
if the gains to treated firms came from them capturing sales from control firms or from other firms
outside of the experimental sample. At least within our experimental sample, firms specialize in
different products (which is what allowed groups to be formed easily without having firms who
are competitors), suggesting that internal validity of our estimates should not be invalidated by
such spillovers. Moreover, as noted in our discussion of the setting, the sector is one where the
main competitors to most firms are imports, which became more expensive with the depreciation
of the peso. It therefore seems likely that any sales gains achieved by the group treatment would
have mostly come from taking business away from imports.

6.6 Why did the group treatment do better than the individual?

The group and individual treatments led to similar improvements in management practices, yet we
only find evidence of improvements in firm outcomes for the group treatment. What explains this
difference? A first possibility is that the two treatments did have similar effects, and it is just small
sample sizes coupled with firm heterogeneity that prevents us from detecting this effect in the
individual treatment group. Although the point estimates show larger impacts from the group
treatment, we can only weakly reject equality of the treatment effects of the two interventions
when looking at some specifications in levels of employment or sales, whereas when using log
outcomes or looking at production channels, we cannot reject equality of impacts.

A second possibility is that the group treatment may have a larger impact because it either provides
a way for the improvements in management to persist longer, or because it delivers additional
benefits to firms beyond the improvements they obtain in management practices. To investigate
this possibility, group firms were asked approximately one year after the intervention whether they
still met with other group members, and what the main benefit of meeting in a group had been.

                                                   24
None of the firms continued formally meeting together as a group, but 54 percent said they still
communicate occasionally with other group members. The main benefit they saw of meeting in a
group was to interchange experiences, noting the value of seeing other firms facing similar
problems, and how others had solved these problems. Only four firms said they saw a possibility
of using the group to find a supplier or customer, with only one giving an example of this actually
happening, saying it was short-lived. This suggests that if the group treatment is having an
additional effect, it is more through providing advice and specific solutions to problems firms face
(as in Brooks et al, 2017) rather than through direct business relationships.

   7. Conclusions
The experiment of Bloom et al. (2013) provided a proof-of-concept that poor management could
be improved. But moving from a pilot demonstration to a scalable program of management
improvement requires lowering the cost of delivery and testing whether such a program can be
locally implemented when subject to the constraints imposed by government bureaucracy. As is
common with other social programs (Rossi 1987, Vivalt 2017), impacts on management are
smaller when delivered by program run by a government at scale than under a small researcher
pilot. Yet, both the individual and group treatments were able to improve management practices
by 8 to 10 percentage points, with this resulting in an increase in firm size under the group
treatment at least. As a result, the group treatment model pioneered here clearly dominates the
individual consulting model on a cost-benefit basis, and offers a promising approach to scaling
management.

As with firms, good management also matters for the public sector (Rasul and Rogger, 2018), and
there were several challenges to implementation. These included delays in contracts which caused
challenges for data collection, and delays in implementation which likely reduced the effectiveness
of the programs implemented. It is also possible that contracting only a single organization to
implement the intervention may have led to hold-up problems and removed the performance
incentives that competition among consulting firms could have provided.            A Government
contemplating scaling up management support programs in the least costly way therefore should
consider the group extension approach, but pay careful attention to the quality of their own
management in doing so.



                                                 25
References

Banerjee, Abhijit, Rukmini Banerji, James Berry, Esther Duflo, Harini Kannan, Shobhini Mukerji,
        Marc Shotland and Michael Walton (2017) “From Proof of Concept to Scaleable Policies:
        Challenges and Solutions, with an Application”, Journal of Economic Perspectives 31(4):
        73-102.
BBVA Research (2018) “Situación Automotriz 2018 Colombia”, BBVA Research, March.
Bloom, Nicholas, and John Van Reenen (2007). "Measuring and Explaining Management
        Practices across Firms and Countries" Quarterly Journal of Economics, 122(4), 1341-
        1408.
Bloom, Nicholas, Benn Eifert, Aprajit Mahajan, David McKenzie, and John Roberts (2013). "Does
        Management Matter? Evidence from India" Quarterly Journal of Economics, 128(1), 1-
        51.
Bloom, Nicholas, Aprajit Mahajan, David McKenzie, and John Roberts (2018) “Do Management
        Interventions Last? Evidence from India”, World Bank Policy Research Working Paper
        no.8339.
Bloom, Nicholas, Raffaella Sadun, and John Van Reenen (2016) “Management as a Technology”.
        Stanford: Mimeo.
Bloom, Nicholas, Erik Brynjolfsson, Lucia Foster, Ron Jarmin, Megha Patnaik, Itay Saporta-
        Eksten, and John Van Reenen (2018) “What Drives Differences in Management
        Practices?”, Mimeo. Stanford.
Bold, Tessa, Mwangi Kimenyi, Germano Mwabu, Alice Ng'ang'a and Justin Sandefur (2018)
        “Experimental Evidence on Scaling Up Education Reforms in Kenya”, Journal of Public
        Economics, forthcoming.
Brooks, Wyatt, Kevin Donovan and Terence Johnson (2017) “Mentors or Teachers?
        Microenterprise Training in Kenya”, American Economic Journal: Applied Economics,
        forthcoming.
Bruhn, Miriam, Dean Karlan, and Antoinette Schoar (2018) “The Impact of Consulting Services
        on Small and Medium Enterprises: Evidence from a Randomized Trial in Mexico", Journal
        of Political Economy, 126(2): 635-87.
Cai, Jing and Adam Szeidl (2018) “Interfirm Relationships and Firm Performance”, Quarterly
        Journal of Economics 133(3): 1229-1282.
Chatterji, Aaron, Solene Delecourt, Sharique Hasan and Rembrand Koning (2018) “Learning to
        Manage: A Field Experiment in the Indian Start-up Sector”, Harvard Business School
        Working Paper no. 17-100.
Dalton, Patricio, Julius Rüschenpöhler, Burak Uras and Bilal Zia (2018) “Learning Business
        Practices from Peers: Experimental Evidence from Small-Scale Retailers in an Emerging
        Market”,
        https://pure.uvt.nl/portal/files/23354244/2_WP_Dalton_et_al_Learning_from_Peers_DFI
        D.pdf
Higuchi, Yuki, Vu Hoang Nam and Tetsushi Sonobe (2017) “Management skill, entrepreneurial
        motivation, and enterprise survival: Evidence from randomized experiments and repeated
        surveys                        in                 Vietnam”,                    Mimeo.
        https://www.canr.msu.edu/afre/uploads/files/Higuchi_Paper_1217.pdf



                                              26
Lafortune, Jeanne, Julio Riutort and José Tessada (2018) “Role models or individual consulting:
        The impact of personalizing micro-entrepreneurship training”, American Economic
        Journal: Applied Economics, forthcoming.
Londoño, Andrés (2017) “Low Productivity: the Elephant in the Room in Colombia’s Minimum
        Wage Debate”, Panam Post, November 28 https://panampost.com/andres-
        londono/2017/11/28/low-productivity-minimum-wage-debate/
McKenzie, David (2012) “Beyond Baseline and Follow-up: The case for more T in experiments”,
        Journal of Development Economics, 99(2): 210-21.
McKenzie, David and Christopher Woodruff (2014) “What are we learning from business training
        evaluations around the developing world?”, World Bank Research Observer, 29(1): 48-82
Proexport        Colombia       (2012)      “Automotive       Industry     in      Colombia”,
        http://www.investincolombia.com.co/attachments/Automotive%20Industry%20in%20Co
        lombia%20-%20April%202012.pdf [accessed February 16, 2015]
Rasul, Imran and Daniel Rogger (2018) “Management of Bureaucrats and Public Service Delivery:
        Evidence from the Nigerian Civil Service, Economic Journal 128 (608): 413-446
Reina, Mauricio, Sandra Oviedo and Jonathan Moreno (2014) “Importancia Económica del Sector
        Automotor en Colombia”, Fedesarrollo, Bogota.
Rossi, Peter (1987) ““The Iron Law Of Evaluation And Other Metallic Rules”, pp. 3-20 in Joan
        Miller and Michael Lewis (ed.) Research in Social Problems and Public Policy volume 4.
        Jai Press Inc.
Vivalt, Eva (2017) “How much can we generalize from impact evaluations?”, Mimeo. ANU.




                                              27
Figure 1: Trajectory of Impacts on Management Practices




Notes: Means shown by treatment status. Anexo K was measured at baseline (2013) for all firms. It was
then measured monthly during implementation of the individual and group treatments, along with a one-
year follow-up, and was measured for the control group at the same time as the end of the individual
intervention, and at the time of the individual one-year follow-up. Vertical lines indicate approximate
periods of implementation of the individual intervention (first two lines) and group intervention (second
two lines). Data are for the unbalanced panel, although figure looks similar for balanced panel.




                                                   28
Figure 2: Impact on Distribution of Management Practices




Notes: Kernel densities shown of Anexo K management practices at baseline, and at last follow-up, for the
balanced panel of firms for which these practices were measured at all points in time. Kolmogorov-Smirnov
tests of equality of distributions at baseline have p-values 0.210 (control vs individual), 0.998 (control vs
group), and 0.422 (individual vs group); and at endline have p-values 0.004 (control vs individual), 0.003
(control vs group), and 0.643 (individual vs group).




                                                     29
Figure 3: The Individual and Group Treatments Improved Specific Practices to a Similar Extent




Notes: Empty circles denotes that difference between the two treatments is not statistically significant at
the 5% level; Solid circles indicate that difference between the two treatments is statistically significant at
the 5% level; Solid diamonds indicate that difference is statistically significant at the 1% level. Correlation
between group treatment effect and individual treatment effect is 0.71. 45 degree line shown.




                                                      30
Figure 4: Trajectory of Employment and Distribution of Changes in Employment




Notes: Employment data are drawn from the combination of firm data and the PILA, and are shown for
the 135 firms that have data for every month between Jan 2013 and Feb 2017. Left panel demeans
employment by the treatment group mean in 2013. Vertical lines in left panel show the period of the
individual intervention (first two lines) and group intervention (second two lines). Right panel shows the
kernel density of the change in employment Feb 2013 to Feb 2017 by treatment status.




                                                   31
Figure 5: Trajectory of Sales and Distribution of Changes in Sales




Notes: Sales are reported in millions of real (December 2017) Colombian pesos, and are shown for the 99
firms that have data for every month between Jan 2013 and Dec 2017. Left panel demeans sales by the
treatment group mean in 2013. Vertical lines in left panel show the period of the individual intervention
(first two lines) and group intervention (second two lines). Right panel shows the kernel density of the
change in log sales for the year 2017 compared to the year 2013 by treatment status.




                                                   32
Table 1: Baseline Balance
                                                             Means by Treatment Group           p-value for testing equality
                                       Overall Sample      Control Individual    Group      Control v Control v         All 3
                                       Mean       S.D.     Group   Consulting Consulting    Individual    Group        Equal
Variables used for matched triplets
Number of Employees                     59         53        64          61        53         0.841       0.285        0.464
Small Firm (<=50 employees)            0.59       0.49      0.60        0.58      0.58        0.845       0.845        0.975
Medium Firm (>50 employees)            0.41       0.49      0.40        0.42      0.42        0.845       0.845        0.975
Cundinamarca                           0.48       0.50      0.55        0.49      0.40        0.564       0.122        0.291
Valle                                  0.16       0.37      0.17        0.09      0.23        0.255       0.469        0.157
Labor Productivity                      31         18        26          32        34         0.059       0.027        0.030
Financing Practices                     51         14        51          48        53         0.225       0.508        0.164
Human Resources Practices               43         12        42          42        43         0.897       0.686        0.843
Logistics Practices                     46         13        49          43        47         0.017       0.457        0.050
Marketing Practices                     46         15        47          43        46         0.190       0.687        0.409
Production Practices                    47         13        47          47        46         0.963       0.881        0.989
Variables not explicitly balanced on
Level 2 Supplier                        0.94      0.24      0.94        0.94      0.92        1.000       0.699        0.909
Metal Products                          0.60      0.49      0.75        0.51      0.53        0.009       0.015        0.011
Plastic Products                        0.18      0.38      0.15        0.17      0.21        0.794       0.452        0.749
Firm Age (Years)                         24        14        27          23        22         0.177       0.058        0.147
Anexo K score                            46        10        47          45        47         0.200       0.955        0.353
USD Sales in 2013                      2715957   3387147   2134280      3345606   2703821     0.098       0.303        0.196
Export at all in 2013                   0.45      0.50      0.47        0.42      0.45        0.562       0.847        0.839

Sample Size                             159                  53         53         53




                                                                   33
Table 2: Impact on Management Practices
                                                    Overall   Finance     HR      Logistics Marketing Production
                                                     Score    Practices Practices Practices Practices  Practices
Panel A: Unbalanced Panel
Individual Treatment*During Intervention          9.703***     9.644***   10.793***   8.708***   10.637***     5.696***
                                                   (1.370)      (1.852)    (1.822)     (1.603)    (2.280)       (1.806)
Individual Treatment*Post Intervention            9.620***     9.712***   8.974***    8.585***   9.451***      8.488***
                                                   (1.830)      (2.413)    (2.508)     (2.457)    (2.466)       (1.993)
Group Treatment*During Intervention               11.971***   13.841***   12.249***   9.327***   11.899***    11.798***
                                                   (1.660)      (2.057)    (2.078)     (2.047)    (2.599)       (1.993)
Group Treatment*Post Intervention                 8.544***     9.820***   7.156***     5.860**   9.046***     10.694***
                                                   (1.894)      (2.306)    (2.655)     (2.539)    (2.637)       (2.048)
Sample Size                                          225          226        226         225        226           225
P-value: Individual=Group During                    0.145        0.027      0.451       0.753      0.568         0.002
P-value: Individual=Group Post                      0.533        0.958      0.365       0.235      0.864         0.315
Control Mean                                        55.98        59.18      52.39       57.75      54.80         55.79
Control SD                                          10.79        13.79      11.25       14.33      12.58         11.19
Panel B: Balanced Panel
Individual Treatment*During Intervention          9.861***    10.608*** 11.111*** 8.639***       9.072***      6.803***
                                                   (1.756)      (2.277)  (2.328)   (1.962)        (2.985)       (2.010)
Individual Treatment*Post Intervention            9.757***    10.118*** 9.463*** 8.629***        8.568***      8.935***
                                                   (2.014)      (2.650)  (2.780)   (2.646)        (2.723)       (2.078)
Group Treatment*During Intervention               12.118***   15.094*** 12.227*** 8.942***       11.309***    12.688***
                                                   (2.029)      (2.373)  (2.583)   (2.413)        (3.349)       (2.279)
Group Treatment*Post Intervention                 8.889***     9.912*** 7.502** 6.022**          9.166***     11.513***
                                                   (2.067)      (2.490)  (2.912)   (2.729)        (2.920)       (2.157)
Sample Size                                          202          202      202       202            202           202
P-value: Individual=Group During                    0.152        0.027    0.555     0.881          0.341         0.006
P-value: Individual=Group Post                      0.627        0.925    0.343     0.274          0.813         0.248
Control Mean                                        55.98        59.18    52.39     57.75          54.80         55.79
Control SD                                          10.79        13.79    11.25     14.33          12.58         11.19
Notes:
Panel A is for the 124 firms for which Anexo K management practices are measured post-baseline, panel B for
the 101 firms for which practices are measured both during and after intervention.
Robust standard errors in parentheses, clustered at the firm level. *, **, *** denote significance at the 10, 5,
and 1 percent levels respectively.
Anexo K management practices are 141 management practices divided into five sub-areas.
Ancova estimation controls for baseline (December 2013) mean, and time fixed effects included, along
with randomization triplet dummies.
Note: Group treatment moved back one period, since no control group data collected during 2016.




                                                     34
Table 3: Robustness of Impact on Management Practices to different weighting schemes
                                                    Overall     Principal         Lasso           Lasso            Lasso
                                                   Anexo K component Log Employ.              Productivity         WMS
Panel A: Unbalanced Panel
Individual Treatment*During Intervention           9.703***     6.014***        0.227***        7.065***          0.079**
                                                    (1.370)      (0.946)         (0.085)         (1.238)          (0.036)
Individual Treatment*Post Intervention             9.620***     6.012***         0.286**        8.297***         0.140***
                                                    (1.830)      (1.217)         (0.115)         (1.811)          (0.041)
Group Treatment*During Intervention               11.971*** 7.266***            0.403***        9.269***         0.240***
                                                    (1.660)      (1.177)         (0.090)         (1.463)          (0.040)
Group Treatment*Post Intervention                  8.544***     5.512***        0.301***        7.596***         0.225***
                                                    (1.894)      (1.220)         (0.106)         (1.706)          (0.040)
Sample Size                                           225           200            213             217              221
P-value: Individual=Group During                     0.145        0.208           0.020           0.111            0.000
P-value: Individual=Group Post                       0.533        0.658           0.862           0.670            0.043
Control Mean                                         55.98         5.59            2.46           43.01             0.93
Control SD                                           10.79         6.03            0.47            9.66             0.20
Panel B: Balanced Panel
Individual Treatment*During Intervention           9.861***     6.048***         0.273**        7.302***          0.100**
                                                    (1.756)      (1.327)         (0.119)         (1.602)          (0.049)
Individual Treatment*Post Intervention             9.757***     5.972***         0.309**        8.451***         0.148***
                                                    (2.014)      (1.402)         (0.122)         (2.003)          (0.044)
Group Treatment*During Intervention               12.118*** 7.494***            0.445***        9.624***         0.263***
                                                    (2.029)      (1.525)         (0.118)         (1.781)          (0.051)
Group Treatment*Post Intervention                  8.889***     5.736***        0.361***        8.009***         0.242***
                                                    (2.067)      (1.416)         (0.111)         (1.914)          (0.043)
Sample Size                                           202           178            190             194              198
P-value: Individual=Group During                     0.152        0.174           0.032           0.114            0.000
P-value: Individual=Group Post                       0.627        0.844           0.539           0.797            0.032
Control Mean                                         55.98         5.59            2.46           43.01             0.93
Control SD                                           10.79         6.03            0.47            9.66             0.20
Notes:
Panel A is for the 124 firms for which Anexo K management practices are measured post-baseline, panel B for
the 101 firms for which practices are measured both during and after intervention.
Robust standard errors in parentheses, clustered at the firm level. *, **, *** denote significance at the 10, 5,
and 1 percent levels respectively.
Anexo K management practices are 141 management practices divided into five sub-areas.
Ancova estimation controls for baseline (December 2013) mean, time and triplet fixed effects.
Principal Component takes the first principal component of the 141 practices.
Remaining columns using Lasso to choose the subset of practices that best predict log baseline employment, log
labor productivity, and the WMS baseline management score respectively, with post-Lasso coefficients then
providing the weightings on the different practices used.


                                                      35
Table 4: Correlation of Practice Changes Within Groups
Dependent Variable: Change in Practice between Baseline and Endline
                                                                     (1)        (2)        (3)
Mean Change in Practice for other Group Members                    0.100*               0.104**
                                                                   (0.050)              (0.049)
Maximum Baseline Level of Practice for Other Group Members                     0.001     0.014
                                                                              (0.021)   (0.019)
Sample Size (Firms*Practices)                                        5069      5210      5069
Mean Change in Practices                                             0.168     0.171     0.168
Notes:
Regression uses the stacked panel of 141 practices for firms in the group treatment.
Robust standard errors in parentheses, clustered at the firm level. *, **, and *** denote
significance at the 10, 5, and 1 percent levels respectively.




                                                  36
Table 5: Impact on Employment
                                                             PILA Data                                 Firm Data                         Combined Data
                                                  Employment       Log Employment          Employment         Log Employment      Employment     Log Employment
Individual Treatment*During Intervention -0.576          -1.595    -0.037     -0.100    -3.012     -5.627     -0.017    -0.054 -3.067   -2.791   0.006    -0.015
                                               (3.624) (3.901) (0.075) (0.066) (2.912) (3.499) (0.040) (0.043) (3.137) (3.170) (0.042) (0.040)
Individual Treatment*Post Intervention          0.223    -1.510    -0.015     -0.077    -2.150     -4.121      0.040     0.026  1.168   -0.001  0.120*     0.063
                                               (4.559) (5.072) (0.090) (0.083) (3.742) (4.261) (0.052) (0.057) (3.900) (3.958) (0.068) (0.055)
Group Treatment*During Intervention             1.372     1.481     0.055      0.042    3.837*      2.264 0.102** 0.085*       5.155*    3.860  0.132*     0.057
                                               (3.868) (4.642) (0.103) (0.104) (2.268) (2.582) (0.039) (0.046) (3.042) (3.155) (0.075) (0.065)
Group Treatment*Post Intervention               3.200     3.522     0.100      0.077 5.874**        4.233 0.128*** 0.129** 7.296** 6.854*       0.139*     0.111
                                               (4.536) (5.018) (0.113) (0.121) (2.849) (3.264) (0.049) (0.056) (3.345) (3.467) (0.075) (0.074)
Balanced Panel                                   No        Yes       No         Yes       No         Yes        No        Yes    No       Yes      No       Yes
Sample Size (N*T)                               8522      6944      8513       6944      7299       5760       7298      5759   8725     7877     8719     7876
Number of Firms                                  156       112       156        112       145         96        145        96    157      135     157       135
P-value: Individual=Group During                0.724     0.627     0.473      0.311     0.058      0.062      0.033     0.023  0.062    0.140   0.131     0.351
P-value: Individual=Group Post                  0.602     0.414     0.372      0.250     0.072      0.097      0.159     0.125  0.209    0.173   0.826     0.567
Control Mean in 2013                            49.0      52.0      3.432      3.637     56.1       49.5       3.666     3.591  56.1     58.6    3.666     3.726
Control S.D. in 2013                            42.8      40.0      1.099      0.844     51.3       41.4       0.864     0.789  51.3     52.7    0.864     0.839
Notes:
Fixed effects regressions with time and firm fixed effects. Standard errors clustered at the firm level in parentheses.
*, **, *** denote significance at the 10, 5, and 1 percent levels.
Columns 1-4 are formal employment, taken from administrative records of the PILA. Data cover Jan 2012 to Feb 2017.
Columns 5-8 are firm employment data, taken from firm records. Data cover Jan 2013-Dec 2017.
Columns 9-12 combine the two datasets to maximize the number of firms with employment data, and cover Jan 2013-Dec 2017.




                                                                               37
Table 6: Impact on Sales
                                                    Monthly Sales                   Log Monthly Sales
Individual Treatment*During Intervention        -18     -38           -22          0.064       -0.033
                                               (29)    (35)           (30)        (0.051)      (0.043)
Individual Treatment*Post Intervention          -54     -75           -38          0.034        0.003
                                               (59)    (65)           (37)        (0.061)      (0.063)
Group Treatment*During Intervention             52      51             44          0.075        0.079
                                               (52)    (59)           (53)        (0.059)      (0.065)
Group Treatment*Post Intervention               71      68             63          0.088        0.074
                                               (46)    (50)           (48)        (0.069)      (0.075)
Balanced Panel                                  No      Yes           Yes           No           Yes
Winsorized at the 99th percentile               No      No            Yes           No           No
Sample Size (N*T)                             7343     5940          5940          7335         5932
Number of Firms                                145      99             99           145           99
P-value: Individual=Group During              0.263   0.222          0.305         0.901        0.190
P-value: Individual=Group Post                0.109   0.095          0.099         0.503        0.407
Control Mean in 2017                           388      407           407          5.298        5.339
Notes:
Coefficients are from fixed effects regressions with time and firm fixed effects, with standard errors
clustered at the firm level.
*, **, *** denote significance at the 10, 5, and 1 percent levels.




                                                    38
Table 7: Channels of Production Impact
                                              Defect        Inventories         Energy Costs      Labor Productivity Export at         Log
                                                Rate     Levels    Logs      Levels       Logs     Log Sales/Worker         all    exports
Individual Treatment*During Intervention -0.008            -63    -0.185       544       -0.079          0.016           0.018       -0.116
                                              (0.008)     (75)    (0.224)     (926)     (0.063)         (0.046)         (0.019)     (0.211)
Individual Treatment*Post Intervention         -0.008      -78     0.118      1430       -0.038         -0.024           -0.009      -0.108
                                              (0.005) (180) (0.268)          (1079)     (0.159)         (0.054)         (0.026)     (0.195)
Group Treatment*During Intervention            0.000       79      0.049     1718**      0.155          -0.003           -0.017      -0.271
                                              (0.004) (103) (0.189)           (831)     (0.094)         (0.049)         (0.026)     (0.170)
Group Treatment*Post Intervention              -0.005      28     -0.169      1072       0.156          -0.033           -0.039      -0.114
                                              (0.005) (121) (0.259)           (821)     (0.145)         (0.059)         (0.028)     (0.137)
Sample Size (N*T)                               3879      3875     3849       5121        5121           5591             8904        1983
Number of Firms                                  78        76        76         97         97             100              159         96
P-value: Individual=Group During               0.400     0.199     0.332      0.379      0.063           0.762           0.251       0.586
P-value: Individual=Group Post                 0.600     0.652     0.350      0.761      0.422           0.897           0.311       0.978
Control Mean in 2017                           0.025       554     5.150      8564       8.063           1.771           0.212       9.602
Notes:
Regressions control for firm and time fixed effects, and are restricted to samples with data available in December 2017. Defect rate is
the proportion of production that is faulty; inventories are in millions of real (December 2017) pesos; energy costs are in thousands of
real (December 2017) pesos. Labor productivity is defined as log real sales (in millions of pesos) per worker. Export at all is a dummy
variable that takes value one if the firm exported directly abroad in the past month, and zero otherwise; Log exports is the log of the
USD value of the amount exported in the month, and is conditional on exporting taking place.
Standard errors clustered at the firm level. *, **, and *** denote significance at the 10, 5, and 1 percent levels respectively.




                                                                        39
                               ONLINE APPENDIX
Appendix 1: Examples of Products Manufactured
Appendix 2: Timeline
Appendix 3: Data Appendix
Appendix 4: Drop-out and Attrition
Appendix 5: Impacts on Individual Management Practices
Appendix 6: Robustness of Management Improvement to Sample Attrition
Appendix 7: Impacts on World Management Survey and MOPS management measures
Appendix 8: Comparison of PILA and Firm Employment Data




                                            40
Appendix 1: Examples of Products Manufactured




   Air Filters    Glass Panels                  Rubber parts




Metal parts




Plastic parts




Tires            Injection molding/cushioning   GPS tracking services




                                          41
Appendix 2: Timeline
April 12, 2012: Pilot program officially launched and firms invited to apply
June 25, 2012: Deadline for firms to apply to the program
June 11, 2013: Diagnostic phase starts
October 30, 2013: Diagnostic phase ends
November, 2013: Random assignment to treatment status
2013: World Management Survey administered to subsample of 72 firms with 40+ workers, as well as to
random sample of 180 firms representative of Colombian manufacturing sector
March-November 2014: Individual Consulting Intervention
September 2015-April 2016: Group Consulting Intervention
November to December 2015: Round 1firm data collection (individual, group and control treatment)
January to February 2016: Round 2 of firm data collection (individual and control treatment)
March to April 2016: Round 3 of firm data collection (control treatment)
June 2016: Round 4 of firm data collection (group treatment)
November 2016: Second round of World Management Survey administered
November 2017-July 2018 : Last round of firm data collection from firms
Note: firm data collection would collect all months of data available from firm records during in-person
firm visits. Timing of when this was extracted from firms varied according to CNP’s contractual
agreements, in which they were paid for batches of data collection at a time.




                                                    42
Appendix 3: Data Appendix
        A3.A. Management practices indicators:
The 141 management practices defined by CNP can be divided into five main areas: Finance, Production,
Logistics, HR, Marketing. Each of these areas can be itself divided into five to eight sub-areas. The score
of the five main areas is the average of the score of their sub-areas. Below we discuss each of these sub-
areas and explain which practices were considered to calculate their score. At the most basic level, each
single practice is graded on the following scale: 1 = “Not existing”, 2 = “In construction”, 3 = “Formalized”,
4 = “Implemented”, 5 = “Operating under control”. For some indicators, the 1 to 5 scale does not exactly
refer to the implementation stage of a practice, instead it indicates how developed or optimized a specific
aspect is – for instance whether strategical goals and individual responsibilities are clear to each worker.
Such information was collected in three stages: during the diagnostic phase, during the intervention, and
once a year after the intervention.

Human Resources

                     i. Strategic objectives leverage on people’s talent
The first aspect of Human Resources relates to the alignment of employees’ objectives with corporate
strategy, and to the clarity of such objectives for each employee. Here we consider four components. The
first one evaluates how strategic objectives leverage on people’s and teams’ talent. The second component
assesses whether there are human talent development plans, and whether these leverage on corporate
strategy. The third component assesses whether a strategic plan is defined, that includes clear objectives
and goals concerning human talent. The last component assesses whether the skill development plans are
defined also for the operational level.
                     ii. Competency-based management model for human talent development
The focus of this measure is on whether the company manages employee competences – based on the
business strategy – in order to develop human talent. It is comprised of two measures. The first one assesses
whether human resources are monitored based on their impact on the strategic objectives of the
organization. The second component addresses the development of work profiles, which must be defined
and aligned with business competencies.
                    iii. Organizational structure prepared to contribute to the achievement of strategic
                         goals
The third sub-area evaluates whether the formal and informal structure of the organization allows the
realization of corporate strategy. Is there a formally defined structure? Are all roles well defined at every
level of the organization? Three measures are taken into consideration. The first one evaluates if the
management’s focus is on processes which are aligned with the strategy of the firm. The second one assesses
whether a communication system between the different processes of the organization has been developed.
The last measure assesses whether a communication system between the different levels of the organization
has been developed.
                    iv. Program of human talent development (according business competences)
This measure evaluates how the organization works on building and retaining human talent to achieve a
competitive advantage over the competition. Two components are considered: Management of

                                                     43
development plans (career plans) for employees at managerial level, and the level of application of the
sector’s technical norms for the development of technical operational competences.
                    v. Organizational climate
The focus of this sub-area is the management of a work climate. Work climate must be appropriate for the
development of Human Capital and directed towards the achievement of corporate strategy. We consider
three components. Is there a culture of monitoring work climate, as strategic lever? Are there programs to
improve work climate? At which level are risks for health and safety controlled?
                    vi. Social responsibility within the enterprise
Here we evaluate how the company manages its internal social responsibilities. This measure is comprised
of three components. The first one assesses whether there are programs of improvement of the family
environment of employees, in order to incentivize their productivity. The second one verifies whether a
formal contracting system is in place, which generates wellbeing and productivity in workers. The last one
evaluates the implementation of a system of recognition and retribution of new ideas and improvement
suggestions at the operational level.
                   vii. Promotion of an open-communication/high-performance organizational culture,
                        and of a culture of high personal involvement
Three measures are considered for this indicator. Did the company develop a culture of control and periodic
monitoring of result achievement? How developed is the performance-based reward system for the
management? How developed is the performance-based reward system for employees at the operational
level?

Production

                     i.   Alignment of functions at the operational, managerial and directive level
The first sub-area of Production focus on whether all people working in the plant know the corporate
strategy and work to realize it. To achieve this, it is necessary that all workers and processes have
improvement goals aligned with corporate strategy. This measure is comprised of five components. The
first two evaluates the implementation and monthly monitoring of strategic goals between the Plant
Manager and his/her supervisor. The third and fourth components assess whether strategical goals and
individual responsibilities are clear to each worker, and whether each worker has improvement goals. The
last component assesses whether the performance of teams at the operational level is evaluated based on
the strategic goals.
                    ii.   Definitions and management of the most important operational processes
Here we evaluate how operational processes are defined and managed, from the order to the delivery of the
final product. Do they allow to accomplish the strategy (Standards, Policies, Roles, 5s, Layout, Established
Processes)? This sub-area includes six components. The first one evaluates whether processes are well
identified and have a proper description (VSN, SIPOC). The second one assesses whether the plant layout
allows optimal material flow. The third one concerns the implementation of a 5S program in the plant. The
fourth one evaluates how bottle necks are identified and managed. The last two components evaluate
standards, specifications and work instructions used by workers, and how these are verified by supervisors.




                                                    44
                    iii.   Formal method to measure and manage the plant’s efficiency (Waste, Hours
                           paid/Service capacity, machinery’s efficiency)
The third sub-area evaluates how the company measures and manages the main KPIs of the plant, such as
team efficiency, efficiency in the use of material, response time, etc. The first of components of this sub-
area concerns the monthly measure of the plant’s KPIs (OEE, Waste, Defects, Lead time, Others). The
second indicator concerns weekly or bi-weekly management of KPIs’ goals (OEE, Waste, Defects, Lead
Time, Others). The third one assesses whether improvement programs for KPIs (times and quality) are
developed applying instruments of plant management. The last one assesses whether a culture of daily
recollection of facts and data is in place, in order to demonstrate improvement in processes.
                    iv.    Recollection of information regarding results, continual improvement, and
                           performance of processes
Here we assess how the company is managing data and information regarding processes, results and
continuous improvement. The four components of this sub-area are the following: Is there a culture of visual
management with daily-updated graphs of machinery performance? Are duration and quality of each
process recorded daily by the responsible worker? Does the Administrative Management make sure that
monitoring instruments are in good condition and precise? Is there a monitoring and sampling plan to
capture the information necessary to the improvement of processes?
                     v.    Process to detect and solve anomalies in the execution of tasks
The focus of this sub-area is to evaluate how anomalies in processes are managed within the plant. It is
comprised of five components. The first one assesses whether there is a mechanism so that workers report
anomalies of time and quality to their supervisors. The second one assesses whether criteria are defined to
realize analysis of anomalies. The third one concerns the daily analysis of time and quality anomalies by
supervisors and workers. The fourth one assesses whether supervisors and workers manage improvement
plans to eliminate time and quality anomalies. The last component concerns job descriptions, and whether
they include responsibilities of anomalies solving.
                    vi.    Technical planning of production based on the analysis of demand
The focus of sixth sub-area is the planning of production. Is such planning based on a statistical analysis of
clients’ orders? Does such planning guarantee the flexibility necessary to achieve a high level of service?
Four components constitute this sub-area. The first one assesses whether meetings to revise programming
take place between production and sales areas. The second component evaluates the use of statistic methods
to collect information and analyze production programming, according to demand variation. The third one
evaluates production planning to ensure the availability of material for the monthly, weekly and daily
program. The last component evaluates monitoring and management of service to clients (deliveries in
quality, time and quantity).
                   vii.    Management of safety during the process, contingencies, emergencies / impact
                           on the environment
Here we assess how the company monitors its impact on people and environment, which actions are
undertaken to mitigate any negative impact, and how it complies with safety and environmental norms and
regulations. This sub-area is comprised by five measures. The first one concerns the compliance with safety
requirements, laws and norms. The second measure assesses whether the necessary norms and standards of
safety within the plant are well defined. The third one evaluates the management of the indicators of
industrial safety within the plant (number of accidents, level of noise, temperature). The fourth one concerns

                                                     45
monitoring and management of the plant’s environmental impact. The last measure assesses compliance
with the norms regarding evacuation routes and cleared zones for fire-fighting equipment.
                  viii.    Maintenance guarantees the optimal condition of infrastructure
The last sub-area of Production evaluates the maintenance plan, how maintenance is monitored and
managed and how maintenance is related to the creation of value by the enterprise. All this is paramount to
guarantee optimal condition of machinery, furniture, equipment and tools. This measure reflects the
following four points. Is there a preventive maintenance plan for the equipment? Are technicians able to
rapidly repair damage to the machines? Are replacements available, so to allow to rapidly repair damage to
the machines? Does Maintenance Management work with indicators such as MTTR, MTBF, Availability?

Logistics

                      i.   Process of alignment of functions at the operational, managerial and directive
                           level
The first sub-area of Logistics looks at the alignment of functions, and at the deployment of the
organizational strategy. It is comprised of three components. The first one concerns the implementation of
strategic goals between the Logistics Head and his/her supervisor, and whether there are specific projects
to achieve such goals. The second component assesses whether there is a monthly control of strategic goals
by the Plant Manager and the supervisor. The last component concerns the alignment of employees’
objectives in the logistics area with the firm’s strategic goals.
                    ii.    Structure and management of the supply chain (planning, purchases and
                           provisions, storage raw material, plant supply, storage finished product,
                           distribution, client service)
Here we evaluate if employees in the logistics area understand their roles and activities. In this sub-area
there are four measures. The first one evaluates procedures and work instructions for logistics processes.
The second measure is concerned with the layout of the areas of logistic operations in the supply chain. The
third component assesses if a 5S plan for the supply chain is in place. The last component evaluates
monitoring and management of KPIs in the logistic process (inventory, lead time, service level).
                    iii.   Planning and management of demand / alignment of productive and logistic
                           processes
This sub-area evaluates the procedure through which demand is planned and the reaction to changes in the
established plan. Here we have four distinct components. The first one assesses whether a statistical system
is in place, in order to study and analyze demand. The second component concerns the definition of the
demand’s planning, and whether such definition is updated with annual, trimestral and monthly frequency.
The third component evaluates whether communication between logistics and the areas of marketing and
sales goes through a system that includes rules to change the production plan. The last component evaluates
the way a firm monitors and manages the compliance with the budgets of production planning.
                    iv.    Planning, management and control of inventories of raw material, supplies,
                           product on process and finished product (Inventory Policies)
This sub-area evaluates the design of the inventory system, and the maintenance of inventory levels. The
five components upon which this measure is based are the following. The first one assesses whether the
levels of inventory (raw material, semi-finalized product WIP, finished product) are kept at an optimal level

                                                     46
related to the variation in demand. The second component assesses whether the inventory movement it is
recorded daily and controlled weekly. The third component states whether a methodology of classification
of inventory ABC is in place, in order to establish policies of inventory, supply, storage and control
accordingly. The fourth component verifies the use of MRP systems, where product structures are defined,
in ways that allow to plan the material needed to comply with production orders. The last component
evaluates whether processes are in place, so to guarantee the rotation of inventory according to “First in,
first out” schemes.
                     v.    Supply system
This sub-area concerns the relation with suppliers, the way in which suppliers are evaluated, and the control
the firm has over realized purchases. It is comprised of five measures. The first one concerns the
management of policies and processes for the selection and evaluation of suppliers. The second measure
concerns the management of suppliers’ development. The third measure focusses on the management of
raw material prices and supplies. The fourth measure assesses whether Lead Time of suppliers is managed
and taken into account in the planning of material supply. The last measure assesses whether purchased
items are verified in terms of quantity, quality and opportunity of delivery.
                    vi.    Storage system
Five components are taken into account while evaluating the storage system. The first one is the
management of the inventory of obsolete and non-compliant products. The second one is the
implementation of a system to administrate storage locations (layout and 5S). The third one evaluates the
implementation of industrial security norms in the warehouse’s operations. The fourth one concerns the use
of standards and procedures in the storage operations (picking and packing). The last component evaluates
the monitoring and improvement of the storage operation time (picking and packing).
                   vii.    Distribution system
This last sub-area of Logistics concerns the delivery of the created value to the client. It is comprised of
four components. The first one evaluates efficiency in the processes of loading and unloading. The second
one evaluates monitoring and management of the efficiency in the delivery process (perfect deliveries). The
third component concerns the management of transport routes to reduce costs. The fourth component
evaluates the management of reverse logistics for those products, materials or supplies that have to return
to the company’s premises. The last component evaluates whether the management of distribution takes
into account the current legislation regarding freight transit.

Marketing

                     i.    Elaboration, management and control of the marketing plan
This measure evaluates the design of the guiding document of commercial activities and its alignment to
the organization’s strategy. Such indicator is comprised of seven components. The first two assess the
implementation of an analysis of trends (economic, commercial, technological, political and social) and of
risks (e.g. free commerce, supply, variations in exchange rate, infrastructure, etc.). The third indicator
evaluates the segmentation of products, technology, clients, consumers, etc. The fourth component assesses
whether commercial strategies are based on contribution margins. The fifth component evaluates the
alignment of the marketing and sales plan with Business Strategy. The sixth indicator assesses whether
price, promotion and growth policies are defined using the contribution margins. The last indicator
addresses monitoring of sale behavior and trends, and of changes in the marketing plan.

                                                     47
                     ii.   Processes of market research
This measure indicates how the company conducts market research, and is composed by three components.
The first one addresses if and how the company conducts inquiries with clients and potential clients. The
second one assesses whether the company conducts periodic monitoring of competitors’ offers. The last
component evaluates if and how the company conducts research of marketers and/or distributors.
                    iii.   Client and after sales service
This measure evaluates the company’s approach to client satisfaction and is comprised of four measures.
The first one evaluates the management of clients’ complaints and requests. The second measure concerns
the analysis of products’ performance in the market. The third measure assesses whether in the company
there is a culture of continuous improvement of products and services. The last component verifies if the
company holds periodic meetings to discuss clients’ feedback.
                    iv.    Sales management
This sub-area focusses on the elaboration, management and control of the sales plan. We consider five
indicators. The first three assesses whether the company is holding three different type of meetings: with
the distribution channels (to capitalize opportunities in the market), planning meetings between sales and
production, and meetings of the sales group to analyze sales behavior and trends. The fourth component
assesses whether periodic training of the sales team takes place. The last indicator states whether sales
agents are evaluated based on performance.
                     v.    Relationship management
This measure is built on three components evaluating whether the company conducts three types of
evaluation studies: of its cooperation with suppliers, of its cooperation with clients, and of its cooperation
with competitors.



Finance

                      i.   Alignment of the financial process with corporate strategy
Four components indicate whether strategic objectives and goals are clear at all levels of the financial
process, and whether everyone is committed to such goals. The first component refers to the alignment of
the Financial Head and Deputy Head with corporate strategic goals. The second component indicates
whether a system of monitoring and control of financial goals and objectives is in place. The third indicator
refers to the frequency in which financial objectives and goals are achieved. The last component evaluates
the financial support to the management processes of the organization.
                     ii.   Structure of the administrative and operational information system
The administrative information system is evaluated based on monitoring and controlling of processes, in its
effectiveness of analysis and decision making. This is reflected in five measures. The first measure
evaluates the structure of the corporative information system. The second one assesses whether the setup
of administrative and operational business’ information is appropriate. The third one states if Product
Structures are associated with cost and profitability margins (standard, estimated, reals). A fourth indicator
refers to the protection of the corporative information system, whereas the last one evaluates the
organization of the corporative information system.

                                                     48
                    iii.   Formulation and management of budgets
This sub-area evaluates how the firm formulates and manages budgets. The measure is comprised of four
components. The first two focus on the existence of a Master Budget (operational, financial and of
investment) and on its control and monitoring (agendas, finances, investment). The third component
assesses Tax Planning, and the last one evaluates how deviations from Master Budget are analyzed
(regarding costs, expenses, sales, working capital, investment).
                    iv.    Financial management of results
The fourth component of Finance reflects how well the company monitors and manages indicators of
financial management, and how it analyses them to undertake corrective action. Three components build
this measure: the first evaluates the structure of control and monitoring indicators (KPIs), the second one
the agenda of financial management meetings, and the third one how working capital is managed.
                     v.    Programs of financial improvement (costs and expenses, working capital,
                           investment)
This sub-area evaluates how projections and saving goals are realized. It is comprised of three components
answering the following three questions: is there a program of efficient administration of costs and
expenses? Is there an action plan for the compliance with financial improvement programs? Is the available
financial information appropriate?
                    vi.    Analysis and management of investment projects
This sub-area evaluates the process which the firm uses to plan, realize and follow up the purchase of fixed
assets. This measure is made of three components. The first component assesses if a program of calculation
of investment projects exists and if it is aligned with strategy. The second one verifies whether there is a
policy regarding capital investment (CAPEX) and other smaller investments. The last one concerns the
implementation of cost-benefit analysis for the different projects and firm’s investments.
                   vii.    Information systems
The second-last sub-area of finance evaluates if the information systems are interrelated and if strategies
are in place to safely conserve information. Three aspects are considered here: the recollection and storage
structure of the administrative information system, recollection and storage structure of the operational
information system, and validation of information.
                  viii.    Structure of the costing system
The last sub-area of finance evaluates whether the costing system supplies real and updated information, so
to identify cost anomalies in any process. The first of four components reflects the implementation of a
costing systems. The second component assesses if results (value estimates and real) are being validated.
The last two components evaluate absorption capacity of installed structure and workforce efficiency.




                                                    49
        A3.B Key Performance indicators
Every variable is recorded monthly.
Defect rate: this is defined as the ratio of faulty production to total production. Faulty production is defined
as not in condition to be sold, and is determined by the firm. There are several key measurement issues with
this measure. First, firms vary in whether they record production in physical units (e.g. number of items,
kilograms) or in pesos. Secondly, some firms would calculate this product only for a specific production
line or product, and not for the whole plant. Thirdly, in a few cases, firms changed the way they measured
these units over time. IPA and CNP worked together to identify these cases, and the series we use is for the
set of firms with a consistent measure.
Energy cost: Cost of the energy in thousands of pesos. Firms are instructed to record the cost of the energy
for each month not the bill they paid that month (which refers to the energy used the previous month). Some
firms incorrectly recorded the energy bill of that month – which refers to the energy cost of the previous
month. However, it was generally possible to correct this during the recollection meetings. In a couple of
cases, firms did not record this variable in pesos, but in KW. It has not been possible to correct this
discrepancy during data collection, and data are not available for those firms.
Net sales: Total sales (gross sales) minus devolutions (discounts, etc.). This is taken directly from the Profit
& Loss Statement (P&L) or records of the firms.
Average monthly inventory: Stock of final product that is in conditions to be sold (in pesos). Most firms do
not keep inventory – for instance because they work on a project schedule. CNP instructed firms to record
a missing value if they don’t keep inventory. Other firms record physical inventory every three or six
months – not monthly – in which cases during the other months they record a missing value. Some firms
keep include in their inventory figures semi-finalized products, not only finalized product. In a limited
number of cases, firms did not record inventory in pesos, and it was not possible to correct the values.
Total employees: All employees of the firm which are considered "stable or long term", independently of
the contract type. There are no standard criteria to define what a "long term" employee is. This is defined
by each firm. They calculate it considering the totality of the firm.

        A3.C Gathering of performance data
During the diagnostic phase CNP gave to each firm a specifically designed spreadsheet to track
the monthly evolution of KPIs in each of the five main areas (Finance, Production, Logistics, HR,
Marketing). CNP also trained each firm to use these spreadsheets. Every firm received such
training, which was done before randomly assigning firms to the two treatment groups and the
comparison group. Periodically, CNP would visit firms to verify the monitoring of KPIs and
resolving any doubt. This information was then recollected during 4 rounds, the first of which took
place in July 2015 as described in Appendix 1. The recollection followed this procedure: staff from
CNP and IPA would attend a firm’s board meeting, at the end of which the spreadsheets would be
revised and KPIs discussed. CNP’s representative would guide the discussion, going through every
single indicator, whereas IPA’s analyst would contribute to the data revision and record any
relevant information. Special effort was put into ensuring that the data were recorded
homogenously across firms and time, also given that some of the information dated back to 2013.
During every meeting, inconsistencies were corrected in the use of missing variables, zeros, units,
and definitions. Moreover, any anomaly in the evolution of KPIs was also discussed in depth.


                                                      50
One challenge stemmed from the fact that not all firms found the use of the provided spreadsheets
equally useful. Some firms were therefore filling the spreadsheets only sporadically, and at the
same time were using other ways of tracking KPIs as their main instrument, or were not tracking
them properly. Other firms were not filling the spreadsheets at all, unless CNP would visit them
and help them to do so, which meant that in some cases data were not recorded for months. This
resulted in a loss of information, which was sometimes impossible to correct.
Another major challenge was that – especially as far as production variables are concerned – CNP
did not give strict prescriptions to firms as to how interpret and record variables. This caused
differences in the interpretation of variables between firms. Two types of inconsistency are the
most frequent: regarding units and regarding whether the variable refers to a production line or to
the whole plant. For instance, some firms have recorded the same production variable as “value in
pesos” while others recorded it as “number of pieces”. Others have filled “total production” with
data regarding their main production line, not regarding the whole plant as it was planned. The
freedom in interpreting variables also caused variability in the units used within a given firm,
which might have recorded different variables in different ways. Finally, in a limited number of
cases there were changes in the way a firm would interpret the same variable over time, and also
changes in the way a variable was measured. Given that the freedom to use the spreadsheets in a
flexible way was considered by CNP to be part of the intervention, during data collection the only
available measure to mitigate these discrepancies was to carefully record any information and
explanation.




                                                51
Appendix 4: Drop-Out and Attrition
Table A4.1 shows that the firms that completed the interventions are similar on baseline
characteristics to those which dropped out.
 Table A4.1: Comparison of Baseline Characteristics of Firms that Completed Interventions to Drop-Outs
                                          Individual Treatment                  Group Treatment
                                                  Dropped         p-                   Dropped       p-
                                  Completed          Out        value Completed          Out       value
 Number of Employees                  62.2           54.4       0.746      52.9          53.1      0.981
 Small Firm (<=50 employees)          0.59           0.57       0.940      0.58          0.59      0.974
 Medium Firm (>50 employees)          0.41           0.43       0.940      0.42          0.41      0.974
 Cundinamarca                         0.54           0.14       0.049      0.42          0.35      0.665
 Valle                                0.09           0.14       0.645      0.25          0.18      0.559
 Labor Productivity                    32             30        0.780       32            39       0.278
 Financing Practices                   48             50        0.730       53            52       0.855
 Human Resources Practices             42             40        0.738       44            43       0.784
 Logistics Practices                   43             43        0.989       49            43       0.175
 Marketing Practices                   43             44        0.934       46            46       0.948
 Production Practices                  46             54        0.229       47            44       0.371
 Level 2 Supplier                     0.93           1.00       0.496      0.92          0.94      0.758
 Metal Products                       0.50           0.57       0.731      0.47          0.65      0.242
 Plastic Products                     0.15           0.29       0.390      0.19          0.24      0.738
 Firm Age (Years)                     23.3           21.8       0.829      20.9          24.6      0.375
 Anexo K score                        44.4           46.5       0.679      47.8          45.7      0.487
 USD Sales in 2013                  3158858       7547448       0.189    2767765       2469362     0.799
 Export at all in 2013                0.43           0.29       0.465      0.47          0.41      0.687
 Sample Size                           46              7                    36            17




Table A4.2 compares the characteristics of those firms for which we have December 2017 sales
and employment data to the attritors, and then shows the sample of non-attritors is reasonably
well balanced on baseline characteristics.




                                                52
Table A4.2: Comparison of Baseline Characteristics of Non-Attritors to Attritors, and Balance on Non-Attiting Sample
                                                        Full Sample                             Sample of Non-Attritors
                                            Non-Attritors     Attritors p-value      Control      Individual     Group       p-value
Number of Employees                              58.9           59.8       0.921       54.9          68.2         52.9        0.441
Small Firm (<=50 employees)                      0.58           0.61       0.716       0.67          0.51         0.57        0.426
Medium Firm (>50 employees)                      0.42           0.39       0.716       0.33          0.49         0.43        0.426
Cundinamarca                                     0.50           0.43       0.349       0.58          0.51         0.43        0.480
Valle                                            0.16           0.17       0.939       0.18          0.08         0.23        0.174
Labor Productivity                                30             32        0.460        26            32           32         0.054
Financing Practices                               51             51        0.964        51            48           53         0.154
Human Resources Practices                         44             40        0.069        45            43           44         0.906
Logistics Practices                               47             44        0.147        50            44           48         0.106
Marketing Practices                               46             44        0.281        47            45           47         0.841
Production Practices                              47             45        0.480        47            48           46         0.867
Level 2 Supplier                                 0.94           0.93       0.679       0.94          0.95         0.94        0.993
Metal Products                                   0.57           0.65       0.353       0.79          0.46         0.49        0.004
Plastic Products                                 0.15           0.22       0.276       0.09          0.16         0.20        0.404
Firm Age (Years)                                 24.1           24.1       0.997       27.6          24.6         20.2        0.085
Anexo K score                                    47.0           44.9       0.218       48.1          45.5         47.6        0.538
USD Sales in 2013                              2877978        2252395      0.342    2043854        3515012     3013064        0.133
Export at all in 2013                            0.47           0.41       0.480       0.48          0.46         0.46        0.969
Sample Size                                      105             54                     33            37           35
Notes: Attrition defined as not having firm sales and employment data reported from firm records in December 2017. This
can arise from firms refusing to provide this information, as well as from firm death. P-value in column 3 is for a t-test of
equality of means by attrition status.
Columns 4 through 6 provide baseline means by treatment status for the sample of non-attritors. P-value in column 7 is for
F-test of equality of means.




                                                                        53
Appendix 5: Impacts on Individual Management Practices
Table A5.1 shows the breakdown of significant improvements in management practices within
the Anexo K index:

 Table A5.1: Summary of Impacts at the Sub-Index and Individual Practice Level
                                        Sub-Indices                            Individual Practices
                                                                                                     # sig.
                       #       # sig. Ind.         # sig. Group         #        # sig. Ind.         Group
 Finances              8            6                    5              29           17               15
 HR                    7            3                    2              20           11                6
 Logistics             7            5                    2              31            8                9
 Marketing             5            3                    3              22            9               13
 Production            8            6                    8              39           22               30
 TOTAL                 35          23                      20          141           67               73
 Note: lists number of practices that are statistically significant at the 5% level post-intervention.


Table A5.2 details the individual management practices that have treatment effects of 0.8 or
more (on a 5-point scale).

 Table A5.2: Practices that increase by 0.8 or more from at least one-treatment
                                                                                           Individual       Group
 Finance Practices
  System of monitoring and control of financial goals in place                                 0.827***    0.666***
                                                                                                (0.175)     (0.189)
 Frequency at which financial objectives and goals achieved                                    0.802***    0.648***
                                                                                                (0.205)     (0.212)
 Existence of a Master Budget                                                                  0.718***    1.163***
                                                                                                (0.263)     (0.259)
 Control and Monitoring of Master Budget                                                       0.765***    1.016***
                                                                                                (0.226)     (0.241)
 How deviations from master budget analyzed                                                    0.909***    1.070***
                                                                                                (0.244)     (0.265)
 Structure of Control and Monitoring Indicators (KPIs)                                         0.935***    0.956***
                                                                                                (0.247)     (0.237)
 Agenda of Financial Management Meetings                                                       1.055***    1.055***
                                                                                                (0.230)     (0.222)
 HR Practices
 Strategic objectives leverage people's and team's talent                                      0.833***    0.631***
                                                                                                (0.206)     (0.214)
 Human talent development plans linked to corporate strategy                                   0.809***    0.902***
                                                                                                (0.200)     (0.215)
 Strategic plan defined, that includes clear goals for human talent                            0.951***    0.910***

                                                      54
                                                                                        (0.207)     (0.194)
 Marketing
 Practices
 Implementation of analysis of marketing trends                                        0.485**     0.867***
                                                                                        (0.227)     (0.196)
 Implementation of analysis of marketing risks                                         0.630***    0.898***
                                                                                        (0.230)     (0.226)
 Alignment of marketing and sales plan with business strategy                          0.663***    0.825***
                                                                                        (0.216)     (0.227)
 Monitoring of sale behavior and trends                                                0.719***    0.901***
                                                                                        (0.209)     (0.224)
 Production Practices
 Implementation of strategic goals between plant manager and supervisor                   0.616*** 0.966***
                                                                                           (0.176)    (0.173)
 Monthly monitoring of strategic goals between plant manager and supervisor               0.686*** 0.895***
                                                                                           (0.215)    (0.207)
 Strategic goals and roles clear to each worker                                           0.670*** 0.896***
                                                                                           (0.166)    (0.162)
 Each worker has improvement goals                                                        0.562*** 0.892***
                                                                                           (0.188)    (0.170)
 Bottlenecks are identified and managed                                                   0.514*** 0.842***
                                                                                           (0.179)    (0.194)
 Monthly measurement of plant KPIs                                                        0.822*** 0.857***
                                                                                           (0.193)    (0.200)
 Weekly or bi-weekly management of KPIs                                                   0.851*** 0.650***
                                                                                           (0.227)    (0.212)
 Improvement programs for KPIs developed                                                  0.927*** 0.989***
                                                                                           (0.223)    (0.220)
 Culture of visual management with graphs of machine performance                          0.810*** 0.515**
                                                                                           (0.210)    (0.212)
 Supervisors and workers manage improvement plans for quality anomalies                   0.802*** 0.944***
                                                                                           (0.187)    (0.215)
 Notes: robust standard errors in parentheses, clustered at the firm level. *** denotes significance at the
 1 percent level. Coefficients are treatment effects post-intervention, and control for time effects,
 randomization strata, and



Appendix 6: Robustness of Management Improvements to Sample
Attrition
Table A6.1 shows the availability of our management score data by time period and measure. The
greatest data availability is for the Anexo K measure, but this still suffers from attrition, while the
WMS and MOPS data are available for subsets of the same only.


                                                   55
 Table A6.1: Management Data availability by measure and time period
                                                  # Firms with Data by
                                                       Treatment
 Measure                           Period Control Individual Group          Data source
 Anexo K management score           2013       52          51          53   Anexo K collected by CNP
                                    2014       42          46           0   Anexo K collected by CNP
                                    2015       26          40          35   Anexo K collected by CNP
                                    2016        0           0          36   Anexo K collected by CNP
 WMS management score               2013       26          24          27   WMS collected by LSE
                                    2016       20          19          31   WMS collected by IPA
 MOPS management data               2012       28          33          34   Collected retrospectively by IPA
                                    2017       28          33          34   Collected by IPA


Figure A6.1 compares the distribution of baseline management practice data for firms which attrit
and do not have endline (2015 for the control and individual treatment, 2016 for the group
treatment) Anexo K data. We see that the distribution of those with and without follow-up
management data is similar, both for the full sample, and when we split by treatment status. We
cannot reject equality of distributions between attritors and non-attritors using a Kolmogorov-
Smirnov test of equality of distributions. This shows that attrition is not selective on initial
management practices.




                                               56
Figure A6.1: Distribution of Baseline Anexo K Management Practices by Whether or Not Endline
Management Data are Missing




Notes: Kolmogorov-Smirnov tests of equality of distributions of baseline management practices between
firms with missing endline management data and firms with endline management data have p-values
0.979 (all firms), 0.995 (control firms), 0.754 (individual treatment), and 0.425 (group treatment).

Note that our main estimates of the treatment effect are for a balanced panel, and include
randomization triplet fixed effects. Coupled with the above analysis which shows no selection on
baseline management practices into having follow-up data, and Figure 2 which shows clearly the
change in distribution of practices for this balanced panel, this suggests our main results are not
being driven by selective attrition. Nevertheless, as a further sensitivity check, Table A6.2 provides
Lee bounds for the treatment impacts. Table A6.1 shows we have substantially more control firms
reporting management practices in 2014 than 2015, so less trimming is required when estimating
the impact during the year of intervention than for the post-intervention impact. We see that both
the treatments have significant impacts even at the lower bound for the during intervention period.
In contrast, the bounds become wider for the post-intervention period. If all the additional firms
that attrited from the control group were the best managed firms, then we could not conclude the
intervention had had a positive effect. We can examine this assumption using the control firms that
attrited between 2014 and 2015. The 16 control firms that attrited had first follow-up (2014) Anexo
K scores with a mean of 51.4, while the 26 control firms that did not attit had 2014 mean Anexo

                                                 57
K scores with a mean of 52.8 (p-value 0.72). Thus, not only is there no evidence of selective
attrition on baseline management practices, neither is there evidence of endline selective attrition
based on first follow-up management practices. This strongly suggests that the assumption that it
was all the best-managed firms in the control group that differentially attrited is very unlikely to
hold, so that the Lee lower bound is unlikely to be applicable.
 Table A6.2: Lee Bounds of Impact on Anexo K Score
                                Individual Treatment Effect Group Treatment Effect
 Impact during intervention
 Lee lower bound                          6.303**                    9.368***
                                           (2.723)                    (3.290)
 Lee Upper bound                         9.746***                   16.610***
                                           (3.065)                    (2.851)
 Impact post-intervention
 Lee lower bound                            1.076                      4.784
                                           (3.628)                    (3.218)
 Lee Upper bound                         13.993***                  13.913***
                                           (3.011)                    (3.158)
 Sample Size                                 106                        106
 Proportion trimmed
   for during intervention                  8.7%                       16.7%
    for post-intervention                   35.0%                      27.8%
 Notes: robust standard errors in parentheses. *, **, and *** denote significance
 at the 10, 5, and 1 percent levels respectively.



Appendix 7: Impacts on World Management Survey and MOPS
management measures
WMS 2013 Data Collection
We commissioned the London School of Economics (LSE) team responsible for the Bloom and
Van Reenen (2007) World Management Surveys (WMS) to apply their methodology to a random
sample of 180 firms representation of the Colombian manufacturing sector, as well as to a sub-
sample of 77 firms in our sample, focusing on firms with 40 or more employees (Table A6.1).
Interviews were done by phone with a manager with thorough knowledge of the production
process, typically the plant manager or production manager. The WMS interview is structured as
a guided discussion, and is designed to be answered by a manager with thorough knowledge of the
production process, typically the production or plant manager. Such discussion lasts between one
hour and one hour and a half, and covers the 18 questions related to operations, monitoring,
targeting, and people management. The interviewer guides the interviewee by means of open
questions, letting him/her speak freely but making sure to have the necessary objective information
to score each of the 18 topics using the provided scoring grid. Each of the 18 topics receives a
score between 1 (no modern practice is implemented) and 5 (best practice).


                                                58
A first use of this survey was to be able to compare the management practices of the auto parts
sector in our sample to that of Colombian manufacturing as a whole. Figure A7.1 shows that the
distribution of management practices in our firms is similar to that of all SME manufacturing firms
in Colombia. A second purpose was to enable comparison of Colombia to the rest of the world.
Figure A7.2 shows Colombia’s average management practices score of 2.54 are poorly managed
by global standards, but typical for many developing countries, just below that of India and just
above Kenya. The mean management practices score for the auto parts firms of 2.38 is similar.
Figure A7.1: Comparison of WMS Management Practices Distribution of our Auto Parts firms to a
Representative Sample of the Colombian Manufacturing Sector




Source: WMS surveys conducted of 180 Colombian manufacturing firms and 77 auto parts firms conducted
by the LSE WMS team in 2013.




                                                59
Figure A7.2: Comparison of Colombian World Management Survey Management Score to Other
Countries

     United States                                                                                                  3.308
              Japan                                                                                             3.230
          Germany                                                                                             3.210
           Sweden                                                                                            3.188
            Canada                                                                                        3.142
      Great Britain                                                                               3.033
             France                                                                              3.015
           Australia                                                                            2.997
                 Italy                                                                         2.978
             Mexico                                                                       2.899
             Poland                                                                      2.887
         Singapore                                                                     2.861
     New Zealand                                                                      2.851
   Northern Ireland                                                                  2.839
           Portugal                                                                 2.826
 Republic of Ireland                                                            2.762
                Chile                                                          2.752
               Spain                                                           2.748
            Greece                                                           2.720
               China                                                         2.712
             Turkey                                                         2.706
         Argentina                                                          2.699
               Brazil                                                     2.684                                Africa
                India                                                 2.611
           Vietnam                                                    2.608
                                                                                                               Asia
         Colombia                                                   2.578
              Kenya                                               2.549
             Nigeria                                            2.516                                          Oceania
         Nicaragua                                      2.397
          Myanmar                                     2.372                                                    Europe
            Zambia                                 2.316
          Tanzania                             2.254
             Ghana                           2.225                                                             Latin America
           Ethiopia                          2.221
      Mozambique                     2.027                                                                     North America



                    1.5           2              2.5               3                                                        3.5
                          Average Management Scores, Manufacturing

Source: World Management Surveys, Nick Bloom.

WMS 2016 Data Collection
In September 2016, we asked Innovations for Poverty Action (IPA) to conduct a second round of
the World Management Survey (WMS). The LSE provided support in training the four analysts
that conducted the interviews, the two supervisors and the research associate responsible for the
survey. All material was provided by the LSE and the training took place in October 2016.
Since the WMS is designed for larger firms, we chose as a sample frame the 109 firms in our
sample that had had at least 25 employees at baseline. This consisted of 37 control, 41 group
treatment, and 31 individual treatment firms. Out of these 109 firms, we were able to collect data
on 70 firms (20 control, 31 group, 19 individual), of which 50 firms had also been interviewed in
2013 (14 control, 22 group, 14 individual). This response rate of 64% is double the standard WMS
response rate, reflecting the pre-existing contacts with these firms through the project. Of those
companies not interviewed, 3 had closed down, and the remainder either refused, or repeatedly
rescheduled and could not be interviewed.




                                                60
Management and Organizational Practices Survey (MOPS)
Our final measure of management practices comes from a 16-question survey given to firm owners
in 2017, derived from the Management and Organizational Practices Survey (MOPS). This survey
was created by the U.S. Census bureau, and was designed to enable basic management practices
to be measured in a self-administered survey format. The survey asks questions related to
monitoring, targeting, and incentives, and is intended to measure similar concepts to the WMS
(Bloom et al, 2018). It was carried out by Innovations for Poverty Action during in-person visits
to the firms, and firms were also asked to recall what these practices were five years earlier (in
2012). Table A6.1 shows this data were able to be collected for 95 firms.
Associations between different measures of management and over time
The WMS and MOPs are collected in a much less in-depth way than the Anexo K, and measure
different aspects of management. Table A7.1 looks at the baseline correlations between different
measures. At baseline, the Anexo K management score has a correlation of 0.26 with the WMS
management score, and 0.23 with the MOPS score. By way of comparison, the 38 management
practices in Bloom et al. (2013) had a 0.40 correlation with the WMS score. The Anexo K is most
highly correlated with the monitoring component of the WMS (correlation of 0.44). When we
examine the five areas of the Anexo K, the finance, logistics and production scores are more highly
correlated with the WMS than the HR and marketing scores. Recall the WMS does not measure
marketing practices, and there is a difference in emphasis in how the two focus on human resource
practices. The WMS is more focused on how good and bad performers are hired and rewarded,
whereas the Anexo K has more of an emphasis on organizational culture and links to overall
business strategy. Notably, while the MOPS and WMS are intended to measure similar concepts,
the correlation between the 2012 (recalled) MOPs management score and the WMS is only 0.08,
suggesting substantial noise in this measurement.
 Table A7.1: Correlations between baseline Management Measures
                              WMS       WMS        WMS        WMS   WMS           MOPS
                             Overall Operations Monitoring Targets People         Overall
 Anexo K Overall Score        0.26       0.16       0.44       0.04 0.11           0.23
 Finance Score                0.28       0.22       0.46       0.07 0.07           0.15
 HR Score                     0.14       0.09       0.33      -0.08 0.03           0.17
 Logistics Score              0.23       0.12       0.32       0.07 0.13           0.31
 Marketing Score              0.09       0.03       0.12       0.02 0.06           0.10
 Production Score             0.26       0.14       0.40       0.07 0.13           0.17
 MOPS Overall                 0.08       0.00       0.04       0.07 0.10           1.00


Figure A7.3 plots the cross-sectional and panel associations between measures. We see that the
endline Anexo K has a cross-sectional correlation of 0.34 at endline with both the WMS and
MOPS, and that the WMS and MOPS at endline still only have a correlation of 0.27. More starkly,
there is no relationship between the WMS and Anexo K in the panel: firms which improve the
most according to the Anexo K are unrelated to those which improve the most according to the
WMS. This is also true of the association between changes in the MOPs and changes in the WMS.


                                                61
Recall that the WMS is done double-blind by phone, with enumerators scoring firms on a five-
point scale. While there is signal in the responses, this also entails a lot of noise. Bloom et al.
(2016) report that the test-retest correlation when two different people from within a plant
answered the same questions within a few weeks of one another is only 0.51. In our case, there is
an added factor of the baseline being done by the LSE team, while the endline was collected by
Innovations for Poverty Action (after training from the LSE team). As such, we should expect
much of the change over time in the WMS to reflect measurement error, which can make it difficult
to detect treatment effects.
Figure A7.3: Cross-sectional and panel correlations between management measures




Notes: first column shows cross-sectional correlations pre-treatment, second column shows cross-sectional
correlations post-intervention for last measurement obtained by each method, and third column shows correlation of
change in management (pre-post) according to each measure.

To investigate which of the three management measures is most strongly correlated with business
outcomes of interest, we regress baseline log employment and labor productivity on each
management measure separately, and then on all three together. The results are shown in Table
A7.2. The Anexo K score is strongly associated with both log employment and labor productivity
at baseline (both significant at the 1% level), while the WMS and MOPS have weaker associations.
When all three measures are included together, the Anexo K measure remains statistically


                                                       62
significant, while neither other measure is significant. This suggests the Anexo K measure has a
stronger signal for business outcomes than these two alternatives.




                                              63
Table A7.2: Baseline Association of Business Outcomes with Management Measures
                                         Log Employment                                  Labor Productivity
Anexo K Score                0.035***                        0.017***        0.672***                         0.877***
                              (0.006)                         (0.006)         (0.140)                          (0.186)
WMS Management Score                     0.250*                 0.086                      4.914               -0.652
                                         (0.134)              (0.153)                     (4.070)              (5.310)
MOPS Management Score                              0.869*      -0.554                                8.994     -2.894
                                                   (0.465)    (0.459)                               (8.650) (12.164)
Sample Size                     156         77       95          46             156         77        95         46
R-squared                      0.19        0.05     0.03        0.14           0.14        0.01      0.01       0.25
Notes:
Anexo K management practices are 141 management practices divided into five sub-areas.
WMS is World Management Survey, taken for subsample of firms in 2013. MOPS is Management and Organizational
Practices Survey, and was conducted in 2017, with recall of practices 5 years earlier used to obtain baseline measure.
Robust standard errors in parentheses, *, **, *** denote significance at the 10, 5, and 1 percent levels respectively.




                                                                      64
Treatment Effects on WMS and MOPS measures of management
Table A7.3 reports the estimated treatment impacts on the WMS and MOPS measures. Since these
data are only available for a subset of our firms, we report several different specifications. In Panel
A, we use all 70 firms for which follow-up WMS data are available (or the 95 firms with MOPS
data for the last column). We do not control for randomization triplet fixed effects given that this
would result in relatively few triplets being included. Instead, panel A includes no other controls,
while Panel B controls linearly for key baseline variables used in the randomization (region, size,
employment, labor productivity, and baseline Anexo K). Panels C through E then use the set of 50
firms for which both baseline and endline WMS data are available.
In panels A and B, we find very small and statistically insignificant impacts of either treatment on
any of the WMS or MOPS management measures. Restricting to the sample for which we also
have baseline data in panels C, D and E results in larger point estimates for the WMS, but the
impacts are still far from statistically significant.
Our results show that both treatments resulted in significant increases in the Anexo K measure of
management practices, and in each of its five subcomponents. This raises the question of why we
do not see such a change in the WMS and MOPS? A first potential explanation is that the WMS
and MOPS are only available for subsamples of the data, so that the difference in results could
stem from sample composition and sample size. To investigate this hypothesis, Table A7.4 re-
estimates the management treatment effect regressions for common sub-samples. The first column
repeats our estimated impact on the Anexo K measure for the balanced panel. Columns 2 and 3
then consider the 52 firms for which we have both the 2016 WMS and Anexo K measured during
and after the intervention. We continue to see a statistically significant impact of the individual
treatment on the Anexo K measure using this sub-sample both during and post-intervention, and a
significant impact of the group treatment during the intervention, with the magnitude of the
estimated effect only falling in a substantive way for the group treatment post-intervention,
although with a wide confidence interval. In contrast, there is no significant impact on the WMS
using this same sample. The foot of the table converts the estimated treatment effects into
confidence intervals expressed in terms of standard deviation changes in the respective
management practice. We see that not only are the WMS treatment effects statistically
insignificant while those for the Anexo K outcome are statistically significant, but the 95 percent
confidence interval for the effect of the individual treatment effect does not even overlap for the
two outcomes. This suggests that the lack of impact on the WMS is not simply a matter of the
sample composition or statistical power. Likewise, when we restrict to the same sample as the
MOPS in columns 4 and 5, we find significant treatment impacts on the Anexo K, and no
significant impact on the MOPS, although in this case the confidence intervals do overlap.




                                                  65
Table A7.3: Impact on Other Measures of Management Practices
                                                   WMS        WMS          WMS           WMS         WMS        MOPS
                                                 Overall Operations Monitoring Targets             People       Score
All firms interviewed in 2016
Panel A: No controls
Individual Treatment                              0.040      0.100         0.152        -0.045      -0.003      -0.008
                                                 (0.169)    (0.345)       (0.225)      (0.238)     (0.156)    (0.034)
Group Treatment                                   0.075      0.035         0.152        0.041       0.053       0.013
                                                 (0.170)    (0.298)       (0.209)      (0.230)     (0.153)    (0.031)
Panel B: Baseline Controls
Individual Treatment                              -0.000     -0.030        0.095        -0.076      -0.007      -0.005
                                                 (0.166)    (0.307)       (0.235)      (0.243)     (0.152)    (0.032)
Group Treatment                                   0.061      0.009         0.094        0.094       0.025       0.018
                                                 (0.166)    (0.276)       (0.210)      (0.231)     (0.162)    (0.030)
Sample Size                                         70         70           70            70          70          95
Control Mean in 2016 of outcome                    2.92       2.90         3.28          2.94        2.61        0.52
Control S.D. in 2016 of outcome                    0.55       1.07         0.68          0.79        0.54        0.13
50 firms interviewed in WMS in 2013 & 2016
Panel C: No Controls
Individual Treatment                              0.143      0.321         0.314        -0.086      0.131       0.010
                                                 (0.218)    (0.423)       (0.256)      (0.311)     (0.199)    (0.051)
Group Treatment                                   0.283      0.357         0.312        0.225       0.284       0.064
                                                 (0.216)    (0.363)       (0.254)      (0.293)     (0.183)    (0.045)
Panel D: Baseline Controls
Individual Treatment                              0.029      0.123         0.153        -0.188      0.074       -0.011
                                                 (0.204)    (0.388)       (0.257)      (0.304)     (0.197)    (0.055)
Group Treatment                                   0.242      0.238         0.210        0.276       0.241       0.066
                                                 (0.203)    (0.350)       (0.266)      (0.286)     (0.175)    (0.049)
Panel E: Baseline Controls + Ancova
Individual Treatment                              0.072      0.233         0.168        -0.160      0.133       -0.009
                                                 (0.199)    (0.394)       (0.252)      (0.299)     (0.199)    (0.055)
Group Treatment                                   0.267      0.335         0.232        0.296       0.214       0.068
                                                 (0.214)    (0.372)       (0.276)      (0.302)     (0.163)    (0.048)
Sample Size                                         50         50           50            50          50          46
Control Mean in 2016 of outcome                    2.88       2.89         3.24          2.96        2.51        0.53
Control S.D. in 2016 of outcome                    0.65       1.13         0.76          0.90        0.56        0.14
Notes:
Each panel represents treatment impacts from a separate regression.
70 of the 159 firms were given the WMS survey in 2016, of which 50 had also received this survey in 2013.
Panels A and C regress outcomes on treatment dummies only. Panels B and D add controls for
dummies for the Cundinamarca and Valle regions, a dummy for having 10 to 50 workers at baseline, the number of
employees in 2013, labor productivity in 2013, and the 2013 Anexo K management practice score.
Panel E also controls for the baseline value of the outcome measure.
Robust standard errors in parentheses. *, **, and *** indicate significance at the 10, 5, and 1 percent levels.




                                                          66
Table A7.4: Impact on Anexo K on Same Samples as WMS and MOPS
                                                Balanced Panel              WMS Sample               MOPS Sample
                                                    Anexo K            Anexo K         WMS       Anexo K       MOPS
Individual Treatment*During Intervention            9.413***           8.350***                 9.669***
                                                     (1.760)            (2.229)                  (1.879)
Individual Treatment*Post Intervention              9.309***           8.325***       -0.210    9.657***       0.017
                                                     (1.821)            (2.368)      (0.176)     (1.856)      (0.036)
Group Treatment*During Intervention                11.384***           7.602**                  11.143***
                                                     (2.202)            (3.164)                  (2.438)
Group Treatment*Post Intervention                   8.155***             3.911        -0.132    7.549***       0.040
                                                     (2.124)            (3.091)      (0.174)     (2.318)      (0.034)
Sample Size                                            202                104           52          172          86
Control Mean                                          55.98               60.1         2.93        57.44        0.49
Control SD                                            10.79               6.98         0.41        10.23        0.12
Implied 95% confidence intervals in S.D.
Individual Treatment*Post Intervention             [0.53,1.19]        [0.53,1.86] [-1.35,0.33] [0.59,1.30] [-0.45,0.73]
Group Treatment* Post Intervention                 [0.37,1.14]       [-0.31,1.42] [-1.15,0.51] [0.29,1.18] [-0.22, 0.89]
Notes:
Column 1 is for the 101 firms for which Anexo K management practices are measured both during and post intervention.
Columns 2 and 3 restrict to the subset of 52 firms that also had the WMS measured in 2016,
Columns 4 and 5 restrict to the subset of 86 firms that also had the MOPS measured in 2017.
Regressions control for baseline (December 2013) Anexo K mean, time fixed effects, and controls for region
baseline labor productivity, baseline number of employees, and for being a small firm at baseline.
Robust standard errors in parentheses, clustered at the firm level.
*, **, *** denote significance at the 10, 5, 1 percent levels respectively.

A more compelling explanation for the lack of impact on the WMS is due to this measure not being
as able to pick up the types of changes in management practices that come from this intervention.
A first reason for this is just the general noise in the measure, as discussed above. This noise means
that much of the change in the WMS over time may reflect measurement error, making it difficult
to detect treatment effects. But a second reason is that the WMS measures practices at a more
general level than the level of specificity at which interventions are focused. Evidence in support
of the idea that the WMS is not able to pick up the specific changes in practices that these
consulting type interventions bring about comes from the India experiment that initially motivated
this work. Bloom et al. (2013) report that their treatment plants increased their use of the 38
specific management practices they measure by 37.8 percentage points, significantly larger than
the change for the control firms. They asked Accenture to also apply the WMS survey instrument
to these firms during this post-intervention measurement phase. However, Accenture did not
receive the LSE training on applying this survey instrument, and appear to have graded firms more
harshly, with a mean WMS score of 1.45, compared to a baseline mean of 2.69 when conducted
by the LSE team. Despite the large change in management practices observed in the 38
management practices used in Bloom et al. (2013), there is no significant difference in the follow-
up WMS scores in this case (mean of 1.43 for the treated firms, 1.49 for the control firms, p-value
= 0.693). So, as with our Colombian case, if one were to rely on the WMS to measure whether
changes in management had occurred, the conclusion would have been that the Indian
interventions had no significant effect on management.



                                                          67
Appendix 8: Comparison of PILA and Firm Employment Data
The PILA is the platform through which firms pay social security data for their employees. We
had to request that government ministries with access to this data attempt to match our firms. This
was done twice. First, the department of statistics (DANE) matched to the firm data between
January 2014 and June 2016. Secondly, the Ministry of Health matched our firms to their database,
covering the period January 2011 through February 2017. Matching firms was not trivial, with
firms’ names not always given, the identification number of the company changing if the economic
activity changes or some other features change, and at times the same firm being listed under the
name of the owner versus the firm. Through a lot of manual matching, DANE was able to match
more of our firms than the Ministry of Health, so we have more observations for their data period
than we do afterwards. Our PILA series uses the Ministry of Health extract as a base, and then
adds in observations which are missing in the Ministry of Health data but are present in the DANE
dataset – as well as replacing cases where the DANE dataset appears to have identified the firm
when the Ministry of Health has only identified the owner.
Table A8 shows the availability of our employment data. We see the PILA data covers most firms,
but the balanced panel for which we have data from Jan 2012 to February 2017 is for 112 firms.
We have data for some of 2016/17 for 110 firms from the firm records, but only 96 firms have data
for every month from Jan 2013 to December 2017. Combining the two datasets gives 135 firms
with data for every month from Jan 2013 to February 2017.
 Table A8: Availability of Employment Data
                                                All
                                              Firms     Control   Individual   Group
 PILA Employment: Any data                     156        52          51        53
 PILA Employment: balanced panel               112        38          34        40
 Survey data employment: any 2016/17
 data                                          110        35         37         38
 Survey data employment: balanced panel        96         30         36         30
 Combined Employment: balanced panel           135        45         42         48


Figure A8 shows a scatterplot of the employment reported in the PILA and the employment taken
from the firm’s records for the set of 5,870 year-month-firm observations for which we have data
from both sources. The correlation is 0.90 over the full period, and the mass of points lie close to
the 45-degree line. However, we do see a few points which have very low levels of employment
reported in the PILA, and higher levels in firm records. These likely reflect informal employment.




                                                68
Figure A8: Employment Reported in PILA vs Employment Reported by Firms




                                       69