Policy Research Working Paper 10108 Is Dirt Cheap? The Economic Costs of Failing to Meet Soil Health Requirements on Smallholder Farms Sydney Gourlay Talip Kilic Development Economics Development Data Group June 2022 Policy Research Working Paper 10108 Abstract Agricultural productivity is hindered in smallholder farming vast majority are cultivating only moderately suitable plots. systems due to several factors, including farmers’ inability Farmers cultivating highly suitable soil have the potential to meet crop-specific soil requirements. This paper focuses to increase their observed yields by as much as 86 percent, on soil suitability for maize production and creates mul- while those at the opposite end of the suitability distribu- tidimensional soil suitability profiles of smallholder maize tion (with marginally suitable land) operate closer to the plots in Uganda, while quantifying forgone production due production frontier and can only increase yields by up to to cultivation on less-than-suitable land and identifying 59 percent, given the current technology set. There is het- groups of farmers that are disproportionately impacted. The erogeneity in potential gains across the wealth distribution, analysis leverages the unique socioeconomic data from a with poorer households facing more heavily constrained subnational survey conducted in Eastern Uganda, inclusive potential. Assuming no change in technologies and man- of plot-level, objective measures of maize yields and soil agement practices used by Ugandan farmers, there are attributes. Stochastic frontier models of maize yields are limited economic gains tied to closing suitability class-spe- estimated within each soil suitability class to understand cific productivity gaps, or even at the extreme reaching the differences in returns to inputs, technical efficiency, and average potential productivity levels observed in the high potential yield. Only 13 percent of farmers are cultivating suitability class. soil that is highly suitable for maize production, while the This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at sgourlay@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Is Dirt Cheap? The Economic Costs of Failing to Meet Soil Health Requirements on Smallholder Farms Sydney Gourlay‡ and Talip Kilic†1 JEL: O13, Q12, Q18. Keywords: Soil, agricultural productivity, crop yields, technical efficiency, household surveys, smallholders, Uganda, Sub-Saharan Africa. 1 ‡ Economist, Living Standards Measurement Study (LSMS), Development Data Group, World Bank, Rome, Italy. sgourlay@worldbank.org.†Senior Economist, Living Standards Measurement Study (LSMS), Development Data Group, World Bank, Washington, DC, USA. tkilic@worldbank.org. This paper was produced with financial support from the 50x2030 Initiative to Close the Agricultural Data Gap, a multi-partner program that seeks to bridge the global agricultural data gap by transforming data systems in 50 countries in Africa, Asia, the Middle East and Latin America by 2030. For more information on the Initiative, visit 50x2030.org. 1. Introduction Agriculture is central to rural livelihoods in Sub-Saharan Africa, where smallholder family farming contributes up to 69 percent of rural household incomes (Davis et al., 2017), with direct effects on household consumption and nutrition outcomes (Azzarri et al. 2015; Dillon et al., 2017; Kirk et al., 2018; Slavchevska et al., 2015). In view of the importance of agriculture for farming households and the evidence regarding the disproportionate reduction in poverty associated with growth in the agricultural sector vis-à-vis other sectors, increasing agricultural productivity has been a long-standing goal of African governments. Nevertheless, the observed yields for staple crops, such as cereals, remain significantly lower than potential yields, especially in rain-fed areas (Lobell et al., 2009). Research has also asserted that (i) the large dispersion in agricultural productivity among African smallholders is driven in part by unobserved heterogeneity in land quality (Gollin and Udry, 2021), and (ii) poor crop yields in the region are driven in part by depletion of soil nutrients, which, besides its direct impact on crop yields, also adversely affects the effectiveness of non-land inputs (Berazneva et al., 2018). The micro-level relationships between crop yields and a range of both climatic and non-climatic factors have been studied extensively in African smallholder farming systems, utilizing socioeconomic and agricultural survey data which, if georeferenced, have also been integrated with third-party geospatial data sources. 2 Despite the rich evidence base on the drivers of/constraints to crop yields, the research on the impact of soil fertility on farm- and plot-level agricultural productivity outcomes, including crop yields, has, however, been relatively limited. The knowledge gaps have been mainly due to (i) systematic measurement errors in farmers’ subjective soil quality assessments (Berazneva et al., 2018; Carletto et al., 2017); (ii) lack of integration of objective sampling and testing of soils as part of household and farm surveys that are critical for understanding the drivers of agricultural production and productivity at the micro- level (Gourlay et al., 2017); and (iii) the mismatch between the scale of African smallholder farming and the spatial resolution of publicly-available, continent-wide geospatial data on soil properties - most notably generated with 250-meter spatial resolution as part of the Africa Soil Information Service (AFSIS) initiative (Hengl et al., 2015). A related strand of research includes the geospatial assessments of the suitability of growing conditions for specific crops and these assessments have been made available also at the level of relatively aggregated geographic areas, leveraging primarily geospatial data that may or may not be complemented with ground data on soil properties (Abd-Elmabod et al., 2019; Ahamed et al., 2 See for instance the research on low levels of agricultural input use (Sheahan and Barrett, 2017), intra- and inter- household gender differences in levels of and returns to agricultural inputs (Udry, 1996; Kilic et al., 2015; Aguilar et al. 2015; Oseni et al. 2015; Slavchevska, 2015), imperfections in land and labor markets (Palacios-Lopez and Lopez, 2015; Deininger et al., 2017; Dillon and Barrett, 2017), and impacts of extreme weather events on agriculture (McCarthy et al., 2021; Michler et al., 2019; Wineman et al., 2017). 2 2000; Hall et al., 1992). A prominent example of these efforts is the Global Agro-Ecological Zones (GAEZ) initiative, which makes available global geospatial datasets on crop suitability and attainable yield for 53 crops but at approximately 9-kilometer spatial resolution. 3 While the GAEZ data may be useful for assessing crop suitability across expansive geographies, the coarse resolution of this geospatial product limits its use for farm- or plot-level analyses of agricultural productivity in Sub-Saharan African contexts. Against this background, our paper leverages unique household survey data collected in Uganda, inclusive of plot-level, objective measures of maize yields and soil attributes, to fill knowledge gaps regarding the linkages between soil fertility and smallholder agricultural productivity – both on the whole and within different farmer subpopulations that are defined by socioeconomic characteristics. In doing so, we additionally provide operational insights regarding the integration of objective soil testing into large-scale household surveys, and present empirical evidence regarding the shortcomings of existing geospatial data on soil attributes vis-à-vis plot-level soil sampling and analysis. The data originate from a methodological survey experiment conducted in Eastern Uganda; the top maize-producing region in Uganda where maize is a primary staple crop. Specifically, our analysis (i) estimates the maize-specific soil suitability profile of each maize plot based plot-level measures of soil attributes and assigns each plot to one of four suitability classes that are anchored in the GAEZ definitions, (ii) demonstrates the heterogeneity, across the soil suitability classes, in observed versus potential levels of productivity – through the estimation of stochastic frontier functions of crop cutting-based maize yields; (iii) quantifies production and income gains from closing suitability class-specific productivity gaps – for the sample as a whole and for various farmer sub-populations; and (iv) documents whether the use of 250-meter resolution AFSIS geospatial data on soil attributes changes our conclusions regarding the relationships between soil fertility and maize yields. Measuring agricultural productivity on plots of varying levels of maize-specific soil suitability, and the potential gains in productivity on those plots, allows for a more thorough understanding of the ability of agriculture alone to generate significant income gains. And while some existing work integrates ground-based measures of realized crop production, the current literature fails to adequately address the linkages between soil suitability, observed versus potential crop yields, and other household- and community-level factors influencing productivity outcomes. 4 The results indicate that despite the standing of the Eastern Region as Uganda’s leading maize producing region, only 13 percent of farmers are cultivating soil that is highly suitable for maize production, while the vast majority are cultivating only moderately suitable lands. The key soil 3 For more information, please visit: https://gaez.fao.org/pages/theme-details-theme-4. 4 One notable exception is the work of Jain et al. (2020) who employ a holistic model of soil suitability in India, including socio-economic factors. 3 health deficiencies differ across suitability classes, suggesting that soil-based interventions need to be carefully considered for the specific suitability profiles in which they take place. The relationship between soil suitability, observed yields, and yield potential, is positive and significant. The findings suggest that farmers cultivating highly suitable land have the potential to increase their observed yields by as much as 86 percent, up to 3,009 kg/ha, while those at the opposite tail of the suitability distribution, those with marginally suitable land, operate closer to the efficient frontier and thus can only seek to increase observed yields by up to 59 percent, or 1,315 kg/ha. Furthermore, the analysis reveals heterogeneity in potential gains across the wealth distribution, with poorer households facing more heavily constrained potential. The stochastic frontier estimations are sensitive to the use of geospatial AFSIS soil data vis-à-vis the plot-level soil measurements, and this is in part driven by the AFSIS data failing to distinguish between soil suitability classes to the same degree as, and in a consistent manner with, the plot-level soil data, with 19 percent of plots assigned a different suitability class when using the AFSIS data vis-à-vis the plot-level data. On the whole, assuming no change in technologies and management practices used by Ugandan farmers, there are limited economic gains tied to closing suitability class-specific productivity gaps, or even at the extreme reaching the average potential productivity levels observed in the high suitability class. The paper is organized as follows: Section 2 describes the context of Ugandan smallholder farming, Section 3 describes the data, Section 4 presents the empirical methodology, Section 5 discusses the key results, and Section 6 concludes. 2. Country Context Uganda has a population of 45.7 million, 73 percent of whom reside in rural areas. 5 The share of rural population falling below the national poverty line stands at 23.4 percent – twice the level observed in urban areas (UBOS, 2021). In rural areas, 78 percent of the working population is employed in agriculture (UBOS, 2021), and agricultural income makes up 67 percent of total rural household income (Davis et al., 2017). As such, the Government of Uganda has long recognized the role of increased agricultural productivity as an important driver in generating wealth and alleviating poverty (GoU, 2013; 2015; MAAIF, 2013). In Eastern Uganda, the primary maize-growing region in the country and the region of analysis here, maize accounts for the highest share of crop income (World Bank, 2016), and no more than 40 percent of maize-growing households sell any maize. 6 Eastern Uganda, following Northern 5 The statistics pertain to 2020 and are sourced from the World Bank World Development Indicators database: https://databank.worldbank.org/source/world-development-indicators. 6 Authors’ calculation based on Uganda National Panel Survey (UNPS) 2015/16 data. 4 Uganda, is also the region with the highest concentration of the country’s poor, with a poverty rate of 26 percent (UBOS, 2021). In the analysis sample, as discussed in the subsequent section, nearly 52 percent of all parcels of land owned or cultivated by the household were inherited or allocated by family or local leaders, suggesting that there is limited mobility of land across households. 3. Data The majority of the analysis that follows relies on household survey data collected through the Methodological Experiment on Measuring Maize Productivity, Soil Fertility, and Variety (MAPS), and the related plot-level soil sample testing results. These plot-level soil analyses are complemented by, and compared to, geospatial soil data extrapolated from the Africa Soil Information Service (AFSIS). 3.1.MAPS The Methodological Experiment on Measuring Maize Productivity, Soil Fertility, and Variety (MAPS) is a two-round household panel survey aimed at testing alternative methods of measuring maize production and key agricultural inputs, including soil fertility, maize variety, and plot area. 7 The resulting MAPS dataset includes a unique collection of objectively measured variables paired with data on household socioeconomics, demographics, and agricultural practices. MAPS Round I was fielded in 2015, and Round II was implemented in 2016. As the second round of the study did not include soil analysis, in this paper we utilize only MAPS Round I, which collected detailed data on the first (and the main) agricultural season of the calendar year. In order to ensure high quality data collection and supervision, the MAPS sampling design was limited in its geographic scope. The sampling for MAPS Round I was completed in a multi-stage process. First, three strata were identified in the primary maize-growing regions of Eastern Uganda, namely Serere district, Sironko district, and a 400km2 area spanning Iganga and Mayuge districts. From each stratum, enumeration areas (EAs) were randomly selected with probability proportional to size (15 from Serere and Sironko each, and 45 from the Iganga/Mayuge stratum). In each selected enumeration area, a full household listing was conducted as part of the study, identifying households who cultivated at least one maize plot and whether they had pure stand and/or intercropped plots. Finally, 12 households were selected from each enumeration area and with an effort to have an even split of purestand versus intercropping maize households. Due to the low incidence of pure stand maize plots, and cases in which plots identified as pure stand in 7 MAPS was implemented through a collaboration between the World Bank’s Living Standards Measurement Study (LSMS), the Uganda Bureau of Statistics, the World Agroforestry Centre, the CGIAR Standing Panel on Impact Assessment (SPIA), and Stanford University, with generous support from the UK government. It is part of a larger methodological research agenda undertaken by the World Bank’s LSMS, aimed at identifying improved methods of agricultural and household data collection using more objective, yet scalable, methods. 5 the household listing phase were intercropped at the time of the first interview, the final sample was made of up 385 pure stand maize plots and 515 intercropped maize plots (43 percent and 57 percent, respectively). Therefore, the sample comprises 900 maize plots, each one from a different household. The MAPS fieldwork was implemented by the Uganda Bureau of Statistics, with technical and training support from the World Bank Living Standards Measurement Study (LSMS). Each household was visited three times for a post-planting interview, a crop cutting visit, and a post- harvest interview. The post-planting visit involved the administration of a questionnaire and the GPS-based plot area measurements, the demarcation of crop-cutting subplots, and the collection of soil samples (discussed below) on the randomly selected maize plot. The post-planting questionnaire included a standard individual-level module on household composition and basic characteristics (age, gender, education, etc.), a durable assets module, a farming assets module, questions on the use and availability of agricultural extension services, and finally parcel and plot- level details. 8 The plot-level modules made up the bulk of the post-planting questionnaire, with questions on tenure status, cultivation status, which household members manage the plot, what farm implements were used, what farm management practices were employed (for example, tillage, crop rotation, etc.), post-planting labor inputs, and most importantly, farmer assessment of plot area, soil quality, and seed usage. It is critical to note that farmer assessment was made prior to any objective measurement so as to not influence the farmer response. 9 In the second visit, the crop-cutting visit, enumerators harvested the demarcated subplots which were set during the post-planting visit in order to obtain objectively measured production quantities for the crop-cutting subplots, which are subsequently extrapolated to the full plot area. The final household visit took place following completion of all maize harvests. At this time, farmers were administered an additional questionnaire, which asked for the estimated total maize production per plot as well as fertilizer inputs and harvest labor inputs. 8 Smallholder agricultural questionnaires in Uganda are structured such that there is a parcel of land, and within that parcel there may be multiple plots. The level of interest in this paper is the plot. In MAPS, a parcel was defined as “a contiguous piece of land with identical (uniform) tenure and physical characteristics. It is entirely surrounded by land with other tenure and/or physical characteristics or infrastructure e.g. water, a road, forest, etc.” A plot was defined as “a contiguous piece of land within a parcel on which a specific crop or a crop mixture is grown. A parcel may be made up of one or more plots.” 9 Because MAPS was a small-scale methodological validation study, great care was taken to ensure that there were no missing values for the key variables, therefore, there are no concerns of missing data. There were, however, circumstances that required the sample to be restricted to 840 from 900. Plots which did not have any soil fertility measurement (due to mismatching of soil sample labels) or no crop-cutting (due to non-compliance of households) are excluded. The missingness of soil measurement is likely independent of production on the plot as the missingness stems from errors by the enumerator or laboratory. It could be argued, however, that non-compliance by the household (in which they harvest the crop-cutting subplot before the enumerator’s arrival) could be a systematic problem in which households with fewer resources cannot afford to forgo the maize on the crop-cutting subplot. 6 In what follows, we provide more details on the methods used for data collection in domains that are central to our research. Soil fertility: Soil fertility testing was conducted by the World Agroforestry Center (ICRAF). During MAPS fieldwork, enumerators collected plot-level soil samples from each of the selected plots following a protocol carefully designed to maximize the representativeness of the samples while maintaining feasibility of implementation. From each plot, four samples were collected from the top-soil (0-20cm depth) and combined in the field to create one composite top-soil sample. Additionally, a single sub-soil sample (20-50cm depth) was collected from the center of the plot. After being processed locally, the samples were shipped to ICRAF Nairobi where all samples were subject to spectral soil analysis and approximately 10 percent were subject to conventional wet chemistry testing. A portion of this 10 percent sample was used to calibrate prediction models, while the remainder was used to verify the predictions made onto the spectral data. For details, see Shepherd & Walsh (2002). The final results from the soil analysis include key indicators of soil fertility such as pH, texture analysis (percent sand, percent clay), cation exchange capacity, and the concentration of multiple elements and micronutrients, such as carbon, nitrogen, and potassium. Maize yields: A 4x4 meter subplot (divided into four 2x2 meter quadrants) and a separate 2x2 meter subplot were laid on the randomly selected maize plot during the post-planting visit following a strict protocol to ensure the location of the subplots was random. The subplots were roped off until harvest, when the enumerators were alerted and completed the harvest with the assistance of the farmer and a local assistant. The shelled maize from each 2x2 meter subplot was weighed and barcoded separately. The maize was then dried by a dedicated team at a central, monitored location until moisture content was in the range of 12 to 14 percent. Once desired dryness was met, the maize was re-weighed, and the dry weight and final moisture content recorded. For analysis, all maize weights have been normalized to 12 percent moisture content. Plot area: Following conclusive evidence of systematic bias in farmer estimates of plot area among smallholder farmers in the region (see Carletto et al., 2013, Carletto et al., 2015, Carletto et al., 2017), MAPS implemented area measurement using a Garmin eTrex 30 handheld GPS device. Both the area and the raw GPS track outline were stored. Table 1 presents descriptive statistics of key variables. Every household in the sample cultivated at least one maize plot (1.89 maize plots on average). Plot sizes are small (0.15 ha on average) and 7 input rates are relatively low with 15% (4%) of plots being treated with inorganic fertilizer (pesticide). Average maize yields, as measured via crop-cutting, are 1068 kg/ha. Socio-economic indicators, plot manager characteristics, and agricultural variables are also included in Table 1, as they will be relevant for the analysis that follows. Table 1. Summary Statistics Mean Mean Household Plot Household size 6.12 Maize yield (kg/ha) 1068 Dependency ratio 1.35 Plot area (ha) 0.15 Female household head° 0.21 Distance from plot to HH (km) 0.13 Head age (years) 43.6 HH labor days 45.1 Head years of education 6.71 Hired labor days 5.89 HH received extension services° 0.36 HH purchased parcel° 0.23 Agricultural asset count 6.02 HH leased in parcel° 0.16 + Distance to nearest market (km) 33.5 Plot treated with organic fertilizer° 0.04 Number of cultivated plots 2.90 Plot treated with inorganic fertilizer° 0.15 Number of maize plots 1.89 Plot treated with pesticide° 0.04 Plot Manager Rainfall: Female manager° 0.39 Flowering season rainfall (2015, mm) 246 Manager age (years) 41.9 Long term mean flowering season rainfall ( 211 Manager years of education 6.18 Manager completed primary education° 0.39 Manager received extension services° 0.31 Primary occupation is agriculture° 0.80 Notes: N = 840; ° Binary variable; + Road network distance to nearest FEWSNET market 3.2.AFSIS Geospatially-derived soil data is more widely available to researchers and policy makers than plot- level soil sampling linked to household surveys. Yet, geospatial data in this realm is of coarser granularity than what would be observed at the plot-level. In order to understand the implications of relying on geospatial soil data in cases where plot-level sampling is unavailable, we complement the analysis using MAPS-based soil properties with that using soil data drawn from one of the premier, publicly available geospatial soil databases – the Africa Soil Information Service’s AfSoilGrids250m data product. 10 The AfSoilGrids250m product, henceforth referred to simply as AFSIS, utilizes multiple inputs to construct a map of more than 15 key soil properties at 250-meter resolution across the entire African continent. Inputs to the product, including the Africa Soil Profiles database (Leenars et al., AfSoilGrids250m is a product developed by the World Soil Information (ISRIC) in collaboration with the World 10 Agroforestry Centre (ICRAF), The Earth Institute (Columbia University), and the International Centre for Tropical Agriculture (CIAT). 8 2014), the Africa Sentinel Sites soil database (Vagen et al., 2010), the GlobeLand30 land cover database, and the SoilGrids 1km predicted values, are joined through the use of 3D regression kriging founded on random forests modeling (Hengl et al., 2015). A layer of geospatial data is produced for each soil property at anywhere from one to six different soil depths. Because MAPS georeferenced each agricultural plot, it is possible to join the AFSIS data with the center point of the agricultural plot and extract the point estimates of each soil property of interest, at the soil depths of interest. The following soil properties were extracted from AFSIS and utilized in the soil suitability analysis: cation exchange capacity, electrical conductivity, organic carbon, and pH. Each of these properties were available in depths of 0-5cm, 5-15cm, and 15-30cm. Because MAPS soil samples were taken at depths of 0-20cm and 20-50cm, we use a weighted average of the AFSIS 0-5cm and 5-15cm values for comparison with the MAPS 0-20cm samples. Subsoils, MAPS 20-50cm or AFSIS 15-30cm depths, are not utilized. Table 2 provides summary statistics for key soil properties derived from both the MAPS and AFSIS sources. Table 2. Comparison of Key Soil Properties Across Data Source Test of MAPS AFSIS Difference pH 6.42 5.75 *** Cation exchange capacity (CEC, cmol/kg) 13.52 15.15 *** Organic carbon (%) 1.47 1.78 *** Electrical conductivity (salinity, dS/m) 0.06 0.10 *** N 840 840 Notes: Two-sided p-values reported: *** p<0.01, ** p<0.05, * p<0.1 4. Empirical Approach Various approaches to agricultural productivity are used in the agricultural literature, depending on research objectives and data availability. Average measures of productivity, including partial and total factor analysis, can be used to create a single statistic but the methods require high quality crop price data for the monetization of reported production that are often hard to come by in rural agricultural contexts with thin markets. Alternatively, marginal productivity analysis can be conducted with more direct policy-related takeaways. Cobb Douglas functions, and variations of Cobb Douglas, are commonly used (Deininger et al., 2007; Sherlund et al., 2002). The limitation of a simple linear production function in this context is that it assumes all farmers to be performing at optimal levels, without explaining the deviations between the observed and attainable (predicted) output levels. In order to allow for the analysis of the heterogeneity in production potential conditional on crop-specific suitability, one of the main objectives of the paper, we use stochastic frontier analysis, which allows for a better understanding of the aforementioned deviations. 9 S1: Highly Suitable S2: Moderately Suitable Maize Soil Property Y S3: Marginally Suitable N: Not Suitable Figure 1. Underlying crop-soil property suitability class structure, following FAO (1976). The maize suitability analysis completed here includes pH, cation exchange capacity (CEC), organic carbon, salinity (soil electrical conductivity), and plot slope (percent). The following two-step empirical approach is employed: (1) estimate crop suitability measures at the plot-level; and (2) estimate stochastic frontier models to estimate production frontiers for each class of maize suitability. The contribution of this paper comes from the ability to execute each of these steps on the same sample and from being able to do so with objectively-measured soil properties and crop production. In addition to executing the aforementioned analytical steps using the MAPS plot-level soil data, the steps are replicated using the geospatially-derived soil property data from AFSIS. 4.1.Assigning Maize-Specific Soil Suitability Measures Estimating aggregate crop suitability measures requires comparing a vector of optimal soil properties against the levels of said properties observed on each plot. Crop suitability cannot be reduced to a single soil property, as several properties affect plant growth simultaneously, and soil property requirements vary by crop. The crop suitability framework set forth by FAO (1976), and illustrated in Figure 1, will serve as the foundation for the suitability classifications at the crop- soil property level. The maize suitability analysis completed here includes pH, cation exchange capacity (CEC), organic carbon, salinity (soil electrical conductivity), and plot slope (percent). 11 After identifying the suitability class of each soil property individually, based on the property- specific critical values borrowed from Naidu (2006) and further reviewed and modified with input from the World Agroforestry Centre, we utilize a fuzzy membership method to construct a membership grade for each suitability class, allowing for identification of the suitability class that best approximates the soil sample overall. The fuzzy membership method is commonly employed in land suitability analysis with GIS data (Ahamed et al., 2000; Ceballos-Silva & López-Blanco, 11 Multiple variations of the soil suitability framework were created, each containing a different combination of key soil properties. The selected framework that was chosen based on its superior predictive power in bivariate regression on yields measured via crop-cutting. 10 2003; Hall et al., 1992; Kahsay et al., 2018; Kalogirou, 2002). This method is also applicable to the plot-level MAPS data, however, as the data includes precise measures of soil parameters that are often extrapolated from lower resolution geospatial data. In this paper, the unit of analysis is the plot rather than the pixel as in geospatial analysis. The fuzzy membership method, drawn heavily from Ahamed et al. (2000) and Hall et al. (1992), begins with an identification of the similarity, or Euclidean distance, between the vector of soil properties on each plot, x, and the representative vector for a given suitability class. After normalizing values over the interval [0,10] for each property to eliminate unit-sensitivity (following Hall et al., 1992), the “distance measure” is constructed as follows: 2 (1) �, � = ��� – � =1 where: = (1 , 2 , … , ) is the vector of soil parameters on a given plot; and = (1 , 2 , … , ) is the representative vector of soil properties that corresponds to suitability class, c. Equation 1 results in a distance measure for each suitability class, where a higher score reflects greater divergence (less similarity) between the properties on a given plot and the respective suitability class. Subsequently, a membership grade is computed for each suitability class, which indicates the relative fit of a given plot to the specific class, ranging from zero to one, allowing for comparison of fit across suitability classes. 1 (̅ , ̅ ) (̅ ) = (2) , where m = 4 (the total number of suitability classes) 1 ∑=1 (̅ , ̅ ) Equation 2 results in a plot-level membership grade for each suitability class based on a given crop’s representative vectors for each class. Each plot is then assigned the overall suitability class of that with the highest membership grade. It is important to note that the method above assumes equal weights for each of the soil properties, which may be a strong assumption considering agronomic needs. However, in the absence of literature upon which to anchor unequal weighting of soil properties for the Ugandan context, we utilize the equal weighting approach and leave exploration of alternative weighting schemes to future work. 11 In summary, the distance measure is an absolute measure of the difference between the soil properties on a given plot and a specific suitability class, while the membership grade is a relative score, ranging from zero to one, indicating the relative fit of a plot into each suitability class. The membership grades for S1, S2, S3, and N, therefore, sum to one for each plot. Two separate suitability class assignments are constructed for each agricultural plot: one derived from MAPS plot-level soil sample results and one from AFSIS geospatially-derived soil properties. 4.2.Econometric Modeling of Production Frontiers Aigner, Lovell, and Schmidt (1977) lay out the potential problems in minimizing the sum of squares of a simple production function, such as Cobb Douglas, in estimating the maximum output for a given level of inputs. The authors argue that this method of estimation inadequately explains observed deviations from the maximum output for given levels of inputs. In their proposed stochastic frontier model, they explain the variation in deviations from the modeled maximum output, or the production frontier, and predict an observation-level measure of technical inefficiency. Much of the literature on stochastic frontier models assumes a translog production function, in which inputs into the production function are also interacted (see Greene (2008), Sherlund et al. (2002), and Ekbom and Sterner (2008)). This can, however, result in an explosion of parameters to be estimated in the case of many inputs, such as in agricultural models. Rather than the translog function, we assume a log-linear Cobb Douglas model, following the seminal work of Aigner, Lovell, and Schmidt (1977) and the agricultural examples set forth by Deininger et al. (2007), Kilic et al. (2009), and others. The estimated stochastic frontier model is as follows: (3) ln( ) = + � ln( ) + =1 (4) = − where is total maize grain output (in kilograms) on plot i, and and parameters to be estimated. X is a vector of traditional economic inputs, including land area, household and hired labor inputs, and inorganic fertilizer usage. 12 As this is a rain-fed agricultural system, rainfall is also controlled for. Plot-specific flowering season rainfall was computed as total rainfall during the 5th to 8th dekads following the onset of seasonal rainfall, using CHIRPS timeseries precipitation 12 Following Sherlund, Barrett, and Adesina (2002), zero values for hired labor and use of inorganic fertilizer were transformed logarithmically as follows: ln(0) = ln(strictly positive sample minimum/10). 12 data (Funk et al., 2015). 13 The distance measure (from highly suitable soil), , is included in X for analysis conducted on the full sample, while the membership grades, , are used to disaggregate the full sample into suitability-class sub-samples upon which the estimations are conducted separately (discussed in more detail below). Because both pure stand and intercropped plots are included in the sample, a dummy for the cropping pattern and a continuous variable for the seeding rate are also included in the X vector. 14 Inclusion of indicator variables for administrative districts were attempted but were found to be problematic due to correlation with rainfall and other covariates. The error term, , is disaggregated into a symmetric disturbance term, , and a non-negative disturbance, . The symmetric disturbance is assumed to be independently and identically 2) distributed with (0, . It is assumed to be independent of and results from measurement error, climate-related shocks that affect production, and other exogenous shocks. The non-negative term, , represents the technical efficiency of the household cultivating the plot, or the distance from the potential production frontier. It is assumed to be from truncated normal distribution, 2 ), (0, with a zero-lower bound (Aigner, Lovell, and Schmidt, 1977). Furthermore, is modeled as a linear function of variables that are believed to explain a household’s technical efficiency or ability (Deininger et al., 2007; Kilic et al., 2009): (5) = + � + =1 Zi is a J-vector of covariates used to explain technical efficiency, which includes plot manager age, an indicator for the manager’s attainment of primary education, an indicator for whether the plot manager received agricultural extension services, the dependency ratio of the household, and the number of agricultural assets owned by the household. Uncertainty around climatic factors, which may influence farmer behavior with respect to farming practices, is proxied by the coefficient of variation of flowering season rainfall (over the period 1999 – 2014), which is also included in Zi. Additional controls were initially included, such as gender of the plot manager and seasonal rainfall shocks, but due to correlation with other covariates and lack of explanatory power in linear regressions, these were ultimately excluded. Indicators of access to credit and public infrastructure could theoretically be included in Z, although data on these covariates are not available for this 13 The onset of the season is defined following the Water Requirement Satisfaction Index (WRSI), such that the season begins when at least 25mm of rain falls in one dekad, and a total of at least 20mm of rain falls in the subsequent two dekads (documentation available here: https://goo.gl/sgmTK8). 14 Pure stand plots are those on which only maize is grown. Intercropped plots are those on which maize and at least one other crop is grown. The “seeding rate” included here is a ratio of the quantity of maize seed used on the plot to the quantity of seed that would have been used had the farmer planted only maize. The seeding rate is, therefore, bounded (0,1] and equals 1 for all pure stand plots. The seeding rate is included in addition to the dummy variable for cultivation pattern because it is believed that some combinations of crops could improve potential maize yields. 13 sample. The error term, , is assumed to be of a truncated normal distribution, with mean zero and truncated at −( + ∑ =1 ), such that remains non-negative. Technical efficiency and the parameters from Equation 3 are estimated jointly using maximum log likelihood. The model, which substitutes Equations 4 and 5 into Equation 3, is estimated four times for each of the MAPS-based and AFSIS-based soil suitability measures: (i) including all plots and controlling for soil suitability with the inclusion of the distance measure from suitability class S1; (ii) including only plots classified as highly suitable (S1); (iii) including only plots classified as moderately suitable (S2); and, (iv) including only plots classified as marginally suitable (S3). Analysis is implemented on a sub-sample basis, rather than in a single model with suitability indicators, as it allows for the estimation of suitability-class specific production frontiers. Technical efficiency (TE) scores are computed based on the conditional distribution of given , following Battese and Coelli (1988), whereby the technical efficiency on plot i is defined as: (6) = [exp (− )| ] The technical efficiency scores are then used to compute potential production and productivity for the given level of inputs. 15 5. Results 5.1.Maize-Specific Suitability Measures The fuzzy set membership method described above was implemented using both the MAPS plot- level soil samples and the AFSIS geospatially-derived soil data. The results of the maize-specific soil suitability classification exercise are summarized in Table 3. Using the MAPS plot-level soil samples as the basis for classification, 13 percent of plots are considered highly suitable, the majority (75 percent) considered moderately suitable, and the remaining 12 percent of plots considered only marginally suitable. Note that classification into a specific group does not suggest that the plot-level soil properties fit that category in full. Rather, they are most closely aligned with that class relative to the other classes. No plots were classified as not at all suitable, in line with expectations as these are all maize-growing plots. 15 Potential output is computed as: observed output/ technical efficiency score. 14 Table 3. Soil Suitability Classification Summary Row-wise summary; binary variables, where 1 indicates soil property falls within given suitability class. MAPS AFSIS S1 S2 S3 N S1 S2 S3 N pH 0.35 0.64 0.01 - 0.00 0.99 0.01 - CEC 0.15 0.08 0.35 0.43 0.14 0.08 0.75 0.03 Organic Carbon (%) 0.27 0.37 0.34 0.02 0.38 0.51 0.11 0.00 Salinity (ECD) 1.00 - - - 1.00 - - - Slope (%) 0.23 0.29 0.27 0.21 0.23 0.29 0.27 0.21 Overall Class 0.13 0.75 0.12 - 0.07 0.88 0.06 - N 106 634 100 0 57 736 47 0 Benchmarked against the preferred method of plot-level soil testing, there is evidence that the geospatial data fails to adequately distinguish soil suitability levels. AFSIS-based classification results in more intense clustering of observations in the central, moderately suitable class (88 percent), with only 7 percent classified as highly suitable and 6 percent as marginally suitable. While 81 percent of observations are mapped to the same suitability class regardless of whether MAPS or AFSIS soil data informs the classification, the suitability class of 19 percent of plots varies with the source of soil data utilized (see Figure 2), which will have economic implications for the perceived production frontiers for these households. MAPS S1 S2 S3 S1 34 23 3 AFSIS S2 72 607 57 S3 0 4 43 Figure 2. Suitability Classification Matrix; MAPS- and AFSIS-based classifications. A unique feature of using this method is the ability to identify specific constraints to suitability for each class. According to MAPS-based suitability classifications, cation exchange capacity is an overarching limiting factor, with 43 percent of plots being classified as not suitable for that particular property. The perceived constraints differ when AFSIS-based classifications are made. For example, AFSIS-based classifications suggest that only 3 percent of plots have not suitable levels of cation exchange capacity (compared to 43 percent when using MAPS-based data). Annex Table A1 identifies the limiting factors for each MAPS-based suitability class separately, enabling an assessment of what interventions would be most effective in increasing the suitability of plots from one level to the next. 15 The suitability classifications are consistent with expectations with respect to agricultural productivity. Figure 3 illustrates the distribution of maize yields (kg/ha) by suitability class. Using MAPS-based classifications, highly suitable plots realized an average of 1,614 kg/ha, while moderately and marginally suitable plots realized an average of 1,015 and 828 kg/ha, respectively (1,497 kg/ha, 1,053 kg/ha, and 789 kg/ha using AFSIS-based classifications). Panel A: MAPS Panel B: AFSIS .0008 .0008 .0006 .0006 Density Density .0004 .0004 .0002 .0002 0 0 0 2000 4000 6000 0 2000 4000 6000 Maize kg/ha Maize kg/ha S1 S2 S3 S1 S2 S3 Figure 3. Productivity by Suitability Class; MAPS- and AFSIS-based classifications. S1, S2, and S3 indicate highly suitable, moderately suitable, and marginally suitable soil for maize production, respectively. Note that existence of highly suitable soil does not, in itself, result in high maize yields, but rather increases the upper bound and incidence of high maize yields. 5.2. Estimation of Production Frontiers The results of the stochastic frontier analyses are reported in Table 4. The MAPS- and AFSIS- based overall estimations offer a fairly consistent understanding of output elasticities for most variables, suggesting that on average the geospatial soil data may be comparable, and potentially a substitute for, plot-level soil data. However, several findings temper the enthusiasm for the use of geospatial data as an acceptable substitute for plot-level soil data. The coefficients on soil suitability in these overall specifications, measured as the distance from the S1 vector, are both negative, thereby suggesting that the closer a plot’s soil is to the optimal, the greater its production, as expected. The MAPS-based soil data exhibits a stronger relationship with production than does the AFSIS data, and the OLS regression replicating the production function portion of the stochastic frontier model (see Annex Table A2) reveals that the coefficients on soil suitability (as 16 measured by distance from S1) are statistically different across MAPS and AFSIS specifications. 16 Seasonal rainfall is only a moderately statistically significant input into production when controlling for plot-level soil suitability, but it is a strong and significant predictor in the AFSIS model. The latter finding comes contrary to expectations, as the plot-level soil data does not directly incorporate climatological variables while the AFSIS data does. A final notable difference in output elasticities across the overall specifications is that of inorganic fertilizer. Under the AFSIS model the application of inorganic fertilizer does not yield any statistically significant gains in production, contrary to expectations and findings in the MAPS-based model. This raises the first caution against the use of geospatial data as a substitute for plot-level soil data. 16 A test of the difference in coefficients across the overall MAPS and AFSIS specifications, columns 1 and 2 in Annex Table A2, indicates that the coefficients (-0.082 and -0.050, respectively) are significantly different from each other at the 1 percent level. The test of difference in coefficients is implemented by the execution of a seemingly unrelated estimation of the two OLS models, followed by a Wald test of equality of the specific coefficients. 17 Table 4. Stochastic Frontier Analysis Dependent Variable = Log of Total Maize Grain Production (KG) Overall S1 S2 S3 MAPS AFSIS MAPS MAPS AFSIS MAPS Highest Membership Grade Distance from S1 -0.075*** -0.060*** Plot Area (Hectares, Logged) 0.993*** 0.988*** 1.018*** 0.977*** 0.995*** 1.094*** Household Labor Days (Logged) -0.048 -0.050 -0.121* -0.038 -0.045 -0.080 Hired Labor Days (Logged) 0.003 0.007 0.023 0.005 0.001 0.001 Inorganic Fertilizer (KG, Logged) 0.023* 0.015 0.008 -0.009 -0.014 0.024 KG of Seed Planted (Logged) 0.080** 0.081** 0.041 0.125*** 0.114*** -0.199** Intercropping Rate (Logged) 0.200*** 0.194*** 0.087 0.174** 0.167** 0.110 Pure Stand Maize° 0.206*** 0.233*** 0.374** 0.253*** 0.283*** 0.169 Flowering Season Rainfall (mm, logged) 0.376* 0.737*** † 0.27 0.504* -0.734 Constant 5.072*** 2.939** 7.851*** 5.017*** 3.825** 11.506*** Technical Inefficiency Manager Completed Primary Education° -0.011 -0.001 -0.167 0.010 0.01 -0.100 Manager Age -0.008 -0.007 -0.037 0.008 0.003 -0.076 Manager Age (squared) 0.000 0.000 0.000 0.000 0.000 0.001* Manager Received Agricultural Extension Services° † † 0.284 0.076 0.066 0.570* Dependency Ratio -0.029 -0.026 0.004 -0.016 -0.029 -0.297** Count of Agricultural Assets -0.019 -0.019 -0.017 -0.021 -0.02 -0.042 CV of Flowering Season Rainfall (1999-2014) 1.303 1.047 -1.836 1.048 2.006** † Constant 6.262*** 6.337*** 3.859** 6.387*** 5.850*** 3.586*** Random Error Term (v) Constant -1.949*** -1.857*** -3.140*** -1.807*** -1.718*** -2.033*** N 840 840 106 634 736 100 Notes: Standard errors clustered at EA level; † = insufficient variation for inclusion of variable or high correlation with other covariates; ° = Binary variable; *** p<0.01, ** p<0.05, * p<0.1 18 Due to the small sample sizes and evidence of inconsistent matching of suitability classification relative to the MAPS plot-level soil sample classification, results are not reported in Table 4 for highly suitable (S1) or marginally suitable (S3) categories for AFSIS. Analysis on these categories, available in annex tables for all tables from this point forward, revealed highly-sensitive results contrary to expectations, including those expectations rooted in agronomic science, and, depending on the specification, implied potential yields far greater than agronomically feasible (for S1 plots). These unattainable potential yields, which are a function of low technical efficiency scores, support the descriptive finding that the geospatial data fails to sufficiently and consistently distinguish between maize-specific soil suitability relative to ground-based measures. 17 Coefficients in the overall specification are generally in line with those reported for moderately suitable (S2) plots, both for MAPS- and AFSIS-based specifications. While AFSIS results for S1 and S3 are not reported here due to the sensitivity and inconsistency of findings (see Annex Table A3 for full reporting), there are takeaways from the comparison of MAPS S1 and S3 categories. Most apparent is the insignificant effect of cultivation pattern on S3 plots. In this category, neither the binary indicator on pure stand cultivation nor the intercropping seed rate is significant, suggesting that production on S3 plots is unchanged with cultivation pattern. On S1 plots, however, returns to cultivating pure stand maize are positive and significant, and higher in magnitude than the returns to pure stand cultivation on S2 plots. Related to seeding, the S3 specification exhibits a negative and significant coefficient on quantity of seed planted. A negative coefficient on seed application is contrary to expectations. One potential explanation could be over-seeding on these lower-suitability plots in an attempt to encourage greater production, while serving only to crowd out successful plants. Across all specifications reported, technical inefficiency is largely unexplained by observable factors. While not statistically significant, coefficients on the technical inefficiency predictors are generally in the expected direction (Table 4). For example, the manager’s completion of primary school and count of agricultural assets both exhibit negative (but insignificant) coefficients in the overall specification, thereby hinting at a reduction in technical inefficiency. In the MAPS S2 specification, manager education has a positive (and insignificant) coefficient, but its value very near zero. The coefficient on manager’s receipt of agricultural extension services is positive (but insignificant in all but MAPS S3 specifications), suggesting that use of extension services is correlated with increased technical inefficiency. It is conceivable, however, that this relationship is driven by those with lower technical efficiency self-selecting into the use of extension services. Only in MAPS S3 do we see statistically significant coefficients on household or manager characteristics: that on the household dependency ratio which suggests that technical inefficiency 17 The technical efficiency scores derived from the AFSIS-based suitability classifications are sensitive to the variables in the technical efficiency model. When both the MAPS- and AFSIS-based stochastic frontier models are implemented without any controls in the technical efficiency specification, MAPS technical efficiency scores remain fairly stable while AFSIS technical efficiency scores vary widely for S1 and S3 classes. 19 on S3 plots is reduced in households with a greater dependency ratio, likely through labor channels; and that on the use of extension services. Uncertainty around rainfall patterns, proxied by the coefficient of variation in flowering season rainfall, has no significant relationship with technical efficiency in the majority of specifications. It does, however, exhibit a positive and statistically significant (at the 10 percent level) coefficient for AFSIS S2 plots, suggesting that increased uncertainty results in decreased technical efficiency. The lack of predictive power of observable household and manager characteristics on technical inefficiency is consistent with the results of separate productivity analysis conducted via ordinary least squares regression (available in Annex Table A4), in which none of the household or manager characteristics has a statistically significant relationship with productivity, with the exception of agricultural asset counts. It is conceivable that the use of crop-cutting-based production measurement, rather than farmer-estimated production as is most commonly utilized in smallholder agricultural analysis, results in a reduced effect of these manager characteristics. By using objectively measured production data, we eradicate the noise and/or bias associated with the plot manager’s estimate of production, which may be correlated with manager’s education, experience, exposure to extension services, etc. However, there is little empirical evidence to support this theory, at least within this particular dataset, as Gourlay et al. (2019) find that observable manager characteristics have very little explanatory power in yield overestimation (measured as self-reported production-based yield minus crop-cutting-based yield). Stochastic frontier analysis was also executed on subpopulations of interest, including plots with female managers, plots with male managers, and pure stand plots. Results of these analyses are available in Table 5. As the sample sizes of these subpopulations are small, it is difficult to draw conclusions on the S1 and S3 classifications so only the overall and S2 classifications are reported. Output elasticities across plot manager gender are similar, but male managed plots experience greater returns to soil suitability and only male managed plots have positive and significant returns to inorganic fertilizer application on average. Additionally, the impact of weather-related uncertainty varies by plot-manager gender, with only male-managed plots exhibiting a positive and significant relationship between uncertainty and technical inefficiency (while the converse is true with respect to current season rainfall inputs in the production function). Output elasticities on pure stand plots are in line with those observed across the full sample, with the exception that inorganic fertilizer application has greater impact on the production of pure stand plots. 20 Table 5. Stochastic Frontier Analysis on Select Subpopulations Dependent Variable = Log of Total Maize Grain Production (KG) PURE STAND PLOTS Overall S2 MAPS AFSIS MAPS AFSIS Highest Membership Grade Distance from S1 -0.088*** -0.071*** Plot Area (Hectares, Logged) 1.040*** 1.019*** 1.046*** 1.036*** Household Labor Days (Logged) -0.077* -0.082* -0.031 -0.069 Hired Labor Days (Logged) -0.005 -0.004 -0.011 -0.019 Inorganic Fertilizer (KG, Logged) 0.046*** 0.039** 0.017 0.018 KG of Seed Planted (Logged) 0.032 0.03 0.042 0.052 Intercropping Rate (Logged) - - - - Pure Stand Maize° - - - - Flowering Season Rainfall (mm, logged) -0.069 0.476 -0.07 0.189 Constant 9.015*** 5.830*** 8.193*** 6.905*** Technical Inefficiency Manager Completed Primary Education° 0.012 0.025 0.059 0.068 Manager Age † † † † Manager Age (squared) † † † † Manager Received Agricultural Extension Services° 0.045 0.074 -0.059 0.021 Dependency Ratio † † † † Count of Agricultural Assets 0.001 0.001 0.006 0.011 CV of Flowering Season Rainfall (1999-2014) -2.442 -3.691* -3.671 -2.858 Constant 7.068*** 7.284*** 7.181*** 6.990*** Random Error Term (v) Constant -2.047*** -1.919*** -1.910*** -1.748*** N 367 367 274 323 - continued next page - 21 Table 5. Stochastic Frontier Analysis on Select Subpopulations (continued) Dependent Variable = Log of Total Maize Grain Production (KG) FEMALE-MANAGED PLOTS MALE-MANAGED PLOTS Overall S2 Overall S2 MAPS AFSIS MAPS AFSIS MAPS AFSIS MAPS AFSIS Highest Membership Grade Distance from S1 -0.056*** -0.039** -0.084*** -0.073*** Plot Area (Hectares, Logged) 0.935*** 0.939*** 0.802*** 0.960*** 1.018*** 1.009*** 1.020*** 1.009*** Household Labor Days (Logged) -0.072 -0.071 0.178* -0.057 -0.037 -0.043 -0.044 -0.047 Hired Labor Days (Logged) 0.016 0.024 0.024 0.005 0.001 0.000 -0.003 -0.003 Inorganic Fertilizer (KG, Logged) -0.008 -0.016 -0.069 -0.038* 0.034*** 0.026* 0.003 -0.006 KG of Seed Planted (Logged) 0.128** 0.131** 0.224*** 0.195*** 0.045 0.044 0.070 0.067 Intercropping Rate (Logged) 0.058 0.011 0.039 -0.051 0.308*** 0.351*** 0.306*** 0.350*** Pure Stand Maize° 0.108 0.153 0.277* 0.222* 0.264*** 0.265*** 0.288*** 0.314*** Flowering Season Rainfall (mm, logged) 0.768** 1.038*** 0.534 0.878** 0.031 0.448 -0.154 0.220 Constant 3.282 1.826 1.784 2.541 6.581*** 3.969** 6.968*** 4.697** Technical Inefficiency Manager Completed Primary Education° -0.064 -0.052 18.151*** -0.018 -0.013 -0.009 0.063 0.024 Manager Age -0.008 -0.006 -0.308* 0.018 -0.007 -0.007 † † Manager Age (squared) 0.000 0.000 0.005** 0.000 0.000 0.000 † † Manager Received Agricultural Extension Services° 0.205 0.209 3.786* 0.130 0.038 0.061 -0.011 -0.001 Dependency Ratio -0.079 -0.078 -13.581*** -0.057 † † † † Count of Agricultural Assets -0.031* -0.034* -0.490*** -0.044* -0.012 -0.011 -0.010 -0.006 CV of Flowering Season Rainfall (1999-2014) -0.114 -0.149 † † 2.344* 1.966 2.970** 2.926** Constant 6.447*** 6.335*** -13.098*** 5.349*** 6.231*** 6.399*** 5.984*** 6.039*** Random Error Term (v) Constant -2.099*** -2.053*** 0.029 -1.925*** -1.984*** -1.869*** -1.853*** -1.751*** N 324 324 259 286 516 516 375 450 Notes: Standard Errors clustered at EA level; † = insufficient variation for inclusion of variable or high correlation with other covariates; ° = Binary variable; *** p<0.01, ** p<0.05, * p<0.1 22 5.3.Technical Efficiency, Potential Gains, and Economic Implications Technical efficiency scores, representing the distance from the potential production frontier, are computed in accordance with Equation 6. Figure 4 presents the distribution of technical efficiency scores under each suitability class for both MAPS- and AFSIS-based suitability classifications while Table 6 summarizes the scores, the potential production (in kilograms), and the potential yields (kilograms/hectare). As with the stochastic frontier analysis, AFSIS results are only reported for the overall sample and S2 classification (see Annex Table A5 for technical efficiency scores of S1 and S3 classes). Farmers cultivating MAPS-based S3 plots exhibit the highest technical efficiency scores, indicating they are operating most closely to the production frontier given their soil suitability. Farmers cultivating S1 and S2 plots have higher realized yields and lower technical efficiency scores, suggesting they have the potential to achieve the most significant gains in their maize yields vis-a-vis their already superior observed levels. T-tests reveal that while the technical efficiency scores of S1 and S2 plots are not different to a statistically significant degree, the scores between S1 and S3, and between S2 and S3, are statistically different at the 10 percent level. The resulting potential output per plot and potential output per hectare are significantly different at the 1 percent level, between all classifications. The gap between mean realized yield and potential yield is 1,394 kg/ha, or 86 percent of the realized mean yield, on S1 plots. S2 plots only have the potential to increase yields by 69 percent on average, while S3 plots are constrained to 59 percent yield growth from the 828 kg/ha realized average. Figure 5 presents the potential yield gains for each of the MAPS-based classifications and AFSIS S2 plots, illustrating the greater level and percentage increase in potential yields for more highly suitable plots. Table 6. Technical Efficiency and Productivity Potential MAPS AFSIS Overall S1 S2 S3 Overall S2 Technical Efficiency 0.53 0.51 0.53 0.58 0.53 0.53 Potential Output (KG) 288.00 599.90 264.30 161.00 289.04 282.66 Potential Yield (KG/Ha) 1,804 3,009 1,714 1,315 1,811 1,754 Mean Output (KG) 176 333 162 97 176 175 Mean Yield (KG/Ha) 1,068 1,614 1,015 828 1,068 1,053 Potential-Mean Yield Difference 736 1,394 699 487 743 702 Difference as a % of Mean Yield 69% 86% 69% 59% 70% 67% N 840 106 634 100 840 736 23 Figure 4. Technical Efficiency by Suitability Class 3,000 2,000 kg/ha 1,000 0 MAPS AFSIS MAPS MAPS 1 2 3 Observed Yield Potential Yield Figure 5. Potential Yield Gains by Suitability Class 24 Table 7. Season-Specific Potential Production Gains at Frontier Attainment MAPS Suitability Class Wealth Tercile Household Head Total S1 S2 S3 1 2 3 Male Female Plot Level Means Plot Area 0.146 0.182 0.144 0.121 0.125 0.161 0.152 0.151 0.126 Observed Yield 1068 1614 1015 828 1005 1046 1153 1091 981 MAPS-Based Potential Yield 1804 3009 1714 1315 1749 1822 1919 1860 1714 Production Gain if Attaining Productivity Frontiers: MAPS Class-Specific Frontier (kg) 108 254 101 59 93 125 116 116 92 (USD)* 46 110 44 25 40 54 50 50 40 MAPS S1 Frontier (kg) 283 254 287 263 250 317 282 290 256 (USD)* 123 110 124 114 108 137 122 125 111 GAEZ High-Input Frontier for S1 Class (kg) 1050 1210 1043 897 906 1164 1080 1084 918 (USD)* 454 523 451 388 392 503 467 469 397 Household Level Means † Total Household Maize Area (Ha) 0.283 0.283 0.286 0.264 0.238 0.297 0.314 0.296 0.232 Production Gain if Attaining Productivity Frontiers: MAPS Class-Specific Frontier (kg) 208 394 200 129 177 230 241 228 170 (USD)* 90 171 87 56 77 99 104 99 73 MAPS S1 Frontier (kg) 549 394 571 576 477 582 583 568 470 (USD)* 237 171 247 249 206 252 252 246 203 GAEZ High-Input Frontier for S1 Class (kg) 2035 1880 2075 1962 1729 2140 2233 2124 1686 (USD)* 880 813 897 848 747 925 965 918 729 N 840 106 634 100 280 280 280 667 173 Notes: † = Total household maize area estimated via multiple imputation founded on self-reported plot area estimates and other observable characteristics; * Monetary values estimated based on a FEWSNET price of UGX 1450 per kilogram of maize grain and converted at an exchange rate of 1 USD : 3354 UGX. 25 To assess the potential production and income gains under various frontier attainment scenarios, Table 7 presents potential gains in terms of both kilograms and monetary values, for the specific plot as well as household level estimates. Household estimates are derived based on total area cultivated with maize by the household, which itself is imputed via multiple imputation methods to adjust for bias in self-reported area estimates. 18 A key assumption underlying the household level estimates is that the household experiences the same level of productivity on all maize plots. The reported USD values of potential maize production gains are based on the November 2015 Famine Early Warning Systems Network (FEWSNET) Uganda Price Bulletin kilogram unit price, and do not account for any additional expenditures required to attain the increased level of productivity. 19 If suitability-class-specific productivity gaps were closed such that all farmers were operating on their respective MAPS-based productivity frontier, the best-case scenario given soil constraints, households could produce an additional 208 kg, or USD 90, per bi-annual agricultural season, on average. 20 Households classified as S3 under MAPS, assuming the suitability level is the same across all maize plots in the household, only have the potential to earn an additional USD 56 per agricultural season, while those in the highly suitable (S1) category can earn USD 171 more. Reaching this MAPS-based class-specific frontier has asymmetric benefits for female- and male- headed households, with male-headed households earning USD 99 on average and female-headed households only USD 73. The benefits of operating at this frontier also differ across the wealth distribution, with those in the poorest tercile having the potential to earn USD 77 per season, USD 28 less than those in the richest tercile. If soil constraints were addressed such that all households were able to operate at the S1 frontier, households on average could increase production by 549 kg, or USD 237, per bi- annual agricultural season. In this optimistic scenario, asymmetric benefits are still observed, particularly by gender of the household head where female-headed households still, despite reaching the S1 frontier, stand to gain less production than male-headed households. Finally, the MAPS stochastic frontier analysis and the resulting production frontier are anchored in the current management practices and technology set used by farmers. We make an additional 18 While the plots selected for soil sampling were measured with handheld GPS devices, all other plots were only measured via farmer self-reported estimate. Following the evidence and methodology set forth by Kilic et al. (2017b), we use multiple imputation to impute GPS area measurements based on a number of observable characteristics including the farmer reported area. Multiple imputation was conducted by linear regression model including the covariates found in Annex Table A6, with 50 imputations conducted. Note that the farmer reported area did not significantly differ across plots that did and did not have a GPS measurement. The mean GPS-based area of the plots which have an original GPS measure is 0.146 hectares (n=840). The mean of the imputed-GPS measures is 0.148 hectares (n=2437, which includes all cultivated plots regardless of crop). 19 The FEWSNET unit price for maize grain is approximately 1450 UGX (https://goo.gl/JFs8d7), or 0.432 USD (using an exchange rate of 1 USD: 3354 UGX, a historical rate from November 30, 2015 from https://goo.gl/HHkfVu). 20 AFSIS-based production comparisons are excluded from this particular analysis due to the low incidence and inconsistent suitability classifications found in the S1 and S3 categories. 26 simulation to capture the potential economic gains that can be achieved by operating within a hypothetical, high input use scenario, as depicted in the geospatial data on potential agricultural yields that are disseminated by the Global Agro-Ecological Zone (GAEZ) initiative. The geospatial, modeled GAEZ data on potential crop yields factor in crop growth cycles, climate factors such as rainfall and temperature, soil moisture levels, geospatially-derived soil properties, among other factors. For each MAPS plot, the GAEZ potential maize yield under the high-input scenario was extracted based on the GPS coordinates of the plot centroid. 21 If the GAEZ high- input potential (as estimated for S1 plots) yields were to be attained on the sampled plots, the economic gains would increase dramatically, with households producing up to USD 880 per season on average. In this scenario, those in the richest tercile would enjoy greater gains over their current production levels than those in the poorest tercile. Female-headed households, on average, if operating at GAEZ high-input potential and irrespective of factors other than land quality and climatic covariates, still suffer from lower gains than male-headed households, with potential gains capped at USD 729 per agricultural season compared to USD 918. 6. Conclusions In this paper we estimate a multi-dimensional measure of maize-specific soil suitability based on existing standards, across a sample of approximately 900 households, spanning 4 districts in Eastern Uganda, the leading maize-producing region in the country. This is made possible by collecting and laboratory-testing plot-level soil samples following international best practices in the context of a methodological household survey experiment. In addition to the plot-level soil data, analysis of maize-specific soil suitability is replicated using publicly available geospatial soil data. This research provides a greater understanding of both the heterogeneous productivity constraints and the potential maize-based production and income gains, across crop-specific soil suitability profiles. Classifying the sampled plots into three suitability classes, namely highly-suitable, moderately- suitable, and marginally-suitable, and leveraging plot-level crop-cutting-based maize yields allows for comparison of the distributions of observed maize yields by suitability class. We then extend this analysis by estimating stochastic frontier models of maize yield separately for each suitability class, using both the MAPS-based and AFSIS-based suitability classifications, to understand differences in (i) returns to factors of production, (ii) technical efficiency, and (iii) potential yield measures. Compared to observed yields, the potential yield estimation provides a unique overview of maximum yield gains that can be achieved in each suitability class by increasing the efficiency with which the current set of inputs into agricultural production are utilized. Pairing the household 21 Note that the GAEZ data is spatially derived, and therefore does not possess the same level of granularity as the MAPS-based data. For example, the GAEZ potential yields under the high input regime are not statistically different across any of the MAPS-based soil suitability classifications. 27 survey data with potential yield estimates from the FAO’s GAEZ database allows for an estimate of production gains if the technology set, or intensity of input use, is dramatically improved. The results clearly illustrate the production penalties for cultivating maize on land that is not highly suitable for maize production, particularly when using MAPS plot-specific soil samples. The use of AFSIS geospatially-derived soil data provided a close approximation to the results of the MAPS-based results on the overall sample but failed to distinguish between soil suitability classes to the same degree as, and in a consistent manner with, the MAPS plot-level soil data. The MAPS- based analysis reveals that farmers cultivating only marginally suitable land are operating with higher technical efficiency and, thus, have less room for improvement than farmers cultivating more agronomically suitable land, given the condition of their soil. This result has implications for agriculture-based poverty reduction and food security policies. Effectively, by cultivating maize on land that is only marginally suitable rather than highly suitable, farmers limit their production potential by as much as 1,694 kg/ha, or 129 percent. Extrapolating the potential yields to the household level, based on multiply-imputed total maize area per household, suggests that given the current set of inputs and soil constraints households only have the potential to increase the value of production by USD 90 per bi-annual season. Assuming equal production in both agricultural seasons, and given the average household size of 6.12 persons, this translates into a gain of USD 0.08 per capita per day on average, not considering additional expenditures that may be required to reach that production frontier. For those cultivating marginally-suitable soils, they can hope to earn only an additional USD 0.05 per capita per day. If soil constraints were addressed such that all households operated on highly-suitable soils, potential gains would increase to USD 0.21 per capita per day on average. Enhancing the technology set and achieving the GAEZ high- input use potential yield on highly-suitable soil would increase gains to USD 0.79 per capita per day. Although these estimates of potential economic gains from agricultural production are only for maize, maize makes up 66 percent of cultivated land across all households. The findings hint that realizing agricultural production potential alone, given the current set of inputs and soil constraints, may not be sufficient for significant welfare gains. In order for agriculture to act as a key mechanism for poverty reduction, policies can include (i) significantly boosting the quantity and quality of inputs used by smallholder farmers, and (ii) implementing crop-specific agricultural interventions based on high-resolution soil data with the aim of increasing crop-specific soil suitability. Addressing specific soil deficiencies that render the land sub-optimally-suitable for a given crop, which can be identified with this dataset for example, can result in gains in agricultural productivity and associated income. Future research may include the analysis of interventions aimed at triggering a shift from one soil-suitability class to the next, and whether, considering the costs required for the shift, that would result in net gains for smallholder farmers. Additional work, in partnership with agronomic specialists, should be conducted on the application of an unequal weighting scheme in the suitability classification framework to confirm 28 whether the results are sensitive to the application of equal weights to individual soil properties in this context. From a methodological perspective, the experience of the MAPS study highlights the analytical value of integrating objective soil measurement into household surveys, while at the same time shedding light on the scalability of the current approach. The adoption of these methods in large- scale household surveys, including those conducted by national statistical offices, will likely require, or at least benefit from, more scalable tools, such as in situ soil sensors (including handheld devices) that provide real-time measures of soil attributes during the fieldwork and increase the timeliness of data collection while reducing reliance on laboratories and overall costs. However, these tools require validation in the field, especially for use in smallholder farming systems. A related approach to facilitate the cost-effective scale-up of objective soil measurement in household surveys is through reliance on sub-sampling and imputation. In this case, soils can be objectively analyzed for an intelligently-selected sub-sample of agricultural plots and imputation methods can be leveraged to predict soil attributes for the remainder sample, with a model that is trained on the sample with objective soil measures, complementary survey data (including subjective assessment of soil characteristics) and publicly available geospatial soil data. The validation of this approach can too be a focus of future methodological research. To the extent that more scalable approaches are developed and integrated into recurrent household surveys in low- income contexts, including longitudinal surveys, the resulting data would not only enhance the scope and accuracy of the research based on these data, but also inform downstream remote sensing applications, including on soil mapping, that would benefit from georeferenced, ground-truth measures of soil attributes. 29 References Abd-Elmabod, S.K., Bakr, N., Muñoz-Rojas, M., Pereira, P., Zhang, Z., Cerdà, A., Jordán, A., Mansour, H., De la Rosa, D. and Jones, L. (2019). Assessment of soil suitability for improvement of soil factors and agricultural management. Sustainability, 11(6), p.1588. Aguilar, A., Carranza, E., Goldstein, M., Kilic, T., & Oseni, G. (2015). Decomposition of gender differentials in agricultural productivity in Ethiopia. Agricultural Economics, 46(3), 311– 334. Ahamed, T. N., Rao, K. G., & Murthy, J. S. R. (2000). GIS-based fuzzy membership model for crop-land suitability analysis. Agricultural Systems, 63(2), 75–95. Aigner, D.J.; Lovell, C.A.K.; Schmidt, P. (1977) Formulation and estimation of stochastic frontier production functions. Journal of Econometrics, 6:21–37. Azzarri, C., Zezza, A., Haile, B., & Cross, E. (2015). Does livestock ownership affect animal source foods consumption and child nutritional status? Evidence from rural Uganda. Journal of Development Studies, 51(8), 1034-1059. Battese, George, E., and Timothy Coelli. (1988). “Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data.” Journal of Econometrics, vol. 38 (3), pages 387-399. Berazneva, J., McBride, L., Sheahan, M., and Guerena, D. (2018). Empirical assessment of subjective and objective soil fertility metrics in East Africa: implications for researchers and policymakers. World Development, 105, pp. 367-382 Carletto, C., Aynekulu, E., Gourlay, S., and Shepherd, K. (2017). Collecting the dirt on soils: advancements in plot-level soil testing and implications for agricultural statistics. World Bank Policy Research Working Paper No. 8057. Ceballos-Silva, A., & López-Blanco, J. (2003). Evaluating biophysical variables to identify suitable areas for oat in Central Mexico: a multi-criteria and GIS approach. Agriculture, Ecosystems & Environment, 95(1), 371–377. Davis, B., Di Giuseppe, S., and Zezza, A. (2017). “Are African households (not) leaving agriculture? Patterns of households’ income sources in rural Sub-Saharan Africa.” Food Policy, 67, pp.153-174. Deininger, Klaus, W., Calogero Carletto, and Sara Savastano. (2007). "Land Market Development and Agricultural Production Efficiency in Albania." European Association of Agricultural Economists 104th Seminar. Dillon, B., & Barrett, C. B. (2017). Agricultural factor markets in Sub-Saharan Africa: An updated view with formal tests for market failure. Food Policy, 67, 64-77. Ekbom, A., & Sterner, T. (2008). Production Function Analysis of Soil Properties and Soil Conservation Investments in Tropical Agriculture. Environment for Development Discussion Paper Series. FAO. (1976). A Framework For Land Evaluation. In Soils Bulletin (p. 72). Rome: FAO. Gollin, D., & Udry, C. (2021). Heterogeneity, measurement error, and misallocation: Evidence from African agriculture. Journal of Political Economy, 129(1), 1-80. 30 Gourlay, S., Aynekulu, E., Carletto, C., and Shepherd, K. (2017). “Spectral soil analysis & household surveys: a guidebook for integration.” Living Standards Measurement Study (LSMS) Guidebook, World Bank: Washington, DC. Gourlay, S., Kilic, T., & Lobell, D. B. (2019). A new spin on an old debate: Errors in farmer- reported production and their implications for inverse scale-Productivity relationship in Uganda. Journal of Development Economics, 141, 102376. Greene, W. H. (2008). The Econometric Approach to Efficiency Analysis. In H. O. Fried, C. A. K. Lovell, & S. S. Schmidt (Eds.), The Measurement of Productive Efficiency and Productivity Change (pp. 92–250). Oxford University Press. Hall, G. B., Wang, F., & others. (1992). Comparison of Boolean and fuzzy classification methods in land suitability analysis by using geographical information systems. Environment and Planning A, 24(4), 497–516. Jain, R., Chand, P., Rao, S. C., & Agarwal, P. (2020). Crop and soil suitability analysis using multi-criteria decision making in drought-prone semi-arid tropics in India. Journal of Soil and Water Conservation, 19(3), 271-283. Kahsay, A., Haile, M., Gebresamuel, G., & Mohammed, M. (2018). Land suitability analysis for sorghum crop production in northern semi-arid Ethiopia: Application of GIS-based fuzzy AHP approach. Cogent Food & Agriculture, 4(1), 1507184. Kalogirou, S. (2002). Expert systems and GIS: an application of land suitability evaluation. Computers, Environment and Urban Systems, 26(2–3), 89–112. Kilic, T., Carletto, C., Miluka, J., & Savastano, S. (2009). Rural nonfarm income and its impact on agriculture: evidence from Albania. Agricultural Economics, 40(2), 139-160. Kilic, T., Palacios-López, A., & Goldstein, M. (2015). Caught in a Productivity Trap: A Distributional Perspective on Gender Differences in Malawian Agriculture. World Development, 70, 416–463. Kirk, A., Kilic, T., & Carletto, C. (2018). Composition of household income and child nutrition outcomes evidence from Uganda. World Development, 109, 452-469. Lobell, D. B., Cassman, K. G., & Field, C. B. (2009). Crop yield gaps: their importance, magnitudes, and causes. Annual Review of Environment and Resources, 34, 179-204. McCarthy, N., Kilic, T., Brubaker, J., Murray, S., & de la Fuente, A. (2021). Droughts and floods in Malawi: impacts on crop production and the performance of sustainable land management practices under weather extremes. Environment and Development Economics, 26(5-6), 432-449. Michler, J. D., Baylis, K., Arends-Kuenning, M., & Mazvimavi, K. (2019). Conservation agriculture and climate resilience. Journal of Environmental Economics and Management, 93, 148-169. Naidu, L. G. K. (2006). Manual, soil-site suitability criteria for major crops. National Bureau of Soil Survey and Land Use Planning, ICAR. Oseni, G., Corral, P., Goldstein, M., & Winters, P. (2015). Explaining gender differentials in agricultural production in Nigeria. Agricultural Economics, 46(3), 285–310. 31 Palacios-López, A., & López, R. (2015). The Gender Gap in Agricultural Productivity: The Role of Market Imperfections. Journal of Development Studies, 51(9), 1175–1192. Sheahan, M., & Barrett, C. B. (2017). Ten striking facts about agricultural input use in Sub- Saharan Africa. Food Policy, 67, 12-25. Shepherd, K. D., & Walsh, M. G. (2002). Development of reflectance spectral libraries for characterization of soil properties. Soil Science Society of America Journal, 66(3), 988– 998. Sherlund, S. M., Barrett, C. B., & Adesina, A. A. (2002). Smallholder technical efficiency controlling for environmental production conditions. Journal of Development Economics, 69(1), 85–101. Slavchevska, V. (2015). Agricultural production and the nutritional status of family members in Tanzania. Journal of Development Studies, 51(8), 1016-1033. Udry, C. (1996). Gender, Agricultural Production, and the Theory of the Household. Journal of Political Economy, 104(5), 1010–1046. Uganda Bureau of Statistics (UBOS), 2021. Uganda National Household Survey 2019/2020. Kampala, Uganda; UBOS. Wineman, A., Mason, N. M., Ochieng, J., & Kirimi, L. (2017). Weather extremes and household welfare in rural Kenya. Food Security, 9(2), 281-300. 32 7. Annex Tables Table A1. Maize Suitability Constraints, by MAPS-Based Suitability Classification Row-wise summary; binary variables where 1 indicates soil property falls within given class. S1 Plots S1 S2 S3 N pH 0.44 0.55 0.01 0.00 CEC 0.62 0.33 0.05 0.00 Organic Carbon (%) 0.92 0.08 0.00 0.00 Slope (%) 0.39 0.39 0.16 0.07 N = 106 S2 Plots S1 S2 S3 N pH 0.35 0.64 0.01 0.00 CEC 0.09 0.04 0.41 0.46 Organic Carbon (%) 0.20 0.43 0.35 0.02 Slope (%) 0.24 0.32 0.31 0.12 N = 634 S3 Plots S1 S2 S3 N pH 0.27 0.73 0.00 0.00 CEC 0.00 0.03 0.30 0.67 Organic Carbon (%) 0.02 0.30 0.62 0.06 Slope (%) 0.00 0.00 0.10 0.90 N = 100 33 Table A2. OLS Regression of Production Function Portion of Stochastic Frontier Model Dependent Variable = Log of Total Maize Grain Production (KG) Overall S1 S2 S3 MAPS AFSIS MAPS MAPS AFSIS MAPS Highest Membership Grade Distance from S1 -0.082*** -0.050*** Plot Area (Hectares, Logged) 0.912*** 0.912*** 1.130*** 0.859*** 0.906*** 1.117*** Household Labor Days (Logged) 0.066 0.053 0.007 0.087 0.055 -0.041 Hired Labor Days (Logged) -0.008 -0.005 -0.021 -0.002 -0.013 0.002 Inorganic Fertilizer (KG, Logged) 0.029 0.019 0.027 -0.010 0.000 0.036 KG of Seed Planted (Logged) 0.162*** 0.164*** -0.153 0.255*** 0.206*** -0.260* Intercropping Rate (Logged) 0.302** 0.303** 0.803 0.216 0.312** 0.115 Pure Stand Maize° 0.298*** 0.317*** 0.294 0.381*** 0.336*** 0.104 Flowering Season Rainfall (mm, logged) 0.529 0.887** † 0.611 0.664* -2.008** Constant 2.219 0.003 3.950* 1.126 0.773 17.730*** N 840 840 106 634 736 100 R2 0.529 0.513 0.674 0.494 0.507 0.523 Notes: Standard Errors clustered at EA level; † = insufficient variation for inclusion of variable or high correlation with other covariates; ° = Binary variable; *** p<0.01, ** p<0.05, * p<0.1 34 Table A3. Stochastic Frontier Analysis on AFSIS-Based Suitability Classes Dependent Variable = Log of Total Maize Grain Production (KG) Overall S1 S2 S3 AFSIS-Based Highest Membership Grade Distance from S1 -0.044** Plot Area (Hectares, Logged) 0.961*** 0.936*** 0.995*** 0.982*** Household Labor Days (Logged) 0.079* 0.035 -0.045 -0.233 Hired Labor Days (Logged) 0.015 0.090*** 0.001 -0.028 Inorganic Fertilizer (KG, Logged) -0.001 0.024*** -0.014 0.007 KG of Seed Planted (Logged) 0.106** -0.09 0.114*** 0.066 Intercropping Rate (Logged) 0.195* 0.151*** 0.167** -0.119 Pure Stand Maize° 0.314*** 0.212* 0.283*** 0.102 Flowering Season Rainfall (mm, logged) 0.741* † 0.504* -0.527 Constant 2.614 7.080*** 3.825** 11.735 Technical Inefficiency Manager Completed Primary Education° -0.727 -1.182** 0.01 -1.554** Manager Age 0.057 -0.458** 0.003 0.108 Manager Age (squared) 0.000 0.004* 0.000 -0.001 Manager Received Agricultural Extension Services° -0.518 -0.779 0.066 † Dependency Ratio -0.378 -0.980*** -0.029 -0.442 Count of Agricultural Assets 0.080 0.088 -0.02 -0.478 CV of Flowering Season Rainfall (1999-2014) 26.466** -14.882*** 2.006** † Constant -10.537* 16.759*** 5.850*** 0.038 Random Error Term (v) Constant -0.272 -33.589*** -1.718*** -3.352* N 840 57 736 47 Notes: Standard Errors clustered at EA level; † = insufficient variation for inclusion of variable or high correlation with other covariates; ° = Binary variable; *** p<0.01, ** p<0.05, * p<0.1 35 Table A4. Productivity Analysis with Maize Suitability Indicators OLS, dependent variable = log of maize yields (kg/ha) Dep. Variable: log of maize yields (kg/ha) MAPS AFSIS Maize Suitability Highest category (S1 comparator) S2 -0.542*** -0.472*** -0.332 -0.283 S3 -0.687*** -0.693*** -0.581* -0.623** Membership grade of S1 1.393*** 1.270*** Distance measure (to S1) -0.091*** -0.055*** Plot Characteristics Log plot area (GPS, ha) -0.047 -0.041 -0.052 -0.048 -0.041 -0.057 Log distance plot - dwelling (GPS, km) -0.015 -0.022 0.001 -0.021 -0.028 -0.019 Pure stand° 0.311*** 0.318*** 0.321*** 0.335*** 0.338*** 0.333*** Log intercropping seed rate 0.362** 0.349** 0.323* 0.338* 0.334* 0.328* Cover crops present before planting° 0.112 0.121 0.147 0.126 0.114 0.139 Log maize seed planted (kg) 0.140* 0.13 0.137* 0.154* 0.150* 0.152* Used inorganic fertilizer° -0.032 -0.024 0.024 -0.075 -0.074 -0.03 Log household labor days 0.090 0.097 0.098 0.086 0.088 0.088 Log hired labor days -0.005 -0.004 -0.005 -0.003 -0.003 -0.002 Flowering season rainfall (2015, mm) 0.43 0.377 0.570 0.764 0.875 0.971 CV of flowering season rainfall (1999-2014) 1.399 1.405 1.213 1.842 1.087 1.591 Edge Effect: % of subplot within 4m of plot edge 0.137 0.134 0.141 0.142 0.147 0.141 Household Characteristics Agricultural asset count 0.028* 0.029** 0.028** 0.029** 0.030** 0.029** Dependency ratio 0.046 0.048 0.047 0.053 0.051 0.051 Log HH Size -0.143 -0.148 -0.133 -0.154 -0.158 -0.144 Manager Characteristics Manager is respondent° -0.002 0.006 0.01 -0.003 -0.004 -0.006 Manager received extension services° -0.069 -0.073 -0.071 -0.107 -0.1 -0.098 Manager is female° 0.057 0.063 0.045 0.056 0.058 0.051 Log manager age (years) -0.114 -0.118 -0.147 -0.122 -0.13 -0.136 Log manager education (years) 0.017 0.0240 0.018 0.024 0.027 0.023 District Mayuge 0.06 0.048 0.08 0.075 0.054 0.116 Serere -0.209 -0.217 -0.25 -0.236 -0.177 -0.185 Sironko 0.01 -0.032 0.16 0.032 -0.039 0.173 Constant 7.037*** 2.298 1.81 1.96 6.869*** 0.301 -0.703 -0.734 N 840 840 840 840 840 840 840 840 2 R 0.023 0.106 0.113 0.124 0.007 0.095 0.096 0.097 Notes: Standard errors clustered on EA; ° = Binary variable; *** p<0.01, ** p<0.05, * p<0.1 36 Table A5. Technical Efficiency and Productivity Potential for AFSIS-Based Suitability Classes AFSIS Overall S1 S2 S3 Technical Efficiency 0.53 0.47 0.53 0.35 Potential Output (KG) 289.04 512.54 282.66 212.85 Potential Yield (KG/Ha) 1,811 3,016 1,754 2,164 Mean Output (KG) 176 275 175 68 Mean Yield (KG/Ha) 1,068 1,497 1,053 789 Potential-Mean Yield Difference 743 1519 702 1376 Difference as a % of Mean Yield 70% 101% 67% 174% N 840 57 736 47 Table A6. OLS Regression Guiding Multiple Imputation of Plot Areas Dependent Variable = GPS-Based Plot Area Measurement (Hectares) Self-reported plot area (ha) 0.331*** Walking time to parcel > 30 mins° 0.000 Parcel is leased in° 0.011 Parcel was purchased° -0.013 Parcel has coffee trees° -0.013 Self-reported parcel area (ha) 0.004 Number of plots on farm -0.003 Household head is female° -0.016 Household head is married (monogamous or polygamous)° -0.006 Household head age (years) 0.000 Household head education (years) 0.001 Dependency Ratio 0.004 Household Size 0.004** Wealth Index 0.012* District Mayuge -0.004 Serere 0.051*** Sironko 0.009 Constant 0.070*** N 840 2 R 0.283 Notes: Standard Errors clustered at EA level; ° = Binary variable; *** p<0.01, ** p<0.05, * p<0.1 37