Sub-Saharan Africa Reference Manual for Harmonizing Household Surveys The World Bank 1818 H Street, N.W. Washington, D.C. 20433, USA. April 2019 Table of Contents Chapter 1: Overall harmonization guidelines ...................................................................................4 1.1 Datalibweb ....................................................................................................................................... 5 1.1 Folder and file structure .................................................................................................................... 6 1.4 Guidelines across modules ................................................................................................................ 6 1.4 Qcheck.............................................................................................................................................. 7 Chapter 2: Module P – Poverty-related Variables.............................................................................8 Table 2.1 Sample, Geography and Basic Household Identifier ................................................................... 9 Table 2.2 Consumption Expenditure Variables ....................................................................................... 12 Chapter 3: Module H – Household-Level Variables ......................................................................... 18 Table 3.1 Sample, Geography and Basic Household Identifier ................................................................. 18 Table 3.2 Housing and Utilities .............................................................................................................. 19 Table 3.3 Access to Social Amenities...................................................................................................... 28 Table 3.4 Ownership of Durable Assets.................................................................................................. 29 Table 3.5 Household Remittances ......................................................................................................... 32 Chapter 4: Module I - Individual-Level Variables ............................................................................ 35 Table 3.0 Sample, Geography and Basic Household Identifier................................................................. 35 Table 3.1 Basic Demographic Characteristics.......................................................................................... 35 Table 3.2 Literacy and Education ........................................................................................................... 40 Table 3.5 Migration............................................................................................................................... 46 Chapter 5: Module L - Labor Force Variables .................................................................................. 49 Table 4.0 Sample, Geography and Basic Household Identifier................................................................. 49 Table 4.1 Household Chores .................................................................................................................. 50 Table 4.2 Labor Screening Questions last 7-days .................................................................................... 52 Table 4.3 Primary Employment last 7-days ............................................................................................ 54 Table 4.4 Secondary Employment last 7-days ........................................................................................ 60 Table 4.5 Employment last 12-months................................................................................................... 65 Annex I: The Methodology of constructing employment variables ................................................ 68 Annex II: International Standard Industrial Classification of All Economic Activities (ISIC) .............. 74 ISIC Rev. 4.0 categories ......................................................................................................................... 74 ISIC Rev. 3.1 categories ......................................................................................................................... 77 Annex II: International Standard Classification of Occupations (ISCO) ........................................... 79 Annex III: Data checks and evaluation .......................................................................................... 80 (a) Validation and certification ........................................................................................................... 80 2 (b) Sources of error reviews ................................................................................................................ 82 (i) Sampling errors ................................................................................................................................................... 83 (ii) Non-sampling errors ............................................................................................................................................ 83 Annex IV: ISO 3166-1 Alpha-3 Country Codes (Sub-Saharan Africa) ............................................... 84 CHAPTER 1: OVERALL HARMONIZATION GUIDELINES As Sub-Saharan African economies become more open and globalized, huge opportunities are created for individuals and families. Yet, a large fraction of households has not benefited sufficiently, and economic and social inequality is real and, in some cases, growing. Household surveys are a data source providing rich information on living standards and the impact of economic changes on individuals and households. Unfortunately, this source of information is largely underutilized due to the complexity of household surveys and the significant time required to prepare the survey data for analytical work. The Sub-Saharan Team for Statistical Development (SSATSD) seeks to eliminate the bottleneck of analyzing household survey data by extracting about 200 variables from existing household surveys and ensuring that have the same definition and variable names. These variables include household consumption, access to infrastructure (water, electricity, etc.), employment status, education, and health. Invariably, in each survey, questions will be asked in a different manner, which poses challenges to consistently define harmonized variables. The harmonized household survey data presents the best available variables with harmonized definitions. This manual presents detailed guidelines for harmonizing household survey data into a set of commonly defined variables that are available in most types of household surveys. To ensure the quality and transparency of the final harmonized data, it is critical to document the harmonization process and check the final data for quality concerns. This assures that the results can be replicated from the original household survey data with ease and that the final data provides reliable temporal and cross-country comparisons. Four harmonized modules are prepared for each survey. Each of these modules contain a theme of harmonized variables that have the same variable names and definitions. The four harmonized modules are: 1. Module P: Poverty-related variables: This module contains consumption variables, regional identifiers, spatial/temporal prices indices, variables indicating national poverty lines, and variables indicating whether households are classified as poor. 2. Module H: Household-level variables (except for poverty-related variables): This module contains information on housing amenities, ownership of assets, access to infrastructure and services, and household remittances. 3. Module I: Individual-level variables (except labor force variables): This module contains basic characteristics of individuals such as age, sex, literacy, education, and migration status. 4. Module L: Labor force variables: This module contains information on labor force variables, such as labor force status, industry, sector of employment, wages, etc. 4 1.1 DATALIBWEB In order to ensure the transparency and replicability of the harmonized data, a strict method or organizing folders and files is used. This method ensures that different versions of harmonizations are kept track of, and that users and future members of the harmonization team can run the harmonization .do-files without changing file paths. The method applied for directory organization and file name conventions follows a practice adopted across regions and implemented through datalibweb. Datalibweb is a data system specifically designed to enable users to access the most up to date versions of non-harmonized (original/raw) and harmonized datasets of different collections across Global Practices. It can easily perform computations relevant for poverty and shared prosperity analysis based on the micro data from different harmonized collections: EAPPOV, ECAPOV, MNAPOV, SARMD, SEDLAC, SSAPOV, and the global collection GPWG. Datalibweb can be installed in two ways 1. Directly from Stata: In order to get install to Datalibweb command in Stata, type the following code, and click on the datalibweb (hyperlink) to install in your computer. • Close all Stata sessions • Enter this line in Stata “net from http://eca/povdata/datalibweb/_ado“ 2. Manual installation: In addition, users can install the package the manual way. • Get the file from this link: http://eca/povdata/datalibweb/_ado/datalibweb.zip • Copy with replacement all the files into c:/ado, without changing the folder structure. Once datalibweb is installed, and access to data has been granted, all raw data for a survey can be access with the following command: datalibweb, country(CCC) year(YYY) type(SSARAW) surveyid(SURVEYNAME) clear, where CCC stands for ISO 3 letter country code (see Annex III), YYYY is the survey year according to IHSN standards, which is when the fieldwork started, and SURVEYNAME is the survey acronym. When harmonizing surveys, harmonizers should always load data this way through datalibweb. This assures that no local file paths are used to load the data, and thus that other individuals who have access to the raw data can run the .do-files. All documents related to a survey, such as questionnaires and technical reports, can be accessed through the following command: datalibweb, country(CCC) year(YYY) type(SSARAW) surveyid(SURVEYNAME) request(doc) Once a harmonization is done, the final harmonized files will be stored in datalibweb and can be accessed through the following command: datalibweb, country(CCC) year(YYY) type(SSARAW) surveyid(SURVEYNAME) mod(MODULENAME), where MODULENAME takes the value, P, H, I or L. 1.1 FOLDER AND FILE STRUCTURE The back-end of datalibweb contains a very specific folder structure and file naming convention. Although we do not work in these folders directly when working with data, it is useful for each harmonizer to copy the folder structure locally. As such, before harmonizing a survey using this manual, the harmonizer should first create sub-directories as instructed below. Additionally, all harmonization files must be named per this manual. This rigorous procedure is to ensure a seamless integration with datalibweb and that different versions of the harmonizations are kept track of. All harmonizers will get assigned a folder on a server, \\WBGMSAFR1001\AFR_Database\SSAPOV- Harmonization, with his/her name. This should be the parent directory from which all harmonizations are saved and from which all work is conducted. This folder should contain subfolders with the ISO3 country codes of the countries with which the harmonizer is working. Within each country-folder, there should be a folder with the name CCC_YYYY_SURVEYNAME for each of the surveys the harmonizes has been working on. For example, if a person is working on harmonizing the 2015 HICES survey of Ethiopia, then all material related to this should be saved in this path: \\WBGMSAFR1001\AFR_Database\SSAPOV-Harmonization\[Name of harmonizer]\ETH\ETH_2015_HICES\. This folder should also be the saved as a global in the beginning of each .do-file. Each survey-specific folder should have two subfolders with the following content: • “01.Programs”: This folder should contain the 4 .do-files used to construct each module, respectively. Each .do-file should match to a module, and there should be no .do-file except for the four used to generate the modules. If some preliminary data cleaning is needed, this should be included in the other .do-files. The .do-files should not call each other or any other .do-files. • “02.Output”: This folder should contain the 4 .dta-files with the harmonized modules All .do-files that do the harmonization, and each .dta that contains a harmonized module should be named according to the following convention. CCC_YYY_SURVEYNAME_v0x_M_v0y_A_SSAPOV_MODULENAME.do CCC_YYY_SURVEYNAME_v0x_M_v0y_A_SSAPOV_MODULENAME.dta Here “v0x” is the version of the raw data. This will almost always be v01, but if errors were fou nd in the original data and a new version of data is received from the National Statistical Office, then it will be called v02, etc. “v0y” is the version of the harmonized data. This will often be v01, but if an error is found in harmonized data, and the .do-file needs to be updated then the new .do-file and .dta-file will carry v02, etc. All of this assures that any team member can run the .do-file without any changes and code and that, if the path becomes outdated, only one line of code needs to be changed. 1.4 GUIDELINES ACROSS MODULES A number of harmonization guidelines are applicable across the four modules: • In all .do-files, to the extent possible, the variables should be created and coded in the order that they appear in this manual. 6 • Frequently, surveys do not have information on all variables that we seek to harmonize. In this case, the variables should still be created as missing such that all variables appear in all modules. • In the P and H-moudle it is important that the household identifier hid uniquely identifies observations. That is isid hid should not return an error. Likewise, it is important in the I and L-modules that hid and pid uniquely identifies observations. That is isid hid pid should not return an error. An implication of this is that hid (and pid in the I and L-modules) should have no missing values. • The exact same households should appear in all four modules. As a user, it is confusing and can be frustrating when all households do not merge between different modules. If some households do not exist in the P module, but they do exist in the H, I or L module, then they should be removed from the H, I and L modules. If some households exist in the P module but not in the H, I, or L module, then they should be created (with missing values) in the H, I and L module. In general, the households that should appear should be the ones that are used for national poverty estimation. Although a few country specific cases may not be able to follow this rule, it should apply in general. • Any critical assumption that is made in the course of the harmonization should be stated clearly in the .do-file • For each module a labelling .do-file exists. At the end of each .do-file this labelling .do-file should be inserted. This ensures that all variables have the exact same labels and formats across surveys. The labelling .do-file also create a few variables that are a function of some of the other harmonized variables. 1.4 QCHECK Once a module is harmonized, a quality check will be performed on the harmonized data. A quality check program called qcheck, has been created to this end. Qcheck tests if all variables are in the dataset, if all variables have the correct format, if the variables take plausible values, and if some of the variables are mutually inconsistent. For example, qcheck will test if the age variable in the I-module takes any negative values of values above 120, both of which indicate an error. It will also flag if someone is coded to have no education in one education variable, but have completed secondary education in another variable. CHAPTER 2: MODULE P – POVERTY-RELATED VARIABLES The most common measures used for living standards are consumption and income. Income refers to actual earnings from productive activities and transfers while consumption refers to resources consumed. While income may be used as an indicator to measure welfare, it is not ideal in countries where the majority of the population works in informal sectors, such as small business, work on land, etc., as net income becomes very difficult to measure in these cases. In addition, for self-employed incomes may be zero or negative for a given period, even though these individuals could have wealth to draw upon. In these cases, income is a poor proxy for welfare. Consumption is therefore thought to provide a better picture of a household’s standard of living than a measure of current income. For these reasons, the vast majority of countries in Sub-Saharan Africa use consumption to measure poverty. The p-module contains a list of variable related to consumption, such as it’s breakdown by food and non-food consumption, consumption per capita and per adult equivalent, as well as indicators for whether a household’s consumption falls short of the poverty line. There are limitations of household surveys in measuring household consumption: - • A household survey is an instrument relying mostly on self-reported data and on household members’ memory. This latter makes the estimates heavily dependent on the length of the recall period. • Although consumption is the best household welfare indicator, it is impossible to distinguish between consumption and expenditure (a bulk purchase could cause overestimation of household welfare). What was bought may also not necessarily be consumed by households in its entirety and thus it becomes difficult to separate consumption and expenditure. • The duration of the recall period may lead to under- or over-estimation of the reported data and expenditure consumption surveys should be designed to envisage such a problem. • A perennial issue relating to national income in any country has been the difference between the System of National Accounts (SNA) Statistics and National Sample Survey estimates on consumption expenditure. The SNA private household consumption expenditure is available as a macro estimate and a scalar for the nation as a whole while the National Sample Survey consumption estimates are available separately by different sub-groups such as provinces, rural and urban areas among others, which can be aggregated to derive a national estimate. The estimates of private consumption from these two sources are different, primarily as these are derived from different concepts and estimation approaches. Consumption aggregates are not comparable across households if prices differ across time and space. For this reason, a lot of effort goes into adjusting the consumption aggregates temporally and spatially. The P- module contains several variables trying to document whether spatial and or temporal deflation was used for a particular survey, both for purposes of national poverty estimation and for purposes of international poverty estimation. 8 TABLE 2.1 SAMPLE, GEOGRAPHY AND BASIC HOUSEHOLD IDENTIFIER No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 harmonization Type of harmonization String variable Should equal SSAPOV. This variable is automatically generated in the labeling file: gen harmonization = “SSAPOV” 2 country Country code String variable 3-character length (Annex IV) 3 survey Type of survey String variable Specifies the type of survey. Possible names are: HBS, LSMS, IS, CWIQ, etc. Upper-case letters should be used. 4 survey_coverage Survey coverage Numeric variable 1 = National 2 = Urban 3 = Rural 4 = Other 5 usemicrodata Use of microdata Numeric variable 0 = Grouped 1 = Micro 6 year_IHSN 4-digit year of survey Numeric variable based on IHSN standards This is the start year of survey based on the IHSN standards. It should be identical to the year used for file-naming purposes. 7 region1 Subnational ID – highest String variable level This variable should contain the first-level administrative divisions of a country. It should contain numeric entries in string format using the following naming convention: “1 – Hatay” (as string). The code below shows how to turn a numeric variable with labels into the format required: gen region1="" qui levelsof inputvar, local(lev) foreach cc of local lev { cap loc la_`cc': label(inputvar) `cc' if !_rc { qui replace region1="`cc' - `la_`cc''" if inputvar ==`cc' } } This link may be helpful in terms of identifying the right variables. 8 region2 Subnational ID – second String variable highest level This variable should contain the second-level administrative divisions of a country. It should contain numeric entries in string format using the following naming convention: “1 – Hatay” (as string). The code below shows how to turn a numeric variable with labels into the format required: gen region2="" qui levelsof inputvar, local(lev) foreach cc of local lev { cap loc la_`cc': label(inputvar) `cc' if !_rc { qui replace region2="`cc' - `la_`cc''" if inputvar ==`cc' } } This link may be helpful in terms of identifying the right variables 9 region3 Subnational ID – third String variable highest level This variable should contain the third-level administrative divisions of a country. It should contain numeric entries in string format using the following naming convention: “1 – Hatay” (as string). The code below shows how to turn a numeric variable with labels into the format required: gen region3="" qui levelsof inputvar, local(lev) foreach cc of local lev { cap loc la_`cc': label(inputvar) `cc' if !_rc { qui replace region3="`cc' - `la_`cc''" if inputvar ==`cc' } } This link may be helpful in terms of identifying the right variables 10 lev_agg Level at which data are Numeric variable disaggregated for analysis Lowest level at which data is considered to be in country representative. It usually corresponds to the lowest level of poverty reporting areas. It will often (but not always) be identical to one of the region variables but in numeric format with labels rather than in string format. 11 strata Strata Numeric variable 12 rururb Area of residence Numeric variable 0 = Rural Each country defines this jurisdiction according to a 1 = Urban certain criterion. In transition economies where ‘semi- urban’ is a recognized category which includes ‘villages of the town type’ this will be collapsed into the ‘urban’ category unless if the country defines 10 these as rural towns. 13 capital Capital/city, other urban Numeric variable and rural classification This is a country-specific variable which may indicate capital city or a different urban/rural classification than the one in rururb. Each numeric code should have a label. 14 cluster Primary sampling unit Numeric variable (enumeration area) Primary sampling unit based on country requirements. 15 hhno Household number Numeric variable Household number 16 hid Household unique String variable identification This variable should uniquely identify observations and cannot be missing, i.e. isid hid should return no error. 17 int_month Month of interview visit Numeric variable The month when the survey questionnaire was administered to the household. 18 int_year Year of interview visit Numeric variable The year when the survey questionnaire was administered to the household. 19 hhsize Household size Numeric variable Total number of residents (regular members). The definition of regular member is country-specific. 20 ctry_adq Adult equivalent scale Numeric variable Definition varies from country to country, as different adult scales exist worldwide. Total number of adult equivalent people in household: • Must be greater 0. • Must be less than or equal to hhsize (household size). Can be provided by the NSO. 21 wta_hh Household weights Numeric variable To obtain household estimates, this is the weight to be used in all computations referring to household- level estimates. This variable cannot be used for poverty estimation. The interpretation is the proportion of households with a certain characteristic is XX%. 22 wta_pop Population weights Numeric variable This variable should be used for poverty estimation. The interpretation is the proportion of individuals with a certain characteristic is XX%. This variable is automatically generated in the labelling file: gen wta_pop = wta_hh*hhsize 23 wta_cadq Adult equivalent weights Numeric variable In a number of countries, this weight is used to derive the proportion of poor population. The interpretation is the proportion of adult equivalent population with a certain characteristic is XX%. This variable is automatically generated in the labelling file: gen wta_cadq = wta_hh*c_adq TABLE 2.2 CONSUMPTION EXPENDITURE VARIABLES No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 welfaretype Type of welfare measure String variable (income, consumption, Specifies the type of welfare aggregate used for expenditure) poverty estimation in a country. “CONS” CONS=consumption “INC” INC=income “EXP” EXP=expenditure 2 fdtexp Purchased and auto- Numeric variable consumption food Country-derived by the NSO. expenditure, nominal (annual) 3 nfdtexp Purchased & auto- Numeric variable consumption non-food Country-derived by the NSO. expenditure, nominal (annual) 4 hhtexp Household food and non- Numeric variable food consumption Country-derived by the NSO. expenditure, nominal This variable is automatically generated in the (annual) labelling file: gen hhtexp = fdtexp+nfdtexp It the raw data does not separate between food and non-food consumption, create this file instead of letting it be created in the labelling file. 5 pc_fd Per capita food Numeric variable consumption Country-derived by the NSO. expenditure, nominal This variable is automatically generated in the (annual) labelling file: gen pc_fd=fdtexp/hhsize 6 pc_hh Per capita food and non- Numeric variable food consumption, Country-derived by the NSO. nominal (annual) This variable is automatically generated in the labelling file: gen pc_hh=hhtexp/hhsize 12 7 padq_fd Per adult equivalent food Numeric variable consumption Country-derived by the NSO. expenditure, nominal This variable is automatically generated in the (annual) labelling file: gen padq_fd = fdtexp/ctry_adq 8 padq_hh Per adult equivalent food Numeric variable and non-food Country-derived by the NSO. consumption, nominal This variable is automatically generated in the (annual) labelling file: gen padq_hh=fdtexp/ctry_adp 9 fdspindex Food spatial price index Numeric variable Country-derived by the NSO. 10 nfdspindex Non-food spatial price Numeric variable index Country-derived by the NSO. 11 spindex Spatial price index Numeric variable Country-derived by the NSO. 12 fdtpindex Food temporal price Numeric variable index Country-derived by the NSO. 13 nfdtpindex Non-food temporal price Numeric variable index Country-derived by the NSO. 14 tpindex Temporal price index Numeric variable Country-derived by the NSO. 15 fdpindex Spatial/temporal food Numeric variable index Country-derived by the NSO. This variable should never be missing. If no separate food spatial/temporal price index is used, set this equal to sptpindex. 16 nfdpindex Spatial/temporal non- Numeric variable food index Country-derived by the NSO. This variable should never be missing. If no separate non-food spatial/temporal price index is used, set this equal to sptpindex. 17 pindex Final spatial/temporal Numeric variable price index Country-derived by the NSO. This variable should be the one used to derive wel_PPP and wel_abs. Should never be missing. If no temporal/spatial deflation is used, generate a column of 1’s. 18 fdtexpdr Purchased and auto- Numeric variable consumption food This variable is automatically generated in the expenditure, deflated labelling file: gen fdtexpdr = fdtexp/fdpindex (annual) 19 nfdtexpdr Purchased & auto- Numeric variable consumption non-food This variable is automatically generated in the expenditure, deflated labelling file: gen nfdtexpdr = nfdtexp/nfdpindex (annual) 20 hhtexpdr Household food and non- Numeric variable food consumption This variable is automatically generated in the expenditure, deflated labelling file: gen hhtexpdr = hhtexp/pindex (annual) 21 pc_fddr Per capita food Numeric variable consumption This variable is automatically generated in the expenditure, deflated labelling file: gen pc_fddr = fdtexpdr/hhsize (annual) 22 pc_hhdr Per capita food and non- Numeric variable food consumption This variable is automatically generated in the expenditure, deflated labelling file: gen pc_hhdr = hhtexpdr/hhsize (annual) 23 padq_fddr Per adult equivalent food Numeric variable consumption This variable is automatically generated in the expenditure, deflated labelling file: gen padq_fddr = fdtexpdr/ctry_adq (annual) 24 padq_hhdr Per adult equivalent food Numeric variable & non-food consumption This variable is automatically generated in the expenditure, deflated labelling file: gen padq_hhdr = hhtexpdr/ctry_adq (annual) 25 wel_abs_deflatio Spatial/temporal Numeric variable n deflation used for national poverty estimation 0 = Neither spatially nor temporally deflated 1 = Spatially deflated 2 = Temporally deflated 3 = Both spatially and temporally deflated 26 wel_abs_pcpadq Per adult equivalent or Numeric variable per capita adjustment used for national poverty estimation 0 = Per capita 1 = Per adult equivalent 27 wel_abs Welfare aggregate used Numeric variable for national poverty This is the welfare aggregate used by the country to estimation (annual) estimate its national poverty. This aggregate can be nominal or spatially/temporally deflated. It should equal one of 14 these four variables: pc_hh, padq_hh, pc_hhdr, padq_hhdr. This variable is automatically generated in the labelling file: gen wel_abs = . if wel_abs_deflation==0 & wel_abs_pcpadq==0 { replace wel_abs = pc_hh } if wel_abs_deflation==0 & wel_abs_pcpadq==1 { replace wel_abs = padq_hh } if inlist(wel_abs_deflation,1,2,3) & wel_abs_pcpadq==0 { replace wel_abs = pc_hhdr } if inlist(wel_abs_deflation,1,2,3) & wel_abs_pcpadq==1 { replace wel_abs = padq_hhdr } 28 wel_fd Food part of welfare Numeric variable aggregate used for This is the food part of the welfare aggregate used national poverty by the country to estimate its national poverty. estimation (annual) This aggregate can be nominal or spatially/temporally deflated. It should equal one of these four variables: pc_fd, padq_fd, pc_fddr, padq_fddr. This variable is automatically generated in the labelling file: gen wel_fd = . if wel_abs_deflation==0 & wel_abs_pcpadq==0 { replace wel_fd = pc_fd } if wel_abs_deflation==0 & wel_abs_pcpadq==1 { replace wel_fd = padq_fd } if inlist(wel_abs_deflation,1,2,3) & wel_abs_pcpadq==0 { replace wel_fd = pc_fddr } if inlist(wel_abs_deflation,1,2,3) & wel_abs_pcpadq==1 { replace wel_fd = padq_fddr } 28 pl_abs National Absolute Numeric variable Poverty line (annual) Country-derived by the NSO. 27 pl_fd National Food Poverty Numeric variable line (annual) Country-derived by the NSO. 29 pl_ext National Hardcore Numeric variable poverty line (annual) Country derived by the NSO. This line may be identical to the food poverty line or may be different. 30 poor_abs Absolute poor based on Numeric variable pl_abs This variable is automatically generated in the 1 = Poor labelling file: gen poor_abs = wel_abs= 1. Includes all rooms used for living, sleeping and eating. Excludes store, bathrooms and kitchens. 3 roofcs Main material used for String variable roof (country specific) This refers to the variable on roof material (if any), as it comes in the survey. If more than one material is used for structure, the dominant material is the information required. The format should be code and value label. For example, “1 - Stone”; “2 - Mud”; etc. 4 roof Main material used for Numeric variable roof This variable must be coded from roofcs. 1 = Thatch - Earth includes adobe, mud. Includes all building (bamboo/grass) technique that relies on earth or mud put over a frame 2 = Earth (adobe, mud, or mixed with other materials for strength. clay) - Thatch includes grass or any form of natural 3 = Wood vegetation for roofing. 4 = Iron/Metal sheets - Iron sheets are processed or galvanized iron or steel 5= sheets. Does not include tins. Concrete/cement/stone - Cement includes concrete and stone blocks. 6 = Tiles/bricks - Tiles/bricks are a thin, flat or slab of hard material and 9 = Other include baked/unbaked bricks made of clay or other human-made building blocks. - Other includes tin from cans, cardboard among others. 5 wallcs Main material used for String variable external walls (country This refers to the variable on external wall material (if specific) any), as it comes in the survey. If more than one material is used for structure, the dominant material is the information required. The format should be code and value label. For example, “1 - Stone”; “2 - Mud”; etc 6 wall Main material used for Numeric variable external walls This variable must be coded from wallcs. 1 = Earth (adobe, mud, clay) 2 = Thatch (bamboo/grass) 3 = Bricks 4 = Wood panels 5 = Iron/metal sheets 6= Concrete/cement/stone 9 = Other 7 floorcs Main material used for String variable floor (country specific) This refers to the variable on floor material (if any), as it comes in the survey. If more than one material is used for structure, the dominant material is the information required. Format should be code and value label. For example, “1 - Stone”; “2 - Mud”; etc 8 floor Main material used for Numeric variable floor This variable must be coded from floor. 1 = Earth (adobe, mud, - Earth includes adobe, mud. clay) - Bricks slab of hard material and include 2 = Bricks baked/unbaked bricks made of clay or other human- 3 = Wood planks made building blocks 4 = Polished wood/tiles - Cement includes concrete and stone. 5 = Cement 9 = Other 9 watercs_type Type of water questions Numeric variable used in the survey This variable records the type of question(s) asked 1 = Drinking water about access to water in the survey. For example, if the 2 = General water survey had a specific question on the water source on 3 = Both drinking water, or on water source on general water, or 4 = Other both. Subsequent question on water will depend on this response. 10 watercs Main source of water String variable (country specific) This refers to the variable on the main water source (if any), as it comes in the survey. If more than one water source, only main source required. In some surveys, drinking water is asked and is differentiated from other water uses. In these cases, use the drinking water source to code this variable. If two sources of water are available (water source during the wet and dry season), use water source during dry season. The reason for using water during the dry season is that the world is experiencing global warming and the climate is changing rapidly. The format should be code and value label. For example, “1 - Pipe”; “2 - Spring”; etc. 11 watercs_d Main source of water String variable during the dry season Question must be explicitly asked in survey on water (country specific) source during the dry season. 20 Labels must be translated to English. If more than one water source, only main source required. In some surveys, drinking water is asked and is differentiated from other water uses. Use the drinking water source to code this variable. For each value label, there should be a space between the hyphen. Format should be code and value label. For example, “1 – Pipe”; “2 – Spring”; etc. 12 water14 Main source of drinking Must be coded from WATERCS. water (14 categories) 1 = Piped water into Piped into dwelling, also called a household connection dwelling is defined as water service pipe connected with in- 2 = Piped water to house plumbing to one of more taps (e.g. in kitchen, yard/plot bathroom, etc.). Privacy is the criterion here. 3 = Public tap or standpipe Piped water to yard/plot, also referred as a yard 4 = Tube well or borehole connection. This is defined as a piped water connection 5 = Protected dug well to a tap placed in the yard or plot but outside the 6 = Protected spring house. 7 = Bottled water 8 = Rainwater Public standpipe refers to water delivered via pipe but 9 = Unprotected spring may or may not be within compound (water point 10 = Unprotected dug shared among households). This refers to public stand- well taps or community water points. 11 = Cart with small tank/drum Tubewell or borehole is a deep hole that has been 12 = Tanker-truck drilled with the purpose of reaching groundwater 13 = Surface water supplies. Boreholes/tubewells are constructed with 14 = Other casing or pipes, which prevent the small diameter hole from caving in and protects the water source from infiltration by run-off water. Water is delivered from a tubewell or borehole through a pump, which may be powered by human, animal, wind or electric, diesel or solar means. Boreholes/tubewells are usually protected by a platform around the well, which leads spilled water away from the borehole and prevents infiltration of run-off water at the well head. Protected dug well is a dug well that is protected from run-off water by a well lining or casing that is raised above the ground level and a platform that diverts spilled water away from the well. A protected dug well is also covered to prevent any infiltration. Protected spring is typically protected from any run-off infiltration by a “spring box”, which is constructed of brick or concrete and is built around the spring so that water flows directly out of the box into a pipe without being exposed to outside pollution. Surface water is water located above the ground and includes lakes, rivers, ponds, streams, canals and irrigation canals. Cart with a small tank/drum refers to water sold by a provider into a community. The types of transportation used include donkey carts, motorized vehicles and other means. Tanker-truck is water trucked into a community and sold from a water truck. The water source unknown. Other includes other water sources not mentioned above. 13 water8 Main source of drinking Wells include springs, boreholes but must be protected water (8 categories) from any possible sources of contamination such as 1 = Piped water (own surface water or seepage. tap) 2 = Public tap or standpipe 3 = Protected well recode water14 (1=1) (2 3=2) (4 5 6=3) (9 10=4) (13=5) 4 = Unprotected well (8=6) (11 12=7) 14=8),gen(water8) 5 = Surface water ta water14 water8 6 = Rainwater 7 = Tanker-truck, vendor 8 = Other 14 waterpipe Household has piped Main water source is piped water which can be within water household, plot or public standpipe. 0 = No 1 = Yes, in premise “Piped” is the condition. 2 = Yes, but not in premise recode water14 (1=1) (2=2) (3=3) (else=0), 3 = Yes, unstated gen(waterpipe) If water14 is missing but you have the information to whether in or outside premise code waterpipe, do not use the red code above. 15 waterimp Household has improved An improved drinking water source, by nature of its water sources construction and design, is likely to protect the source 1 = Yes from outside contamination, from fecal matter. 0 = No 22 Improved drinking water sources include: • Piped water into dwelling, plot or yard • Public tap/stand pipe • Tube well/borehole • Protected dug well • Protected spring and • Rainwater collection On the other hand, unimproved drinking water sources are: • Unprotected drug well, • Unprotected spring, • Cart with small tank/drum, • Tanker truck, • Surface water (river, dam, lake, pond, stream, canal, irrigation channel ad any other surface water), and • Bottled water (if it is not accompanied by another improved source) Source: (WHO & UNICEF, 2010) http://apps.who.int/gho/indicatorregistry/App_Main/vi ew_indicator.aspx?iid=8 recode water14 (1/6 8=1) (else=0),gen(waterimp) 16 adiswat_d Actual distance to main This refers to actual distance to water point (one way) water point (kms) during used by household in kms during the dry season. the dry season If no season is specified, use this variable. By convention: 1 km = 1000 m 1 km = 5/8 mile. If within dwelling, code zero. 17 adiswat_w Actual distance to main This refers to actual distance to water point (one way) water point (kms) during used by household in kms. the wet season By convention: 1 km = 1000 m 1 km = 5/8 mile. If within dwelling, code zero. If no season is specified, code this as missing. 18 atimwat_d Actual time taken to This refers to actual time taken to water point used by main water point (mins) household. during the dry season If roundtrip provided, divide by 2. 19 atimwat_w Actual time taken to This refers to actual time taken to water point used by main water point (mins) household. during the wet season If roundtrip provided, divide by 2. 20 toiletcs Main toilet facility String variable (country specific) Labels must be translated to English. Make sure translation is correct from a language expert. For each value label, there should be a space between the hyphen. Format should be code and value label. For example, “1 – Flush”; “2 – VIP”; etc. 21 toilet14 Main toilet facility (14 Must be coded from TOILETCS. categories) 1 = A flush toilet If several types of toilet are used, only main source 2 = A piped sewer system required. 3 = A septic tank 4 = Pit latrine Helps to identify the efforts that are needed to prevent 5 = Ventilated improved common and basic diseases, in particular water-borne pit latrine (VIP) diseases. 6 = Pit latrine with slab 7 = Composting toilet Flush toilet also referred as a Water Closet (WC) is a 8 = Special case toilet that disposes waste matter by using water to 9 = A flush/pour flush to flush it through a drainpipe to a main sewer or septic elsewhere tank or pit latrine. This excludes a 10 = A pit latrine without • pour flush uses a water seal, but unlike a flush slab toilet, it uses water poured by hand for flushing 11 = Bucket (no cistern is used) 12 = Hanging toilet or • flush toilet to “somewhere else” such a flushed hanging latrine to a river, hanging toilet or some place 13 = No facilities or bush or field Ventilated Improved Pit latrine (VIP): The primary 14 = Other features of VIP latrines consist of an enclosed structure (roof and walls) with a large diameter (110mm), PVC vertical ventilation pipe running outside the structure from the pit of the latrine to vent above the roof. They often will have concrete slabs containing the latrine hole. A composting toilet is a type of dry toilet that uses a predominantly aerobic processing system to treat human excreta, by composting or managed aerobic 24 decomposition. These toilets generally use little to no water and may be used as an alternative to flush toilets. Pit latrine is a simple pit latrine but covered or with a slab. No facility includes, open fields, bush. Other includes bucket, pan, and open/uncovered pit latrines among others. 22 toilet6 Main toilet facility (6 Must be coded from TOILETCS. categories) 1 = Flush toilet 2 = Ventilated Improved Pit (VIP) latrine 3 = Composting toilet 4 = Pit latrine with slab recode toilet14 (1/4=1) (5=2) (7=3) (6=4) (13=5) 5 = No facility (else=9),gen(toilet6) 9 = Other 23 toiletflush Access to flushed toilet Must be asked in survey explicitly. 0 = No 1 = Yes, in premise Do not guestimate. 2 = Yes, but not in premise including public toilet 3 = Yes, unstated whether in or outside premise 24 toiletshared Is toilet facility shared This question must have been asked in the survey. with other households? 1 = Yes If question not asked leave as missing. 0 = No 25 toiletimp Does household have This includes TOILET6<=4 and not shared. access to improved sanitation? An improved sanitation facility is one that likely 1 = Yes hygienically separates human excreta from human 0 = No contact. Improved sanitation facilities include: • Flush or pour-flush to piped sewer system, septic tank or pit latrine, • Ventilated improved pit latrine, • Pit latrine with slab and • Composting toilet Sanitation facilities are not considered improved when shared with other households, or open to public use. While, unimproved sanitation include: • Flush or pour-flush to elsewhere, • Pit latrine without slab or open pit, • Bucket, hanging toilet or hanging latrine and • No facilities or bush or field (open defecation) If question of shared toilet facility is asked, use the variable to recode appropriately. Source: (WHO & UNICEF, 2010) http://apps.who.int/gho/indicatorregistry/App_Main/vi ew_indicator.aspx?iid=9 26 fuelcookcs Main cooking fuel String variable (country specific) If several fuels asked in survey, only main source required. Labels must be translated to English. Make sure translation is correct from a language expert. For each value label, there should be a space between the hyphen. Format should be code and value label. For example, “1 – Electricity”; “2 – Firewood”; etc. 27 fuelcook Main cooking fuel If several fuels asked in survey, only main source 1 = Electricity required. 2 = Gas 3 = Kerosene Firewood includes both purchased and collected. 4 = Charcoal 5 = Firewood Electricity refers to mains, generator and solar energy 9 = Other provided by the government or private entity. Other includes fuel derived from coffee waste, saw dust, crop residue, cow dung among others. 28 fuellighcs Main lighting fuel String variable. (country specific) If several fuels asked in survey, only main source required. Labels must be translated to English. Make sure translation is correct from a language expert. For each value label, there should be a space between the hyphen. 26 Format should be code and value label. For example, “1 – Electricity”; “2 – Firewood”; etc. 29 fuelligh Main lighting fuel If several fuels asked in survey, only main source 1 = Electricity required. 2 = Gas 3 = Kerosene Electricity refers to mains, generator and solar energy 4 = Candles provided by the government or private entity. 9 = Other Other includes fuel derived from coffee waste, saw dust, crop residue, cow dung among others. 30 elecsource Main source of Use both FUELCOOK and FUELLIGH. electricity 1 = Mains FUELLIGH should be the main one to use. 2 = Solar 3 = Generator If electricity source not specified, code “other” but this 4 = Other should be on a country-to-country situation. 5 = No electricity 31 electricity Connection of electricity This specifies access to electricity connection to a main in dwelling from mains grid only (ELECSOURCE=1). only 1 = Yes Note: having an electrical connection says nothing 0 = No about the actual electrical service received by the household in a given country or area. Check consistency by type ta electricity elecsource 32 kitchen Separate kitchen in dwelling 1 = Yes 0 = No 33 bath Bathing facility such as shower or bathtub in the dwelling 1 = Yes 0 = No 34 garbdispcs Garbage and trash String variable disposal (country specific) Labels must be translated to English. Make sure translation is correct from a language expert. For each value label, there should be a space between the hyphen. Format should be code and value label. For example, “1 – Collected”; “2 – Buried”; “3 - Street”; etc. 35 garbdisp Garbage and trash Refers to only garbage or trash generated by disposal household. 1 = Collected 2 = Buried/burned 3 = Discarded in empty lots, street, rivers 9 = Other TABLE 3.3 ACCESS TO SOCIAL AMENITIES In some surveys this may not be available for each household but will be present in the community survey. The distances and time are to the nearest services from the household irrespective of whether the household uses these services. All distances and times refer to single/one way journeys. Please note that all data for distances and time that are not categorized (continuous) are to the nearest 2 decimal places. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 dispsch Distance to nearest One way. elementary/primary school (kms) This refers to distance to nearest primary school in kms. By convention 1 km = 1000 meters 1 km = 5/8 mile If roundtrip provided, divide by 2. This is a continuous variable. If survey question is pre-coded, do not guestimate this into a continuous variable. Leave as missing. 2 timpsch Time taken to nearest One way. elementary/primary school (minutes) This refers to time taken to reach nearest primary school in mins. By convention 1 hr = 60 min. If roundtrip provided, divide by 2. This is a continuous variable. 28 If survey question is pre-coded, do not guestimate this into a continuous variable. Leave as missing. 3 disheal Distance to nearest health One way. facility (kms) This refers to distance to nearest health facility in kms. By convention 1km = 1000 meters 1 km = 5/8 mile If roundtrip provided, divide by 2. This is a continuous variable. If survey question is pre-coded, do not guestimate this into a continuous variable. Leave as missing. 4 timheal Time taken to nearest One way. health facility (minutes) This refers to time taken to reach nearest primary school in mins. By convention 1hr = 60 min. If roundtrip provided, divide by 2. This is a continuous variable. If survey question is pre-coded, do not guestimate this into a continuous variable. Leave as missing. TABLE 3.4 OWNERSHIP OF DURABLE ASSETS No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 radio Ownership of radio Functioning radio includes a radio, radio 1 = Yes cassette, and 3-in-1-radio cassette and 0 = No regardless of what condition the asset is in. 2 television Ownership of television Presence of a functioning television in house 1 = Yes and regardless of what condition the asset is in. 0 = No 3 television_cable Ownership of television Presence of a functioning television cable in cable house and regardless of what condition the 1 = Yes asset is in. 0 = No 4 video Ownership of video Presence of a functioning video in house and 1 = Yes regardless of what condition the asset is in. 0 = No 5 landphone Ownership of landline Presence of a functioning fixed land line (fixed) phone telephone in house and regardless of what 1 = Yes condition the asset is in. 0 = No 6 cellphone Ownership of at least Presence of a functioning cellular in house and one cellular phone regardless of what condition the asset is in. 1 = Yes 0 = No 7 phone Ownership of at least Where not specified as landline or cellphone, phone presence of a functioning land/cellular in house 1 = Yes and regardless of what condition the asset is in. 0 = No 8 fridge Ownership of Presence of a functioning refrigerator in house refrigerator and regardless of what condition the asset is in. 1 = Yes 0 = No Does not include a food freezer 9 sewmach Ownership of sewing Presence of a functioning sewing machine in machine house and regardless of what condition the 1 = Yes asset is in. 0 = No 10 washmach Ownership of washing Presence of a functioning washing machine in machine house and regardless of what condition the 1 = Yes asset is in. 0 = No 11 fan Ownership of fan Presence of a functioning fan in house and 1 = Yes regardless of what condition the asset is in. 0 = No 12 airconditioner Ownership of air Presence of a functioning air conditioner in conditioner house and regardless of what condition the 1 = Yes asset is in. 0 = No 13 computer Ownership of computer Presence of a functioning computer in house. 1 = Yes 0 = No Can be desktop or laptop. 14 etablet Ownership of an Presence of a functioning tablet in house and electronic tablet regardless of what condition the asset is in. 1 = Yes 0 = No 15 stove Ownership of stove Presence of a functioning stove or cooker in 1 = Yes house and regardless of what condition the 0 = No asset is in. 30 16 oxcart Ownership of animal Presence of a functioning animal cart, which is cart used as a means of transport or a farm tool and 1 = Yes regardless of what condition the asset is in. 0 = No 17 bcycle Ownership of bicycle Presence of a functioning bicycle and 1 = Yes regardless of what condition the asset is in. 0 = No 18 boat Ownership of boat Presence of a functioning boat and regardless 1 = Yes of what condition the asset is in. 0 = No 19 canoe Ownership of canoe Presence of a functioning canoe and regardless 1 = Yes of what condition the asset is in. 0 = No 20 mcycle Ownership of Presence of a functioning motorcycle and motorcycle regardless of what condition the asset is in. 1 = Yes 0 = No 21 car Ownership of private car Presence of a functioning car is important and 1 = Yes regardless of what condition the asset is in. 0 = No This refers to car for household use and NOT a commercial vehicle. 22 internet Access to internet inside This variable should indicate whether the the house household can access the internet within their 1 = Yes home. This could be from a computer, a phone, 0 = No a tablet etc. If the survey asks each individual in the household separately whether they have access to the internet, then the household is considered to have access if at least one individual has access. If the survey asks if the internet connection is working, only consider the household to have access if the connection is working. If the survey asks if individuals are using the internet, then this cannot be included as an indicator of internet access unless the survey also asks where internet is used, and one of the options is in the home. If the survey asks where internet is 'most often' used, then this is not good enough to identify internet access, as individuals may use the internet at home but use it more frequently at work. TABLE 3.5 HOUSEHOLD REMITTANCES No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 hh_remit Did household receive any Did the household receive any remittances? remittances? 1 = Yes Source of remittances not important here. 0 = No If HH_REMIT=0 then subsequent questions are null and void 2 sex_rmt_1 Sex of the 1st remittance The order of the sending members is in sender decreasing order of amount of remittance 1 = Male (remittance includes cash, gifts and food). 0 = Female In some countries, the remittances is by 3 sex_rmt_2 Sex of the 2nd remittance number of transactions, enter each sender transaction as a unique identifier. The 1 = Male reason being one cannot tell if this is the 0 = Female same sender or not. This applies to all 4 sex_rmt_3 Sex of the 3rd remittance questions in this section. sender 1 = Male 0 = Female 5 relat_rmt_1 Relationship to the The order of the sending members is in household head of the 1st decreasing order of amount of remittance remittance sender (remittance includes cash, gifts and food). 2 = Spouse 3 = Son/daughter 4 = Parents/parents-in-law 5 = Grandchild 6 = Son-in-law/daughter-in- law 7 = Other relative 9 = Non-relative 6 relat_rmt_2 Relationship to the household head of the 2nd remittance sender 2 = Spouse 3 = Son/daughter 4 = Parents/parents-in-law 5 = Grandchild 6 = Son-in-law/daughter-in- law 7 = Other relative 9 = Non-relative 7 relat_rmt_3 Relationship to the 32 household head of 3rd remittance sender 2 = Spouse 3 = Son/daughter 4 = Parents/parents-in-law 5 = Grandchild 6 = Son-in-law/daughter-in- law 7 = Other relative 9 = Non-relative 8 des_mig_1 Destination of migration of The order of the sending members is in the 1st remittance sending decreasing order of amount of remittance member (remittance includes cash, gifts and food). 1 = Capital 2 = Within the country (but not capital) 3 = Abroad 9 des_mig_2 Destination of migration of the 2nd remittance sending member 1 = Capital 2 = Within the country (but not capital) 3 = Abroad 10 des_mig_3 Destination of migration of the 3rd remittance sending member 1 = Capital 2 = Within the country (but not capital) 3 = Abroad 11 origin_rmt Origin of the remittance Numeric variable senders This variable is automatically generated in 1 = Domestic the labelling file: 2 = Abroad gen origin_rmt = 1 if inlist(des_mig_1,1,2) & 3 = Both inlist(des_mig_2,1,2) & inlist(des_mig_3,1,2) replace origin_rmt = 2 if des_mig_1==3 & des_mig_2==3 & des_mig_3==3 replace origin_rmt = 3 if origin_rmt==. replace origin_rmt = . if des_mig_1==. & des_mig_2==. & des_mig_3==. 12 amt_rmt_1 Amount of annual Numeric variables remittance by the 1st The order of the sending members is in remittance sender decreasing order of amount of remittance (remittance includes cash, gifts and food). 13 amt_rmt_2 Amount of annual remittance by the 2nd remittance sender 14 amt_rmt_3 Amount of annual remittance by the 3rd remittance sender 15 amt_rmt_fd Total amount of annual The total includes the remittances received remittances received in in the form of food from all remittance food (annual) senders. 16 amt_rmt_oth Total amount of annual The total includes the remittances received remittances received in in other forms (cash, etc.) from all other forms (annual) remittance senders. 34 CHAPTER 4: MODULE I - INDIVIDUAL-LEVEL VARIABLES This module extracts variables of individuals in the household and covers approximately forty quantitative variables. The information is organized in 5 tables that provide variables on basic household identification, demographic characteristics, education, health, and child’s vaccination and anthropometry. TABLE 3.0 SAMPLE AND BASIC HOUSEHOLD IDENTIFIER All Table 3.0 variables will be extracted from the Poverty file Table 2.0. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 country Country code To be merged from the p-file. If you don’t have the p-file, create the variable. 2 year_IHSN 4-digit year of survey based on To be merged from the p-file. If you don’t have IHSN standards the p-file, create the variable. 3 hhno Household number To be merged from the p-file. If you don’t have the p-file, create the variable. 4 hid Household unique identification To be merged from the p-file. If you don’t have the p-file, create the variable. 5 wta_hh Household weights Numeric variable To obtain household estimates, this is the weight to be used in all computations referring to household-level estimates. This variable cannot be used for poverty estimation. The interpretation is the proportion of households with a certain characteristic is XX%. TABLE 3.1 BASIC DEMOGRAPHIC CHARACTERISTICS The file may have different household size when compared to the poverty-level file. Make sure that the regular household members are selected in the same criterion as the Poverty-level file. Secondly, households that do not match the Poverty-level file must be dropped as they do not have the consumption component. Al variables are numeric unless specified. FORMAT, DESCRIPTION AND COMMENTS No VARIABLE NAME LABEL AND CODES 1 pid Individual identification Uniquely identifies the regular household members in each household. Sequentially numbered from 1 to N (household size). If the PID is a concatenation of HID and person ID, concatenate HID and leave PID only. Check that each household member ID is unique. duplicates tag (hid pid),gen(dup) tab dup 2 ageyrs Age in completed years Age is an important variable for most socio- (continuous) economic analyzes and must be established as accurately as possible. Missing ages must be left as missing. If 99=missing, recode to missing. If date of birth is provided, derive age and compare with the given recorded age. If age of Household head is missing, use the var=hhagey in the poverty file to replace the missing age of household head only. For children aged less than 5 years, this is used to interpret child malnutrition and survival data. Check consistency with age in months (AGEM) to get correct age in completed years. For older surveys, check consistency and maintain AGEYRS. 36 This can only be done if date of birth and date of interview are provided. gen bday=mdy(month,day,year) gen iday=mdy(imonth,iday,iyear) format bday iday %d gen age = (iday - bday)/365.25 gen ages=trunc(Age) gen diff=ages-recorded_age tab diff 3 agecat Age intervals (string) String variable Country specific categorical variable. It will only be created only when the country does not report the age of the interviewed people but intervals years of their age. Otherwise leave as missing. gen outputvar="" qui levelsof inputvar, local(lev) foreach cc of local lev { cap loc la_`cc': label(inputvar) `cc' if !_rc { replace outputvar ="`cc'-`la_`cc''" if inputvar ==`cc' } } 4 sex Sex Sex of the individual 1 = Male 0 = Female 5 relathhcs Relationship to household String variable head (country-specific) Country-specific. For each value label, there should be a space between the hyphen (before and after). Code and name: Example: “1 - Head”; “2 - Spouse”; “3 – Child”; etc. gen relathhcs="" qui levelsof inputvar, local(lev) foreach cc of local lev { cap loc la_`cc': label(inputvar) `cc' if !_rc { qui replace relathhcs ="`cc' - `la_`cc''" if inputvar ==`cc' } } 6 relathh9 Relationship to household This refers to the relationship of each household head (9 categories) member to the household HEAD. 1 = Head 2 = Spouse Must have one and only one head in each 3 = Child household. 4 = Parents/parents-in-law 5 = Grandchild Child refers to biological child or adoptive children 6 = Son-in-law/daughter- by either marriage or other reason. in-law 7 = Other relative Domestic help (servant, guard, cook, baby-sitter 8 = Domestic help/paying among others) refers to a person who is paid for boarder services rendered (cash or in-kind e.g. training 9 = None relative skills, board and lodging) even if they are related to the head of household. Paying boarder is someone who pays the household for room and/or board. None relative include friends living in household regularly. Use relathhcs to derive this variable after the edits. If all categories are not present in the questionnaire, leave this variable as missing 7 relathh6 Relationship to household This refers to the relationship of each household head (6 categories) member to the household HEAD. 1 = Head 2 = Spouse Must have one and only one head in each 3 = Child household. 4 = Parents 5 = Other relative Other includes grandchild, in-laws, etc. 6 = Non-relative Non-relative includes domestic help, paying 38 boarder, etc. recode relathh9 (1=1) (2=2) (3=3) (4=4) (5/7=5) (8/9=6), gen(relathh6) 8 marital6 Marital status (6 Polygamous unions exclude relationships that are categories) not officially recognized such as mistresses, 1 = Married monogamous concubines. 2 = Married polygamous 3 = Never married Check for consistency in married unions. Marital 4 = Living together status for couples must be identical. 5 = Divorced/separated 6 = Widowed Do not derive polygamous unions if survey does not ask. Leave variable as missing. If marital asked for persons only above 12 years, one can confidently guestimate that the children are “Never married”. If all categories are not present in the questionnaire, leave this variable as missing 9 marital5 Marital status (5 The term ‘married’ may have different meanings in categories) different countries. Married refers to both formal 1 = Married and informal unions such as common-law 2 = Never Married marriages, union coutumiere, free unions, living 3 = Living together together. 4 = Divorced/Separated 5 = Widowed Check for consistency in married unions. Marital status for couples must be identical. Not all can be imputed but for children less than say 10 years, one can assume with some level of accuracy and certainty that they are never married. recode marital6 (1 2=1) (3=2) (4=4) (5=5),gen(marital5) tab marital6 marital5 10 sp_pres Spouse of household head Code based on a question that asks whether the living in household household head spouse lives in the household. 1 = Yes Otherwise leave as missing. 0 = No Only if MARITAL6<=4 or MARITAL5<=3 DO NOT TRY TO DEDUCE FROM HOUSEHOLD MEMBERSHIP. Whether or not the member of household has a spouse (formal marriage or union/common law spouse) who lives in the household. However, under some special circumstances, a couple may be divorced/separated but living in the same household (dwelling unit) but in separate rooms. In this instance, sp_pres=1. Check the ages and see if consistent. If a child is a spouse, go back to varname=relathhcs, relathh9, relathh6 and edit accordingly. tab ageyrs if sp_pres==1 Note: ✓ For any variable not collected in a country, variable should be created and left as missing (.) in the final harmonized file. ✓ Variables in the data files must follow the sequence in which they appear in the manual. TABLE 3.2 LITERACY AND EDUCATION Al variables are numeric unless specified. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 literacy Literacy status For individuals aged 5 and above only. 1 = Yes, can read and write 0 = No, cannot read or write Value must be missing for all others. Literacy: Is the ability to both read and write with understanding, a short simple statement on his/her everyday life in any language. It will be useful to align measurements of literacy with this given standard international 40 definition. Be careful while coding 1; one must be able to both read and write. If a person can either read or write, he/she will be considered illiterate (LITERACY=0). It can be assumed with some degree of accuracy that if respondent has secondary level and above of education, then must be literate. Also, persons with over 5 years of primary can be assumed literate. Can be programmed with EDUCYRS if literacy is missing for some members. 2 ed_mod_age Education module Minimum age for which education section is application age (country- applied in country. The questionnaire and/or specific) manual specifies this. For this reason, the lower age cutoff at which information is collected will vary from country to country. 3 everattd Ever attended school Value must be missing for individuals less than 1 = Yes the required age (ed_mod_age). 0 = No Depends on how school attendance is defined in a country. Example, in some countries, a criterion is placed to decide if ever attended school is valid or not and is determined by number of weeks or months or school term in attendance. Does not require to have completed any level of education. Indirect derivation if not collected by survey would be to program EDUCAT10 and ATSCHOOL. If ATSCHOOL=1 then ever attended=1. If EDUCAT10>=3 and EDUCAT10<=9, ever attended = 1. 4 educat10 Highest level of education Value must be missing for individuals less than completed (10 categories) the required age (ed_mod_age). 1 = No education 2 = Preschool If a person is currently enrolled in the highest 3 = Primary incomplete year of education, then his/her level of 4 = Primary complete but less education completed should be determined by than completed lower minus one year. For example, if a person is secondary currently enrolled in P6, then his/her highest 5 = Completed lower level completed should be coded as 1 (Pre- secondary (or post-primary school/ Primary, not completed). vocational education) but less than completed upper Individuals enrolled in University level are secondary coded as 8 (University and higher) regardless of 6 = Completed upper whether completed or not. secondary (or extended vocational/technical Other refers to level of education not defined education) by the above codes. This may refer to level of 7 = Post-secondary but not education not explicitly defined e.g. person university attending a village polytechnic yet level 8 = University and higher reached not stated. This classification should 9 = Formal adult education be documented whenever possible. or literacy program 10 = Other If Koranic school teaches formal curricula then it will be classified under formal education, then code appropriately. Koranic schools that teach Islamic knowledge with only (a) basic recitation or (b) recitation and Arabic writing or hafeez (memorization and Arabic fluency) are not mainstream formal schools. Code as “Other” If education level is missing for any member, do not try to impute but leave it as MISSING. If all categories are not present in the questionnaire, leave this variable as missing. 5 educat7 Highest level of education Value must be missing for individuals less than completed (7 categories) the required age (ed_mod_age). 1 = No education 2 = Primary incomplete Primary complete implies that one completed 3 = Primary complete the stipulated primary education by 4 = Secondary incomplete undertaking an exam or test. 5 = Secondary complete 6 = Post-secondary but not Secondary complete implies that one university completed the stipulated secondary education 7 = University (complete or by undertaking an exam or test. incomplete) Post-secondary technical education level refers to any higher education after successfully completing secondary level of education such as higher professional schooling, college, etc. University and higher education level refers undergraduate and higher. 42 If education level is missing, do not try to impute but leave it as MISSING. If all categories are not present in the questionnaire, leave this variable as missing. 6 educat5 Highest level of education Value must be missing for individuals less than completed (5 categories) the required age (ed_mod_age). 1 = No education 2 = Primary incomplete If education level is missing, do not try to 3 = Primary complete but impute but leave it as MISSING. Secondary incomplete If all categories are not present in the 4 = Secondary complete questionnaire, leave this variable as missing. 5 = Tertiary/post-secondary (complete or incomplete) Can be programmed from educat7. recode educat7 (3 4=3) (5=4) (6 7=5),gen(educat5) tab ageyrs educat5 7 educat4 Highest level of education Value must be missing for individuals less than completed (4 categories) the required age (ed_mod_age). 1 = No education 2 = Primary (complete or No education includes people in pre-school and incomplete) never attended. Pre-school definition is 3 = Secondary (complete or country-specific. This may include baby class, incomplete) kindergarten and nursery school among others. 4 = Tertiary (complete or This is the level before joining the regular incomplete) stipulated primary level education cycle. At the minimum, educat4 must be available for all countries. If education level is missing, do not try to impute but leave it as MISSING. Can be programmed from educat7. recode educat7 (2 3=2) (4 5=3) (6 7=4),gen(educat4) tab ageyrs educat4 8 educat_ISCED ISCED education categories These are the UNESCO ISCED 2011 education (highest level enrolled in or categories. Please note that we use the highest completed) level enrolled in or completed. For example, if 1 = Early childhood education you are enrolled in primary education, you 2 = Primary education should get category 2 even if you have not 3 = Lower secondary completed primary yet or never will. education 4 = Upper secondary Check this link for country ISCED Mappings 9 education 5 = Post-secondary non- Post-secondary non-tertiary education may be tertiary education referred in many ways depending on country. 6 = Short-cycle tertiary However, these are typically vocational education programmes that prepare one for the labor 7 = Bachelor's or equivalent market such as technician diploma, electrician level diploma. 8 = Master's or equivalent level 9 = Doctoral or equivalent level 9 primarycomp Primary school Value must be missing for other individual completion less than the required age (ed_mod_age). 1 = Yes 0 = No One can assume with a degree of certainty these conditions qualify primary-school completion: • EDUCAT10>=4 & EDUCAT10<=8 • EDUCAT7>=3 & EDUCAT&<=7 • EDUCAT5>=3 & EDUCAT5<=5 9 educyrs Years of completed Value must be missing for other individual less education than the required age (ed_mod_age). 0 = Pre-school 1 = Grade 1 If grade level not listed, leave EDUCYRS=. 2 = Grade 2 . For individuals who are currently enrolled in . school, their years of education completed . correspond to the class currently attending minus one. For individuals who are not currently enrolled in school, the years of completed education corresponds to the highest level of education completed. This is a continuous variable of the number of years of formal schooling completed. It is constructed only if the survey asked for the number of year of education or highest grade level completed; otherwise, the values are constructed as missing. The years of education that each grade corresponds to, varies by country, for example - some countries may have 5 or 6 years of primary school, 3 years of lower-secondary school, while other countries may have 4 years of primary school and 4 years of lower- secondary school. Refer to the UNESCO ISCED 44 mappings.1 For higher education, the grades/years may not have been asked explicitly. In such cases, the variable should be constructed based on the following assumptions: - • If the individual has completed the tertiary education specified, add to years of completed education - 4 years for BA/BSc, 6 years for MA/MSc, and 8 Years for PhD after the completion of secondary education. • If the individual has not completed tertiary education or completion cannot be ascertained, add to years of completed education – 2 years for BA/BSc, 5 years for MA/MSc, and 7 years for PhD. The variable does not take into account the actual number of years required to reach this grade level. In other words, first grade repeated three times only counts as 1 year of completed education. 10 atschool Currently enrolled in or Value must be missing for individuals less than attending school the required age (ed_mod_age). 1 = Yes 0 = No Use the question that asks for current attendance. If such a question is missing, use the question that explicitly asks for enrollment over the past 12 months. In such surveys, record this in the comments. Code as 0 if EVERATTD=0. 11 atschltyp Type of school currently Value must be missing for individuals less than enrolled/attending the required age (ed_mod_age). 1 = Public 2 = Private Code only for individuals currently attending 9 = Other school (ATSCHOOL=1). Public includes fully government owned as well as semi-public owned. Private are facilities run by non-governmental organizations (e.g. NGOs, religious institutions) or by private entities. 1 http://www.uis.unesco.org/Education/ISCEDMappings/Pages/default.aspx Other refers to schools that cannot be categorized by the above such as community schools which cannot be easily classified if run by either government or private. 12 atslevattd Level of schooling currently Value must be missing for individuals less than enrolled/attending the required age (ed_mod_age). 1 = Preschool 2 = Primary See EDUCAT10 for definition. 3 = Secondary 4 = Post-secondary but not Check for consistency between EDUCAT10. university That is EDUCAT10 cannot be university yet 5 = University and higher current level primary. 6 = Formal adult education or literacy program 9 = Other Note: ✓ For any variable not collected in a country, variable should be created and left as missing (.) in the final harmonized file. ✓ Variables in the data files must follow the sequence in which they appear in the manual. TABLE 3.5 MIGRATION Even if the survey does not ask any information on migration create variables as missing. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 rb_mod_age Migration module Minimum age for which migration is applied. application age (country-specific) For this reason, the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. 2 rbirth Was member born in Value must be missing for individuals less than this country? the required age (rb_mod_age). 1 = Yes 0 = No 3 rbirth_ctry In what country was String variable member born? Value must be missing for individuals less than the required age (rb_mod_age). Only if RBIRTH=0. 46 If born outside country, enter 3-digit ISO country code (see Annex X). Several codes added for use if country no specified. “Other Africa” “Other Europe” “Other America” “Other (unspecified)” 4 rbirthreg Was person born in this Value must be missing for individuals less than region? the required age (rb_mod_age). 1 = Yes 0 = No 5 rbirth_reg Region of birth String variable Value must be missing for individuals less than the required age (rb_mod_age). Only if RBIRTH_REG==0 Use survey region codes. Must entered as “1 – region 1 name”, “2 – region 2 name”, etc. 6 rbirth_prevref Reference time for String variable previous residence Indicates the time reference of the question about migration (or place of residence). For example, RBIRTH_PREV_REF=5, means that the question asks about place of residence 5 years ago. 7 rbirthprev Ever lived in a previous Value must be missing for individuals less than residence than the the required age (rb_mod_age). current one? 1 = Yes, within county If person lived in several places, only the most 2 = Yes, outside country recent should be recorded here. 3 = No 8 rbirth_prev Region of previous String variable residence Value must be missing for individuals less than the required age (rb_mod_age). Only if RBIRTHPREV==1. Code using region codes of survey, must entered as “1 - region name”, etc. If survey asks by area of residence, leave this variable as missing. 9 ymove Year individual moved Value must be missing for individuals less than to current location the required age (rb_mod_age). Indicates year of most recent move to RBIRTH_PREV. 48 CHAPTER 5: MODULE L - LABOR FORCE VARIABLES The construction of employment and labor participation variables is specific to Sub-Saharan African context since over 80 percent of employment activities are in the informal sector. Studying labor participation in Tanzania, it was found that due to poor questionnaire design many unpaid family workers under reported their economic activities, especially women who reported domestic duties as the main activity. These individuals inevitably undertake some unpaid economic activities such as cultivating, and raising livestock, (preparing meals for the family and caring for own children are not classified as economic activities by ILO definition). The SSA harmonization developed complementary steps to capture these under-reported economic activities. This manual reclassifies the employment status of these individuals, who claim household duties as their main activity, as employed rather than inactive. Because labor force questionnaires are significantly different from one another, it is not possible to provide a set of very specific steps that one can follow to classify employment status in Africa. The diagram below illustrates the logic used to classify unpaid economic activities. For details refer to Bardasi, Beegle, Dillon and Serneels "Do Labor Statistics Depend on How and to Whom the Questions Are Asked", World Bank Policy Research Working Paper 5192. It is also strongly recommended that the user studies Appendix I carefully before starting to construct labor force variables, for detailed and in-depth explanations of the logic and how to construct various key SHIP labor force variables. Definition of Unemployment and Labor Force in Africa Region Employed? Yes, Labor F No Were you absent from your job/do you have a job to return to? Yes, Labor F No Have you searched for work? Yes, Labor F, No, out of labor F unemployed TABLE 4.0 SAMPLE AND BASIC HOUSEHOLD IDENTIFIER No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 country Country code To be merged from the p-file. If you don’t have the p-file, create the variable. 2 year_IHSN 4-digit year of survey based To be merged from the p-file. If you don’t have on IHSN standards the p-file, create the variable. 3 hhno Household number To be merged from the p-file. If you don’t have the p-file, create the variable. 4 hid Household unique To be merged from the p-file. If you don’t have identification the p-file, create the variable. 5 wta_hh Household weights Numeric variable To obtain household estimates, this is the weight to be used in all computations referring to household-level estimates. This variable cannot be used for poverty estimation. The interpretation is the proportion of households with a certain characteristic is XX%. TABLE 4.1 HOUSEHOLD CHORES Create tempfile with HID, PID, AGEYRS from the individual-level file that will be used to create the labor file variables. The individual-level file provides the correct household composition and size. Al variables are numeric unless specified. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 pid Individual identification To be merged from individual-level file 2 hh_mod_age Household chores Minimum age for which household chores module application age section is applied. For this reason, the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. 3 fetchwood Fetched wood for the Value must be missing for individuals less than household the required age (hh_mod_age). 0= No 1= Yes Code as 1 (YES) if the individual fetched wood for his/her own household or for others, otherwise code 0 (NO). Based on UN definition of SSN. Fetching wood (for pay or in-kind) is an economic activity. 50 4 fetchwater Fetched water for the Value must be missing for individuals less than household the required age (hh_mod_age). 0= No 1= Yes Code as 1 (YES) if the individual fetched water for his/her own household or for others, otherwise code 0 (NO). Based on UN definition of SSN. Fetching water (for pay or in-kind) is an economic activity. 5 cooking Helped cook or prepare Value must be missing for individuals less than meals/drinks for the the required age (hh_mod_age). household 0= No Code as 1 (YES) if the individual helped cook 1= Yes or prepare meals/drinks for his/her own household or for others, otherwise code 0 (NO). 6 cleaning Helped clean household Value must be missing for individuals less than or wash/iron clothes for the required age (hh_mod_age). the household 0= No Code as 1 (YES) if the individual helped clean 1= Yes household or wash/iron clothes for his/her own household or for others, otherwise code 0 (NO). 7 childcare Helped take care of Value must be missing for individuals less than children for the the required age (hh_mod_age). household 0= No Code as 1 (YES) if the individual helped take 1= Yes care of children for his/her own household or for others, otherwise code 0 (NO). 8 oldcare Helped take care of the Value must be missing for individuals less than elderly for the the required age (hh_mod_age). household 0= No Code as 1 (YES) if the individual helped take 1= Yes care of the elderly for his/her own household or for others, otherwise code 0 (NO). 9 hour_necon Hours spent per week on Value must be missing for individuals less than non-economic activities. the required age (lb_mod_age). These include activities such as preparing food, and care for children. Fetching wood and water are considered economic activities and should not be included here. Note: ✓ For any variable not collected in a country, variable should be created and left as missing (.) in the final harmonized file. ✓ Variables in the data files must follow the sequence in which they appear in the manual. TABLE 4.2 LABOR SCREENING QUESTIONS LAST 7-DAYS Create tempfile with HID, PID, AGEYRS from the individual-level file that will be used to create the labor file variables. The individual-level file provides the correct household composition and size. Al variables are numeric unless specified. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 lb_mod_age Labor force module Age at which the labor module starts being application age (country- applied (working age: people at which can specific) start legally working) For this reason, the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. 2 lstatus Labor status last 7 days Value must be missing for individuals less than 1 = Employed the required age (lb_mod_age). 2 = Unemployed 3 = Not-in-labor force All persons are considered active in the labor force if they presently have a job (formal or informal, i.e. are employed) or do not have a job but are actively seeking work (i.e. unemployed). Employed is defined as anyone who worked during the last 7 days or reference week, regardless of whether the employment was formal or informal, paid or unpaid, for a minimum of 1 hour. Individuals who had a job, but for any reason did not work in the last 7 days are considered employed. A person is defined as unemployed if he or she is, presently not working but is actively seeking a job. The formal definition of unemployed usually includes being ‘able to accept a job’. This last question was asked in a minority of surveys and is, thus, not incorporated in the present definition. A person presently not working but waiting the start of a new job is considered to be unemployed. 52 3 nlfreason Reason not in labor This variable is constructed for all those who force last 7 days are not presently employed and are not 1 = Student looking for work (lstatus=3) and missing 2 = Housewife otherwise. 3 = Retired Student, the person is studying. 4 = Disabled Housekeeping is the person takes care of the 5 = Other house, older people, or children. Disabled is the person who cannot work due to physical conditions. Other the person does not work for any other reason Missing value for people below working age, employed, and unemployed. Other missing values allowed. 4 unempldur_l Unemployment duration Continuous variable (months) lower bracket Variable is constructed for all persons who are unemployed (lstatus=2, otherwise missing). If continuous records the numbers of months in unemployment. If the variable is categorical it records the lower boundary of the bracket. Missing values are allowed for everyone who is not unemployed. Other missing are also allowed. 5 unempldur_u Unemployment duration Continuous variable (months) upper bracket Variable is constructed for all persons who are unemployed (lstatus=2, otherwise missing). If continuous records the numbers of months in unemployment. If the variable is categorical it records the upper boundary of the bracket. If the right bracket is open a missing value should be inputted. Missing values are allowed for everyone who is not unemployed. Other missing are also allowed. Note: ✓ For any variable not collected in a country, variable should be created and left as missing (.) in the final harmonized file. ✓ Variables in the data files must follow the sequence in which they appear in the manual. TABLE 4.3 PRIMARY EMPLOYMENT LAST 7-DAYS Create tempfile with HID, PID, AGEYRS from the individual-level file that will be used to create the labor file variables. The individual-level file provides the correct household composition and size. All variables are numeric unless specified. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 njobs Number of total jobs Missing value for people below working age, unemployed and for people out of the labor force. Other missing values allowed. 2 empstat Employment status Variable is constructed for all persons 1 = Paid Employee administered this module in each 2 = Non-Paid Employee questionnaire. For this reason the lower age 3 = Employer cutoff (and perhaps upper age cutoff) at which 4 = Self-employed information is collected will vary from country 5 = Other, workers not to country. classifiable by status Definitions taken from the International Labor Organization’s Classification of Status in Employment with some revisions to take into account the data available. Classifies the main job employment status of any individual with a job (lstatus=1) and is missing otherwise. Paid employee includes anyone whose basic remuneration is not directly dependent on the revenue of the unit they work for, typically remunerated by wages and salaries but may be paid for piece work or in-kind. The ‘continuous’ criteria used in the ILO definition is not used here as data are often absent and due to country specificity. Non paid employee includes contributing family workers are those workers who hold a self-employment job in a market-oriented establishment operated by a related person living in the same households who cannot be regarded as a partner because of their degree of commitment to the operation of the establishment, in terms of working time or other factors, is not at a level comparable to that of the head of the establishment. Employer is a business owner (whether alone or in partnership) with employees. If the only people working in the business are the owner and ‘contributing family workers, the person is 54 not considered an employer (as has no employees) and is, instead classified as own account. Own account or self-employment includes jobs are those where remuneration is directly dependent from the goods and service produced (where home consumption is considered to be part of the profits) and have not engaged any permanent employees to work for them on a continuous basis during the reference period. Members of producers’ cooperatives are workers who hold a self-employment job in a cooperative producing goods and services in which each member takes part on an equal footing with other members in determining the organization of production, sales and/or other work of the establishment, the investments and the distribution of the proceeds of the establishment amongst the members. Other, workers not classifiable by status include those for whom insufficient relevant information is available and/or who cannot be included in any of the preceding categories. All apprentices should be mapped as unpaid workers 3 ocusec Sector of activity Variable is constructed for all persons 1 = Public sector, Central administered this module in each Government, Army questionnaire. 2 = Private, NGO Classifies the main job's sector of activity of 3 = State owned any individual with a job (lstatus=1) and is 4 = Public or State- missing otherwise. owned, but cannot Public sector includes armed forces. distinguish Private sector is that part of the economy which is both run for private profit and is not controlled by the state, it also includes non- governmental organizations State owned includes para-state firms and all others in which the government has control (participation over 50%). Note: If no such question, leave as missing. Do not code based on occupation (ISCO) or industry (ISIC) codes. 4 industry_orig Original industry codes This variable correspond to whatever is in the original file with no recoding. Missing value for people below the working age. Other missing values allowed. .a indicates non-response 5 industry 1 digit industry Variable is constructed for all persons classification administered this module in each 1 = Agriculture, Hunting, questionnaire. For this reason the lower age Fishing, etc. cutoff (and perhaps upper age cutoff) at which 2 = Mining information is collected will vary from country 3 = Manufacturing to country. 4 = Public Utility Services Classifies the main job of any individual with a 5 = Construction job (lstatus=1) and is missing otherwise. 6 = Commerce The codes for the main job are given here 7 = Transport and based on the UN International Standard Communications Industrial Classification (revision 3.1). The 8 = Financial and main categories subsume the following codes: Business Services 1 Agriculture, Hunting, Fishing (01-05) 9 = Public Administration 2 Mining (10-14) 10 = Others Services, 3 Manufacturing (15-37) Unspecified 4 Electricity and Utilities (40-41) 5 Construction (45) 6 Commerce (50-55) 7 Transportation, Storage and Communication (60-64) 8 Financial, Insurance and Real Estate (65-74) 9 Services: Public Administration (75) 10 Other Services ( 80 -99) In the case of different classifications (former Soviet Union republics, for example), recoding has been done to best match the ISIC-31 codes. 10 is also assigned for unspecified categories or items. 6 industry1 1 digit industry This variable is either created directly from classification (Broad the data (if industry classification does not Economic Activities) exist for 10 categories) or created from 1 = Agriculture industrycat10. The following convention will 2= Industry be used to get from 10 to 4 categories (based 3 = Services on ISIC): 4 = Other gen industry1=. replace industry1=1 if inlist(industry, 1) replace industry1=2 if inlist(industry, 2, 3, 4, 5) replace industry1=3 if inlist(industry, 6, 7, 8, 9) replace industry1=4 if inlist(industry, 10) assert industry1!=. if industry!=. 7 occup_orig Original occupation code This variable correspond to whatever is in the original file with no recoding 56 8 occup 1 digit occupation Variable is constructed for all persons classification administered this module in each 1 = Managers questionnaire. For this reason, the lower age 2 = Professionals cutoff (and perhaps upper age cutoff) at which 3 = Technicians and information is collected will vary from country associate professionals to country. 4 = Clerical support Classifies the main job of any individual with a workers job (lstatus=1) and is missing otherwise. As 5 = Service and sales most surveys collected detailed information workers and then coded it, and the original data is not 6 = Skilled agricultural, in the data bases, no attempt has been made forestry and fishery to correct or check the original coding. workers The classification is based on the International 7 = Craft and related Standard Classification of Occupations (ISCO) trades workers 88. 8 = Plant and machine In the case of different classifications re- operators, and coding has been done to best match the ISCO- assemblers 88. 9 = Elementary occupations 10 = Armed forces occupations 99 = Other/unspecified 9 wage_no_compen Last wage payment Continuous variable: wage in local currency. Variable is constructed for all persons administered this module in each questionnaire. For this reason the lower age cutoff (and perhaps upper age cutoff) will vary from country to country. States the main job's wage earner of any individual (lstatus=1 & empstat=1) and is missing otherwise. Wage from main job (job to which the person dedicated most time in the week preceding the survey). This excludes tips, bonuses, and other payments. For all those with self-employment or owners of own businesses, this should be net revenues (net of all costs EXCEPT for tax payments) or the amount of salary taken from the business. Due to the almost complete lack of information on taxes, the wage from main job is NOT net of taxes. By definition non-paid employees (empstat=2) should have wage=0. Excludes tips, compensations such bonuses, dwellings or clothes, and other payments. 10 bonuses Tips, compensations such Includes tips, compensations such bonuses, bonuses, dwellings or dwellings or clothes, and other payments. clothes, and other Please annualize this value considering the payments. number of months working in the firm and the periodicity of the bonuses 11 unitwage Last wages time unit Type of reference for the wage_no_compen 1 = Daily variable. 2 = Weekly States the main job's wage earner time unit 3 = Every two weeks measurement of any individual (lstatus=1 & 4 = Every two months empstat=1) and is missing otherwise. 5 = Monthly 6 = Quarterly 7 = Every six months 8 = Annually 9 = Hourly 10 = Other 12 whours Hours of work in last Variable is constructed for all persons week administered this module in each questionnaire. For this reason, the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. Classifies the main job of any individual with a job (lstatus=1) and is missing otherwise. This is the number of hours worked in the last 7 days or the reference week in the person’s main job. Main job defined as that occupation to which the person dedicated more time. For persons absent from their job in the week preceding the survey due to holidays, vacation or sick leave, the time worked in the last week the person worked is recorded. (Note sometimes the questions are phrased as, on average how many hrs a week do you work). For individuals who only give information on how many hours they work per day and no information on number of days worked a week, multiply the hours by 5 days. In the case of a question that has hours worked per month, divide by 4.2 to get weekly hours. 13 contract Employment contract Variable is constructed for all persons 0 = No administered this module in each 1 = Yes questionnaire. Indicates if a person has a signed (formal) contract, regardless of duration. For this reason the lower age cutoff (and perhaps upper age cutoff) at which 58 information is collected will vary from country to country. Classifies the contract status of any individual with a job (lstatus=1) and is missing otherwise. This variable is only constructed if there is an explicit question about contracts. 14 healthins Health insurance Variable is constructed for all persons 0 = No administered this module in each 1 = Yes questionnaire. For this reason the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. Classifies the health insurance status of any individual with a job (lstatus=1) and is missing otherwise. This variable is only constructed if there is an explicit question about health security. 15 socialsec Social security Variable is constructed for all persons 0 = No administered this module in each 1 = Yes questionnaire. For this reason the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. Classifies the social security status of any individual with a job (lstatus=1) and is missing otherwise. This variable is only constructed if there is an explicit question about pension plans or social security. 16 union Union membership Variable is constructed for all persons 0 = No administered this module in each 1 = Yes questionnaire. For this reason the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. Classifies the union membership status of any individual with a job (lstatus=1) and is missing otherwise. This variable is only constructed if there is an explicit question about trade unions. 17 firmsize_l Firm size (lower bracket) Variable is constructed for all persons who are employed. If continuous records the number of people working for the same employer. If the variable is categorical it records the lower boundary of the bracket. 18 firmsize_u Firm size (upper bracket) Variable is constructed for all persons who are employed. If continuous records the number of people working for the same employer. If the variable is categorical it records the upper boundary of the bracket. If the right bracket is open, a missing value should be inputted. Note: ✓ For any variable not collected in a country, variable should be created and left as missing (.) in the final harmonized file. ✓ Variables in the data files must follow the sequence in which they appear in the manual. TABLE 4.4 SECONDARY EMPLOYMENT LAST 7-DAYS Create tempfile with HID, PID, AGEYRS from the individual-level file that will be used to create the labor file variables. The individual-level file provides the correct household composition and size. All variables are numeric unless specified. Check consistency with number of jobs (njobs) from previous Section 4.3. If njobs=1 than this section must be missing. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 empstat_2 Employment status - Variable is constructed for all persons second job administered this module in each 1 = Paid Employee questionnaire. For this reason the lower age 2 = Non-Paid Employee cutoff (and perhaps upper age cutoff) at 3 = Employer which information is collected will vary from 4 = Self-employed country to country. 5 = Other, workers not Definitions taken from the International Labor classifiable by status Organization’s Classification of Status in Employment with some revisions to take into account the data available. Classifies the second job employment status of any individual with a job (lstatus=1) and is missing otherwise. Paid employee includes anyone whose basic remuneration is not directly dependent on the revenue of the unit they work for, typically remunerated by wages and salaries but may be paid for piece work or in-kind. The ‘continuous’ criteria used in the ILO definition is not used here as data are often absent and 60 due to country specificity. Non paid employee includes contributing family workers are those workers who hold a self-employment job in a market-oriented establishment operated by a related person living in the same households who cannot be regarded as a partner because of their degree of commitment to the operation of the establishment, in terms of working time or other factors, is not at a level comparable to that of the head of the establishment. Employer is a business owner (whether alone or in partnership) with employees. If the only people working in the business are the owner and ‘contributing family workers, the person is not considered an employer (as has no employees) and is, instead classified as own account. Own account or self-employment includes jobs are those where remuneration is directly dependent from the goods and service produced (where home consumption is considered to be part of the profits) and have not engaged any permanent employees to work for them on a continuous basis during the reference period. Members of producers’ cooperatives are workers who hold a self-employment job in a cooperative producing goods and services in which each member takes part on an equal footing with other members in determining the organization of production, sales and/or other work of the establishment, the investments and the distribution of the proceeds of the establishment amongst the members. Other, workers not classifiable by status include those for whom insufficient relevant information is available and/or who cannot be included in any of the preceding categories. 2 industry_orig_2 Original industry codes This variable correspond to whatever is in the second job original file with no recoding. 3 industry_2 1 digit industry Variable is constructed for all persons classification - second administered this module in each job questionnaire. For this reason the lower age 1 = Agriculture, Hunting, cutoff (and perhaps upper age cutoff) at which Fishing, etc. information is collected will vary from country 2 = Mining to country. 3 = Manufacturing Classifies the second job of any individual with 4 = Public Utility Services a job (lstatus=1) and is missing otherwise. 5 = Construction The codes for the second job are given here 6 = Commerce based on the UN International Standard 7 = Transport and Industrial Classification (revision 3.1). The Communications main categories subsume the following codes: 8 = Financial and 1 Agriculture, Hunting, Fishing (01-05) Business Services 2 Mining (10-14) 9 = Public Administration 3 Manufacturing (15-37) 10 = Other Services, 4 Electricity and Utilities (40-41) Unspecified 5 Construction (45) 6 Commerce (50-55) 7 Transportation, Storage and Communication (60-64) 8 Financial, Insurance and Real Estate (65-74) 9 Services: Public Administration (75) 10 Other Services ( 80 -99) In the case of different classifications (former Soviet Union republics, for example), recoding has been done to best match the ISIC-31 codes. 10 is also assignd for unspecified categories or items. 4 industry1_2 1 digit industry This variable is either created directly from classification (Broad the data (if industry classification does not Economic Activities) - exist for 10 categories) or created from second job industrycat10. The following convention will 1 = Agriculture be used to get from 10 to 4 categories (based 2= Industry on ISIC): 3 = Services gen industry1_2=. 4 = Other replace industry1_2=1 if inlist(industry_2, 1) replace industry1_2=2 if inlist(industry_2, 2, 3, 4, 5) replace industry1_2=3 if inlist(industry_2, 6, 7, 8, 9) replace industry1_2=4 if inlist(industry_2, 10) assert industry1_2!=. if industry_2!=. 5 occup_2 1 digit occupational Variable is constructed for all persons classification - second administered this module in each job questionnaire. For this reason the lower age 1 = Managers cutoff (and perhaps upper age cutoff) at which 2 = Professionals information is collected will vary from country 3 = Technicians and to country. associate professionals Classifies the main job of any individual with a 4 = Clerical support job (lstatus=1) and is missing otherwise. As workers most surveys collected detailed information 5 = Service and sales and then coded it, and the original data is not workers in the data bases, no attempt has been made 62 6 = Skilled agricultural, to correct or check the original coding. forestry and fishery The classification is based on the International workers Standard Classification of Occupations (ISCO) 7 = Craft and related 88. trades workers In the case of different classifications re- 8 = Plant and machine coding has been done to best match the ISCO- operators, and 88. assemblers 9 = Elementary occupations 10 = Armed forces occupations 99 = Other/unspecified 6 wage_no_compen_2 Last wage payment Continuous variable: wage in local currency. Variable is constructed for all persons administered this module in each questionnaire. For this reason, the lower age cutoff (and perhaps upper age cutoff) will vary from country to country. States the main job's wage earner of any individual (lstatus=1 & empstat=1) and is missing otherwise. Wage from main job (job to which the person dedicated most time in the week preceding the survey). This excludes tips, bonuses, and other payments. For all those with self-employment or owners of own businesses, this should be net revenues (net of all costs EXCEPT for tax payments) or the amount of salary taken from the business. Due to the almost complete lack of information on taxes, the wage from main job is NOT net of taxes. By definition non-paid employees (empstat=2) should have wage=0. Excludes tips, compensations such bonuses, dwellings or clothes, and other payments. 7 bonuses_2 Includes tips, compensations such bonuses, dwellings or clothes, and other payments. Please annualize this value considering the number of months working in the firm and the periodicity of the bonuses 8 unitwage_2 Last wages time unit Type of reference for the wage variable. second job States the second job's wage earner time unit 1 = Daily measurement of any individual (lstatus=1 & 2 = Weekly empstat=1) and is missing otherwise. 3 = Every two weeks 4 = Every two months 5 = Monthly 6 = Quarterly 7 = Every six months 8 = Annually 9 = Hourly 10 = Other 9 whours_2 Hours worked last week Continuous variable: hours worked in last in secondary job week Variable is constructed for all persons administered this module in each questionnaire. For this reason the lower age cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. Classifies the main job of any individual with a job (lstatus=1) and is missing otherwise. This is the number of hours worked in the last 7 days or the reference week in the person’s main job. Main job defined as that occupation to which the person dedicated more time. For persons absent from their job in the week preceding the survey due to holidays, vacation or sick leave, the time worked in the last week the person worked is recorded. (Note sometimes the questions are phrased as, on average how many hrs a week do you work). For individuals who only give information on how many hours they work per day and no information on number of days worked a week, multiply the hours by 5 days. In the case of a question that has hours worked per month, divide by 4.2 to get weekly hours. 10 firmsize_l_2 Variable is constructed for all persons who are Firm size (lower bracket) employed. If continuous records the number of people working for the same employer. If the variable is categorical it records the lower boundary of the bracket. 11 firmsize_u_2 Firm size (upper bracket) Variable is constructed for all persons who are employed. If continuous records the number of people working for the same employer. If the variable is categorical it records the upper boundary of the bracket. If the right bracket is open, a missing value should be inputted 64 Note: ✓ For any variable not collected in a country, variable should be created and left as missing (.) in the final harmonized file. ✓ Variables in the data files must follow the sequence in which they appear in the manual. TABLE 4.5 EMPLOYMENT LAST 12-MONTHS Create tempfile with HID, PID, AGEYRS from the individual-level file that will be used to create the labor file variables. The individual-level file provides the correct household composition and size. All variables are numeric unless specified. No VARIABLE NAME LABEL AND CODES FORMAT, DESCRIPTION AND COMMENTS 1 lstatus_year Labor force status last Variable is constructed for all persons 12-months administered this module in each 1 = Employed questionnaire. For this reason, the lower age 0 = Not Employed cutoff (and perhaps upper age cutoff) at which information is collected will vary from country to country. All persons are considered active in the labor force if they presently have a job (formal or informal, i.e. are employed) or do not have a job but are actively seeking work (i.e. unemployed). Variable is constructed for all persons administered this module in each questionnaire. For this reason the age cutoffs at which information is collected will vary from country to country. 2 empstat_year Employment status Variable is constructed for all persons 1 = Paid Employee administered this module in each 2 = Non-Paid Employee questionnaire. For this reason the lower age 3 = Employer cutoff (and perhaps upper age cutoff) at 4 = Self-employed which information is collected will vary from 5 = Other, workers not country to country. classifiable by status Definitions taken from the International Labor Organization’s Classification of Status in Employment with some revisions to take into account the data available. Classifies the main job employment status of any individual with a job (lstatus=1) and is missing otherwise. Paid employee includes anyone whose basic remuneration is not directly dependent on the revenue of the unit they work for, typically remunerated by wages and salaries but may be paid for piece work or in-kind. The ‘continuous’ criteria used in the ILO definition is not used here as data are often absent and due to country specificity. Non paid employee includes contributing family workers are those workers who hold a self-employment job in a market-oriented establishment operated by a related person living in the same households who cannot be regarded as a partner because of their degree of commitment to the operation of the establishment, in terms of working time or other factors, is not at a level comparable to that of the head of the establishment. Employer is a business owner (whether alone or in partnership) with employees. If the only people working in the business are the owner and ‘contributing family workers, the person is not considered an employer (as has no employees) and is, instead classified as own account. Own account or self-employment includes jobs are those where remuneration is directly dependent from the goods and service produced (where home consumption is considered to be part of the profits) and have not engaged any permanent employees to work for them on a continuous basis during the reference period. Members of producers’ cooperatives are workers who hold a self-employment job in a cooperative producing goods and services in which each member takes part on an equal footing with other members in determining the organization of production, sales and/or other work of the establishment, the investments and the distribution of the proceeds of the establishment amongst the members. Other, workers not classifiable by status include those for whom insufficient relevant information is available and/or who cannot be included in any of the preceding categories. 66 3 njobs_year Number of total jobs in Continuous variable last year 4 firmsize_l_year Firm size (lower bracket) Variable is constructed for all persons who are employed. If continuous records the number of people working for the same employer. If the variable is categorical it records the lower boundary of the bracket. 5 firmsize_u_year Firm size (upper bracket) Variable is constructed for all persons who are employed. If continuous records the number of people working for the same employer. If the variable is categorical it records the upper boundary of the bracket. If the right bracket is open, a missing value should be inputted 6 whours_year Hours of work in typical week in the 12-month job ANNEX I: THE METHODOLOGY OF CONSTRUCTING EMPLOYMENT VARIABLES The construction of employment and labor participation variables is specific to the Sub-Saharan African context since over 80 percent of employment activities are in the informal sector. Studying labor participation in Tanzania, Bardasi, Beegle, Dillon and Serneel found that due to poor questionnaire design, many unpaid family workers under reported their economic activities, especially women who reported domestic duties as the main activity. These individuals inevitably undertake some unpaid economic activities such as cultivating, raising livestock, fetching water, and collecting wood (preparing meals for the family and caring for own children are not classified as economic activities by ILO definition.) The Sub-Saharan Team for Statistical Development developed complementary steps to capture these under-reported economic activities. Additionally, due to informal economic activities and under-reported employment, there are often many missing values for the industry of employment based on ISIC code. To remedy this situation, we create a variable to classify industry into farm and non-farm sectors that can be gleaned from other modules of the survey, such as farm, household enterprises and time use modules. The steps outlined below are designed to capture, to the greatest extent possible, the actual employment status, including women who work from home and take care of household responsibilities at the same time, and students who help with fetching wood and water. Construction of labor variables In the SHIP manual, we construct labor force participation first (Table 2). Please note that we code SHIP employment variables based on 7-day information and by 12-month information. We follow ILO definition broadly with supplementary steps to capture under-reported employment. It is important to keep in mind that supplementary steps only replace missing values generated from previous steps. The final employment variable of interest is EMP_CAT_1 (Table 5), which provides statistics to monitor structural changes in employment, classified into five categories as follows: 11 Wage public 12 Wage private non-agriculture EMP_CAT_1 (Table 5) 13 Wage private agriculture 21 Self-employed non-agriculture (household enterprises) 22 Self-employed agriculture (farmers). It should be noted that there is a small overlap between employment by SHIP definition and unemployment by ILO definition. By ILO definition, anyone who is without a job and looking for a job in a reference period (normally from 7 to 2 weeks) should be classified as unemployed. Because of the large proportion of informal employment and the almost absence of unemployment benefits, the status between employed and unemployed is often blurred. However, this small overlap has little significance in analyzing labor force participation. 68 Below are summary tables on construction of most important SHIP employment variables. The sequence corresponds to the precedence of information, i.e. the later steps only replace missing values generated by previous steps. Table 1: Construction of LABFORCE_WB Variable Names Description WORKED_7=YES Did the Individual have any kind of employment for any YES, in the labor force, if ANY of the duration of time within the last 7 day recall period? WORKED_12=YES Did the individual have any kind of employment for any conditions is YES duration of time within the last 12 month recall period? ABSENT=YES If the individual did not work in the past 7 days did he/she have a job to return to? LOOKJOB=YES Did the individual look for a job in the reference period (7days, 2 weeks or 4 weeks)? LABFORCE_WB EMPTYPE_WB_1≠99 Type of employment based on information in the employment section and other sections of the survey. (Table 3 and Flowchart 1). WORKED_7=NO If ALL the conditions are satisfied. WORKED_12=NO NO ABSENT=NO LOOKJOB=NO EMPTYPE_WB_1=99 WORKED_7=. If ALL the conditions are satisfied. MISSING WORKED_12=. ABSENT=. LOOKJOB=. EMPTYPE_WB_1=99 Table 2: Construction of EMPLOYED_WB Variable Names Description WORKED_7=YES Did the Individual have any kind of employment for any YES, if ANY of the conditions are satisfied duration of time within the last 7 day recall period? It is based on an employment screening question, otherwise missing. WORKED_12=YES Did the individual have any kind of employment for any duration of time within the last 12 month recall period? It EMPLOYED_WB is based on an employment screening question, otherwise missing. ABSENT=YES If the individual did not work in the past 7 days did he/she have a job to return to? EMPTYPE_WB_1≠99 Type of employment based on information in the employment section and other sections of the survey. (Table 3 and Flowchart 1). LABFORCE_WB=YES If the individual is part of the labor-force but does not NO EMPTYPE_WB_1=99 have a job as identified by the above four variables. MISSING LABFORCE_WB=. If no information on labor force participation. Description of EMPTYPE_WB_1 EMPTYPE_WB_1 is the most important variable in the SHIP Labor-force module. Information is sought form all sections of the survey including the household enterprise and the farming sections. By design it has no missing values. Individuals for whom no information can be found (include “not known,”, “other”, and no response/not applicable) are coded as 99. EMPTYPE_WB_1 classifies the type of employment into 8 different types. 70 Table 3: Description of codes in EMPTYPE_WB_1 Code Label Definition 1 Wage and An individual employed by non-household members who are paid in cash or in salaried worker kind on a regular basis or based on a task. Agricultural and non-agricultural laborers are included in this category. 2 Self-employed An individual who runs a farm or a non-agricultural enterprise and employs at with employees least one non-household member. Some surveys have information on employment of non-household members and is u8sed to define this category. 3 Self-employed An individual who runs a farm or a non-agricultural enterprise and DOES NOT without employ any non-household member. If the survey does not have information on employees employment of non-household members then individuals are by default classified in this category. Some surveys have a category called ‘own account worker’; such individuals are categorized in this section. 4 Employer Refers to the owner of a business with employees, irrespective of agricultural or non-agricultural sector. Individuals are classified only if the survey explicitly has ‘employer’ as a category. 5 Domestic An individual who works for a domestic household. Some surveys that have a employees question on job description may have this information. 6 Family worker An individual who is a paid or unpaid worker who assists in the work on a farm or a non-farm enterprise. 7 Apprentice Individuals who are apprentices, irrespective of whether paid or unpaid 9 Volunteer An individual who is a volunteer, stated as an explicit category in the questionnaire. 99 Other/Not All individuals who respond as other or not known or have a non-response value. known/missing Additional information from other sections of the survey is gleaned to reclassify these individuals into one of the above mentioned categories where-ever possible. Construction of EMPTYPE_WB_1 (Primary employment) All individuals in the survey Type of employment (EMPTYPE_WB_1) based on employment reported over a 7 day recall period. Volunteers are coded as 9. No information (missing/other/not known) is coded as 99. Code 1 to 7 No Information/volunteer (99/9) Type of employment based on employment reported over a 12 month recall. This step onwards, not applicable for the secondary employment variable Code 1 to 7 No Information/Volunteer (99/9) (EMPTYPE_WB_2) If individuals can be found as working from other modules such as the household enterprise or farming sections, he/she will then be classified as self-employed with/ without employees or as a family worker workers. Code 2, 3 or 6 No Information/Volunteer (99/9) If a household has a household enterprises or a farm, its head will be classified as self-employed without employees. This is implied information regarding the individual. Code 3 If no questions No Information/Volunteer (99/9) were asked regarding performance of If reported reason for not working is Household Duties household duties, then stop at this step. The answer Yes, continue No, not employed “No” means “Not employed” If family engages in agricultural activities; coded as employed at home (EMPLHOME) and classified as a family worker Yes, Code 6 No, not employed Table 4: Description of EMPFRM and EMPSEC EMPFRM Coded as ‘yes’ if the individual is employed in agricultural activities. Activities in the agriculture sector include regular farming, sharecropping, raising livestock, bee-keeping, fishing, logging and hunting. In addition, if the household engages in agricultural activities, 72 and the head is reported as not working, he/she then is classified to be in the agriculture sector. EMPSEC Captures the type of the employment establishment – 1) public, 2) private and 3) State- owned enterprise. It is based only on an explicit question in the survey to this effect. EMP_CAT_1 is derived from the following underlining variables: 1. EMPTYPE_WB_1 (Table 3 and Flowchart 1) 2. EMPFARM_1 (Table 4) 3. EMPSEC_1 (Table 4) Table 5: Construction of EMP_CAT_1 Variable Names Description EMPTYPE_WB_1 Type of employment is wage or employer or domestic EMP_CAT_1=11 Wage public = employee or apprentice or volunteer. 1, 4, 5, 7, 9 EMPSEC_1= 1 , 2 Individual worked in either the public sector or in a state owned company. EMPTYPE_WB_1 Type of employment is wage or employer or domestic Wage private non- EMP_CAT_1=12 = employee or apprentice or volunteer. agriculture 1, 4, 5, 7, 9 EMPSEC_1= 3 Individual worked in the private sector. EMPFRM_1≠YES Individual was NOT employed in agricultural activities. EMPTYPE_WB_1 Type of employment is wage or employer or domestic EMP_CAT_1 EMP_CAT_1=13 EMP_CAT_1=13 Family enterprise Wage private = employee or apprentice or volunteer. agriculture 1, 4, 5, 7, 9 EMPSEC_1= 3 Individual worked in the private sector. EMPFRM_1=YES Individual was employed in agricultural activities. EMPTYPE_WB_1 Type of employment is self-employed with or without non -agriculture = employees or family worker. 2, 3, 6 EMPFRM_1≠YES Individual was NOT employed in agricultural activities. EMPTYPE_WB_1 Type of employment is self-employed with or without EMP_CAT_1=13 Family farmer = employees or family worker. 2, 3, 6 EMPFRM_1 Individual was employed in agricultural activities. =YES ANNEX II: INTERNATIONAL STANDARD INDUSTRIAL CLASSIFICATION OF ALL ECONOMIC ACTIVITIES (ISIC) ISIC REV. 4.0 CATEGORIES A - Agriculture, hunting and forestry 01 - Agriculture, hunting and related service activities 02 - Forestry, logging and related service activities 03 - Fishing and aquaculture B - Mining and quarrying 05 - Mining of coal and lignite 06 - Extraction of crude petroleum and natural gas 07 - Mining of metal ores 08 - Other mining and quarrying 09 - Mining support service activities C - Manufacturing 10 - Manufacture of food products 11 - Manufacture of beverages 12 - Manufacture of tobacco products 13 - Manufacture of textiles 14 - Manufacture of wearing apparel 15 - Manufacture of leather and related products 16 - Manufacture of wood and of products of wood and cork, except furniture; manufacture of articles of straw and plaiting materials 17 - Manufacture of paper and paper products 18 - Printing and reproduction of recorded media 19 - Manufacture of coke and refined petroleum products 20 - Manufacture of chemicals and chemical products 21 - Manufacture of basic pharmaceutical products and pharmaceutical preparations 22 - Manufacture of rubber and plastics products 23 - Manufacture of other non-metallic mineral products 24 - Manufacture of basic metals 25 - Manufacture of fabricated metal products, except machinery and equipment 26 - Manufacture of computer, electronic and optical products 27 - Manufacture of electrical equipment 28 - Manufacture of machinery and equipment n.e.c. 29 - Manufacture of motor vehicles, trailers and semi-trailers 30 - Manufacture of other transport equipment 31 - Manufacture of furniture 32 - Other manufacturing 33 - Repair and installation of machinery and equipment D - Electricity, gas, steam and air conditioning supply 35 - Electricity, gas, steam and air conditioning supply E - Water supply; sewerage, waste management and remediation activities 36 - Water collection, treatment and supply 37 - Sewerage 38 - Waste collection, treatment and disposal activities; materials recovery 39 - Remediation activities and other waste management services 74 F - Construction 41 - Construction of buildings 42 - Civil engineering 43 - Specialized construction activities G - Wholesale and retail trade; repair of motor vehicles, motorcycles 45 - Wholesale and retail trade and repair of motor vehicles and motorcycles 46 - Wholesale trade, except of motor vehicles and motorcycles 47 - Retail trade, except of motor vehicles and motorcycles H - Transportation and storage 49 - Land transport and transport via pipelines 50 - Water transport 51 - Air transport 52 - Warehousing and support activities for transportation 53 - Postal and courier activities I - Accommodation and food service activities 55 - Accommodation 56 - Food and beverage service activities J - Information and communication 58 - Publishing activities 59 - Motion picture, video and television programme production, sound recording and music publishing activities 60 - Programming and broadcasting activities 61 - Telecommunications 62 - Computer programming, consultancy and related activities 63 - Information service activities K - Financial and insurance activities 64 - Financial service activities, except insurance and pension funding 65 - Insurance, reinsurance and pension funding, except compulsory social security 66 - Activities auxiliary to financial service and insurance activities L - Real estate activities 68 - Real estate activities M - Professional, scientific and technical activities 69 - Legal and accounting activities 70 - Activities of head offices; management consultancy activities 71 - Architectural and engineering activities; technical testing and analysis 72 - Scientific research and development 73 - Advertising and market research 74 - Other professional, scientific and technical activities 75 - Veterinary activities N - Administrative and support service activities 77 - Rental and leasing activities 78 - Employment activities 79 - Travel agency, tour operator, reservation service and related activities 80 - Security and investigation activities 81 - Services to buildings and landscape activities 82 - Office administrative, office support and other business support activities O - Public administration and defence; compulsory social security 84 - Public administration and defence; compulsory social security P - Education 85 - Education Q - Human health and social work activities 86 - Human health activities 87 - Residential care activities 88 - Social work activities without accommodation R - Arts, entertainment and recreation 90 - Creative, arts and entertainment activities 91 - Libraries, archives, museums and other cultural activities 92 - Gambling and betting activities 93 - Sports activities and amusement and recreation activities S - Other service activities 94 - Activities of membership organizations 95 - Repair of computers and personal and household goods 96 - Other personal service activities T - Activities of households as employers; undifferentiated goods- and services-producing activities of households for own use 97 - Activities of households as employers of domestic personnel 98 - Undifferentiated goods- and services-producing activities of private households for own use U - Activities of extraterritorial organizations and bodies 99 - Activities of extraterritorial organizations and bodies When the ISIC 4.0 categories are used in a survey, we use the following mapping: A = Agriculture and fishing B = Mining and quarrying C = Manufacturing D+E = Electricity, gas and water supply F = Construction G+I = Commerce H + J N (code=79) = Transport, storage and communication K+L = Financial, insurance and real estate O = Public administration M + N (excl. 79) P + Q + R + S + T + U = Other services 76 ISIC REV. 3.1 CATEGORIES A - Agriculture, hunting and forestry 01 - Agriculture, hunting and related service activities 02 - Forestry, logging and related service activities B - Fishing 05 - Fishing, operation of fish hatcheries and fish farms; service activities incidental to fishing C - Mining and quarrying 10 - Mining of coal and lignite; extraction of peat 11 - Extraction of crude petroleum and natural gas; service activities incidental to oil and gas extraction, excluding surveying 12 - Mining of uranium and thorium ores 13 - Mining of metal ores 14 - Other mining and quarrying D - Manufacturing 15 - Manufacture of food products and beverages 16 - Manufacture of tobacco products 17 - Manufacture of textiles 18 - Manufacture of wearing apparel; dressing and dyeing of fur 19 - Tanning and dressing of leather; manufacture of luggage, handbags, saddlery, harness and footwear 20- Manufacture of wood and of products of wood and cork, except furniture; manufacture of articles of straw and plaiting materials 21 - Manufacture of paper and paper products 22 - Publishing, printing and reproduction of recorded media 23 - Manufacture of coke, refined petroleum products and nuclear fuel 24 - Manufacture of chemicals and chemical products 25 - Manufacture of rubber and plastics products 26 - Manufacture of other non-metallic mineral products 27 - Manufacture of basic metals 28 - Manufacture of fabricated metal products, except machinery and equipment 29 - Manufacture of machinery and equipment n.e.c. 30 - Manufacture of office, accounting and computing machinery 31 - Manufacture of electrical machinery and apparatus n.e.c. 32 - Manufacture of radio, television and communication equipment and apparatus 33 - Manufacture of medical, precision and optical instruments, watches and clocks 34 - Manufacture of motor vehicles, trailers and semi-trailers 35 - Manufacture of other transport equipment 36 - Manufacture of furniture; manufacturing n.e.c. 37 - Recycling E - Electricity, gas and water supply 40 - Electricity, gas, steam and hot water supply 41 - Collection, purification and distribution of water F - Construction 45 - Construction G - Wholesale and retail trade; repair of motor vehicles, motorcycles and personal and household goods 50 - Sale, maintenance and repair of motor vehicles and motorcycles; retail sale of automotive fuel 51 - Wholesale trade and commission trade, except of motor vehicles and motorcycles 52 - Retail trade, except of motor vehicles and motorcycles; repair of personal and household goods H - Hotels and restaurants 55 - Hotels and restaurants I - Transport, storage and communications 60 - Land transport; transport via pipelines 61 - Water transport 62 - Air transport 63 - Supporting and auxiliary transport activities; activities of travel agencies 64 - Post and telecommunications J - Financial intermediation 65 - Financial intermediation, except insurance and pension funding 66 - Insurance and pension funding, except compulsory social security 67 - Activities auxiliary to financial intermediation K - Real estate, renting and business activities 70 - Real estate activities 71 - Renting of machinery and equipment without operator and of personal and household goods 72 - Computer and related activities 73 - Research and development 74 - Other business activities L - Public administration and defence; compulsory social security 75 - Public administration and defence; compulsory social security M - Education 80 - Education N - Health and social work 85 - Health and social work O - Other community, social and personal service activities 90 - Sewage and refuse disposal, sanitation and similar activities 91 - Activities of membership organizations n.e.c. 92 - Recreational, cultural and sporting activities 93 - Other service activities P - Activities of private households as employers and undifferentiated production activities of private households 95 - Activities of private households as employers of domestic staff 96 - Undifferentiated goods-producing activities of private households for own use 97 - Undifferentiated service-producing activities of private households for own use Q - Extra-territorial organizations and bodies 99 - Extra-territorial organizations and bodies When the ISIC 3.1 categories are used in a survey, we use the following mapping: A+B = Agriculture and fishing C = Mining and quarrying D = Manufacturing E = Public utility services F = Construction G+H = Commerce I = Transport, storage and communication J+K = Financial and business services L+M+N = Public administration O+P+Q = Other services 78 ANNEX II: INTERNATIONAL STANDARD CLASSIFICATION OF OCCUPATIONS (ISCO) The International Standard Classification of Occupations (ISCO) is one of the main international classifications the ILO. It belongs to the international family of economic and social classifications. ISCO is a tool for organizing jobs into a clearly defined set of groups per the tasks and duties undertaken in the job. Its main objectives is to provide: • a basis for the international reporting, comparison and exchange of statistical and administrative data about occupations; • a model for the development of national and regional classifications of occupations; and • a system that can be used directly in countries that have not developed their own national classifications. It is intended for use in statistical applications and in a variety of client oriented applications. Client oriented applications include the matching of job seekers with job vacancies, the management of short or long term migration of workers between countries and the development of vocational training programmes and guidance. The first version of ISCO was adopted in 1957 by the Ninth International Conference of Labour Statisticians (ICLS). It is known as ISCO-58. This version was superseded by ISCO-68, which was adopted by the Eleventh ICLS in 1966. The third version, ISCO-88, was adopted by the Fourteenth ICLS in 1987. Many current national occupational classifications are based on one of these three ISCO versions. ISCO has recently been updated to take into account developments in the world of work since 1988 and to make improvements in light of experience gained in using ISCO-88. The updating did not change the basic principles and top structure of ISCO-88 but significant structural changes were made in some areas. The updated classification was adopted in December 2007 and is known as ISCO-08. Many countries are now updating their national classification either based on ISCO-08 or to improve alignment with the new international statistical standard. The resolution adopting ISCO-08, the classification structure and correspondence tables with ISCO-88 are available on the ISCO Website in English, French and Spanish. Final definitions of the ISCO-08 groups are currently available on this Website in English only. The structure, definitions, correspondence tables, and an Introduction summarising the updating process, outlining the methodology and conceptual model used and describing the main differences between ISCO-88 and ISCO-08 will be released in book form as ISCO-08 Volume 1. An index of occupation titles in both alphabetical and numerical order will also be made available on the Website and subsequently in book form as ISCO-08 Volume 2. ANNEX III: DATA CHECKS AND EVALUATION Statistics have been recognized as playing a key role in the work of many organizations. The production of good statistics implies that data evaluation should be one of the key phases in any statistical operation. This refers to the process of assessing the statistical final product in terms of accuracy and reliability. This should be done before the creation of the standardized files, as data should be edited accordingly to a certain standard of accuracy. Two common types of evaluation involve: - (a) Validation and certification This ensures that erroneous data are not released. This is conducted in conjunction with an interpretative analysis of the data. Due to time constraints, basic methods include: - - Checks of consistency with external sources of data or from other surveys; - Internal data consistency checks; - Unit-by-unit reviews for aggregate estimates especially consumption, expenditure and income data; - “Reasonableness” or “rational” checks by subject matter specialists; - Calculation of data quality checks such as non-response rates. Eliminating duplicates: Before the aggregation, one should make sure that there is no duplicated household or individual id (This problem should have been eliminated at the data entering stage, this is just a quick check). There are two types of duplicate household id. One is that one household was entered twice, in which case all other variables should be also the same, such as physical features of the housing, rent paid, etc. In such case, one of the duplicated observations should be eliminated (STATA) can perform this task easily). Second type of duplicate is two or more households have identical households ID. This type of duplicate is hard to discover because one often takes two households as one household. However, unusual large household size, such as over 30, should warrantee a check of the household to make sure that they are not two households. When there are two or more same ids signed to household members, if the two or more members have exact same information, such as age, sex, status enrollment, etc., a double entry of the same member is usually the case, and one should eliminate one of the observations. However, if they are two different individuals, one should reassign a new id to one of the members. For any data set the following minimum standards should be reviewed. This list is not exhaustive but provides a starting point for data editing and analyzes. i) All coded variables should have their labels within range, valid answers. For example, if sex has two value labels - male = 1 and female = 2 or vice versa, no other values are permitted nor missing values allowed. ii) The serial number or identification number for each member should be unique. 80 Any two members cannot have the same unique identifier. The range is 1-n, where n is the number of household members. Member numbers must be consecutive starting with 1. A quick check can be to count the number of household members’ vis-à-vis the n, which is the nth person in the household. If these do not correspond to actual household size, then an error may have occurred during data entry or there could be an omission in the number of household members or inclusion of persons belonging to other households. iii) There must be one and only one head of household. iv) The spouse of the head of household should be of the opposite sex of the head. Sex of Head  Sex of Spouse (v) Check age of household head. Persons less than 15 years should not be heads. Age and head should be crosschecked. Head age > minimum (say 15 years) Child of Head age < age of head - minimum Parent of Head age > age of head + minimum However, there may be situations of heads that are less than 15 years. This must be documented. v) Marital status for head and spouse should be identical. If number of spouse > 1, implies that head of household and spouses are polygamous. Alternatively, the presence of a spouse implies that the head of household is married. vi) For persons, greater than 99, this should be recoded appropriately as missing values not permitted. Only the Household heads will have imputed age if missing. vii) Check age versus marital status of household members. Firstly, all children <= 5 years must be single. Children between the ages of 5-12 should be single. However, depending on culture and in exceptionally cases; one may find 10-12 year olds that are married. viii) Literacy levels should be coded appropriately. If a criterion is in place (say above a minimum age), then anyone less than the minimum age should have values missing. For example, in some countries, literacy is asked for persons 15 years and above (herein referred as adult literacy). This should be checked. All other persons must have missing values or system missing. If literacy is asked for all persons regardless of age, then no missing value is permitted. Check literacy versus age. A child less than 5 years is illiterate with some degree of accuracy. ix) School attendance should be coded appropriately. Usually children aged less than 3 years will not be in school. Ages 4 and 5 may be in pre- school and will be considered as school attendance. Definition of pre-school is country- specific. This can be crosschecked with education level attained. x) Education level reached or attained. In most cases, children less than 5 years will not be in primary school. Children aged 4-5 may be in pre-school. Pre-school definition is country-specific. As a result, one may find children less than 3 in school that may not formal schooling. Therefore this is checked with country in question for correct education definition. Few exceptions may exist with 4-5 year olds in primary school. Children less than 3 years will not be in school. xi) Education years completed Age of member  number of years in school. One can guestimate a difference of about 3 years between age and number of years in school. Otherwise code this to missing. xii) Morbidity (sickness) should be for all persons. No missing values permitted. Only 2 responses allowed – Yes and No. If type of sickness present and morbidity missing and may be some individual-level health expenditure present, then morbidity will be YES. Missing values not permitted. xiii) Household size must be greater than zero. xiv) Number of rooms must be greater than zero. xv) Age in months can be calculated if date of interview and date of birth are present. xvi) For welfare aggregate data, country-specific requirements will be considered and assumed to be correct, within range and reasonable. Expenditure becomes one of the most difficult file to edit. There are no hard and fast rules but each country should be observed independently of another. Consumption expenditure patterns are unique for each country and so one case fits all does not apply. (b) Sources of error reviews This provides quantitative information on specific sources of error in the data. Sources of errors can occur during field data collection and/or at the editing stage. The results of these reviews are only available after the official release of data. However, this does not imply that the results produced earlier be rendered void but helps in the improvement of data collection methodologies and techniques in the next phase. Sources of errors include: - 82 (i) Sampling errors Occur when results are based on a sample population instead of the entire population. The sampling error is a measure of variability between all possible samples and can be evaluated statistically. These are grouped into: - ✓ Probability sampling – This is used where registers are available and accuracy can be estimated for variables. If estimating from a probability random sample, then a measure of accuracy of an estimate is the square root of the mean square error. ✓ Non-probability sampling – National Statistical Institutes may use expert samples based on a high coverage of relevant characteristics. In such circumstances, it is impossible to obtain an objective assessment of the accuracy of the estimates. However, some rough accuracy of quality can be designed using sensitivity analyses. (ii) Non-sampling errors Non-sampling errors are impossible to avoid and are difficult to evaluate statistically. These include: - ✓ Coverage errors – consist of omissions, erroneous inclusions and frame duplication while conducting the survey. This affects every estimate produced and may cause a bias in the results and the effect can vary among sub-groups. ✓ Non-response errors – this occurs when an effective sample size is not attained. This may increase the variance due to the decrease in sample size. This can be corrected either through imputation or adjusting weights of the responding units. ✓ Measurement errors – occurs when measures differ from true values. This occurs at the time of data collection. This may be caused by the: - - Interviewer - The interviewer may influence respondent in such way that measurement errors arise. - Respondent (e.g. lack of understanding of survey question, respondent fatigue, long recall period) - Information system (e.g. reference period requested may be different from the period i.e. calendar year and accounting year). - Survey instrument although this is difficult to assess in an objective way. However, a description of the pilot survey and the conclusion of the analysis are necessary in order to assess the questionnaire. Errors also arise due to large questionnaire, vague questions. - Mode of data collection (interviewing technique – face-to-face, telephone, self- administered, etc.) ✓ Processing errors – occurs during post-data collection processes such as validation of data editing, coding, imputation, capture and tabulation. ANNEX IV: ISO 3166-1 ALPHA-3 COUNTRY CODES (SUB-SAHARAN AFRICA) Country code Country name Country code Country name AGO Angola LBR Liberia BEN Benin MDG Madagascar BWA Botswana MWI Malawi BFA Burkina Faso MLI Mali BDI Burundi MRT Mauritania CMR Cameroon MUS Mauritius CPV Cape Verde MOZ Mozambique CAF Central African Republic NAM Namibia TCD Chad NER Niger COM Comoros NGA Nigeria COD Congo, Dem. Rep. RWA Rwanda COG Congo, Rep. STP Sao Tome and Principe CIV Cote d'Ivoire SEN Senegal GNQ Equatorial Guinea SYC Seychelles ERI Eritrea SLE Sierra Leone SWZ Eswatini SOM Somalia ETH Ethiopia ZAF South Africa GAB Gabon SSD South Sudan GMB Gambia, The SDN Sudan GHA Ghana TZA Tanzania GIN Guinea TGO Togo GNB Guinea-Bissau UGA Uganda KEN Kenya ZMB Zambia LSO Lesotho ZWE Zimbabwe 84 Definition of Employment in Africa Region Population Worked last 7 days? (Any work, including unpaid work) Yes, Employed No If screening questions are asked previously such as Worked last 12 months? (Any work, including unpaid work) have you worked on farm land, livestock, etc. and no questions were asked Yes, Employed No whether performed household duties, then stop at this step. Household duties? Yes No, Not employed Family owns farm or livestock? Yes, Employed No, not employed Note: This diagram tries to capture unpaid economic activities in Africa, based on a research about under reporting labor participations in Tanzania. It found that many unpaid family workers, especially women, under report their economic activities due to poor questionnaire design. For details see Bardasi, Beegle, Dillon and Serneels "Do Labor Statistics Depend on How and to Whom the Questions Are Asked", World Bank Policy Research Working Paper 5192.