A Comparison of CAPI and PAPI through a
                Randomized Field Experiment1


                                        November 2010

                               Bet Caeyers (University of Oxford)
                                     Neil Chalmers (EDI)
                                   Joachim De Weerdt (EDI)



                                         ABSTRACT
This paper reports on a randomized survey experiment among 1840 households, designed
to compare pen-and-paper interviewing (PAPI) to computer-assisted personal
interviewing (CAPI). We find that PAPI data contain a large number of errors, which can
be avoided in CAPI. We show that error counts are not randomly distributed across the
sample, but are correlated with household characteristics, potentially introducing sample
bias in analysis if dubious observations need to be dropped. We demonstrate a tendency
for the mean and spread of total measured consumption to be higher on paper compared
to CAPI, translating into significantly lower measured poverty, higher measured
inequality and higher income elasticity estimates. Investigating further the nature of
PAPI’s measurement error for consumption, we fail to reject the hypothesis that it is
classical: it attenuates the coefficient on consumption when used as explanatory variable
and we find no evidence of bias when consumption is used as dependent variable.
Finally, CAPI and PAPI are compared in terms of interview length, costs and
respondents’ perceptions.


1. Introduction
Whilst the analysis of survey data has benefitted from the information technology
revolution, most data collection in developing countries still uses traditional pen-and-
paper interviewing (PAPI). In computer-assisted personal interviewing (CAPI) the
interviewer reads questions from the screen of a handheld device, preloaded with the
questionnaire, to the respondent. The respondent’s answers are immediately entered into
the device, which eliminates the need for manual re-keying of the data. The computer

1
  We gratefully acknowledge financial support from the World Bank’s multi-year research agenda in survey
methodology (LSMS Phase IV). We appreciate permission from the Millennium Challenge Corporation
(MCC) to build on their existing survey in Pemba. We thank Kathleen Beegle, David McKenzie, Kinnon
Scott and participants at the Conference on Survey Design and Measurement in Washington DC for
feedback on the experiment’s design and an earlier draft of this paper. The paper was substantially
improved after incorporating suggestions made by the editors and two anonymous referees. Leonard
Kyaruzi, Deogratias Mitti and Mujobu Moyo lead the field teams, while Alessandro Romeo and Thaddaeus
Rweyemamu took care of data entry of the paper questionnaires.


                                                                                                      1
also automates the routing through the questionnaire and enables the interviewer to run a
set of consistency checks during the interview, so that anomalies can be resolved with the
respondent. These and numerous other features are believed to improve data quality, but
it is unclear to what extent they actually do so and what effect this has on analysis.
Furthermore, there is currently no empirical evidence from the developing world on how
a switch from PAPI to CAPI would influence the length of the interview, respondents’
perceptions, the cost of the survey, requirements on level of education of interviewers
and so forth. This paper reports on a formal experiment, designed specifically to compare
CAPI and PAPI along these and other lines.

The study was built on an existing LSMS-style CAPI survey of 1,200 households on the
Island of Pemba in Tanzania. The experiment consisted of randomly sampling, within the
same enumeration areas, 320 additional households to be interviewed using restricted
CAPI (with disabled consistency checks) and 320 using PAPI. This design allows for a
detailed comparison of errors, outliers, interview times, respondent’s perceptions,
interviewer effects and costs across the three methodologies. Special focus was given to
improving the collection of consumption data, which utilises many of the powerful
features of the computer, including complex validity checks and the ability to show
pictures on the screen. The experiment lends itself to comparing simple poverty and
inequality measures across the experiments.

While the first computer-assisted telephone interviews (CATI) were conducted by a US
marketing firm in 1971, the first nation-wide CAPI survey occurred only in 1987 in the
Netherlands (Nichols and de Leeuw, 1996). As CAPI became more popular for large-
scale face-to face surveys in western countries, researchers became more aware of its
impact on the survey process and outcomes. It was found that interviewers and
respondents reacted favourably to the technology (Couper and Burt, 1994; de Leeuw and
Nichols, 1996). Taylor (1998) shows that this remains true for respondents with,
presumably, less exposure to modern technology, such as the elderly over 70 years of
age. Banks and Lauri (2000) report that the attrition rate in the British Household Panel
Survey was not affected when it switched from PAPI to CAPI in 1998.The literature also
indicates the potential of CAPI to reduce routing and other errors (de Leeuw, 2008).
There has been a number of CAPI surveys in the developing world, an enumeration of
which is beyond the scope of this paper. Apart from the paper by Fafchamps et al (2010),
however, we are not aware of any systematic attempt to study the effect on data quality
and analysis.

The lack of evidence on how to reduce errors in surveys in developing countries stands in
stark contrast to how much is known about the effects of measurement error in analysis
(Bound et al., 2001; Chesher and Schluter, 2002). Classical measurement error is defined
by Bound et al. (2001) as an error in the measurement of a particular variable which is
uncorrelated with the true value of that variable, the true values of other variables in the
model, and any errors in measuring those variables. As we do not have independent,
validation data in this experiment, we cannot directly measure the error to analyse its
nature. We are, however, able to set up two testable hypotheses that should hold if
measurement error is classical: in regression analysis, classical measurement error causes



                                                                                           2
no bias when just the dependent variable has error, but attenuates the estimated
coefficient on a single error-ridden explanatory variable. We fail to reject the hypothesis
that the introduced measurement error is classical, at least for consumption measurements
and based on these two tests. There is some consolation in this finding, as non-random,
mean-reverting errors negatively correlated with true values bias regression coefficients
even when just the dependent variable has error. When an explanatory variable has such
error, its coefficient may be biased either toward or away from zero (Gibson and Kim,
2007). Moreover, the main correction for measurement error bias – instrumental variables
(IV) – is inconsistent when errors are correlated with true values (Black, Berger and
Scott, 2000).

The next section describes the design of the experiment and the differences we
hypothesise to exist between CAPI and PAPI. Section 3 discusses results pertaining to
errors and sample size reduction. It shows that CAPI significantly reduces the number of
inconsistencies per survey. Some of these errors may require observations to be omitted
from analysis, which could bias the sample because missing variables are not randomly
distributed. Section 4 analyses the nature of measurement error in consumption
aggregates. It first compares nutrition, consumption, poverty and inequality data across
the three experiments. It then hypothesises that error is introduced through PAPI and sets
up two testable predictions to verify whether this measurement error is classical. The first
is that regression coefficients on consumption as an independent variable should be
attenuated. The second is that there is no bias in a model where the error-ridden variable
is used as a dependent variable. We find that, despite the fact that error counts are higher
in certain types of households, we cannot reject that (after cleaning) the introduced
measurement error is classical. Section 5 looks at other dimensions of comparison, such
as cost, length of the interview and respondents’ perceptions. Section 6 discusses some
concluding observations.

2. Experimental set-up and hypothesised effects
2.1. Set-up
The experiment was run alongside an existing household survey on Pemba Island (which
is part of Zanzibar, Tanzania). The main survey was conducted in July and August 2009
on behalf of MCA-T (Millennium Challenge Account Tanzania) as a baseline to evaluate
their rural roads upgrade programme. In total 1,200 households were interviewed - 15 in
each of the 80 Enumeration Areas (EAs). All households were administered a full CAPI
questionnaire using an Ultra Mobile Personal Computer (UMPC), which is a handheld
device with a 7’’ touch screen (a screen smaller than that of a laptop, but larger than that
of a PDA). In a first experiment, we randomly selected 4 additional households per EA
(320 in total) who were interviewed with the same CAPI questionnaire, but with one
important CAPI feature disabled: the system of consistency checks. The purpose of this
experiment was to isolate the effect of consistency checks, which are believed to have
important impact on data quality, especially in the consumption data. In the remainder of
this paper we will refer to this application as ‘restricted CAPI’, in order to distinguish it
from the unrestricted ‘full CAPI’ application which included the system of consistency
checks. To investigate all other CAPI effects, as a bundle, a second experiment randomly
selected another 4 households per EA to be interviewed using PAPI. The PAPI data were



                                                                                            3
transferred to computer using two pass verification to minimize keystroke errors. Each of
the four interviewers in a team conducted one restricted CAPI, one PAPI and three or
four full CAPI interviews per cluster. For the restricted CAPI and PAPI, interviewers
were allocated a specific household to interview at a specific time within the team’s two-
day visit to the EA. This was done to ensure that questionnaires were not clustered per
interviewer or in time.

All experimental questionnaires were conducted by the same 20 interviewers working on
the main MCA survey. This increased the likelihood of contamination within the
experiment, though it is hard to know the direction of the bias a priori. On the one hand,
interviewers could learn about the kind of checks CAPI implements (something they may
not have done had the questionnaire been purely on paper), but on the other hand
interviewers could unlearn the practice of carefully verifying a questionnaire at the end of
the day as they get used to the computer doing it for them. We tackle this contamination
bias in two different ways. First, during training and fieldwork, interviewers were
repeatedly instructed to check questionnaires at the end of the interview and again before
submitting them to the supervisor. The supervisor, in turn, would check the
questionnaires for errors. Questionnaires with errors that could not be resolved at base
camp were returned to the interviewer, who was then required to revisit the household.
Second, we have data to control for the number of months of experience that interviewers
had using paper questionnaires and using electronic questionnaires.

The experimental questionnaire took, on average, 84 minutes to administer and included
the following sections: Control data, GPS-coordinates, household head details, household
member roster, demographics, education, health, amenities, assets, livestock, agriculture
and consumption.2

A few days after the electronic questionnaire was conducted, a separate team of locally
recruited interviewers returned to 4 households per experiment to ask 13 simple questions
on the experience of the respondent in participating in the survey.

2.2. Experiment 1: the effect of validation checks
The full electronic questionnaire included a comprehensive system of internal validation
checks.3 The first experiment was set up to isolate the effects of these checks by
comparing full CAPI to restricted CAPI. The checks are believed to lead to more accurate
data capture, because they were run during the interview, at a time when they could still
be resolved with the respondent. The check procedure does not run automatically, but is
activated by the interviewer by manually clicking check-buttons. They are run at various
stages during the interview, typically after completing all the questions on one screen. A
final, global check can be run at the end of the interview. The checking procedure was

2
  The main questionnaire, implemented on behalf of MCA-Tanzania, included some additional sections on
prices, transfers, shocks, credit, self-help groups and the like. To avoid these sections interfering with the
experiment they were placed at the end of the main questionnaire. The full questionnaire is available from
the authors upon request.
3
  Examples of screen shots of the electronic questionnaire, including check buttons, are available in on-line
Appendix 1. The complete list of consistency checks is given in on-line Appendix 2.


                                                                                                             4
repeated by the supervisor at the end of each survey day, and once more by the data
processing team at headquarters after data transfer (usually the day after data collection).
The full CAPI application contained 366 consistency checks. These fall into three broad
categories, depending on whether they were designed to detect routing errors (248
checks), unlikely entries (61 checks) or impossible entries (57 checks). We will discuss
each of these checks in turn.

Over two thirds of the checks aimed at detecting violations of the questionnaire’s routing
scheme. Routing errors occur by answering a question that is supposed to be skipped, or
by skipping a question that is supposed to be answered. The questionnaire had a total of
152 variables, out of which 100 were dependent on previous answers and 52 were
unconditional. Each unconditional question had a single check detecting missing entry,
while each conditional question had two checks: one detecting missing entry and one
detecting an entry made in a disabled field. Four routing checks turned out to have
malfunctioned, leading to a total of 248 routing checks. Answers such as ‘don’t know’ or
‘refused’ were not recorded as missing, but had their own codes.

Another 16% of the checks constituted checks detecting impossible entries or impossible
combinations of entries. Some were simple range checks on a single variable, for
example verifying that the number of days a person reported to be ill for in the past 4
weeks did not exceed 28 or ensuring the value for a consumed quantity was not negative.
Others checked consistency across variables, highlighting, for example, situations where
the age someone started school at exceeded his current age, or a member’s relation to the
head of the household was ‘spouse’, but the head’s marital status was ‘never married’, or
a male person had pregnancy related problems. Some of these checks could have been
avoided by restricting the range of permissible responses in the first place (more on this
below).

The remaining 17% of the checks constituted checks detecting possible, but unlikely
entries, such as an uncommon number of cows, or an uncommon expenditure value.
Verifications for unlikely combinations of entries could trigger warning messages such as
“nobody in the household is older than 15 years”, “the main activity of person is full-time
student but person is not currently in school”, or “a house with a thatched roof is unlikely
to have electricity, please verify”. If an unlikely entry was detected, the interviewer was
obliged to verify with the respondent, and, if the unlikely entry turned out to be correct,
to comment on the situation to reassure the analyst that the data point was indeed correct.

Besides the system of 366 consistency checks, the full electronic questionnaire also
included a report summarizing the total calorific intake and its sources, as implied by the
entries in the consumption section, allowing the interviewer to verify the plausibility of
the consumption data.4 This consumption report was also part of full CAPI and omitted in
restricted CAPI, but it will be more completely discussed in Section 4 below.




4
    On-line Appendix 3 gives an example of a consumption summary report.


                                                                                           5
Finally, as respondents partake in resolving errors and inconsistencies, one could
hypothesise that attitudinal factors, such as belief in the accuracy and usefulness of the
survey, are affected by consistency checks.

2.3. Experiment 2: bundle of other CAPI features
Experiment 2 consisted of adding a further 320 PAPI questionnaires to the sample.
Because of the random nature of the questionnaire allocation, any difference between
restricted CAPI and PAPI can only be due to the bundled effect of all CAPI features,
excluding checks.

In line with most CAPI applications, we incorporated automated routing. The literature
stresses automated routing as one of the most important error reducing features of CAPI.
For example, Banks and Laurie (2000) note that reducing errors related to complex
routing in a 45 minute questionnaire was the main justification for migrating the British
Panel Household Survey to CAPI in 1999. Automated routing avoids asking a question
that should have been skipped, which may decrease the length of the interview, avoids
asking irrelevant questions (which confuses respondents and may lower the regard they
hold for the survey and its results) and decreases time spent correcting data after the
fieldwork. Automated routing also avoids the converse: skipping questions that should
have been asked and may therefore prevent dropping observations during analysis. In this
CAPI application, automated routing did not eliminate the need for routing checks.
Unlike other existing CAPI surveys, our experiment displayed multiple questions and
sections per screen and allowed the interviewer to continue the survey even if a required
field/section was left blank. We made a conscious decision to set the programme up like
that in order to allow the interviewer to return to a question later if, for example, the most
knowledgeable person was not around. If an interviewer backtracked to change a
response that determines subsequent routing, then an entry in a disabled field occurred.
Again, we could have set it up so that the computer deletes entries in disabled fields
automatically, but we were worried that that could lead to unintended data loss,
especially if gateway questions are accidentally changed after completing a section. The
experiment allows us to disentangle the effects of checks from those of automated
routing.

The data were stored in a relational database, using a record structure which eliminates
redundancy. Key identifiers were used to link the various data tables in a manner that
ensures the referential integrity of the complete dataset (this means, for example that a
household asset cannot exist without a related household, the identifier key being
common to both data tables). Answers to most questions were selected from pre-coded
drop-down menus or made use of radio-buttons. In some cases, drop-down menus were
altered dynamically, depending on previous responses, so that the interviewer was never
presented with an impossible response code. For example, when linking a woman to the
ID of the husband the drop-down menu was restricted to married men within the
household based on the previously filled in marital status and sex variables.5 GPS

5
  As pointed out by one referee, some of the checks could have been alternatively implemented by
restricting answer options. The spouse drop-down and the item-specific unit list in the consumption section
(described in Section 4.1) are two example of where we opted for this approach. In many places, however,


                                                                                                          6
coordinates and start and end times of the interview were captured automatically by the
computer, eliminating any scope for interviewer error. In PAPI, the interviewer needed to
copy the GPS coordinates from a GPS receiver and record start and end time of the
interview in the appropriate fields.6 Finally, PAPI had a data entry stage where paper
forms were re-keyed into the computer. There were numerous other smaller features that
could all add up to a cleaner dataset. The experiment was not set up to isolate the effect
of each of these features separately, so we can only identify them as a bundle of effects
driving the difference between restricted CAPI and PAPI.

Just like the system of consistency checks, also the bundle of other CAPI features may
contribute to the respondent’s attitude towards the survey. For instance, noticing that the
interviewer is using a computer device instead of pen and paper may increase the
respondent’s perception of survey reliability.

2.4. Implications for sample bias and analysis
One likely consequence of the survey errors as described is that they generate missing
variables and so reduce the effective sample size available for analysis. A questionnaire
with missing or obviously erroneous data may lead the analyst to drop the observation
entirely. If observations are randomly dropped, then one could simply increase the
sample size of a PAPI survey to compensate. If, however, such mistakes are correlated
with household characteristics otherwise of interest to the data user, then the analysis
could be affected. We set up a formal test for this in Section 3. Alternatively, an analyst
may decide to make assumptions about the problematic observations in order to avoid
dropping them from the sample. These assumptions may then introduce measurement
error. Section 4 analyses the nature of that measurement error and its effects on analysis.
The remainder of this section gives more detail on the types of checks full CAPI
included.

The share of questionnaires that have at least one impossible or missing entry potentially
leading to missing values in our dataset amounts to 2%, 40% and 83% in respectively full
CAPI, restricted CAPI and PAPI. Whether or not the analyst will drop an observation,
however, will probably depend on the willingness to make assumptions and the type of
analysis conducted. Table A1.1 in Appendix 1 lists the 15 most commonly occurring
missing values in any section in PAPI, excluding the consumption section (discussed
separately below and in Section 4).7 The most frequent errors are nonsensical survey
durations, which occur in 24% of PAPI questionnaire, but in virtually no CAPI


we preferred checks as it could confuse an interviewer if he or she fails to locate an expected response
option from the drop-down without any indication of which previous answer triggered its elimination from
the list.
6
  Time data are notoriously difficult to collect in Tanzania, because Swahili time is counted differently. 7
am is considered the first hour of the day and called “1 o’clock”. Time during the day is counted upwards
from there till 6 pm, which is called “12 o’clock” (the 12th hour of the day). After that the first hour of the
night is 7 pm and so forth. English and Swahili times are often mixed up in the same questionnaires.
7
  Note that we look at the 15 most common missing values in PAPI, as opposed to the 15 most common
missing values over all three applications. The main purpose of this table is to inform us on the type of
errors made in PAPI, and not necessarily to compare the frequency of missing values across the three
applications.


                                                                                                              7
questionnaires. One could think that interviewers were more negligent recording time
stamps, because they did not consider them an important focus of the study. The
questionnaire was implemented in the context of a rural upgrade project and thus any
questions on transport were especially important in the study. Despite this, many PAPI
questionnaires have problematic transport data. Appendix 1 shows that 9% of
questionnaires miss the amount paid to transport at least one sold agricultural item, 7%
miss data on the amount spent on transport to school for at least one member, 6% on the
one-way fare to school and 7% on the location at which crops sold fetched the highest
price. In practice an analyst may assume that by leaving the value blank the interviewer
may have wanted to indicate that they were supposed to be zero. Another analyst may
decide the interviewer made a mistake and place the value at the cluster or sample
median. Neither will have much basis for that decision. Robustness analysis for these and
hundreds of similar data cleaning decisions that need to be made in a typical dataset is
unlikely to be feasible. Assuming that a purist would want to drop any household that has
any of the four transport related question missing, then that would imply dropping 20%
of observations. The other potentially missing variables listed in Table A2.1 occur in core
variables, which are key to calculating statistics like fertility rates, literacy rates, the
number of people living with a disability and the number of landless households.

Table A2.2 in Appendix 1 lists the ten most common consumption related (potentially)
missing values. In terms of the share of questionnaires in which the error occurs at least
once, the most common consumption related error concerns food items for which the
three sources (‘purchases’, ‘home production’ and ‘gifts’) do not sum to the indicated
total. This error occurred at least once in more than 17% of all PAPI surveys. In
comparison, this error occurred only in 3 % of restricted CAPI households and in close to
none of the full CAPI.8 In terms of the average frequency per questionnaire, the top error
concerns the question “In the past 7 days did household consume any [Food Item]?”,
which was missing for 4 food items on average (out of a total of 53 items per survey) in
about 6 % of all PAPI surveys, 1 time on average in about 9 % of all restricted CAPI
surveys, and zero times in full CAPI surveys. In Section 4, we will determine whether
these inconsistencies lead to different analytical conclusions.

2.5. Interviewer effects
The quality of survey data depends to a large extent on both the technical capacity and
the integrity of the interviewers. We expect education level and previous survey
experience to improve the quality of survey data. In CAPI, the use of new survey
technology might pose additional challenges to the interviewers on the one hand. On the
other hand, we expect some CAPI features, such as automatic routing and the elaborate
system of validation checks, to assist the interviewers, possibly compensating for lower
education and experience. In PAPI, it is likely that interviewers make less routing and
consistency errors as the field work progresses, because they receive feedback from their
supervisors at the end of each survey day.

8
  The most likely reason why this error did not occur as frequently in restricted CAPI as in PAPI is that
CAPI displayed the total amount consumed coming from the three different sources on the screen, allowing
the interviewer to check. If, despite this, the sum was still not correct, then a consistency check in full CAPI
warned the interviewer of the mistake.


                                                                                                              8
3. Errors and Sample Size Reduction

3.1. Methods of Analysis
To investigate the effects of CAPI on errors and potential sample size reduction more
formally, we start by estimating Yijc (simply written as Yi in what follows), which is a
count of the number of problematic variables (some of which may potentially have to be
dropped in analysis) in the questionnaire of household i, interviewed by interviewer j in
community c:

(1)     Yijc = α + β Ci + γVi + ε i

where Ci indicates a CAPI questionnaire (a dummy equal to one for both full and
restricted CAPI and zero for PAPI) and Vi is a dummy set to one if the interviewer had
access to validation checks during the interview, which was only the case in full CAPI,
but not in restricted CAPI and PAPI. In equation (1) γ measures the effect of the
validation checks on the dependent variable, while β is an estimate of the bundled effect
of all other CAPI features that could influence the number of errors in a questionnaire.

If error counts depend on household characteristics otherwise of interest to the data user,
then the dropping observations with erroneous variables could introduce sample bias.9 To
investigate this, we check whether the number of problematic values in a questionnaire
depends on household characteristics Xi and whether CAPI can correct for this.
Therefore, we are particularly interested in the level effect of Xi as well as its interactions
with Ci and Vi:

(2)     Yijc = α + β Ci + γVi + δX i + φX i .Ci + ρX i .Vi + ε i

Where Yijc is a count of the number of variables that potentially have to be dropped or
cleaned in household i.

In a final specification we will estimate interaction effects of Ci and Vi with interviewer’s
characteristics such as months of experience with CAPI, months of experience with
PAPI, and years of education. This will allow us to verify whether the measured effects
differ with experience in either type of questionnaires, as well as with education level.

Although the set-up ensured that questionnaires were equally and randomly spread over
interviewers, clusters and time, we also verified that all results were robust to the controls
for additional factors that may influence the number of errors in an interview:
characteristics of the respondent (age, sex, literacy, whether a head of household),
characteristics of the interview (conducted on day one or two of the team’s visit), the
interviewer and the location. The latter two effects are included as cluster (μc) and


9
 Observations may not need to be dropped if cleaning assumptions are made. This may introduce
measurement error, the nature of which is the subject of Section 4.


                                                                                                9
interviewer (λj) fixed effects. We find that all estimations are robust to these further
controls, so will not report this further.

3.2. Routing Errors
Our measure for the number of routing errors is a simple count of the number of times an
unconditional variable was missing or a conditional variable mistakenly entered or
missing (dependent on previous responses). It should be noted that a single error early on
can sometimes have a cascading effect, creating a large number of routing errors
throughout the questionnaire. Table 1 shows that PAPI contained an average of 10
routing errors per survey, restricted CAPI 0.6 and full CAPI 0.0.

Column 1 and Column 2 in the first panel of Table 2 show that restricted CAPI
significantly reduces the total number of routing errors by almost 10 per questionnaire
compared to PAPI. Column 1 shows that there are on average 4 missing entries in
required fields (the constant in the regression without controls), out of which 3.5 are
eliminated through CAPI. The remaining 0.5 errors are wiped out by adding checks to
CAPI. All of the 6.3 entries made in fields that ought to have been skipped, on average in
PAPI, are eliminated by CAPI, with no additional effect of the checks. The latter type of
error is perhaps less problematic than the former one, but such ambiguity in the data is
nevertheless best avoided and will, in any case, add time to the interview (see below).
Taken together, this shows that 94% of routing errors are avoided through the automated
routing system and that the checks eliminate almost all those that remain. Appendix 2
shows that this does not lead to respondents reporting a smoother survey experience. It is
unlikely that this result stems from interviewers leaving ‘don’t know’ responses blank.
First, there were specific codes for such a response and the interviewers were trained
extensively on this matter. Secondly, a comparison of the occurrence of ‘don’t know’
answers across the three different experiments does not show any significant differences.
CAPI lends itself to the use of unfolding brackets to reduce ‘don’t know’ answers, but
this specific experiment did not make use of them.

3.3. Unlikely and Impossible Entries
Column 3 in the first panel of Table 2 shows that restricted CAPI reduces the number of
impossible entries by 0.34 per questionnaire compared to PAPI and adding checks further
reduces this number by 0.15, to almost zero. This means that in a dataset of 1200
households, moving from PAPI to full CAPI would reduce the number of impossible
entries by 588 in total. The bundled effect of ‘all other CAPI features’ on the occurrence
of impossible entries, as discussed in Section 2, seems to be larger than that of the
checks.

The last column in panel 1 of Table 2 shows that CAPI significantly reduces the number
of unlikely entries by 0.26 per household survey. This effect is even greater when checks
are available, with number of unlikely entries falling from 1.35 in PAPI to 0.63 in full
CAPI. This result suggests that, although some unlikely entries remain (once confirmed
to be correct by the interviewer), full CAPI successfully assists the interviewer in
detecting unusual entries that turn out to be incorrect after confirmation. Furthermore,




                                                                                           10
because the programme flags these entries and reminds the interviewer to comment, the
analyst is reassured that the data point is indeed correct.

Appendix 2 further shows that the techniques introduced by CAPI to avoid these errors
do not increase the credibility or usefulness of the results in the eyes of the respondent.

An unintended natural experiment occurred within the experiment. It was realised, during
analysis for this paper, that 13 validation rules had been erroneously omitted from the
programme. Tabulating the number of times each of these malfunctioning checks was
violated in the resulting dataset, we find no significant differences across the three types
of questionnaires. This suggests that CAPI is only as good as the features that get built
into it. Without checks or other error reducing features, CAPI has no impact on
impossible entries.

Panel 2 in Table 2 shows that there are 24% of households that had problematic interview
duration calculations in PAPI, but CAPI reduces this to virtually 0. The same panel
further shows that PAPI has 6.6% problematic GPS locations, which are largely
eliminated through CAPI’s automatic GPS capture. Enumeration Areas were very small
in Pemba and we can be confident that any location farther than 1 km away from the
cluster centre is problematic. One may argue that any analysis requiring the use of time
stamps or GPS locations should simply increase its sample size to account for this. But,
as will be shown next, these missing observations occur more frequently in certain types
of households.

3.4. Implications for Sample Size Reduction and Sample Bias
A missing or an impossible entry may cause an observation to be dropped, which may
lead to biased estimates if the missing values are non-randomly distributed across the
sample. To investigate this, Table 3 shows estimates of Equation (2) for four different left
hand side variables (the uninteracted results for these four regressions are shown in the
first two columns of panel 2 in Table 2). The dependent variables in the first two columns
are simple sums of the number of missing entries in required fields and the number of
impossible entries. This sum is first made for entries in any part of the questionnaire,
excluding the consumption section (column 1) and then separately for entries in the
consumption section (column 2). Both of these can lead to either dropping the
observation in question or making an ad-hoc data cleaning decision about what is going
on. The third dependent variable indicates whether or not there was a problem with the
time stamps and the fourth whether or not there was a problem with the GPS co-
ordinates. We do not use the information on entries in disabled fields or unlikely
observations as these two types of errors would likely not lead to dropping the
observation. Unlikely observations may introduce error and affect analysis, but that will
be the subject of Section 4 below.

Table 3 shows that the sum of the number of missing entries in required fields and the
number of impossible entries (as picked up by the validation rules) are dependent on
household characteristics. The first column shows how large, female headed and non-
farm households are more likely to have non-consumption related entries that could cause



                                                                                              11
the observation to be dropped or the entry to be altered by the analyst. The second
column shows a different pattern when focussing on the consumption section. We see
here that rich households make more errors, possibly due to their more complicated
consumption patterns. As expected, farming households have more problematic
consumption data, as a larger share needs to be estimated from home production, often
using subjective units of measurement. The coefficient of household size is now
significantly negative. The effect is not large – increasing household size with 5 members
would reduce the number of consumption errors in a questionnaire by an average of 0.4 –
but still significant and could be explained by the fact that larger households (more than 9
members) have only an average of 1.8 more consumed items compared to small
households (1-3 members), while smaller households are 40% more likely to use
decimals in their quantity estimation, which generally are more prone to erroneous
entries. Furthermore, while there is no difference in the types of consumption items
consumed by smaller households, the sources from which they obtain them are different:
small households have more consumption from gifts and may therefore be less familiar
with objective units of measurement found in the market place. The coefficient of female
headed household is no longer significant. Importantly, once the interaction with CAPI
and any of the characteristics discussed above is made, the effects disappear: the sum of
the level and interaction effects is never statistically significant (verified by the authors).
Interactions with age and education of head were not found to be significant.

Surprisingly, we find that even problems with time stamps and GPS locations are not
independent of household characteristics. In particular, they occur more frequently in
large households. This could be because large households have a much longer interview
time, as the questionnaire contains many roster questions that are repeated for each
member. Median interviewing times on paper are 53, 82 and 113 minutes for a 1, 3 and 8
person household, respectively. It took 141 minutes to interview the one 13 member
household in the survey. This increase in interviewer workload may reduce concentration
when copying time or GPS co-ordinates.10 We confirmed by a formal statistical test that
CAPI undoes the negative effect of household size on problematic GPS and interview
duration measurements.


3.5. Interaction effects with interviewer characteristics and survey period
Table 4 shows the interaction effects of our main variables of interest (CAPI and checks)
with total number of years of formal schooling and number of months of CAPI and PAPI
survey experience of the interviewer. We find that both PAPI survey experience and
education significantly reduce the total number of alerts (routing + impossible + unlikely
entries) in PAPI surveys. Interestingly, the number of years of CAPI survey experience
seems to significantly increase the number of errors on paper, for a given number of
years of PAPI experience. This suggests some unlearning of best-practice PAPI skills

10
   Lengthy questionnaires have more non-consumption errors in general. This is confirmed by the results of
a regression of the number of non-consumption related missing/impossible entries on the duration of
interview (not reported), which shows a significantly positive correlation between the two. Interestingly,
the interview length does not influence the number of missing/impossible entries in the consumption
section, which squares with the finding of the negative coefficient on household size here.


                                                                                                       12
once interviewers switch from paper to the computer. Both experience and education
effects disappear once CAPI is used (confirmed by formal statistical tests).Banks and
Laurie (2000) noted how PAPI interviewers can be easily re-trained to conduct CAPI.
This result suggests that CAPI can, to some extent, compensate for lower education and
experience level of interviewers, mainly because of automated routing. The interaction
effects of checks with education and experience are not significant.

Table 5 provides data on whether error rates drop as the survey progresses over time and,
if so, whether the pattern is different for CAPI and PAPI. To do this, we split the 37
survey days up into quartiles and include dummies for each period, both as levels and as
interactions with the CAPI and checks dummies. The results suggest that error rates do
indeed drop for PAPI, but not for CAPI. Compared to the first quartile, the total number
of alerts is significantly lower in subsequent survey quartiles for PAPI, with almost 10
alerts less per PAPI household survey in the last 9 days of the survey. Once the
interaction effects with CAPI are added to the level coefficients, the effect of the quartile
disappears (confirmed by formal statistical tests). In other words, there is no similar
learning effect for CAPI. One reason for this could be that the average number of alerts in
the first quartile of CAPI survey work is very low (0.8) relative to PAPI (18) and
therefore there is much less scope for improvements under CAPI than under PAPI (Table
1). Interactions with checks are insignificant. Taken together, the results from Table 4
and 5 suggest that CAPI is less dependent on interviewer experience, education and
interviewers learning over time as the survey progresses.




                                                                                          13
4. Measuring Consumption

4.1. Food Consumption
Estimates of poverty and welfare in developing countries are frequently calculated using
a consumption recall module in a household questionnaire. While the largest share of
consumption is related to food, it is exactly food consumption that is most problematic to
measure accurately. The typical food recall module will have the interviewer go over a
list of food items in two iterations. In the first iteration, each item consumed by the
household over the recall period is flagged. A second iteration then goes through the list
of flagged items and, for each, asks total household consumption and its decomposition
into sources (purchases, home production and other sources).

Three important problems arise. First, quantities are expressed in imprecise units;
households report consumption as “pieces” of cassava, “bundles” of spinach or
“bunches” of bananas (Capéau and Dercon, 2006). This leads to ambiguous item-unit
combinations. While the size of such units is subject to interpretation (large versus
small), the analyst needs a clear mapping onto metric units. Second, the list of units is
uniform for each consumption item, even though some units in the list do not apply to all
items (e.g. “litre” for “potatoes”). This causes conflicting item-unit combinations, usually
detected only much later during data analysis. Third, a completed consumption module
represents a rather unwieldy matrix making it hard for an interviewer to maintain an
overview of the consumption pattern of the household. Therefore, obvious errors and
irregularities in the reported consumption are only highlighted several months later when
researchers start analysing the data. At that point, the only solution is to either make an
ad-hoc assumption about what is meant, or omit the observation from the sample.

In CAPI the screen of the handheld device can be used to display pictures of vague units,
such as “bundle” or “bunch”, so that they can be more precisely mapped onto metric
units.11 The application can also tailor the list of units to be specific to the item, making it
impossible to, for instance, express potato consumption in litres or cooking oil in bags.
Finally, by mapping each item-unit combination to its calorific value, the computer can
summarize, in a report, the calorific intake pattern of the household12. This allows the
interviewer to carry out a report-based check during the interview, to verify whether the
total Kcal per AEU lies within reasonable boundaries and that the sources of calories are
sensible given the context in which the interviewer is conducting the work. We refer to
this report as the ‘consumption report’ in what follows. Additionally the automated
routing and the consistency checks, discussed in detail in section 2, are expected to
improve data quality. Some of these features could, in principle, be implemented in paper
questionnaires, although the logistics are more complicated here. This is especially true
for the automated routing, the consistency checks and the consumption report, which rely
on complex matrix manipulations and look-up tables. In this experiment full CAPI had all
three features, restricted CAPI omitted the checks (which we mean to include both the
validation checks and the consumption report on total food energy intake and its sources),


11
     Examples of pictures used are displayed in the on-line appendix 4.
12
     On-line Appendix 3 gives an example of such a report.


                                                                                             14
while PAPI also omitted pictures and item-specific units (e.g. we would just have
‘bunch’, or ‘litres’, as a possible unit code for reporting banana consumption).

Table 6 shows that in 1% of PAPI cases the item unit combination made no sense; in
these cases the calorie value was replaced by the median EA-level value in the
subsequent analysis. A further 42% of the item-unit combinations in PAPI were
ambiguous (pieces, bunches, bundles, heaps, etc.) and, in order to obtain a precise
conversion to Kcal values, an assumption about the exact size of the ambiguous unit
needed to be made. We used lower and upper bound estimates of the unit conversion
rates, as well as a mid-range value (a typical user would have likely used this mid-range
estimate). While upper- and lower-bound conversion rates were quite reasonably set13,
Table 6 shows that changing the assumptions on unit conversions from lower- to upper-
bound estimates raises calorific intake per AEU per day from 2,478 to 4,362. There is
also a substantial increase in the standard deviation as one goes from full CAPI (655) to
restricted CAPI (1,177) and PAPI (1,644 – 3,379). The number of outlier observations,
with values over 4,000 Kcal per adult per day, is 1% for full CAPI, 8% for restricted
CAPI and 7%, 20% and 35% for PAPI, depending on the conversion assumption.

These results suggest that the effect of the ‘other CAPI features’, probably pictures and
item-specific units in this case, depends on how far off the ad-hoc assumptions on unit
size in PAPI are from reality, while the effect of the checks is independent of this.
In fact, we do see that in CAPI the pictures of the smaller units were 14 times more likely
to be chosen than those of large units and nearly 2.5 times more likely than mid-range
units. Equipped with this knowledge we can adapt the unit conversions, but it is fair to
say that most similar surveys would base their unit conversions on much thinner data.
Because we know the small unit assumptions are closest to the truth, we will use these in
the remainder of the text. In this way we expect any differences between PAPI and CAPI
to be lower bound estimates. Finally, there are a number of other small data cleaning
decisions that needed to be made with regard to all the violated consistency checks.14

Would our assessment of the food situation have changed, depending on whether we did
a survey on paper or electronically? The answer depends on the calorific intake threshold
we consider when defining malnutrion. Had we done the survey on paper, we would have
concluded that 21% of households live on less than 1,500 KCals per AEU per day. Had
the same survey been conducted in full CAPI then the conclusion would have been that
8% of households live below this threshold. This difference is statistically significant at
well under 0.01%. Table 6 further shows that restricted CAPI puts the same figure at
14%, implying that, on average, 6 percentage points of the difference between full CAPI
and PAPI is due to checks, while 7 percentage points are due to other CAPI features.
Raising the threshold to 1,800 Kcal/AEU/day still shows a significant difference between
full CAPI and PAPI (p<0.01%), but the effect is completely due to the checks and not to


13
   In many cases the units could be matched to the CAPI pictures and the lower-bound was taken as the size
of the smallest depicted unit and the upper-bound the size of the largest unit.
14
   For example, we assumed that a missing source entry indicated zero consumption for that particular
source, as discussed in Section 3.4.


                                                                                                       15
the bundle of other CAPI features. When we consider 2,200 Kcal/AEU/day as the
malnutrion threshold, we do not find any significant differences between CAPI and PAPI.

4.2. Total Consumption, Poverty and Inequality
To arrive at a consumption measure, we place a monetary (Tanzanian Shilling (TZS))
value on food consumption. For purchased items this comes directly from the
respondent’s assessment of the value, while for gifts and home produced items unit prices
are used to convert the quantity estimations into monetary values. To this food
consumption is added a non-food consumption component, which was asked directly in
monetary value both on paper and in CAPI. As Pemba is a very small island and the
survey was concentrated in the Northern half only, we do not correct for prices. The
regression analysis, however, will always verify robustness to inclusion of cluster fixed
effects.

Table 7 shows that average consumption increases with 9% as one moves from full CAPI
to restricted CAPI and another 15% when moving to PAPI, creating a jump of 25% from
full CAPI to PAPI. These mean differences also translate into very different conclusions
regarding the number of people that live below the basic need poverty line, with the
poverty headcount going down from 83% in full CAPI to 68% in PAPI.15 Note that the
2005 Zanzibar Budget Household Survey (conducted on paper) reported a poverty
headcount of 72.54 on average for the region we consider (Wete and Micheweni
districts). Because both the differences between full and restricted CAPI, as well as
between restricted CAPI and PAPI are significant, we conclude that both the checks, as
well as the bundle of other CAPI features are important. Interestingly, the CDF drawn in
Figure 1 show that the effect of the checks depends on where one draws the poverty line.
For poor households there is no effect of the checks, while the effects start appearing
from around TZS 400,000 onwards. This evidence is consistent with the fact that rich
households have more complicated consumption patterns, where the power of the
computer is important to summarise the information in an intelligible way. With poorer
households, it is possible that consumption patterns are relatively uncomplicated so that
even without the consumption reports an interviewer has an intuitive sense of whether the
entries, as a whole, are reasonable.

Next, we investigate what this means for inequality. Consistent with expectations, we see
that full CAPI lowers the Gini coefficient to 0.24 from 0.30 in PAPI (p<0.01), a
difference almost entirely attributable to the checks and not the bundle of other CAPI
features. At least some of the variation picked up in restricted CAPI and PAPI is actually
measurement error rather than real inequality. Finally, we estimate Engel curves (log
food consumption regressed on a constant and the log of total consumption) within each
of the three different data sets and show, in Table 7, what the implied differences are with
respect to the calculation of the income elasticity of food consumption. We see, as with

15
   Our poverty line is set at a value of TZS 580,832 of annual consumption per aeu. To construct this
poverty line, we started from the basic need poverty lines of Wete and Micheweni (where this experiment
was conducted) as defined by the Zanzibar Household Budget Survey 2005. We then adjusted this poverty
line for inflation and differences in survey methods by multiplying it by a factor reflecting the difference in
median consumption between the 2005 HBS dataset and our own 2009 dataset.


                                                                                                            16
the Gini coefficient, pronounced differences when moving from full CAPI to restricted
CAPI, but not from restricted CAPI to PAPI. We conclude that it is mainly the checks
that explain the difference between paper and electronic questionnaires with respect to
Gini coefficients and income elasticities.

4.3. Classical Measurement Error in the Independent Variable: attenuation bias
The previous results demonstrate a tendency for the mean and spread of total
consumption to go up on paper, compared to full CAPI, translating in lower measured
poverty and higher measured inequality in PAPI. We have also seen that this can lead to
significantly different coefficients on the total consumption variable when estimating
Engel curves. We also know from Section 3 that simple error counts in consumption data
depend on wealth, household size and the occupation of the head. In this section, we
further analyse the nature of the measurement error (after cleaning, see footnote 17) by
exploiting the insight that an explanatory variable, measured with classical error, will
lead to attenuation bias. In order to test the hypothesis of zero attenuation bias, we
estimate the following estimation equation for three different outcomes Oi that can be
explained by consumption:

(3)    Oi = α + β C i + γVi + δLnCons i + φLnCons i .C i + ρLnCons i .Vi + ε i

Where Ci and Vi are defined as before and LnConsi is the log of total consumption per
AEU of household i. Finding δ < δ + φ and/or δ < δ + φ + ρ , would be consistent with
attenuation bias with PAPI. Table 8 shows three regressions. First one where i is a child
between 7 and 14 years old and Oi is the number of years of formal schooling completed
by the child, controlling for age fixed effects to compare only children of the same age. In
a second specification i is a school-going child in the sample and Oi is the amount spent
by the household on his or her education (in which case we drop the education
expenditure components from the consumption aggregate to avoid spurious correlation).
The third is a regression were Oi is a dummy for whether the child (0 to 14 years old)
slept under a treated mosquito net the night prior to the survey, again controlling for age
fixed effects. In all cases we would expect family wealth and its correlates, measured
through consumption, to influence the outcome in question, while children’s outcomes to
have no influence on consumption.

The results, displayed in Table 8, show that, on paper, the number of years of schooling a
child has attained is independent of household consumption. Once checks are included,
however, a positive association emerges, consistent with attenuation bias in the PAPI and
restricted CAPI results. Total consumption does explain schooling expenditures for
children at school and whether or not a child slept under a bednet, but the size of the
effect is estimated at around half of what full CAPI estimates, again consistent with
attenuation bias. The coefficient on CAPI is not significant, indicating that it is the
consumption reports and validation checks that are responsible for these differences and
not the bundle of other CAPI features.

One critique on these regressions could be that the left hand side variables are themselves
measured more accurately in CAPI compared to PAPI. The computer verified during the


                                                                                          17
interview, for example, that the grade attained was sensible given the age of the child,
flagged zero education expenditures for school-going children and expenditures over 3
million Tanzanian Shillings as unlikely and cross-checked the transport costs for going to
school with other entries in the transport section. Finally, there was cross-validation
check between bednet use and ownership of a bednet in the assets section. However,
estimating Oi = α + β C i + γVi + ε i gave insignificant coefficients on Ci and Vi, giving
some confidence that the results are not affected by this possibility.

We also estimated all coefficients restricting ourselves to within-sample estimates and
artificially (and randomly) reducing the full CAPI sample to have the same observations
as restricted CAPI and PAPI. All results remained intact after this exercise. Results are
also robust to inclusion of cluster fixed effects, controlling for respondent characteristics
(age, head of household, sex, literacy), household characteristics (household size),
interviewer fixed effects and interview characteristics (conducted on first day of team’s
visit). In 13 PAPI child records at least one of these variables was missing, in which case
we included a dummy indicating this.


4.4. Classical Measurement Error in the Dependent Variable
In regression analysis, classical measurement can attenuate the coefficient on an error-
ridden explanatory variable, but causes no bias when just the dependent variable has
error. To investigate whether the evidence of this experiment is consistent with the latter
property of classical measurement error, we run a regression explaining log total
consumption per AEU with various factors typically included in analysis of determinants
of household level poverty. In specifying our set of explanatory variables, we follow the
guidelines provided by Haughton and Khandker (2009). The explanatory variables
(household characteristics) included in the first regression in Table 9 are: household size,
dependency ratio (counting adults as people aged 15 to 65), a dummy for whether the
household head is female, average number of years of education of adult HH members
(aged 15 and above), number of days head was ill in the last 4 weeks, dummy for whether
the household owns its house, dummy for whether the house has a modern roof (made of
iron, concrete/cement, tiles or asbestos), proportion of employed adults in the household
and the acres of land owned. The regressions also control for cluster fixed effects.

We verified that all these variables are themselves not suffering from attenuation bias by
regressing each one on Ci and Vi and ensuring both coefficients are insignificant. We
considered, but omitted other variables, such as a dummy equal to 1 if head is not
employed in the agricultural sector, the number of productive assets and the number of
livestock owned by the household, because they were dependent on Ci and Vi and so
could suffer from attenuation bias. In this case, a significant interaction effect may simply
result from attenuation bias, and lead us to falsely reject the classical measurement
hypothesis.

Column 1 in Table 9 displays the results of this regression and finds no significant
interaction effects of CAPI and checks with determinants of poverty. This finding is
consistent with the dependent variable having classical measurement error. Dropping the


                                                                                           18
insignificant regressors from the analysis does not alter the results. For the sake of
completeness, column 2 adds the problematic regressors and finds their interactions are
indeed significant. This does not lead us to reject the classical measurement hypothesis,
as these results could simply be driven by attenuation bias on the PAPI estimates
(insignificant level effects), which gets alleviated through CAPI (significant interaction
effects).


5. Further Dimensions of Comparison: Cost, Length of Interview and Respondents’
perceptions.

CAPI has a larger fixed cost component: up-front outlays for the development of the
software and the purchase of the hardware. But many of the variable costs (per interview)
that are incurred for PAPI, such as printing and data entry, are eliminated in CAPI. This
makes CAPI, budget-wise, a more viable option for larger surveys. As a smooth-
functioning rental market for hand-held devices does not exist, surveys with fewer
interviewers, spread over a longer time will be cheaper, as fewer machines need to be
bought. Organisations that regularly conduct surveys could share machines across
projects to overcome this problem. The software used in this experiment (and the baseline
survey on which it was based) is based on Microsoft Access and had been under
development for 2 years. However, it still required about 50 consultancy days each of a
senior and junior programme developer to adapt it for this specific survey. The survey
had roughly 80% similarity with other surveys already conducted with the same
programme. This comes to a fixed cost of USD 40,000 in consultancy fees for making the
programme. A data entry application could be developed for around USD 4,000 using
consultants at similar rates. The UMPCs used in this experiment, including peripherals
such as extra batteries, a replacement battery after 2 years of use, GPS units, bags,
charging equipment, transport, were about USD 1,800 a piece and have an estimated life
time of 600 interviewing days or roughly 3 dollars per day. Interviewers conducted 3
interviews per day in this survey, so the variable UMPC cost per questionnaire was
roughly 1 dollar per questionnaire. The variable costs per paper questionnaire was about
USD 4 for data entry clerks and desktop computers, 4 dollars per questionnaire for data
entry management and supervision (including adjudication of errors) and USD 2 for
printing a single questionnaire. Thus in the context of this survey solving 40,000 + X =
4000 + 10X for X gives a break-even point of 4,000 questionnaires. Below this, paper is
cheaper and above this CAPI is cheaper. For example, a survey of 2,500 households
would be USD 13,500 more expensive on CAPI, while a survey of 10,000 households
would be USD 54,000 cheaper on CAPI. Adding wasted observations to this changes this
number. For example, if a paper survey needs to collect 10% more observations, then the
break-even point drops to 3,600 questionnaires. The break-even point can drop even
further if one considers reduced interview length (see below) and reduced data cleaning
efforts.

It is likely that by the time this paper is published these figures have already changed
substantially. Driven by the popularity of Apple’s iPad, many hardware manufacturers
are now developing their own UMPCs and it seems likely that the price of a machine will



                                                                                         19
go down to around USD 300. Also in terms of software, this project used a hard-coded
questionnaire that can only be adapted by experienced software engineers. This system
was used because existing products (e.g. the programme used by Fafchamps et al (2010)
did not allow us to build in the complexity we needed. Once the market develops off-the-
shelf products that do not require software engineers to be involved, there should be no
reason to believe that making an electronic questionnaire takes longer than making a data
entry programme. Fafchamps et al (2010), cite 75 hours of researcher time to build an
enterprise survey and 20 hours to programme the follow-up questionnaires; without
hiring the services of a software specialist.

Table 1 shows that interview time is reduced in CAPI by 10% (and by 14% in the
restricted CAPI without validation checks). Examples of CAPI features that may be
responsible for this reduction are the automatic routing system and the use of drop down
menus to select responses from (instead of codes listed in a box somewhere else on the
page). Finally, we see virtually no differences in the respondent’s perception (e.g. degree
of intimidation, perception of confidentiality, etc.) between CAPI and PAPI (see
Appendix 2).

6. Concluding Discussion
Many researchers and survey implementers are keen to switch from paper to electronic
surveys, but there is currently little quantitative, empirical information available to
inform that choice. This paper uses data from a survey experiment to identify differences
between PAPI and CAPI and finds that errors leading to missing variables in PAPI are
virtually eliminated in CAPI. A simple, compensatory increase in sample size on paper
cannot adequately deal with this problem, because observations are not randomly
dropped. The effect of CAPI is particularly evident when measuring consumption. We
find that paper questionnaires can lead to estimates of higher mean consumption, lower
poverty and higher inequality. We performed a number of regression analyses using the
consumption aggregates as dependent and explanatory variables. Paper and restricted
CAPI suffer from attenuation bias when consumption aggregates are used as regressors.
We do not, however, find evidence of bias when consumption aggregates are used as the
left hand side variable of a regression model. Hence, our evidence is consistent with
classical measurement error. While there is scope for mimicking some of the CAPI
features on paper, this seems unfeasible for the checks and reports made available to the
interviewer during the interview. Results show that these two features play a key role,
especially for reducing errors in consumption data. We further explain why this specific
CAPI product leads to higher fixed up-front costs, but lower variable per-questionnaire
costs. Finally, we show that interview times are significantly lower in CAPI, while there
is no change in respondents’ experience.

Some analytical caveats remain, however. First, the experiment does not provide an
independent validation of the data as in, for example, Bound and Krueger (1991) and
Bound et al. (1994). One could argue that in the analysis in Section 3, where we use error
counts at the left hand side of the regression models, any dependency between the checks
and the error count variables could bias the results.




                                                                                         20
A second critique is that an (apparently) error-free questionnaire is not the same as one
that reflects reality. One may worry that interviewers can now simply enter any data that
the computer is willing to accept. An unscrupulous interviewer could simply change a
value to anything that suppresses the error message rather than make the effort to obtain
the correct value. This survey was subject to intense quality control: supervisors did daily
direct observations and brief re-interviews of respondents to check the validity of the
data. An independent quality controller went to a random subset of 18 EAs to conduct re-
interviews. The interviewers were aware that these random quality control visits would
take place and no false data were detected in either PAPI or CAPI. As members of staff
of a survey company (EDI), most interviewers intend to be on board for several surveys
and several years.

Third, CAPI is not a panacea. We have found that the success of CAPI depends
substantially on the effort spent programming, piloting and testing the application, as well
as on careful consideration to the underlying data management and transfer systems. This
specific experiment required considerable resources refining the application before taking
it to the field. Without such preparation PAPI may well outperform CAPI. More
generally, one needs to keep in mind that this experiment had no variation with respect to
CAPI applications.

The literature from the developed world, 10-15 years ago, talks about the inevitability of
the switch from CAPI to PAPI; despite the challenges the benefits are so attractive that a
switch seems irresistible. It is our contention that a similar desire is growing amongst
survey practitioners working in lower income countries, and with good reason, as this
paper illustrates.

REFERENCES

Banks, Randy and Laurie, Heather (2000). “From PAPI to CAPI: The Case of the British
       Household Panel Survey.” Social Science Computer Review, 18(4): 397-406.

Black, Dan A., Berger, Mark C., and Scott, Frank A. (2000). “Bounding Parameter
       Estimates with Nonclassical Measurement Error.” Journal of the American
       Statistical Association, 95(451): 739-748.

Bound, John, Brown, Charles, Duncan, Greg J., and Rodgers, Willard L. (1994).
      “Evidence on the Validity of Cross-Sectional and Longitudinal Labor Market
      Data.” Journal of Labor Economics, 12(3): 345-368.

Bound, John, Brown, Charles, and Mathiowetz, Nancy (2001). “Measurement Error in
      Survey Data.” In Handbook of Econometrics Vol. 5. J. Heckman and E. Leamer
      (eds), Elsevier, pp. 3705-3843.

Bound, John and Krueger, Alan B. (1991). “The Extent of Measurement Error in
      Longitudinal Earnings Data: Do Two Wrongs Make a Right?” Journal of Labor
      Economics, 9(1): 1-24.



                                                                                         21
Capéau, Bart, and Dercon, Stefan (2006). “Prices, Unit Values and Local Measurement
      Units in Rural Surveys: an Aconometric Approach with an Application to Poverty
      Measurement in Ethiopia.” Journal of African Economies, 15(2): 181-211.

Chesher, Andrew, and Schluter, Christian (2002). “Welfare Measurement and
      Measurement Error.” Review of Economic Studies, 69(2): 357-378.

Couper, Mick P. and Burt, Geraldine (1994). “Interviewer Attitudes Toward Computer-
      Assisted Personal Interviewing (CAPI).” Social Science Computer Review, 12(1):
      38-54.

de Leeuw, Edith D. (2008). “The Effect of Computer-Assisted Interviewing on Data
       Quality: A Review of the Evidence.”, mimeo, Department of Methodology and
       Statistics, Utrecht University. Accessed at
       http://www.xs4all.nl/~edithl/surveyhandbook/deleeuw-cai-overview-updated.pdf.

de Leeuw, E. and Nicholls II, William L. (1996). “Technological Innovations in Data
       Collection: Acceptance, Data Quality and Costs.” Sociological Research
       Online, 1(4). Accessed at
       http://www.socresonline.org.uk/socresonline/1/4/leeuw.html.

Fafchamps, Marcel, McKenzie, David, Quinn, Simon, Woodruff, Christopher (2010).
      “Using PDA Consistency Checks to Increase the Precision of Profits and Sales
      Measurement in Panels.” CSAE Working Paper 2010-19.

Gibson, John, and Kim, Bonggeun (2007). “Measurement Error in Recall Surveys and the
      Relationship between Household Size and Food Demand.” American Journal of
      Agricultural Economics, 89(2): 473-489.

Haughton, Jonathan and Khandker, Shahidur R. (2009). “Understanding the
      Determinants of Poverty.”, Chapter 8 in Handbook on Poverty and Inequality,
      Vol. 1, World Bank, pp. 145-160.

Office of Chief Government Statistician Zanzibar, Zanzibar Household Budget Survey
       2004/2005, Zanzibar, Tanzania.

Nicholls II, W. and de Leeuw, E. (1996). “Factors in Acceptance of Computer-Assisted
       Interviewing Methods: A Conceptual and Historical Review.”, Proceedings of the
       Survey Research Methods Section, American Statistical Association. Available at
       http://www.amstat.org/sections/SRMS/Proceedings/papers/1996_130.pdf.

Taylor, Sue (1998). “Setting up Computer-Assisted Personal Interviewing in the
       Australian Longitudinal Study of Ageing.” Statistical Science, 13(1): 14-18.




                                                                                      22
23
Table 1: Summary statistics on errors, interviewer and survey characteristics and sample
          size
                                                                 Full       Restricted         PAPI
                                                                CAPI          CAPI
Routing errors        Average number of routing errors           0.0             0.6            10.4
                      per HH (total)                            (0.1)           (1.1)          (11.9)
                      Average nr of entries in to be             0.0             0.1             6.3
                      skipped fields per HH                     (0.0)           (0.6)           (9.3)
                      Average nr of missing entries in           0.0             0.5             4.0
                      required fields per HH                    (0.1)           (0.9)           (5.6)
Impossible/           Average nr of impossible entries           0.0             0.2             0.5
Unlikely entries      per HH                                    (0.0)           (0.4)           (1.1)
                      Average nr of unlikely entries per         0.6             1.1             1.4
                      HH                                        (1.0)           (1.2)           (1.4)
Errors/unlikely       Average nr of routing          1st         0.8             2.3            17.9
entries per survey    errors + impossible entries               (1.1)           (2.2)          (16.6)
period quartile       + unlikely entries per         2nd         0.8             2.1            12.2
                      survey period quartile (37                (1.0)           (1.8)          (11.6)
                      survey days in total)          3rd         0.5             1.7             9.4
                                                                (0.9)           (1.6)          (10.2)
                                                      4th        0.4             1.2             8.4
                                                                (0.7)           (1.2)           (7.4)
GPS data              % HHs > 1 km from cluster centre
                      (likely outliers given the small size      0.6             1.3             6.6
                      of the EAs)
Time stamp data       % surveys with problematic time
                      stamps                                     0.9             0.3            23.8

                      % surveys conducted on day 1 of
                      the cluster visit                          45.6           50.0            49.1

                      Average survey duration1                    81             78              89
                                                                 (24)           (23)            (25)
Interviewer           Average PAPI survey experience             5.7             5.7             5.7
characteristics       (months)                                  (8.9)           (8.9)           (8.9)
                      Average CAPI survey experience             7.4             7.3             7.4
                      (months)                                  (4.6)           (4.6)           (4.6)
                      Average education (years)                 13.3            13.6            13.4
                                                                (1.3)           (1.3)           1.3)
                      Total nr of interviewers                                    20
Sample size           Total nr of clusters                                        80
                      Total nr of households per cluster
                                                                1200            320             320
    Notes: Values in parentheses are standard deviations
    1. For full CAPI only the modules overlapping with the experiment were counted towards the survey duration
        (see Section 2 for details)




                                                                                                             24
Table 2: Effect of CAPI and Checks on data quality
LHS           Routing errors:         Routing errors:         Impossible          Unlikely entries
              Missing entries         Entries in fields       entries
              in required             that should have
              fields                  be been skipped


Panel 1       OLS                     OLS                     OLS                 OLS
CAPI          -3.544***               -6.225***               -0.337***           -0.259***
              (0.187)                 (0.306)                 (0.040)             (0.087)
Checks        -0.490***               -0.117                  -0.148***           -0.461***
              (0.149)                 (0.244)                 (0.032)             (0.069)
Const.        4.038***                6.344***                0.487***            1.347***
              (0.132)                 (0.216)                 (0.028)             (0.061)

Panel 2
LHS           Potentially             Potentially
              missing values in       missing values in
                                                              Time Stamp
              non-                    consumption                                 GPS Problems
                                                              Problems
              consumption             section
              sections
              (OLS)                   (OLS)                   (LPM)               (LPM)
CAPI          -2.800***               -1.081***               -0.234***           -0.053***
              (0.142)                 (0.121)                 (0.015)             (0.010)
Checks        -0.287**                -0.351***               0.006               -0.007
              (0.113)                 (0.096)                 (0.012)             (0.008)
Const.        3.091***                1.434***                0.238***            0.066***
              (0.101)                 (0.085)                 (0.011)             (0.007)
Notes: N=1840. *** p<0.01, ** p<0.05, * p<0.1
1. Standard errors are shown in parentheses
2. All estimates are robust to cluster and interviewer fixed effects, controlling for respondent characteristics (age, head
of household, sex, literacy), household characteristics (No. of household members) and interview characteristics
(conducted on first day of team’s visit). In 16 PAPI households at least one of these variables was missing, in which
case we included a dummy indicating this. Potentially missing values are measured as the sum of missing and
impossible entries.




                                                                                                                         25
Table 3: Interaction effects with household characteristics
                                                                     Missing/      Missing/      Time Stamp       GPS
                                                                     Impossible    Impossible    Problems         Problems
                                                                     entries in    entries in    (LPM)            (LPM)
                                                                     non-          consumption
                                                                     consumption   section
                                                                     sections      (OLS)
                                                                     (OLS)
                                                                     coef/se       coef/se       coef/se          coef/se
CAPI                                                                 0.197         -1.369***     -0.163***        0.013
                                                                     (0.434)       (0.379)       (0.049)          (0.032)
Checks                                                               -0.203        -0.446        0.005            -0.022
                                                                     (0.333)       (0.291)       (0.037)          (0.025)
Dummy = 1 if household head is female                                1.367***      -0.044        -0.010           0.003
                                                                     (0.271)       (0.237)       (0.030)          (0.020)
Household size                                                       0.421***      -0.084**      0.014***         0.008***
                                                                     (0.040)       (0.035)       (0.005)          (0.003)
Dummy = 1 if head not employed in agriculture                        1.149***      -0.408**      0.013            0.015
                                                                     (0.209)       (0.183)       (0.023)          (0.015)
Dummy = 1 if HH belongs to richest 25th percentile                   -0.049        0.701***      -0.034           0.014
                                                                     (0.212)       (0.186)       (0.024)          (0.016)
CAPI interacted with:
Dummy = 1 if household head is female                                -1.322***     -0.084        0.006            -0.020
                                                                     (0.370)       (0.323)       (0.041)          (0.027)
Household size                                                       -0.410***     0.079         -0.014**         -0.010**
                                                                     (0.055)       (0.048)       (0.006)          (0.004)
Dummy = 1 if head not employed in agriculture                        -1.069***     0.456*        -0.019           -0.007
                                                                     (0.297)       (0.259)       (0.033)          (0.022)
Dummy = 1 if HH belongs to richest 25th percentile                   0.027         -0.882***     0.046            -0.005
                                                                     (0.310)       (0.271)       (0.035)          (0.023)
Checks interacted with:
Dummy = 1 if household head is female                                -0.047        0.128         -0.002           0.027
                                                                     (0.284)       (0.249)       (0.032)          (0.021)
Household size                                                       -0.009        0.005         0.001            0.002
                                                                     (0.043)       (0.037)       (0.005)          (0.003)
Dummy = 1 if head not employed in agriculture                        -0.080        -0.044        0.004            -0.006
                                                                     (0.235)       (0.206)       (0.026)          (0.017)
Dummy = 1 if HH belongs to richest 25th percentile                   0.030         0.184         -0.008           -0.006
                                                                     (0.260)       (0.227)       (0.029)          (0.019)
_cons                                                                0.002         1.813***      0.166***         0.010
                                                                     (0.320)       (0.280)       (0.036)          (0.024)
N                                                                    1,840         1,840         1,840            1,840
Notes: *** p<0.01, ** p<0.05, * p<0.1
1. Standard errors are shown in parentheses
2. All estimates are robust to cluster fixed effects, controlling for interview
characteristics (interviewer ID, day of team’s visit).




                                                                                                             26
Table 4: Interaction effects with interviewer characteristics

                                        TOTAL nr of alerts              TOTAL nr of alerts


 CAPI                                  -10.366***                       -30.967***
                                       (0.424)                          (4.865)
 Checks                                -1.217***                        -5.048
                                       (0.337)                          (3.872)
 PAPI experience (months)                                               -0.315***
                                                                        (0.037)
 CAPI experience (months)                                               0.199***
                                                                        (0.065)
 Formal education (years)                                               -1.833***
                                                                        (0.252)
 CAPI interacted with:
 PAPI experience (months)                                               0.284***
                                                                        (0.052)
 CAPI experience (months)                                               -0.188**
                                                                        (0.092)
 Education (years)                                                      1.526***
                                                                        (0.356)
 Checks interacted with:
 PAPI experience (months)                                               0.027
                                                                        (0.042)
 CAPI experience (months)                                               0.003
                                                                        (0.073)
 Education (years)                                                      0.274
                                                                        (0.283)
 _cons                                  12.216***                       37.015***
                                        (0.300)                         (3.444)
 N                                      1,840                           1,840
Notes: *** p<0.01, ** p<0.05, * p<0.1
         1. Standard errors are shown in parentheses
         2. All estimates are robust to cluster fixed effects, controlling for respondent characteristics (age, head of
         household, sex, literacy), household characteristics (No. of household members) and interview characteristics
         (conducted on first day of team’s visit).
         3. Total Nr. of alerts is the sum of routing errors and impossible and unlikely entries.




                                                                                                                    27
Table 5: Interaction effects with survey period

                                         TOTAL nr of alerts                TOTAL nr of alerts


 CAPI                                    -10.366***                        -15.657***
                                         (0.424)                           (0.782)
 Checks                                  -1.217***                         -1.528**
                                         (0.337)                           (0.616)
 Survey days 10-18                                                         -5.723***
                                                                           (0.784)
 Survey days 19-27                                                         -8.509***
                                                                           (0.817)
 Survey days 28-37                                                         -9.522***
                                                                           (0.817)
 CAPI interacted with:
 Survey days 10-18                                                         5.568***
                                                                           (1.107)
 Survey days 19-27                                                         7.878***
                                                                           (1.149)
 Survey days 28-37                                                         8.433***
                                                                           (1.156)
 Checks interacted with:
 Survey days 10-18                                                         0.156
                                                                           (0.879)
 Survey days 19-27                                                         0.418
                                                                           (0.908)
 Survey days 28-37                                                         0.756
                                                                           (0.922)
 _cons                                   12.216***                         17.941***
                                         (0.300)                           (0.558)
 N                                       1,840                             1,840
Notes: *** p<0.01, ** p<0.05, * p<0.1
         1. Standard errors are shown in parentheses
         2. All estimates are robust to cluster and interviewer fixed effects, controlling for respondent characteristics
         (age, head of household, sex, literacy) and household characteristics (No. of household members) and
         interview characteristics (conducted on first day of team’s visit).
         3. Total Nr. of alerts is the sum of routing errors and impossible and unlikely entries.




                                                                                                                        28
Table 6: Summary statistics on calorific intake per AEU per day (per HH)
                                                  Full          Restricted                           PAPI
                                                 CAPI             CAPI
 Item – unit              No problem              98.4             98.0                               56.8
 combinations in
 food consumption         Ambiguous               1.6               2.0                               42.0
 section (%)
                          Non valid               0.0               0.0                                1.2
 Unit size assumption                             N/A               N/A            Small          Medium             Large
 Calorific intake per Mean                       2,297             2,471            2,478          3,145             4,362
 AEU per day                                     (655)            (1,177)          (1,644)        (2,038)           (3,379)
 (Kcal)4              Min                         413               235              232            232               232
                          Max                    5,117            10,528           18,969          20,181           23,057
 Outliers in the          < 1000
 distribution of                                  1.8               3.4              3.8             1.9              1.3
 calorific intake
 (Kcal) per AEU per
 day (%)                  > 4000                  0.8               7.8              7.2            19.7              35.4
 Malnutrition: % of       <1,500
 households under                               8.3***            14.4**            20.6
 threshold of
 calorific intake
 (Kcal) per AEU per       <1,800
 day.5                                          23.1**             29.4              29.4

                          <2,200                  47.5             50.3              51.9
 N                                               1200               320              318             318              318

Notes:   1. Standard errors are shown in parentheses
         2. Calorific intake values include meals taken outside of the household by household members. AEU
            positively adjusted for nr of meals taken by guests in the household.
         3. N/A = Not applicable
         4. Two PAPI observations with calorific intake values based on small unit assumption over 23,570 Kcal per
            AEU per day were excluded from the analysis to avoid these outliers driving the results.
         5. Asterisks indicate p-values for a one-sided t-test whether value is significantly different compared to PAPI.
            *** p<0.01, ** p<0.05, * p<0.1




                                                                                                                     29
Table 7: Consumption, Poverty and Inequality
                                                                                 significance of t-test

                                (1)             (2)               (3)            (1)        (2)       (1)
                            Full CAPI        Restricted          PAPI             =          =         =
                                               CAPI                              (2)        (3)       (3)
 Total Consumption
                              435,251          475,889          546,223
 per AEU mean and                                                                ***        ***       ***
                             (222,881)        (279,349)        (337,538)
 (standard deviation)
 Poverty Headcount              83.0             74.6             68.3           ***         *        ***
 Gini                            .24              .29              .30
                                                                                 **          =        ***
 (95% CI)                     (.22-.25)        (.26-.32)        (.27-.32)
 Income elasticity of            .89              .95              .94
                                                                                  *          =        ***
 food consumption               (.01)            (.02)            (.02)
 N                              1200             319               319


Notes:   1. Standard errors are shown in parentheses; Standard errors of the Gini coefficients were calculated using the
         jacknife method;
         2. Calorific intake values include meals taken outside of the household by household members. AEU
            positively adjusted for nr of meals taken by guests in the household;
         3. N/A = Not applicable;
         4. Two PAPI outlier observations with total consumption over 3,000,000 are dropped;
         5. Asterisks indicate p-values for a one-sided t-test whether value is significantly different compared to PAPI.
            *** p<0.01, ** p<0.05, * p<0.1 and = p≥0.1;
         6. PAPI food consumption is calculated using the small unit assumption;
         7. Income elasticities are those of the interaction term between log total consumption and a dummy indicating
            which sample the observations is from in a regression that includes only the 2 samples being tested;




                                                                                                                     30
Table 8: Estimates of Equation (3): Attenuation Bias
                                                                                          Child slept under
                                        No. of years of        Schooling
                                                                                          a treated bednet
                                        schooling              expenditures on
                                                                                          night before
                                        (children aged         school-going
                                                                                          survey (children
                                        7-14)                  children4
                                                                                          aged 0-14)
                                        Age FE                 Age FE                     LPM Age FE
                                        -0.451                 39,915                     -0.035
CAPI (β)
                                        (2.316)                (28,616)                   (0.629)

                                        -3.808**               -82,896***                 -1.963***
Checks (γ)
                                        (1.842)                (22,376)                   (0.509)

Log total consumption per aeu           0.204                  8,000***                   0.134***
(δ)                                     (0.135)                (1,676)                    (0.036)

Interaction of log total
consumption per aeu with:

                         CAPI ( φ )     0.054                  -2,958                     0.004
                                        (0.180)                (2,219)                    (0.049)

                                        0.292**                6,477***                   0.154***
                        Checks (ρ)
                                        (0.144)                (1,745)                    (0.040)
                                   N    2,683                  2,137                      5,148

p-value of F-test
(δ + φ + ρ = 0 )                        0.00                   0.00                       0.00
Notes: *** p<0.01, ** p<0.05, * p<0.1
1. Standard errors are shown in parentheses;
2. All regressions control for age FE;
3. All estimates are robust to cluster and interviewer fixed effects, controlling for respondent characteristics (age, head
of household, sex, literacy) and household characteristics (No. of household members) and interview characteristics
(conducted on first day of team’s visit).
4. To avoid spurious correlation in the results of column 2, we exclude the education component from the consumption
aggregate used in that regression.




                                                                                                                        31
Table 9: Implications of measurement error in the dependent variable
                                                                     Log total consumption/aeu             Log total consumption/aeu
                                                                     Final set of regressors               Initial set of regressors
                                                                                    (1)                                    (2)
CAPI                                                                 -0.203        (0.196)                 -0.279         (0.209)
Checks                                                               0.055         (0.154)                 0.124          (0.163)
Household size                                                       -0.077*** (0.011)                     -0.075*** (0.011)
Dependency ratio (adults 15-65 years old)                            0.020         (0.023)                 0.021          (0.023)
Dummy = 1 if household head is female                                -0.233*** (0.066)                     -0.212*** (0.067)
Average education (years) adults (15+)                               0.018**       (0.008)                 0.015*         (0.008)
Number of days head ill in last 4 weeks                              -0.007**      (0.003)                 -0.007**       (0.003)
Dummy = 1 if HH owns its house                                       0.105         (0.079)                 0.092          (0.079)
Dummy = 1 if house has robust roof                                   0.085*        (0.049)                 0.063          (0.050)
Proportion of employed adults in the HH                              0.211*        (0.128)                 0.227*         (0.130)
Acres of land owned                                                  0.028*        (0.015)                 0.029*         (0.015)
Dummy = 1 if head not employed in agriculture                                                              0.062          (0.052)
Number of productive assets owned by the HH                                                                0.056          (0.037)
Number of livestock owned by the HH                                                                        -0.000         (0.002)
CAPI interacted with:
Household size                                                       0.010           (0.015)               0.002           (0.015)
Dependency ratio (adults 15-65 years old)                            0.015           (0.034)               0.013           (0.033)
Dummy = 1 if household head is female                                0.068           (0.088)               0.098           (0.089)
Average education (years) adults (15+)                               0.012           (0.011)               0.012           (0.012)
Number of days head ill in last 4 weeks                              0.007           (0.004)               0.007           (0.004)
Dummy = 1 if HH owns its house                                       -0.055          (0.112)               -0.048          (0.111)
Dummy = 1 if house has robust roof                                   -0.028          (0.069)               -0.046          (0.069)
Proportion of employed adults in the HH                              -0.109          (0.184)               -0.095          (0.187)
Acres of land owned                                                  0.020           (0.023)               -0.003          (0.023)
Dummy = 1 if head not employed in agriculture                                                              0.020           (0.072)
Number of productive assets owned by the HH                                                                0.047           (0.050)
Number of livestock owned by the HH                                                                        0.017***        (0.005)
Checks interacted with:
Household size                                                       0.008           (0.011)               0.016           (0.011)
Dependency ratio (adults 15-65 years old)                            -0.023          (0.027)               -0.024          (0.027)
Dummy = 1 if household head is female                                -0.048          (0.066)               -0.061          (0.067)
Average education (years) adults (15+)                               -0.007          (0.009)               -0.009          (0.010)
Number of days head ill in last 4 weeks                              -0.002          (0.003)               -0.002          (0.003)
Dummy = 1 if HH owns its house                                       -0.053          (0.089)               -0.052          (0.087)
Dummy = 1 if house has robust roof                                   0.036           (0.054)               0.064           (0.054)
Proportion of employed adults in the HH                              -0.073          (0.147)               -0.067          (0.150)
Acres of land owned                                                  -0.023          (0.019)               -0.003          (0.019)
Dummy = 1 if head not employed in agriculture                                                              0.000           (0.057)
Number of productive assets owned by the HH                                                                -0.078**        (0.039)
Number of livestock owned by the HH                                                                        -0.012***       (0.004)
_cons                                                                13.242***       (0.143)               13.164***       (0.151)
N                                                                    1,837                                 1,837
Notes: *** p<0.01, ** p<0.05, * p<0.1; Standard errors are shown in parentheses; Both equations control for cluster fixed effects; Robust
roof materials are considered iron, concrete/cement, tiles or asbestos; Robust to different measures of household health (e.g. average health
household members) and household educational level (e.g. average level of education of all HH members); The choice of measures used in
table 9 was based on minimizing the difference in means of the measures between CAPI and PAPI to avoid attenuation bias affecting the
results; Three observations with outliers in land area owned are dropped from the sample.



                                                                                                                      32
Figure 1: Cumulative Distribution Function

                               Cumulative Distribution Function
    1
    .8
    .6
    .4
    .2
    0




          0                          500000               1000000                                  1500000
                                       Annual consumption per AEU

                                         Full CAPI                   Restricted CAPI
                                         PAPI


Notes: 15 values over 1,600,000 are not displayed on the graph for a better presentation. The vertical line presents the
       poverty line.




                                                                                                                      33
Appendix 1: Most common missing values (missing + erroneous entries).
Table A1.1: 15 most common missing values outside the consumption section in PAPI
Section/Level    Validation Check Message                                                                                % of surveys where error    Average freq. of error in
                                                                                                                          occurred at least once       surveys where error
                                                                                                                                                            occurred
                                                                                                                        Full    Restricted   PAPI   Full   Restricted     PAPI
                                                                                                                        CAPI      CAPI              CAPI      CAPI
Start &          Error in interview duration calculation                                                                0.9%      0.3%       23.8
finish/HH                                                                                                                                      %
Agriculture/     Missing: Over the past year, how much did you spend on transport in order to sell the products         0.0%      0.0%       9.4%    0.0       0.0        1.4
Crop Item        from [Crop Item]?
Health/          Missing: In the past 5 years, has [Household Member] given birth to children (including children       0.1%      3.1%       8.1%    1.0       1.0        1.1
HH Member        dead)?
Education/       Missing: How much was spent by the members of your household on [Household Member]'s                   0.0%      0.0%       7.2%    0.0       0.0        1.3
HH Member        transport to/from school?
Agriculture/     Missing: In which of those 2 places (i.e. selling places) do you fetch the highest price per unit of   0.0%      0.3%       6.9%    0.0       1.0        1.4
Crop Item        [Crop Item].
GPS/HH           Error with GPS location                                                                                0.6%      1.3%       6.6%
Education/       Missing: Has [Household Member] ever attended school?                                                  0.0%      0.0%       6.3%    0.0       0.0        1.3
HH Member
Health/          Missing: Is [Household Member] permanently physically or mentally disabled in any way which            0.0%      0.0%       5.6%    0.0       0.0        2.4
HH Member        limits or prevents normal daily activities or work?
Education/       Missing: What is the one way fare to go to school using [Transportation Mode]?                         0.0%      0.3%       5.6%    0.0       1.0        1.3
Transport Item
Agriculture/HH   Missing: Do you own any agricultural land/farm (including grazing and fallow land)?"                   0.0%      0.0%       5.3%
Agriculture/     Missing: Rank the most important crops for generating cash income                                      0.0%      0.0%       4.7%    0.0       0.0        1.6
Crop Item
Demographics/    Missing: What is [Household Member]’s marital status?                                                  0.0%      0.6%       4.4%    0.0       1.0        1.6
HH Member
Education/       Missing: Can [Household Member] read and write?                                                        0.0%      0.0%       4.4%    0.0       0.0        1.0
HH Member
Demographics/    Missing: Is [Household Member] an actual member of this household (satisfying some                     0.0%      0.0%       4.1%    0.0       0.0        1.2
HH Member        membership criteria, such as being present at the household for at least 9 out of 12 past months)?
Health/          If the number of days person was too ill to perform his/her normal daily activities in the past 4      0.0%      0.0%       3.8%    0.0       0.0        1.5
HH Member        weeks is greater than 0, the answer to "In the past 12 months was [Household Member] too ill to
                 perform his/her normal daily activities?" cannot be "no"..




                                                                                                                                                      34
Table A1.2: 10 most common missing values in consumption section in PAPI

Section/Level   Validation Check Message                                                                          % of surveys where error     Average freq. of error in
                                                                                                                   occurred at least once         surveys where error
                                                                                                                                                       occurred
                                                                                                                 Full    Restricted   PAPI   Full     Restricted    PAPI
                                                                                                                 CAPI      CAPI              CAPI        CAPI
Consumption/    The total amount of consumption of this food item differs from the sum (‘How much came from      0.1%      2.8%       17.2    1.0         1.0         1.5
Food Item       purchases’ + ‘How much came from own production’ + ‘How much came from gifts and other                                 %
                sources’)
Consumption/    Missing: How much [Food Item] came from ‘gifts and other sources’ in the past [Recall Period]?   0.0%      5.0%       14.4    0.0        1.0        1.3
Food Item                                                                                                                              %
Consumption/    Missing: How much [Food ITEM] came from ‘own production’ out of what was spent in the last       0.0%      3.4%       10.3    0.0        1.0        1.2
Food Item       [Recall Period]?                                                                                                       %
Consumption/    Missing: UNIT of ‘How much [Food Item] did your HH consume in the past [Recall Period]’?         0.0%      0.0%       10.0    0.0        0.0        1.2
Food Item                                                                                                                              %
Consumption/    Missing: How much [Food Item] came from ‘purchases’ out of what was spent in the last [Recall    0.0%      0.6%       7.2%    0.0        1.0        1.9
Food Item       Period]?
Consumption/    Missing: How much expenditure information for [Household Member] is not captured in what you     0.1%      0.9%       6.3%    1.0        1.3        2.0
HH Member       have mentioned to me?
Consumption/    Missing: In the past 7 days did household consume any [Food Item]?                               0.0%      9.1%       5.6%    0.0        1.3        3.8
Food Item
Consumption/    Missing: How much [Food Item] did your household consume in the past [Recall Period]?            0.0%      0.3%       5.0%    0.0        1.0        1.0
Food Item
Consumption/    Missing: In the past [Recall Period] did household consume/purchase any [Non Food Item]?         0.0%      5.6%       4.1%    0.0        1.2        1.1
Non Food Item
Consumption/    The household has consumed [Food Item], but no source for obtaining this item                    0.0%      0.0%       1.9%    0.0        0.0        1.0
Food Item       (Purchased/Production/gifts) has been selected




                                                                                                                                                35
APPENDIX 2: Respondent’s perception, per survey method1

Question                                               Response options           Full         Restricted     PAPI
                                                                                 CAPI1          CAPI
                                                                                  (%)            (%)           (%)
What did you think of the duration of the                  Short (1)              26.3           35.1          37.5
interview?1,2                                              Long (2)               73.7           64.9          62.5
                                                      Standard deviation           0.4            0.5           0.5
Did you enjoy participating in the interview?               Yes (1)                6.8            9.1          10.2
                                                            No (2)                93.2           90.9          89.8
                                                      Standard deviation           0.3            0.3           0.3
How smooth did you find the interview terms                 Bad (1)                1.2            2.0           1.0
of flow of questions?                                     Normal (2)              29.3           30.2          27.3
                                                       Rather good (3)            61.7           60.1          63.5
                                                        Very good (4)              7.7            7.8           8.3
                                                      Standard deviation           0.6            0.6           0.6
Do you believe your answers will get used                   Yes (1)               88.8           87.8          87.4
for policy making?                                          No (2)                11.2           12.2          12.6
                                                      Standard deviation           0.3            0.3           0.3
We are not talking about this specific               Not reliable at all (1)       2.5            2.9           2.2
interview, but do you think that results of           Rather reliable (2)         90.7           91.2          90.5
these types of surveys are generally reliable?         Very reliable (3)           6.8            5.8           7.0
                                                      Standard deviation           0.3            0.3           0.3
Do you believe that the information you                     Yes (1)               95.1           98.1          96.1
provided in the interview is 100 %                          No (2)                 4.9            2.0           3.8
confidential?                                         Standard deviation           0.2            0.1           0.2
If we went through the survey again, do you                 Yes (1)               25.0           20.2          20.1
think any answers would change?                             No (2)                75.0           79.8          79.9
                                                      Standard deviation           0.4            0.4           0.4
Did you feel comfortable talking to the                  Not at all (1)            4.3            2.0           3.5
interviewer?                                              A little (2)            34.0           27.6          32.5
                                                        Very much (3)             61.7           70.5          64.0
                                                      Standard deviation           0.6            0.5           0.6
How nervous did you feel during the                      Not at all (1)           86.7           90.9          86.7
interview?                                                A little (2)            11.1            8.1           9.8
                                                           A lot (3)               2.2            1.0           3.5
                                                      Standard deviation           0.4            0.3           0.5
How difficult did you find the questions?3                 Easy (1)               38.3           29.6          35.7
                                                         Difficult (2)            61.7           70.5          64.3
                                                      Standard deviation           0.5            0.5           0.5
If were not recording the answers (just                     Yes (1)               29.6           31.8          27.7
talking to you), do you think you would have                No (2)                70.4           68.2          72.3
answered anything differently?                        Standard deviation           0.5            0.5           0.5
Would you participate to this survey again                  Yes (1)               95.4           97.0          98.1
next year?                                                  No (2)                 3.1            1.7           1.3
                                                          Maybe (3)                1.5            1.3           0.6
                                                      Standard deviation           0.3            0.3           0.2
1. Full CAPI covered more modules than restricted CAPI and PAPI and hence had a longer interview time, which may
   affect responses (See Section 2 for details).
2. Answers grouped as Short = ‘short’ + ‘just fine’; Long = ‘Long’ + ‘Too long’ (with no change to results)
3. Answers grouped as Easy = ‘Very easy’ + ‘Rather easy’; Difficult = ‘Rather difficult’ + ‘Very difficult’ (with no
   change to results)




                                                                                                                 36
ON-LINE APPENDICES

The following appendices are not intended to be printed on paper, but could be
available on-line.

Online Appendix 1: Screen shots of CAPI

Screen shot of section 2: HH head info




Screen shot of section 3: Roster




                                                                                 37
On-line Appendix 2: List of all validation checks used in full CAPI experiment
                    (grouped by section and by warning type)

      Section       Type                                    Warning message
Agriculture      Error      Entry in disabled question 2a: How much land do you farm? AREA
Agriculture      Error      Duplicate crops have been selected
Agriculture      Error      Entry in disabled question 5: Do you have a title deed/offer letter for any land that
                            you own?
Agriculture      Error      Entry in disabled question 4a: What is the total area you own? AREA
Agriculture      Error      Entry in disabled question 12: Did you or any other HH member use any manure
                            on your farm in the past 12 months?
Agriculture      Error      Entry in disabled question 10: Did you or any other HH member irrigate any of
                            your fields in the past 12 months?
Agriculture      Error      Entry in disabled question 13: Did you or any other HH member use any hybrid
                            seeds on your farm in the past 12 months?
Agriculture      Error      Entry in disabled question 14: Have you or any other HH member spoken to a
                            government agricultural extension officer in the past 12 months?
Agriculture      Error      Entry in disabled question 4b: What is the total area you own? AREA UNIT
Agriculture      Error      Entry in disabled question 9: Rank the most important crops for generating CASH
                            income - sold for cash (1= most important; 3 = least important)
Agriculture      Error      Entry in disabled question 8: For what purpose is this crop? READ ALL
                            RESPONSES
Agriculture      Error      Entry in disabled question 2b: How much land do you farm? AREA UNIT
Agriculture      Error      Entry in disabled question 6b: which are the 3 most important crops or other
                            agricultural activities this household depends on most? CROP DESCRIPTION
Agriculture      Error      Entry in disabled question 11: Did you or any other HH member use any chemical
                            fertilizer on any of your fields in the past 12 months?
Agriculture      Missing    Missing question 8: For what purpose is this crop? READ ALL RESPONSES
Agriculture      Missing    Missing question 9: Rank the most important crops for generating CASH income -
                            sold for cash (1= most important; 3 = least important)
Agriculture      Missing    Missing question 11: Did you or any other HH member use any chemical fertilizer
                            on any of your fields in the past 12 months?
Agriculture      Missing    Missing question 14: Have you or any other HH member spoken to a government
                            agricultural extension officer in the past 12 months?
Agriculture      Missing    Missing question 10: Did you or any other HH member irrigate any of your fields in
                            the past 12 months?
Agriculture      Missing    Missing: Does anyone in this household conduct farming activities?
Agriculture      Missing    Missing question 5: Do you have a title deed/offer letter for any land that you own?
Agriculture      Missing    Missing question 12: Did you or any other HH member use any manure on your
                            farm in the past 12 months?
Agriculture      Missing    Missing question 6b: which are the 3 most important crops or other agricultural
                            activities this household depends on most? CROP DESCRIPTION
Agriculture      Missing    Missing question 13: Did you or any other HH member use any hybrid seeds on
                            your farm in the past 12 months?
Agriculture      Missing    Missing question 4a: What is the total area you own? AREA
Agriculture      Missing    Missing question 2a: How much land do you farm? AREA
Agriculture      Missing    Missing: Do you own any agricultural land/farm (including grazing and fallow



                                                                                            38
      Section     Type                                   Warning message
                          land)?
Agriculture     Missing   Missing question 2b: How much land do you farm? AREA UNIT
Agriculture     Missing   Missing question 4b: What is the total area you own? AREA UNIT
Amenities       Error     The entry for the number of rooms is not allowed to be negative
Amenities       Error     Entry in disabled question 6c: How many habitable rooms in THE OTHER
                          DWELLINGS does this household occupy? DO NOT COUNT BATHROOMS,
                          TOILETS, STOREROOMS, OR GARAGE
Amenities       Error     Entry in disabled question 5: Do you have any documentation of ownership of the
                          dwelling?
Amenities       Error     Entry in disabled question 11: How long does it take to get water from drinking
                          water source to this dwelling in the dry season? GO AND RETURN TRIP
                          INCLUDE WAITING TIME - MINUTES
Amenities       Error     Entry in disabled question 8: How long does it take to get water from drinking
                          water source to this dwelling in the rainy season? GO AND RETURN TRIP
                          INCLUDE WAITING TIME - MINUTES
Amenities       Error     Entry in disabled question 12: Out of these [READ] minutes, how long do you
                          spend waiting?
Amenities       Error     Entry in disabled question 9: Out of these [READ] minutes, how long do you
                          spend waiting? MINUTES
Amenities       Missing   Missing question 8: How long does it take to get water from drinking water source
                          to this dwelling in the rainy season? GO AND RETURN TRIP INCLUDE WAITING
                          TIME - MINUTES
Amenities       Missing   Missing: How many habitable rooms in THE MAIN DWELLING does this
                          household occupy? DO NOT COUNT BATHROOMS, TOILETS, STOREROOMS,
                          OR GARAGE
Amenities       Missing   Missing: What material is the floor of this house made of?
Amenities       Missing   Missing question 6c: How many habitable rooms in THE OTHER DWELLINGS
                          does this household occupy? DO NOT COUNT BATHROOMS, TOILETS,
                          STOREROOMS, OR GARAGE
Amenities       Missing   Missing: What material is the roof of this house made of?
Amenities       Missing   Missing question 11: How long does it take to get water from drinking water
                          source to this dwelling in the dry season? GO AND RETURN TRIP INCLUDE
                          WAITING TIME - MINUTES
Amenities       Missing   Missing: What is the major fuel used for cooking?
Amenities       Missing   Missing question 12: Out of these [READ] minutes, how long do you spend
                          waiting?
Amenities       Missing   Missing: What is the tenure status of the main residence? READ ALL
                          RESPONSES
Amenities       Missing   Missing: Is there any other dwelling that the HH uses?
Amenities       Missing   Missing: What is the main type of toilet used by this HH?
Amenities       Missing   Missing: What material are the walls of this house made of?
Amenities       Missing   Missing: What is the household's main source of drinking water in the dry season?
Amenities       Missing   Missing: What is the main source of energy used for lighting?
Amenities       Missing   Missing: What is the household's main source of drinking water in the rainy
                          season?
Amenities       Missing   Missing question 5: Do you have any documentation of ownership of the dwelling?
Amenities       Missing   Missing question 9: Out of these [READ] minutes, how long do you spend



                                                                                       39
     Section         Type                                    Warning message
                             waiting? MINUTES
Amenities          Warning   It is uncommon to have electricity in a thatch roofed house. Please verify whether
                             this information is correct.
Amenities          Warning   It is uncommon to have electricity in a mud roofed house. Please verify whether
                             this information is correct.
Amenities          Warning   It is uncommon to have electricity in a wood roofed house. Please verify whether
                             this information is correct.
Amenities          Warning   It is uncommon to have such a high number of rooms in a dwelling, are you sure
                             the given information is correct? Please verify whether this information is correct.
Amenities          Warning   It is uncommon to have such a high number of rooms in a dwelling, are you sure
                             the given information is correct? Please verify whether this information is correct.
Amenities          Warning   It is uncommon to have tiles in a mud walled house
Amenities          Warning   It is uncommon to have tiles in a wooden walled house. Please verify whether this
                             information is correct.
Assets             Error     Entry in disabled question 2: How many (total quantity) FUNCTIONING [ITEM]
                             does your household own?
Assets             Missing   Missing question 2: How many (total quantity) FUNCTIONING [ITEM] does your
                             household own?
Assets             Missing   Missing: Asset code
Assets             Missing   Missing: Do you, or anyone else in your household, own a functioning [ASSET]?
Assets             Warning   Are you sure the household owns a TV antenna or Video/DVD but does not own a
                             TV?
Assets             Warning   HH has at least one HH member whose main activity is fishing. It is unlikely that
                             someone with main activity self-employed fishing does not own any fishing
                             equipment at all (neither lantern, nor boat/canoe, nor other fishing equipment).
                             Please verify.
Assets             Warning   None of the HH members slept under a mosquito net last night (T3CQ14)
                             although the HH does claim to have at least one mosquito net (T4BQ1). This is
                             possible, but please verify.
Assets             Warning   At least one HH member slept under a mosquito net last night (T3CQ14) although
                             the HH claims not to have any mosquito nets (T4BQ1). This is possible, but
                             please verify.
Assets             Warning   The HH's main source of lighting is said to be candles (T4AQ14). Hence it is
                             uncommon that the HH owns this asset. It is possible, however. Please double
                             check.
Assets             Warning   HH has at least one HH member whose main activity is farming. It is unlikely that
                             someone with main activity self-employed farming does not own any farming
                             equipment at all (whealbarrow/cart, nor harrow/plough, nor other farming
                             equipment). Please verify.
Consumption Data   Error     Respondent to the consumption questions should not be less than 12 years of age
Consumption Data   Error     The answer to "What is the FIRST main source of cash income?" is not allowed to
                             be
Consumption Data   Error     The FIRST and SECOND main source of cash income is not allowed to be the
                             same
Consumption Data   Missing   Missing: Respondent on consumption questions
Consumption Data   Missing   Missing: What are your HH's 2 main sources of cash income, starting with the
                             most important one? FIRST
Consumption Data   Missing   Missing: What are your HH's 2 main sources of cash income, starting with the


                                                                                            40
     Section           Type                                   Warning message
                               most important one? SECOND
Details of Missing   Missing   Missing: How much expenditure information for [NAME?] is not captured in what
Information Based              you have mentioned to me?
on HH Roster
Details of Missing   Warning   It is uncommon that NONE of the consumption expenditures is captured in
Information Based              information given to you. Note that this question is in NEGATION form. Are you
on HH Roster                   sure the given information is correct?
Finish               Error     Entry in disabled question 6: Why is the interview only partially completed?
Finish               Error     Entry in disabled question 2: How proficient was the respondent in Swahili?
Finish               Missing   Missing: Date and time of interview finish
Finish               Missing   Missing: Number of visits required to complete the interview
Finish               Missing   Missing question 2: How proficient was the respondent in Swahili?
Finish               Missing   Missing question 6: Why is the interview only partially completed?
Finish               Missing   Missing: Interview result
Food Consumption     Error     Entry in disabled question 1a: How much [ITEM] did your HH CONSUME in the
Details                        past [PERIOD]?
Food Consumption     Error     Entry in disabled question 1b: How much [ITEM] did your HH CONSUME in the
Details                        past [PERIOD]?
Food Consumption     Error     Entry in disabled question 3: How much did you spend?
Details
Food Consumption     Error     The value HOME PRODUCED quantity is not allowed to be negative
Details
Food Consumption     Error     Entry in disabled question 2a: How much [ITEM] came from PURCHASES out of
Details                        what was spent in the last [PERIOD]?
Food Consumption     Error     Entry in disabled question 4a: How much [ITEM] came from OWN PRODUCTION
Details                        out of what was spent in the last [PERIOD]?
Food Consumption     Error     Entry in disabled question 5: How much would it have cost you if you had
Details                        purchased this HOME PRODUCED quantity in the main market or store in this
                               village?
Food Consumption     Error     The entry for the consumed quantity that came from purchases is not allowed to
Details                        be negative (except for -99)
Food Consumption     Error     Entry in disabled question 6a: How much came from GIFTS AND OTHER
Details                        SOURCES in the past [PERIOD]?
Food Consumption     Error     The household has consumed this item, but no source for obtaining this
Details                        consumption item (Purchased/Produced/Other sources) has been selected
Food Consumption     Error     The entry for the consumed quantity that came from own production is not allowed
Details                        to be negative (except for -99)
Food Consumption     Error     The entry for the consumed quantity is not allowed to be zero
Details
Food Consumption     Error     The entry for the consumed quantity that came from gifts and other sources is not
Details                        allowed to be negative (except for -99)
Food Consumption     Error     The total amount consumed has to be greater than zero for this consumption item
Details                        given the information entered by you in question field T6eaQ2.
Food Consumption     Error     The value GIFT quantity is not allowed to be negative
Details
Food Consumption     Error     The entry for the consumed quantity is not allowed to be negative
Details



                                                                                            41
     Section          Type                                   Warning message
Food Consumption Error        The total amount of consumption must be equal to (How much came from
Details                       purchases + How much came from own production + How much came from gifts
                              and other sources)
Food Consumption    Error     Entry in disabled question 7: How much would it have cost you if you had
Details                       purchased this GIFT quantity in the main market or store in this village?
Food Consumption    Missing   Missing question 2a: How much [ITEM] came from PURCHASES out of what was
Details                       spent in the last [PERIOD]?
Food Consumption    Missing   Missing question 1b: How much [ITEM] did your HH CONSUME in the past
Details                       [PERIOD]?
Food Consumption    Missing   Missing question 1a: How much [ITEM] did your HH CONSUME in the past
Details                       [PERIOD]?
Food Consumption    Missing   Missing question 7: How much would it have cost you if you had purchased this
Details                       GIFT quantity in the main market or store in this village?
Food Consumption    Missing   Missing question 6a: How much came from GIFTS AND OTHER SOURCES in
Details                       the past [PERIOD]?
Food Consumption    Missing   Missing question 4a: How much [ITEM] came from OWN PRODUCTION out of
Details                       what was spent in the last [PERIOD]?
Food Consumption    Missing   Missing question 3: How much did you spend?
Details
Food Consumption    Missing   Missing question 5: How much would it have cost you if you had purchased this
Details                       HOME PRODUCED quantity in the main market or store in this village?
Food Consumption    Warning   It is very unlikely that a household in Tanzania cooks with such a small amount of
Details                       cooking oil. Are you sure the given information is correct?
Frequent Non-food   Missing   MISSING: Expenditure
Expenditures (1)
Frequent Non-food   Warning   It is uncommon to have an expenditure value of less than 100 TSH. Are you sure
Expenditures (1)              the given information is correct?
Frequent Non-food   Missing   MISSING: Expenditure
Expenditures (2)
Frequent Non-food   Warning   It is uncommon to have an expenditure value of less than 100 TSH. Are you sure
Expenditures (2)              the given information is correct?
HH Crop details     Error     Second most common location should be different from first most common
                              location
HH Crop details     Error     The location where the HH fetches the highest price for this crop should be either
                              the location mentioned in Q1 or the one in Q3.
HH Crop details     Error     Entry in disabled question 6: What is the total cash value of the sale of [CROP] by
                              this household during the last year? IN TSHS
HH Crop details     Error     Entry in disabled question 5: In which of those two places do you fetch the highest
                              price per unit of [CROP]?
HH Crop details     Error     Entry in disabled question 1: Over the past 12 months, where did the household
                              sell most of its produce of [CROP]? MOST COMMON LOCATION
HH Crop details     Error     Entry in disabled question 3: Over the past 12 months, where did the household
                              sell most of its produce of [CROP]? SECOND MOST COMMON LOCATION
HH Crop details     Error     Entry in disabled question 2: How do you transport [CROP] to the location where it
                              is sold? MOST COMMON LOCATION
HH Crop details     Error     Entry in disabled question 4: How do you transport [CROP] to the location where it
                              is sold? SECOND MOST COMMON LOCATION
HH Crop details     Error     Entry in disabled question 7: Over the past year, how much did you spend on


                                                                                            42
     Section         Type                                    Warning message
                             transport in order to sell the products from [CROP] (all destinations)?
HH Crop details    Missing   Missing question 1: Over the past 12 months, where did the household sell most
                             of its produce of [CROP]? MOST COMMON LOCATION
HH Crop details    Missing   Missing question 2: How do you transport [CROP] to the location where it is sold?
                             MOST COMMON LOCATION
HH Crop details    Missing   Missing question 7: Over the past year, how much did you spend on transport in
                             order to sell the products from [CROP] (all destinations)?
HH Crop details    Missing   Missing question 4: How do you transport [CROP] to the location where it is sold?
                             SECOND MOST COMMON LOCATION
HH Crop details    Missing   Missing question 6: What is the total cash value of the sale of [CROP] by this
                             household during the last year? IN TSHS
HH Crop details    Missing   Missing question 5: In which of those two places do you fetch the highest price per
                             unit of [CROP]?
HH Crop details    Missing   Missing question 3: Over the past 12 months, where did the household sell most
                             of its produce of [CROP]? SECOND MOST COMMON LOCATION
HH Head Info       Error     Entry in disabled question 5: In which district was the head raised?
HH Head Info       Error     Entry in disabled question 4: In which region was the head raised?
HH Head Info       Missing   Missing: What is the name of the head of this HH?
HH Head Info       Missing   Missing question 5: In which district was the head raised?
HH Head Info       Missing   Missing question 4: In which region was the head raised?
HH Head Info       Missing   Missing: Where was the head raised? READ ALL RESPONSES
Household Member   Error     Entry in disabled question 2: What is [NAME] marital status?
- Demographics
Household Member   Error     Entry in disabled question 3: Is the spouse of [NAME] living in household?
- Demographics
Household Member   Error     Entry in disabled question 6: Do you expect that [NAME] will be residing here in 6
- Demographics               months from now?
Household Member   Error     Household member has ID 1 but it is not the household head
- Demographics
Household Member   Error     Only household member with ID 1 can be household head
- Demographics
Household Member   Error     The relationship of [NAME] is Wife/Husband, although the marital status is not
- Demographics               married or no non-formal union
Household Member   Error     Entry in disabled question 4: Who is [NAME] husband?
- Demographics
Household Member   Error     Entry in disabled question 7: What is [NAME] main daily activity?
- Demographics
Household Member   Error     The person selected as being the husband of [NAME] is the head of the
- Demographics               household, whilst the relationship to head selected in Q1 differs from 'spouse'.
Household Member   Missing   Missing question 2: What is [NAME] marital status?
- Demographics
Household Member   Missing   Missing: For how long was [NAME] absent during the last 12 months?
- Demographics
Household Member   Missing   Missing question 3: Is the spouse of [NAME] living in household?
- Demographics
Household Member   Missing   Missing question 6: Do you expect that [NAME] will be residing here in 6 months
- Demographics               from now?



                                                                                            43
     Section         Type                                     Warning message
Household Member   Missing   Missing question 7: What is [NAME] main daily activity?
- Demographics
Household Member   Missing   Missing question 4: Who is [NAME] husband?
- Demographics
Household Member   Missing   Missing: What is the relationship of [NAME] to the head of the household?
- Demographics
Household Member   Warning   This person is very old to be in boarding school. Are you sure this information is
- Demographics               correct? Please verify.
Household Member   Warning   It is very rare that a person this young is retired. Please verify.
- Demographics
Household Member   Warning   It is unusual that a person is over 7 years old and neither a full-time student nor
- Demographics               performing any type of work. Please verify.
Household Member   Warning   [NAME] is very young for this activity. Are you sure this is accurate?
- Demographics
Household Member   Warning   Person is said to be in boarding school (T3AQ5) but its main occupation is said to
- Demographics               be different than 'student'. Please verify.
Household Member   Warning   A child younger than 12 years old is unlikely to have the relationship to head that
- Demographics               is selected. Are you sure that the given information is correct?
Household Member   Warning   It is very uncommon that a child of less than 6 years old is not present all year
- Demographics               long. Are you sure the given information is correct?
Household Member   Warning   The man is a polygamist, hence it is unlikely that he is present all year long,
- Demographics               unless all his wives live in the same household. Please verify.
Household Member   Warning   It is uncommon that the head of the household or his/her spouse/husband
- Demographics               (T3AQ1) is a boarding school child (T3AQ5). Please double check this
                             information.
Household Member   Error     Entry in disabled question 21: What is the ONE-WAY fare to go to school using
- Education                  this mode of transportation (in Tshs)?
Household Member   Error     Entry in disabled question 19: How does [NAME] usually go to school?
- Education
Household Member   Error     The answer to 'How many times has [NAME] repeated grades?' is not allowed to
- Education                  be negative
Household Member   Error     Entry in disabled question 20b: MINUTES
- Education
Household Member   Error     Entry in disabled question 22c: SCHOOL BOOKS AND MATERIALS
- Education
Household Member   Error     Entry in disabled question 12: Has [NAME] successfully passed this exam?
- Education
Household Member   Error     Entry in disabled question 10: Did [NAME] ever sit for a national examination from
- Education                  which results are out?
Household Member   Error     Entry in disabled question 1: Can [NAME] read and write?
- Education
Household Member   Error     Entry in disabled question 17: Has [NAME] missed school in the last schooling
- Education                  week?
Household Member   Error     The entry for 'In total how much was spent on [NAME]'s education in the last 12
- Education                  months ...?' is not allowed to be negative.
Household Member   Error     Educational expenditures on OTHER CONTRIBUTIONS are not allowed to be
- Education                  negative
Household Member   Error     Educational expenditures on MEALS are not allowed to be negative


                                                                                              44
     Section         Type                                  Warning message
- Education
Household Member   Error    Educational expenditures on FEES are not allowed to be negative
- Education
Household Member   Error    Educational expenditures on EXTRA TUITION are not allowed to be negative
- Education
Household Member   Error    Educational expenditures on BOOKS are not allowed to be negative
- Education
Household Member   Error    The entry to 'what is the one way fare to go to school' is not allowed to be
- Education                 negative
Household Member   Error    The last grade of examination cannot be higher than 1 grade higher than the
- Education                 highest level of completed education
Household Member   Error    Entry in disabled question 2: Has [NAME] ever attended school?
- Education
Household Member   Error    Entry in disabled question 18: Why was [NAME] absent from school?
- Education
Household Member   Error    Entry in disabled question 4: How old was [NAME] when he/she started primary
- Education                 school?
Household Member   Error    Entry in disabled question 20a: How long does it take [NAME] to go to school
- Education                 using this mode of transportation? HOURS
Household Member   Error    Entry in disabled question 15: Was [NAME] in school in the last 12 months?
- Education
Household Member   Error    The answer to 'how long does it take to go to school - MINS' is not allowed to be
- Education                 negative
Household Member   Error    The entry to 'how long does it take to go to school - HOURS' is not allowed to be
- Education                 negative
Household Member   Error    Entry in disabled question 22h: TOTAL
- Education
Household Member   Error    Educational expenditures on TRANSPORT are not allowed to be negative
- Education
Household Member   Error    Entry in disabled question 22d: UNIFORMS
- Education
Household Member   Error    The age when he/she started school cannot exceed his/her current age
- Education
Household Member   Error    Entry in disabled question 9: What is the year of [NAME] last grade repetition?
- Education
Household Member   Error    Entry in disabled question 13: What division did [NAME] score on the
- Education                 examination?
Household Member   Error    Educational expenditures on UNIFORMS are not allowed to be negative
- Education
Household Member   Error    Entry in disabled question 11: For which level was the last examination that
- Education                 [NAME] took?
Household Member   Error    Entry in disabled question 14: Is [NAME] currently in school?
- Education
Household Member   Error    The last grade of repetition is not allowed to be 'none', because the answer to
- Education                 'Has [NAME] ever repeated a grade?' is said to be 'yes'.
Household Member   Error    Entry in disabled question 5: What is the highest level of COMPLETED education
- Education                 of [NAME]?



                                                                                           45
     Section         Type                                   Warning message
Household Member   Error     Entry in disabled question 8: What is the last grade [NAME] has repeated?
- Education
Household Member   Error     Entry in disabled question 6: Has [NAME] ever repeated a grade?
- Education
Household Member   Error     The number of times that [NAME] has repeated grade is not allowed to be 0,
- Education                  because the answer to 'Has [NAME] ever repeated a grade?' is said to be 'yes'.
Household Member   Error     Entry in disabled question 7: How many times has [NAME] repeated grades?
- Education
Household Member   Error     Entry in disabled question 22e: EXTRA TUITION
- Education
Household Member   Error     Entry in disabled question 16: Who runs/manages school [NAME] is attending?
- Education
Household Member   Error     Entry in disabled question 3: What type of school has [NAME] attended? READ
- Education                  ALL RESPONSPONSES
Household Member   Error     Entry in disabled question 22g: SCHOOL MEALS
- Education
Household Member   Error     Entry in disabled question 22f: OTHER CONTRIBUTIONS FOR EDUCATION
- Education
Household Member   Error     Entry in disabled question 22a: How much was spent in the last 12 months by the
- Education                  members of your HH on [NAME]: TRANSPORT TO/FROM SCHOOL
Household Member   Error     Entry in disabled question 22b: SCHOOL FEES
- Education
Household Member   Missing   Missing question 22d: UNIFORMS
- Education
Household Member   Missing   Missing question 22a: How much was spent in the last 12 months by the members
- Education                  of your HH on [NAME]: TRANSPORT TO/FROM SCHOOL
Household Member   Missing   Missing question 22f: OTHER CONTRIBUTIONS FOR EDUCATION
- Education
Household Member   Missing   Missing question 1: Can [NAME] read and write?
- Education
Household Member   Missing   Missing question 22g: SCHOOL MEALS
- Education
Household Member   Missing   Missing question 22c: SCHOOL BOOKS AND MATERIALS
- Education
Household Member   Missing   Missing question 22e: EXTRA TUITION
- Education
Household Member   Missing   Missing question 7: How many times has [NAME] repeated grades?
- Education
Household Member   Missing   Missing question 6: Has [NAME] ever repeated a grade?
- Education
Household Member   Missing   Missing question 3: What type of school has [NAME] attended? READ ALL
- Education                  RESPONSPONSES
Household Member   Missing   Missing question 12: Has [NAME] successfully passed this exam?
- Education
Household Member   Missing   Missing question 10: Did [NAME] ever sit for a national examination from which
- Education                  results are out?
Household Member   Missing   Missing question 2: Has [NAME] ever attended school?
- Education


                                                                                          46
     Section         Type                                   Warning message
Household Member   Missing   Missing question 22h: TOTAL
- Education
Household Member   Missing   Missing question 22b: SCHOOL FEES
- Education
Household Member   Missing   Missing question 19: How does [NAME] usually go to school?
- Education
Household Member   Missing   Missing question 20a: How long does it take [NAME] to go to school using this
- Education                  mode of transportation? HOURS
Household Member   Missing   Missing question 20b: MINUTES
- Education
Household Member   Missing   Missing question 21: What is the ONE-WAY fare to go to school using this mode
- Education                  of transportation (in Tshs)?
Household Member   Missing   Missing question 17: Has [NAME] missed school in the last schooling week?
- Education
Household Member   Missing   Missing question 5: What is the highest level of COMPLETED education of
- Education                  [NAME]?
Household Member   Missing   Missing question 8: What is the last grade [NAME] has repeated?
- Education
Household Member   Missing   Missing question 4: How old was [NAME] when he/she started primary school?
- Education
Household Member   Missing   Missing question 15: Was [NAME] in school in the last 12 months?
- Education
Household Member   Missing   Missing question 11: For which level was the last examination that [NAME] took?
- Education
Household Member   Missing   Missing question 13: What division did [NAME] score on the examination?
- Education
Household Member   Missing   Missing question 9: What is the year of [NAME] last grade repetition?
- Education
Household Member   Missing   Missing question 14: Is [NAME] currently in school?
- Education
Household Member   Missing   Missing question 16: Who runs/manages school [NAME] is attending?
- Education
Household Member   Missing   Missing question 18: Why was [NAME] absent from school?
- Education
Household Member   Warning   It is very uncommon that the answer to question '22h - Amount spent in the last 12
- Education                  months by the HH on [NAME]'s education: TOTAL - AUTOMATICALLY
                             CALCULATED BASED ON RESPONSES TO 21a-21g. NOT EDITABLE BY
                             INTERVIEWER. ' is greater than 3000000. Please double check whether the
                             information you entered is the actual information given by the respondent
Household Member Warning     Main activity was recorded as full-time student but person is not currently in
- Education                  school? Please verify.
Household Member Warning     Given the entered age at which this person started education, it is uncommon that
- Education                  he/she has yet reached the stated 'highest level of education'. Please double
                             check.
Household Member Warning     It is uncommon to have a repetition rate higher than 5. Are you sure the given
- Education                  information is correct?
Household Member Warning     A person of less than 12 years old is unlikely to have reached the identified level
- Education                  of education. Are you sure the given is correct?



                                                                                           47
     Section         Type                                     Warning message
Household Member   Warning   A person of less than 5 years old is unlikely to have reached the identified level of
- Education                  education. Are you sure the given information is correct?
Household Member   Warning   A person of less than 16 years old is unlikely to have reached the identified level
- Education                  of education. Are you sure the given information is correct?
Household Member   Warning   A person of less than 18 years old is unlikely to have reached the identified level
- Education                  of education. Are you sure the given information is correct?
Household Member   Warning   It is very uncommon that the answer to question '20a - How long does it take
- Education                  [NAME] to go to school using this mode of transportation? HOURS ' is greater
                             than 3. Please double check whether the information you entered is the actual
                             information given by the respondent
Household Member Warning     It is very uncommon that the answer to question '20b - How long does it take
- Education                  [NAME] to go to school using this mode of transportation? MINUTES ' is greater
                             than 59. Please double check whether the information you entered is the actual
                             information given by the respondent
Household Member Warning     It is very uncommon that the answer to question '21 - What is the ONE-WAY fare
- Education                  to go to school using this mode of transportation (in Tshs)? ENTER ZERO IF
                             NONE ' is greater than 150000. Please double check whether the information you
                             entered is the actual information given by the respondent
Household Member Warning     It is very uncommon that the answer to question '4 - How old was [NAME] when
- Education                  he/she started school? ' is greater than 11. Please double check whether the
                             information you entered is the actual information given by the respondent
Household Member Warning     It is very uncommon that the aswer to question '4 - How old was [NAME] when
- Education                  he/she started school? ' is smaller than 6. Please double check whether the
                             information you entered is the actual information given by the respondent
Household Member Warning     The age when he/she started school is not allowed to be negative
- Education
Household Member Warning     The answer to 'How many times has [NAME] repeated grades?' is DK. Make sure
- Education                  a comment about this DK is made in the comment box of this question.
Household Member Warning     Last year's total transport costs to go to school for [NAME] are somehow low
- Education                  given the one way fare cost of the most commonly used transport method used by
                             [NAME], see Q21. Please double check.
Household Member Warning     Only in special cases the last grade of repetition will be higher than 1 grade higher
- Education                  than the highest level of completed education. Are you sure the given information
                             is correct?
Household Member   Warning   It is uncommon to spend such a high amount on education of a child. Please
- Education                  double check whether the given information is correct.
Household Member   Warning   It is uncommon that there is nothing spent at all on the child's education if it was in
- Education                  school in the last 12 months. Are you sure the given information is correct?
Household Member   Warning   The answer to 'In total how much was spent on [NAME]'s education in the last 12
- Education                  months ...?' is DK. A comment about this DK MUST be made.
Household Member   Error     Entry in disabled question 20: Did [NAME] regularly go to a health clinic when
- Health                     he/she was pregnant with his/her last child?
Household Member   Error     Entry in disabled question 18: Of the children who died, how many died before
- Health                     their first birthday?
Household Member   Error     Contradictory information: As incapacitated was selected as [NAME]'s main daily
- Health                     activity, [NAME] must have a physical or mental disability that limits or prevents
                             normal daily activities or work. Please correct either main daily activity in T3A or
                             disability status in T3C.
Household Member Error       Entry in disabled question 16: In the past 5 years, how many children did [NAME]



                                                                                               48
     Section         Type                                   Warning message
- Health                    give birth to (including children who were born dead)?
Household Member   Error    the answer to 'How much did it cost?' is not allowed to be negative
- Health
Household Member   Error    Entry in disabled question 23: Was this birth registered?
- Health
Household Member   Error    Entry in disabled question 21: Where did [NAME] deliver his/her last child?
- Health
Household Member   Error    Entry in disabled question 22: Who delivered this child?
- Health
Household Member   Error    Entry in disabled question 5: For the last 4 weeks was [NAME] hospitalized or had
- Health                    overnight stay(s) in medical facility?
Household Member   Error    Entry in disabled question 4: What was the most important kind of health provider
- Health                    that [NAME] visited?
Household Member   Error    It is not possible that a male person has pregnancy related problems.
- Health
Household Member   Error    Entry in disabled question 6: How was treatment mainly financed?
- Health
Household Member   Error    Entry in disabled question 19: Of the children who died, how many died between
- Health                    their first and their fifth birthday?
Household Member   Error    Entry in disabled question 11: Is [NAME] PERMANENTLY physically or mentally
- Health                    disabled in any way which limits or prevents normal daily activities or work?
Household Member   Error    Entry in disabled question 12: What type of disability does [NAME] have?
- Health
Household Member   Error    Entry in disabled question 13: How is the impact of [NAME] disability on his/her
- Health                    daily activities compared to 12 months ago?
Household Member   Error    Entry in disabled question 15: In the past 5 years, has [NAME] given birth to
- Health                    children (including children born dead)?
Household Member   Error    Entry in disabled question 3: For how many days in the last 4 weeks has [NAME]
- Health                    suffered from this main health problem?
Household Member   Error    Entry in disabled question 10: Estimate the total number of days [NAME] was not
- Health                    able to perform his/her daily activities due to illness for the past 12 months?
Household Member   Error    Entry in disabled question 9: In the past 12 months have there been any episodes
- Health                    in which [NAME] was too ill to perform his/her normal daily activities?
Household Member   Error    Entry in disabled question 8: In the past 4 weeks, for how many days was [NAME]
- Health                    unable to perform his/her normal daily activities due to the illness/injury? ENTER 0
                            IF NONE
Household Member   Error    Entry in disabled question 2: What was the main health problem [NAME] was
- Health                    suffering from (in the last 4 weeks)?
Household Member   Error    Entry in disabled question 7: How much did it cost?
- Health
Household Member   Error    Entry in disabled question 17: How many of those children are still alive now?
- Health
Household Member   Error    The entry for "Estimate the total number of days in the past 12 months?" is not
- Health                    allowed to be negative.
Household Member   Error    The entry for "Estimate the total number of days in the past 12 months?" cannot
- Health                    exceed 365 (i.e. the number of days in 1 year).
Household Member   Error    The days ill in 4 weeks cannot exeed 28
- Health


                                                                                           49
     Section         Type                                     Warning message
Household Member Error       The nr of days ill in the past 12 months cannot be smaller than the nr of days ill in
- Health                     the past 4 weeks
Household Member Error       Entry in disabled question 1: Was [NAME] sick or injured in the last 4 weeks?
- Health
Household Member Error       If the number of days the person was too ill to perform his/her normal daily
- Health                     activities in the past 4 weeks is greater than 0, the answer to "In the past 12
                             months ... too ill to perform his/her normal daily activities?" cannot be "no".
Household Member Error       Entry in disabled question 14: Did [NAME] sleep under a bednet yesterday?
- Health
Household Member Error       The number of children born in the past 5 years who are still alive, plus the
- Health                     number of children who died before their first birthday, plus the number of children
                             who died between their first and fifth birthday have to add up to the total number
                             of children given birth to.
Household Member Error       The entry for "For how many days was [NAME] suffering from this main health
- Health                     problem?" is not allowed to be negative.
Household Member Error       The entry for "In the past 4 weeks, for how many days was [NAME] unable to
- Health                     perform his/her normal daily activities due to the illness/injury?" is not allowed to
                             be negative.
Household Member Error       The entry for "For how many days was [NAME] suffering from this main health
- Health                     problem?" is not allowed to be zero, since the respondent claimed that [NAME] did
                             suffer from the disease.
Household Member Error       The entry for "In the past 4 weeks, for how many days was [NAME] unable to
- Health                     perform his/her normal daily activities due to the illness/injury?" is not allowed to
                             be higher than 28 days (since it has to be within 4 weeks interval).
Household Member   Missing   Missing question 10: Estimate the total number of days [NAME] was not able to
- Health                     perform his/her daily activities due to illness for the past 12 months?
Household Member   Missing   Missing question 13: How is the impact of [NAME] disability on his/her daily
- Health                     activities compared to 12 months ago?
Household Member   Missing   Missing question 15: In the past 5 years, has [NAME] given birth to children
- Health                     (including children born dead)?
Household Member   Missing   Missing question 4: What was the most important kind of health provider that
- Health                     [NAME] visited?
Household Member   Missing   Missing question 17: How many of those children are still alive now?
- Health
Household Member   Missing   Missing question 19: Of the children who died, how many died between their first
- Health                     and their fifth birthday?
Household Member   Missing   Missing question 16: In the past 5 years, how many children did [NAME] give birth
- Health                     to (including children who were born dead)?
Household Member   Missing   Missing question 18: Of the children who died, how many died before their first
- Health                     birthday?
Household Member   Missing   Missing question 9: In the past 12 months have there been any episodes in which
- Health                     [NAME] was too ill to perform his/her normal daily activities?
Household Member   Missing   Missing question 3: For how many days in the last 4 weeks has [NAME] suffered
- Health                     from this main health problem?
Household Member   Missing   Missing question 8: In the past 4 weeks, for how many days was [NAME] unable
- Health                     to perform his/her normal daily activities due to the illness/injury? ENTER 0 IF
                             NONE
Household Member Missing     Missing question 5: For the last 4 weeks was [NAME] hospitalized or had



                                                                                              50
     Section         Type                                    Warning message
- Health                     overnight stay(s) in medical facility?
Household Member   Missing   Missing question 2: What was the main health problem [NAME] was suffering from
- Health                     (in the last 4 weeks)?
Household Member   Missing   Missing question 6: How was treatment mainly financed?
- Health
Household Member   Missing   Missing question 22: Who delivered this child?
- Health
Household Member   Missing   Missing question 14: Did [NAME] sleep under a bednet yesterday?
- Health
Household Member   Missing   Missing question 21: Where did [NAME] deliver his/her last child?
- Health
Household Member   Missing   Missing question 20: Did [NAME] regularly go to a health clinic when he/she was
- Health                     pregnant with his/her last child?
Household Member   Missing   Missing question 11: Is [NAME] PERMANENTLY physically or mentally disabled
- Health                     in any way which limits or prevents normal daily activities or work?
Household Member   Missing   Missing question 23: Was this birth registered?
- Health
Household Member   Missing   Missing question 12: What type of disability does [NAME] have?
- Health
Household Member   Missing   Missing question 7: How much did it cost?
- Health
Household Member   Missing   Missing question 1: Was [NAME] sick or injured in the last 4 weeks?
- Health
Household Member   Warning   It is very uncommon that a person of this age has pregnancy related problems.
- Health                     Please double check whether the information you entered is the actual information
                             given by the respondent.
Household Member Warning     Since [NAME] was hospitalised due to illness it is unlikely that the number of days
- Health                     he/she was unable to perform his/her normal daily activities due to illness is zero.
                             Are you sure the given information is correct?
Household Member Warning     The number of children of the head that are born in last 5 years and that are still
- Health                     alive is smaller than the number of biological children counted in the household
                             roster. It is very possible that the child is no longer a household member, but
                             please double check this.
Household Member Warning     In T3BQ18 it is stated that the child was absent from school last week because of
- Health                     illness. However, T3CQ1 states that child has not been sick in last 4 weeks.
                             Please double check.
Household Member Warning     It is very uncommon that the answer to question '7 - How much did it cost? IN
- Health                     TSH ' is greater than 300000. Please double check whether the information you
                             entered is the actual information given by the respondent
Household Member Warning     It is very uncommon that the answer to question '7 - How much did it cost? IN
- Health                     TSH ' is smaller than 500. Please double check whether the information you
                             entered is the actual information given by the respondent
Household Member Warning     The person is said to be 'sick' or 'incapacitated' in T3AQ7, hence it is strange that
- Health                     the answer to 'sick in last 4 weeks?' is said to be 'no'. Please double check.
Less Frequent    Missing     MISSING: Expenditure
Expenditure
Less Frequent    Warning     It is uncommon to have an expenditure value of less than 100 TSH. Are you sure
Expenditure                  the given information is correct?



                                                                                             51
     Section         Type                                   Warning message
Livestock          Error     Entry in disabled question 2: How many [LIVESTOCK TYPE] does the HH own
                             TODAY?
Livestock          Error     The number of livestock owned is not allowed to be negative
Livestock          Missing   Missing: Do you, or anyone else in your household, own [LIVESTOCK TYPE]?
Livestock          Missing   Missing question 2: How many [LIVESTOCK TYPE] does the HH own TODAY?
Livestock          Missing   Missing: Code for type of animal
Livestock          Warning   It is uncommon to own such a high number of rabbits. Are you sure the given
                             information is correct?
Livestock          Warning   It is uncommon to own such a high number of ducks/turkeys. Are you sure the
                             given information is correct?
Livestock          Warning   It is uncommon to own such a high number of chickens. Are you sure the given
                             information is correct?
Livestock          Warning   It is uncommon to own such a high number of cows. Are you sure the given
                             information is correct?
Livestock          Warning   It is uncommon to own such a high number of sheep. Are you sure the given
                             information is correct?
Livestock          Warning   It is uncommon to own such a high number of goats. Are you sure the given
                             information is correct?
Meals taken by     Error     Entry in disabled question 3: How many meals did [NAME] miss in the last 7
guests + Meals               days? (assume 2 meals per day; hence a number between 1 and 14)
taken outside HH
Meals taken by     Error     The answer to question: '3 - Number of meals that [NAME] missed (between 1
guests + Meals               and 14) ' should not be greater than 14.
taken outside HH
Meals taken by     Error     The answer to question:'3 - Number of meals that [NAME] missed (between 1 and
guests + Meals               14) ' should be at least 1.
taken outside HH
Meals taken by     Missing   Missing: How many person-meals were consumed by guests over the last 7 days?
guests + Meals               (e.g. if 2 extra people shared 3 HH meals, enter 6 person meals): people aged 5-
taken outside HH             12
Meals taken by     Missing   Missing: How many person-meals were consumed by guests over the last 7 days?
guests + Meals               (e.g. if 2 extra people shared 3 HH meals, enter 6 SHARES): people aged 0-4)
taken outside HH
Meals taken by     Missing   Missing: How many person-meals were consumed by guests over the last 7 days?
guests + Meals               (e.g. if 2 extra people shared 3 HH meals, enter 6 person meals): people aged 13-
taken outside HH             18
Meals taken by     Missing   Missing: How many person-meals were consumed by guests over the last 7 days?
guests + Meals               (e.g. if 2 extra people shared 3 HH meals, enter 6 person meals): people aged 19-
taken outside HH             59
Meals taken by     Missing   Missing: How many person-meals were consumed by guests over the last 7 days?
guests + Meals               (e.g. if 2 extra people shared 3 HH meals, enter 6 person meals): people aged
taken outside HH             60+
Meals taken by     Missing   Missing question 3: How many meals did [NAME] miss in the last 7 days?
guests + Meals               (assume 2 meals per day; hence a number between 1 and 14)
taken outside HH
Meals taken by     Missing   Missing: Did [NAME] miss any HH meals in the last 7 days?
guests + Meals
taken outside HH



                                                                                          52
     Section           Type                                    Warning message
Meals taken by       Warning   It is very uncommon that the aswer to question '3 - Number of meals that [NAME]
guests + Meals                 missed (between 1 and 14) ' is greater than 14. Please double check whether the
taken outside HH               information you entered is the actual information given by the respondent
Meals taken by       Warning   It is very uncommon that the aswer to question '3 - Number of meals that [NAME]
guests + Meals                 missed (between 1 and 14) ' is smaller than 1. Please double check whether the
taken outside HH               information you entered is the actual information given by the respondent
Outside Food and     Missing   Missing: Expenditure
Drink
Outside Food and     Warning   It is uncommon to have an expenditure value of less than 100 TSH. Are you sure
Drink                          the given information is correct?
Roster               Error     Main respondent should not be less than 12 years of age
Roster               Error     HH head claims to be unmarried but there is at least one wife.
Roster               Error     Inconsistency between marital status of husband and wife
Roster               Error     HH head claims to have monogamous marriage but there is more than one wife.
Roster               Error     The household roster is empty.
Roster               Missing   Missing: Is this person a HH member?
Roster               Missing   Missing: Roster number of the main respondent
Roster               Missing   Missing: What was the age of [NAME] at last birthday (in completed years)? IF
                               LESS THAN 1 YEAR, ENTER 0
Roster               Missing   Missing: Is [NAME] male or female?
Roster               Missing   Missing: HH member name
Roster               Missing   Missing: Member ID (automatically generated by ticking the "add new HH
                               member" command button on form)
Roster               Warning   It is uncommon not to have any member older than 15 years old in a household.
                               Are you sure the given information is correct?
Roster               Warning   Main respondent is not the head or spouse. A reason for this MUST be entered in
                               the comments.
Roster               Warning   It is uncommon to have a person of this age. Are you sure the given information is
                               correct?
Roster               Warning   The age difference between parent and child is less than 14 years. Please verify.
Roster               Warning   The same name appears more than once in the HH member roster. Are you sure
                               the given information is correct?
Roster Sampling      Error     Entry in disabled question 2: Why the member is not available for interview
Roster Sampling      Missing   Missing: Member is available for interview
Roster Sampling      Missing   Missing: Has [NAME] been sampled for the interview of TRANSPORT section?
Roster Sampling      Missing   Missing question 2: Why the member is not available for interview
Roster Sampling      Missing   Missing: Has [NAME] been sampled for the interview of LABOUR section?
Select Consumption   Missing   Missing: In the past [RECALL] did household consume/purchase any [ITEM]?
Items
Select Consumption   Missing   Missing: In the past [RECALL] did household consume/purchase any [ITEM]?
Items
Start                Missing   Missing: Language of interview?
Start                Missing   Missing: Was an interpreter used?
Start                Missing   Missing: Date and time of interview start?
Start                Warning   An interpreter is used, hence a comment MUST be made about how smooth the
                               interview was conducted, whether using an interpreter caused any difficulties, etc.



                                                                                              53
On-line Appendix 3: Example of the consumption data summary report used by the
                    enumerators whilst verifying data collected




                                                                            54
On-line Appendix 4: Examples of pictures used in the consumption section:




                                                                            55
56