1DISCUSSION PAPER
Report No.   WUDD
0,
ANALYZING AN URBAN ROUSING SURVEY:
ECONOMIC MODELS AND STATISTICAL TECHNIQUES
by
Stephen Malpezzi
May, 1984
Water Supply and Urban Development Department
Operations Policy Staff
The World Bank
The views presented here are those of the author and they should not be
i.terpreted °as reflecting those of the World Bank.


﻿The author is indebted to David Hoaglin and Paul Velleman, and their
publisher, Duxbury Press, for permission to use the computår code in
Appendix-C. ,This paper is a draft and will be revised. Comments are welcomed
by the author. Michael Bamberger and David Hoaglin have provided helpful
comments on this version of the paper; many of their comments will be
incorporated in the next version.
The author is on the staff of the Water Supply and Urban Develdpment
Department of the World Bank.
4/e


﻿ABSTRACT
The purpose of this paper is to explain some common statistical
procedures and their application to housing market analysis. Most of the
emphasis is on the use of medians and other "order statistics," and on
regression analysis. Examples are used to illustrate some of the techniques,
using actual data from an Egyptian housing survey as well as "manufactured"
data.
This document was prepared to assist the Central Bureau of
Statistics and the Ministry of Works and Housing of Kenya in the analysis of
an urban housing survey. A companion paper is Planning an Urban Housing
Survey: Key Issues for Researchers and Program Managers in Developing
Countries (Water Supply and Urban Development Department Discussion Paper
No. U'DD-44, November 1982).
This paper is written for housing market analysts with some previous
exposure to basic statisticss


﻿TABLE OF CONTENTS
Page No.
PREFACE...............           .    .   .   .    .   .    .   .    .....i1ii
List of Tables....                                                    .....     v
List of Figures.                   .......              ........               vi
PART I:    STATISTICAL TECHNIQUES......o . oo*o.oooot.....        oo.  ooo....  1
Section    I.1     Introduction:    Two Purposes of Statistics.........         1
1.2     Medians and Related Order Statistics..............           2
1.3     Housing Market Analysis Using
Regression Techniques...     o,.,,,,,,..         ...    9
1.3.1     The Logic of Regression Analysis.........          9
Simple Bivariate Regression Versus
Multiple Regression................... 14
1.3.2     An Example Using Developing
Country Data..15
1.3.3     Residuals......... . .ooo...........            18
1.3.4     RypothesisTesting.......                          19
R-Squared...............s...          ....    . 21
Standard Error of the Equation.i.........         23
Standard Errors of Regression
Coefficients..       .... 00..9.0. . 0 0 . .0*  29
Confidence Intervals: Coefficients.......         34
Confidence Intervals: Predicted Values...         36
1.3.5     The Role of Functional Form (Transfor-
mations) in Regression Analysis........         37
Power Terms .......                       ...     40
Logarithms.o.....oesooooo.,..........            40
Dummy Variables.. ...,,**90000.,*......         40
PART II:   ECONOMIC MODELS FOR HOUSING MARKET ANALYSIS............o.          46
Section    11.1    Introduction......46
11.2    Composite Demand Models..         ...        .........      49
11.2.1    Measurement................ .oooo.o.o.           49
Measuring Housing Consumption............         49
Measuring Housing Prices...............           53
Measuring Incomes.....                            54
Demographic Variables................ .....56
11.2.2    Integrating the Effects of Tenure
Choice and Mobility on Housing
Demand..........o..     ................. 57
11.2.3    Tying It All Together: Examples of
Demand Equations Using Egyptian
Data...................59
II.3    Introduction to Hedonic Price Indexes.............          66
11.3.1    Theoretical Basis......      .............. ..66
11.3.2    An Example.     .. .......                        69


﻿TABLE OF CONTENTS cont'd
Page No.
PART III    COMPUTATIONAL TECHNIQUESQUE..... ...........                      72
Section    III.1   Preparing the Data for Analysis...           ,.......     72
Section    111.2   Computational Notese....                ............eessvt.e,  73
Appendix A - Kenyan Housing Survey.....aG.............                        75
Appendix B - Introduction to Logarithms and Elasticity0*.....ooss.           85
Appendix C - Fortran Subroutines for Order Statistics ... $60800990           94
Appendix D - Suggestions for Further Reading........          ......e ... 115
Appendix E - Data Appendix for Simple Examples                               117
Appendix F - Outline of Suggested Tables for
an Urban Housing Survey Reporte       st.05.ve.        ... 129
References: . . . . ........................ . .                     .....135


﻿111
PREFACE
This paper has been prepared for Kenya's Central Bureau of
Statistics (CBS) and Ministry of Works and Housing (MWH) to as5ist them in
preparing a report on the housing situation in Kenya's urban areas, using the
1983 Urban Housing Survey (the questionnaire is appended). This paper is a
companion to an earlier paper, Planning and Urban Housing Survey: Key Issues
for Researchers and Program Managers.1/ That paper discussed general goals of
housing market analysis, some common problems encountered, and suggested
questions for the prospective survey. Now the survey questionnaire has been
completed, and this paper addresses those topics again, but this time with
reference to the actual survey, and suggests concrete solutions based on the
information contained therein.
This paper is divided into three parts. Part I focuses on statistical
techniques, with particular reference to using medians and regression
analysis. Part II concentrates on developing several simple but useful
economic models of housing market which can be estimated with data from the
Kenyan surveya Part III gives some handy computational hints which can be
used in actually estimating the models described in the first two parts.
The three parts of the paper are related, by design. In fact, dividing
the paper up into statistical and economic parts is convenient but somewhat
artificial. There will necessarily be some overlap, especially between the
first two parts, so read both parts togetaer. For example, the topic of
functional form is treated in several places: the general discussion in Part
I and the specific examples of Part II.
Although much of the material covered in this report is well known to
statisticians (especially the contents of Part I) basic material is included
throughout alongside more advanced material in order to make the paper as
1/ Malpezzi, Bamberger and Mayo (1981).


﻿iv
self-contained as possible. Additional references on many of these topics can
be found in the papers listed in Appendix D.
A few comments are in order regarding the data used in the examples.
Some examples use real survey data from Cairo, Egypt, so that one can see how
these techniques work with actual data. Other examples use hypothetical data
which have been constructed to exaggerate certain relationships which are
featured in the text. Results from Kenyan data will differ considerably from
both the Egyptian data and from the manufactured data; we want to emphasize
that the examples illustrate techniques, not expected results. The data used
for many of the examples are presented in the appendix, and can be used to
replicate the examples.


﻿v
List of Tables
Page No.
1.    Summary Statistics from Cairo Sample.....................,,,                5
2.    Summary Statistics from Cairo Sample, by Income Quintile ....7
3.    Simple Regression of Log Rent or Log Income, Cairo ..............16
4.    Regression Examples Using Manufactured Data with Large
and Small Vrac.            ....    .   .    .    .   .   ..             .24
5.    F Table Showing .01 and .05 Probability Levels..............00             27
6.    The t Distribution and the Normal Distribution ....esetoe...32
7.    Regression Example Using Manufactured Data:
Rent and HousehC-ld Size .............       .    .    .    .   .    .   39
8.    Dummy VariabI2 Coding Scee             .   .    .    ..             .   .   44
9.    Regression Example Illustrating Us*e of Dummy
Variables ........................                           .   .    .   45
10.    Measures of Housing Consumption................................50
11.    Cairo Renter Demand Equations Using Log
of Gros s Rent....... . .         ................. 000  0*0  000  040  000  9ea 6 0
12.    Cairo Owner Demand Equations Using Log of House Val                ue.. o*  61
13.    Simple Demand Equations for Renters and Owners
4.      ir
64.   Car     RetrHd        icEuto ..soooooooooeso                                7


﻿vi
List of Figures
Page No.
1.    Linear Plot of Rent By Income   (Cairo).10......
2.    Logarthims Plot of Rent by Income   (Cairo)................      13
3.    Histogram of Residuals from Simple Renter
Demand Equation   (Cairo)........,,,ooovoovooo09000040000000   20
4.    Plots of Manufactured Data With Large and Small Variance.....    22
5.    Plot of Manufactured Data Illustrating Hypothetical
Non-linear Relationships Between Rent and
Household Size...........                                      38


﻿PART I: STATISTICAL TECHNIQUES
Sectior I.1      Introduction: Two Purposes of Statistics
.nformation about housing conditions is costly to collect, and
difficult to use intelligently unless we have some wal to reduce the
information into manageable form. For example, someone charged with designing
a housing program for a particular town will want to know the relationship
1W            between people's incomes and how much they are willing to pay for housing.
Since collecting information for every household in the town is expensive, we
obviously rely upon a sample of households to collect this kind of
information. But even after a well-chosen sample is surveyed we have more raw
information than we can comfortably digest -- what you can make of the rents
and incomes of a thousand or even a hundred people? But you can easily
compute the average income and average rent of your sample, and these two
numbers give you more usable information than the hundreds of numbers they
were derived from.
A statistic -- an average, median, regression coefficient, or
whatever -- summarizes the information in a sample, and can be used for two
purposes. First, statistics are descriptive -- a way to reduce a lot of
information in a sample to one or perhaps several pieces of information, which
can be more easily absorbed by the analyst. Second, statistics can be used to
test hypotheses, that is, establish the probable truth or falsity of certain
propositions, given the information in the sample. The two purposes are, of
course, related. Suppose we divide up our sample into low and high income
groups, and compute the average rent of each sub-sample. These two new pieces
of information (1) sumarize the rents paid by each group (description) and


﻿-2 -
(2) permit a test of the admittedly simple hypothesis that higher income
people spend more on housing (inference).
This paper will explain some common statistical procedures and their
application to housing market analysis in some detail. Most of the emphasis
will be on the use of medians and other "order statistics", and on regression
analysis. Much of the material will be familiar to many readers, especially
to statisticians, but the note will go over the basics in order to make the
discussion somewhat self-contained. References are given where appropriate
for those who want to pursue these topics in more detail. Examples will be
used to illustrate some of the techniques, using actual data from an Egyptian
housing survey.1/
Section 1.2      Medians and Related Order Statistics
Order statistics are statistics which are based on ranks or order by
some criterion variable.2/ The most common order statistic is the median.
Like the arithmetic mean, or average, it is a measure of central tendency or
location, but it has several desirable properties which will be briefly
discussed. Suppose we have a small sample of five households, with rents of
100, 120, 120, 150 and 250 shillings, respectively. The average rent of this
sample is ,of course, 148 shillings. The median rent is the rent paid by the
"middle household", or 120 shillings. In general, the median   of any variable
is the value of that variable for which half the sample values are above the
1/   See Mayo et al., 1982, for a description of the data.
2/   They are also called non-parametric statistics. See Blalock (1960),
Chapter 5.


﻿-3 -
median and half are below.2-/ It is computed by (1) sorting the sample or. the
variable of interest, (2) computing one-half of the sample size, (call it
N/2), and (3) reporting the value of that variable for the "N/2th"
observation. Other order statistics can be computed in a similar fashion,
e.g. quartiles are computed using N/4, quintiles using N/5, deciles using
N/1O, percentiles using N/100 and so on. The median is also the second
quartile, the fiftieth percentile, and the fifth decile. When should we use
the arithmetic mean, and when use medians? The short answer is, the mean is
superior when the data are normally distributed;./ the median is better when
data are best approximated by some other distribution. Rents, house values,
and incomes are examples of varibles that are not, in general, normally
distributed, but are truncated at zero and have more very large values than do
normally distributed variables. Because of this, the means of these variables
can be unduly affected by a few extreme observations; the median is much less
sensitive to the presence of large values: we say the median is more robust
than the mean. It is a better representation of the typical value in the
sample.
Other order statistics can be computed in addition to the median.
Two common statistics and the first quartile and the third quartile.
To compute them, rank the data and compute the sample size, N.
Divide the data into fourths, then proceed as follows: the value of the
"N/4th" observation is the value of the first quartile; one-fourth of the data
have lower values, three-fourths have higher values. The "2N/4th" observation
is of course the median, discussed above. The "3N/4th" observation is the
third quartile, for which three-fourths of the data have lower values and one-
fourth have higher values.
3/   If the number of sample observations is odd the median is more precisely
the (N/2 + .5)th observation; if N is even a common procedure is to
average the two observations (N/2 + .5) and (N/2 - .5).
4/   For a review of the normal distribution see any statistics text.


﻿-4-
The first and third quartiles give a good idea of the spread of the
distribution; half of the data lie between these two values. Their difference
is often computed and referred to as the "interquartile range," and can be
thought of as the order statistic analagous to the more familiar standard
deviation.
Medians and other order statistics can be very useful in cross-
classifications. An example from the Egyptian housing survey will illustrate
the idea (Table 1).
The median rent of our sample of Cairo renters is 8 Egyptian pounds.
The arithmetic average is 13 pounds but this overstates the rent of the
"typical" unit because the average is heavily influenced by a few extreme
observations, up to 224 pounds). Notice that the mean is approximately equal
to the third quartile, not the median. In other words, the mean is not the
best estimate of the rent paid by the typical (i.e. middle) consumer. Also,
we can illustrate that medians are more robust than averages. Suppose we drop
the top five observations (57, 73, 109, 156 and 224 pounds) and recompute.
The mean is 11 pounds (a difference of 15 percent) but the median remains
stable at 8 pounds. Small changes in the sample do not affect the median as
much as the mean.
We can see similar patterns with other distributions, such as rent-
to-income ratios, also included in Table 1.
Then, suppose we want to know how rents are related to total
income. One effective method to use is to: (1) divide the sample into groups
based on income ranks (e.g. quartiles, quintiles, deciles or whatever) then
(2) compute the median within each group. We'd like to have at least thirty
observations in each cell to ensure reliable results, so we divide the data
into quintiles (using deciles resulted in sample sizes of 20 in several


﻿-5-
Table 1: Summary Statistics from Cairo Sample
Income         Gross Rent        Rent-to-Income
Mean                      115              13                 .18
Median                     87               8                  .10
First Quartile            59                6                  .06
Third Quartile           129               14                 .16
Note: Definition: Mean is the arithmetic average. Median is the mid-point
of the distribution of rents. First Quartile is the rent paid by the
household which is at the twenty-fifth percentile (i.e, one-fourth of
all households pay less, three-fourths pay more). Third Quartile is
the seventy-fifth percentile (three-fourths pay less, one fourth pay
more). Also, notice that the average (or median or quartile) of each
household's rent-to-income ratio is not the same as the sample average
(median, etc.) rent to the sample average (median, etc.) income.


﻿-6 -
cells). Then we compute the median, and the first and third quartiles, within
each quintile. Table 2 presents these results.
Now add two refinements. Obviously, rents increase with income, but
we'd like better information on how fast they go up relative to total
consumption. One way to do this is to look at rent-to-income ratios rather
than rents. If the ratio goes up with income, then rent goes up faster than
income; if the ratio is constant, rent goes up at the same rate as income; if
the ratio decreases as income goes up, then rents go up more slowly than
income (although they still go up). In economic jargon, these three cases
correspond to elastic demand, demand of unit elasticity, and inelastic demand,
respectively.
The second refinement is this: compute the first and third
quartiles of the rent-to-income ratio as well as the median (second quartile)
within each income quintile. This gives us a good idea of the distribution of
the ratio, i.e. how much it varies in each group. Table 2 presents these
results for our example data. Several interesting patterns emerge.


﻿-7-
Table 2: Summary Statistics from Cairo Sample,
By Income Quintile
Gross Rent              Rent-To-Income-Ratio
Number      First            Third     First            Third
of      Quartile          Quartile Quartile          Quartile
Sample       of     Median    of        of     Median     of
Observations   Rent     Rent    Rent       R/I     R/I       R/I
-          Fifth          49          8       20      27       .04      .06      .11
(150-797)
Fourth         53          7        8       13      .06      .08      .11
(100-149)
Third          46          6        9       13      .07      .10      .15
(75-99)
Second         49          6        8      11       .09      .13      .17
(54-74)
First          49          4        6       9        .10     .14      .28
(0-53)
Total
Sample        246          6       8       14       .06      .10      .16


﻿-8 -
First, as everyone expects, rents increase with income. In
particular, the typical rent in the highest quintile is about twice that in
the other quintiles. Second, notice that the median rent in the fourth
quintile is actually a little lower than that in the third quintile, but that
the difference (1 pound) is small relative to the spread of the distributions,
as measured by the differences between first and third quartiles of rent
within the third and fourth income quintiles (these numbers are 13-6=7 pounds,
and 13-7=6 pounds respectively). This illustrates an important point: with
real world data, careful analysis requires looking at the spread of
distributions in addition to point estimates. The key finding is that the
rent distribution is relatively flat in the second, third and fourth
quintiles. Most of the differences in median rents is at the very top and
very bottom of the income distribution. Differences between income class
medians are small relative to the spread within classes. In other words, the
bivariate relationship between rent and income is positive, but income alone
does not explain much of the observed variation in rents paid.
The last three columns of Table 2 illustrate that even though rents
go up with income, the proportion of rent-to-income declines. In particular,
the poor in Cairo often pay large proportions of their income on rent; a
fourth of the poorest income class pay 28 percent or more. These kinds of
results show that some common rules of thumb about affordability are
contradicted in the Cairo market.4
4/    Policy implications of particular results are not discussed in this
paper. Forthcoming papers from a research project on "Housing Demand
and Finance in Developing Countries" conducted by the World Bank's Water
Supply and Urban Development Department will address these issues.


﻿9
Section 1.3     Housing Market Analysis Using Regression Techniques
One limitation of the methods described above is that order
statistics are difficult to apply to multivariate problems: for example,
suppose we hypothesize that willingness to pay depends on other variables as
well as income or total consumption. For example, we might expect that larger
families consume more housing; that higher income families consume more
housing; but also that larger families have higher incomes because they have
more wage earners. Regression analysis permits us to estimate the separate
effects of household size and income from a sample in which all three
variables are correlated.!/ It also permits us to test hypotheses about the
relative importance of these separate effects*--
The next few pages will develop some of the ideas behind regression
analysis by starting with the simplest problem, one dependent variable and one
independent variable, and then extending the technique to several independent
variables. After the basics of the statistical technique are covered, we will
discuss the actual specification of regressions for the coming report, in
Part II.
1.3.1     The Logic of Regression Anaylsis
Consider once again the relationship between rent and income.
Figure 1 shows a plot of our example data from Cairo, Egypt. If there was no
5/    If the independent variable income and household size are highly
correlated, it is hard to separate effects even with this technique, but
the standard errors of the regression coefficients will warn us of the
problem. This will be discussed below.
6/    Hypothesis testing with regression analysis assumes normality, which we
stated above is not always realistic; but it turns out that these tests
are still approximately correct with the kind of truncated distributions
we encounter in economic analysis. For a more detailed discussion, see
Theil (1971) pp. 615 ff.


﻿스


﻿relationship between rents and income, the plotted points would, of course, be
scattered across the page in random fashion. If there was a very strong
positive relationship, the plotted points would mostly fall near a line with
positive slope drawn on the page (that is, near a line which represents
increasing values of the rent variable as income increases). It is not
surprising that with real world data, we often get a pattern somewhat in
between these two extremes: the plot will show some tendency for large rents
and incomes to be associated, but the pattern will usually not be very
pronounced. Looking at Figure I we see that there are some points plotted in
the upper right-hand corner (high rent, high income) but no points in the
upper left hand corner (high rent, low income). Most of the points are
bunched in the lower left, and any pattern is hard to discern.
This is related to the problem we discussed above: that rents and
incomes typically have a skewed distribution, so that a few outlying
observations (especially high income observations) can obscure what's going on
in the rest of the data. We don't want to just drop the outlying observations
(unless we think they are so unrealistic that they are mistaken or miscoded
responses) because these observations contain valuable information. A common
solution to this problem is "reexpression" or "transfor-mation" of the original
rent and income variables in order to mitigate the problem.-L/ What we'd like
to do is find a way to compute new variables which (1) contain essentially the
same information as the original variables, that is, how fast rents increase
with income, but (2) the new variables more closely approximate a normal
distribution and are therefore better candidates for statistical analysis.
7/    See Tukey (1977), Chapter 4 for a more detailed discussion of
transformations.


﻿- 12 -
A common transformation used in economic analysis is the natural
logarithm--  Logarithms have the desirable property that they contain
information about the original variable -- without exception, the larger the
original variable, the larger its logarithm -- while in most cases the log of
rent or income more closely approximates the normal distribution than the
9/
original untransformed variable.--
Figure 2 presents a plot of logarithms of rents and incomes for the
Cairo sample. Notice that the pattern of positive association is more
pronounced in Figure 2 than in Figure 1.
How can we summarize the information contained in these plots? To
state that rent increases with income does not significantly extend the
frontiers of human knowledge. What we want to know is, by how much does it
increase?
If we drew a line through the points in Figure 2, the slope of that
line would be a number that would tell us how much the log of rent went up as
the log of income increased by one, or in terms of the original intransformed
rent and income, the percentage increase in rent given a percentage increase
in income (See appendix B). Regression analysis is nothing more than a
technique for fitting the best line through a collection of points like those
in Figure 2 10!
8/    "Natural" logarithms are logarithms using the base 2.718. "Common"
logarithms use the base 10. See Appendix B for details; we always work
with natural logarithms.
9/    There are other advantages which are discussed in Appendix B.
10/   Technically, "best" means minimum variance among all unbiased linear
estimators. Estimates always have some error associated with them; they
are estimates of an unknown "true" parameter. Unbiased means that
although our results for any given sample have some error, if we look at
many samples (e.g. many towns) these errors tend to cancel out. Minimum
variance means that there is no technique that we could use to fit an
unbiased line which would usually come closer to the true parameter.
For more details and proofs, see any statistics textbook.


﻿FIGURE 2
LOGRITHMIC PLOT OF RENT BY INCOME (CAIRO)
5-
L
L 4-
0  -
G 4
N
T 3-
H
L
y                                                           glø
R 2
E
N
T  -
* g                          g                                   g
.0    0.6     1.2    1.8     2.4     3.0    3.6     4.2     4.8     5.4    6.0     6.6
LOG MONTHLY INCOME


﻿- 14 -
Simple Bivariate Regression Versus Multiple Regression
The example we have used has been deliberately limited to one
dependent variable and one independent (right hand side) variable, because we
can illustrate the principles involved with graphs and simple algebra.
However, it is straightforward to extend these techniques algebaically to more
than one independent or explanatory variable. In fact, one of the chief
advantages of regression analysis is that it is a multivariate technique.
With this technique it is possible to sort out the separate effects of several
explanatory variables, even when the explanatory variables themselves are
interrelated.
For example, suppose we started out with a sample of 150 renters in
a particular town, and wanted to estimate the effects of (1) income, (2)
household size, and (3) age of household head upon housing consumption. Using
cross tabulations, we could divide the sample into, say, 5 income groups,
4 household size groups, and 3 age of household groups. Then we could compute
the mean or the median rent in each cell, and examine the results to get
estimates of the effects of these variables on consumption. But there are
5 * 4 * 3 = 60 cells! Many cells will be empty, and most will have only a few
observations. Our means or medians will be extremely unreliable.
Regression techniques get around this problem. With income entered
as a continuous variable, 3 household size dummy variables, and 2 age of head
dummy variables, and a constant term, we can run a regression which estimates
the separate effects of each variable but which has 150 - (1 + 3 + 2 + 1) =
143 degrees of freedom. Our estimates will be more reliable, using the same
data.
For the rest of the paper we will skip back and forth between simple
bivariate examples and multiple regression examples. Although the multiple
regression examples can't be graphed as easily, there are no essential


﻿- 15 -
differences between simple and multiple regression. Tests or procedures which
we illustrate with simple examples can be applied straightforwardly to
multiple regression models.
Section 1.3.2     An Example Using Developing Country Data
We will not discuss the details of how to compute a regression
coefficient here; they can be found in any statistics text and in manuals for
computer packages like SPSS. Table 3 is a photocopy of the output from a
regression computed using the same Cairo data, by the computer package SAS.
Most likely, CBS will compute these regression results using SPSS, a similar
type of package. Even though this is a simple one-variable regression,
computer packages print out a lot of numbers -- we will focus on the most
important ones: parameter estimates, standard error, t-statistics for the
hypothesis that the coefficient is zero, and R-squared.
To fit that line through the plot we need only the parameter
estimates for the log of income variable, and the intercept. Put differently,
when we estimate this regression we are assuming the following model of
housing demand:
(1) log(R) = a + b * log(I) + u
where log(R) is the log of rent,
log(I) is the log of income,
a and b are regression coefficients to be estimated, and
u is the "residual", or the difference between the value
of log(R) predicted by our estimated a and b for a
given observation (i.e. for a given value of log(I),
and the actual value of log(R) for that observation.
For now, assume that this is the correct or true model. Later we
will develop a better model of housing demand.
Every observation in the sample has its own values of log(R) and log
(I), and of u. The two numbers a and b are fixed. Note that we have


﻿Table 3
Simple Regression of Log Rent on
Log Income, Cairo
DEP VARIABLE: LMGRENT LOG MONTHLY RENT
SUM OF        MEAN
SOURCE    DF     SQUARES       SQUARE     F VALUE      PROB>F
MODEL      1   24.289792    24.289792      49.052      0.0001
ERROR    244     120.824     0.495182
C TOTAL 245       145.114
ROOT MSE    0.703692     R-SQUARE      0.1674
DEP MEAN     2.223925    ADJ R-SQ      0.1640
C.V.         31.64187
PARAMETER    STANDARD   T FOR HO:                  VARIABLE
VARIABLE DF     ESTIMATE        ERROR PARAMETER=0   PROB > ITI      LABEL
INTERCEP   1    0.371449     0.268277       1.385      0.1674 INTERCEPT
LMINCOME   1    0.413728     0.059072       7.004      0.0001 LOG MONTHLY INCOME
0%
rI


﻿- 17 -
estimated the residuals as well as the coefficients; the residuals will be
used to measure "how good" our regression coefficients are since they are used
to compute standard errors and R-squared. But first we will discuss the
coefficients.
One of the advantages of regression analysis is that coefficient
estimates permit a prediction of the dependent variable [log(R)l given the
level of the independent variable or variables [here log(I)I. Continuing our
Cairo example, if a household has an income of 85 Egyptian pounds per month,
we can predict the log of rent as:
predicted log(R) = .371449 + .413728 * log(85)
= 2.20950
and to get predicted rent we take the antilog of this result ("exponentiate"):
predicted rent   = exp (2.20950)
= 9.11 Egyptian pounds.
Recall our example of median rents computed by income quintiles,
from above. A family with an income of 85 pounds is in the third quintile,
whose median rent is 10.5 pounds. 10.5 pounds is thus the predicted rent for
an 85 pound income from the median procedure, and 9.1 pounds is the predicted
rent for the same household from the regression procedure. Which is better?
The median procedure is more resistant to mistakes from miscoded data, and is
easier for non-statisticians to understand. The regression procedure permits
more "fine tuning" of the estimates; the impact of say, a 10 pound increase in
income on housing consumption is difficult to compute using the median
procedure; with the regression procedure it's easyi-l/
11/   For example, the impact of a 10 pound increase from 85 to 95 pounds is:
exp [.371 + .414 * log(95)] - exp [.371 + .414 * log(85)]
= 9.55 - 9.11
= .44 pounds.


﻿- 18 -
Section 1.3.3     Residuals
Turning now to the estimated residuals (the "u" in equation 1), note
that these are estimates of the error in the regression equation.. They are
just, for each sample household, the difference between actual house rent, and
the rent predicted by our estimated equation. Suppose we have two sample
households, each with 85 Egyptian pounds income; household 1 lives in a house
that rents for 10 pounds per month, and household 2's house rents for 8
pounds. Our estimated rent is the same for both households, namely 9.11
pounds, so we've made estimated errors of .89 and -1.11 pounds
respectively..l21 There are as many residuals as there are sample households,
246. These 246 residuals are 246 pieces of information that tell us how well
our equation fits the data. From these residuals are calculated many familiar
statistics that summarize that information131 We will discuss briefly
(1) three measures of how well the equation does overall: R-squared, the
standard error of the equation, and the F-test for the equation; and
(2) two measures of how well we estimate individual coefficients: the
standard error of a coefficient. First, as background, we discuss some
properties of errors and their estimates (residuals).
The most common assumption in statistics is that errors are normally
distributed with a zero mean. If this assumption is correct, many tests exist
which can be usoed for testing various hypotheses about the equation, e.g.
12/   A subtle point is that the residuals   are estimated errors because they
are calculated from estimated coefficients. Given the correct model, we
never know the true values of the regression coefficients but calculate
estimates. If we knew the true a and the true b, we'd know the true
errors.
13/   Details of calculation can be found in any statistics textbook. Many
books also contain more advanced ideas on how to use these residuals,
e.g. how to check the assumption of constant variance, exact tests for
normality, etc.


﻿- 19 -
whether a particular coefficient is statistically different from zero.1!!' In
many examples we know this assumption is not strictly correct. Consider our
housing expenditure equation. Since rent can never be less than zero, there
is a bound on size of negative errors, (actual rent less than estimated rent)
but there is no such bound on positive errors (actual greater than estimates),
at least in principle. Normal errors would be unbounded on both sides.
Fortunately, many studies have demonstrated that if the errors are only
approximately normally distributed, the usual tests and statistics are
approximately correct 15/ and we can justify using them. By approximately
normal, we mean that the errors (and hence their estimates) are bell-shaped
when plotted as in Figure 3, i.e. most residuals are small in absolute
magnitude, and cluster around the central value of zero. This can be checked
by doing plots similar to Figure 3.
Note that even if the residuals are erratically distributed,
regression coefficients still have desirable properties, if we have the
correct model. That is, we still obtain the "best" estimates, we just can't
apply the usual tests of significance.
Section 1.3.4     Hypothesis Testing
Plots of residuals like those in Figure 3 yield useful information
about the regression. But remember that there were originally two purposes of
statistical analysis: to somehow reduce a large and unwieldy number of pieces
14/   We discuss what this means below.
15/   See Theil, pp. 615 ff.


﻿FIGURE 3
HISTOGRAM OF RESIDUALS FROM SIMPLE RENTER DEMAND EQUATION (CAIRO)
FREQUENCY BAR CHART
FREQUENCY
25 +
20 +
15* *
1+*                               ***      ***      *
*15*+                                * *   **  *  *
4+                         *      *** * * *    *    **
*         **** **    ***      *   *    *  *     *
* * * *                                * *    *    *ss sse       ***
2 1 1 1 1 1 1 1 1 1 1 OOOOOOOOOOOOOOOOOOO 0                 0    0    1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
O 9 8 7 6 5 4 3 2 1 O       9  8 7 6 5 4 3 2 1 O 1 2 3     4                     5 6 7 8 9    1 2 3 4 5 6 7 8 9 
RESIDUAL MIDPOINT        RESIDUALS


﻿- 21 -
of information from our sample into a few manageable statistics that conveyed
the essential information contained in the sample, and to make formal tests of
specific hypotheses.
The first step in hypothesis testing is, of course, to carefully
formulate a testable hypothesis: for example, "the coefficient of income is
not discernibly different from one" or "the regression equation explains more
of the variation in the dependent variable than would be expected by mere
chance." We will have more to say about the exact specification of hypothesis
as we go through examples.
The residuals contain the information that can be used (with the
coefficients) to test hypotheses. But whereas the coefficients are only a few
numbers, there are as many residuals as there are sample observations.
Fortunately, we can form summary statistics of these estimated residuals
themselves, and use these for our tests. The next few pages discuss some
commonly used statistics computed from the residuals: R-squared, the standard
error of the overall equation, and the standard errors of the estimated
coefficients. Again, formulas and computational details can be found in any
statistics text; the purpose of this description is to provide an inituitive
understanding.
R-Squared.
Consider the two plots of two different samples in Figure 4, with
their estimated regression lines drawn in. (These plots are constructed from
manufactured data in order to exaggerate the effects we want to talk about).
Each line has the same slope, and the same intercept, but in 4-A most of the
data points are clustered around the line, while in 4-B the data do not 'fit'
the line as well. This is true despite the fact that when you estimate the
regressions for each sample you get almost the same regression


﻿귀


﻿- 23 -
coefficient-.    Table 4 displays the regression equations used to fit the two
lines in the plots. Notice that the estimated slopes and the estimated
intercepts are roughly in agreement, but the R-squared statistics are very
different (.80 versus .28). R-squared, or the square of the multiple
correlation coefficient, is a statistic that gives a measure of the goodness-
of-fit of the regression equation.
R-squared varies between zero and one, and can be interpreted as the
percentage of variation in the dependent variable (here log of rent) that can
be "explained" by that regression equation. That is, if all the data points
lie on the line -- if there is no error in our regression -- then the
difference between our predicted values and the actual values is zero. R-
squared is calculated as the difference between 1 and the ratio of the sum of
squared errors (often abbreviated SSE) to the sum of the squares of the
dependent variable or residuals (often abbreviated SST, for sum of squares
total); when the errors are all zero, i.e. all points fall on the line or
11 perfect fit", R-squared is one. If the errors are large, the fit is poor,
and the ratio SSE/SST approaches 1, and R-squared approaches zero.
The Standard Error of the Overal. Equation.
With real world data, R-squared will never be exactly zero or
exactly one. Even if we pick nonsense data with no relationship -- even if we
regress completely random numbers with each other -- we will always get some
-numerical coefficient estimates and some positive (though small) R-squared.
If R-squared is small, is that because our data are unrelated, or because the
relationship we are measuring is a true one but is rather weak? In other
words, what is a low R-squared? Fortunately we can construct a formal test of
the hypothesis "this regression equation result
16/   They are the same because we made the data up that way for the example.


﻿TABLE 4-A
REGRESSION EXAMPLE USING MANUFACTURED DATA
DEPENDENT VARIABLE WITH SMALL VARIANCE
DEP VARIABLE: Yl
SUM OF        MEAN
SOURCE    DF      SQUARES      SQUARE      F VALUE      PROB>F
MODEL      1      491.942     491.942      414.040      0.0001
ERROR     98      116.439    1.188151
C TOTAL   99      608.381
ROOT MSE     1.090024    R-SQUARE       0.8086
DEP MEAN     6.043012    ADJ R-SQ       0.8067
C.V.         18.03775
PARAMETER     STANDARD  T FOR HO:
VARIABLE DF      ESTIMATE       ERROR PARAMETER=0   PROB > ITI
INTERCEP   1     1.715451    0.238984        7.178      0.0001
X          1     0.785114    0.038584       20.348      0.0001
I
TABLE 4-B
REGRESSION EXAMPLE USING MANUFACTURED DATA
DEPENDENT VAIABLE WITH LARGE VARIANCE
DEP VARIABLE: Y2
SUM OF        MEAN
SOURCE    DF      SQUARES      SQUARE      F VALUE      PROB>F
MODEL      1      468.433     468.433       38.501      0.0001
ERROR     98     1192.331   12.166642
C TOTAL   99     1660.764
ROOT MSE     3.488071    R-SQUARE       0.2821
DEP MEAN     6.123751    ADJ R-SQ       0.2747
C.V.         56.95972
PARAMETER     STANDARD  T FOR HO:
VARIABLE DF      ESTIMATE       ERROR PARAMETER=O   PROB > ITI
INTERCEP   1     1.900860    0.764748        2.486      0.0146
X          1     0.766124    0.123470        6.205      0.0001


﻿- 25 -
is due merely to chance: there is no real statistically discernable
relationship present in these data."
The way the test is computed makes use again of the relationship
between the sum of squared errors and the total variance in the dependent
variable. By definition, the total sum of squares of the dependent variable
equals the sum of squared errors plus that part of the SST which is not error,
that is, which is explained by the regression.
(2) SST = SSE + SSR
where SSR is an abbreviation for "sum of squares due to regression." In other
words, the total sum of squares can be partitioned into explained (SSR) and
unexplained (SSE) variation.
If the number of observations is small, or if the number of
independent variables is large, we can get a large R-squared just because
there aren't enough data, or pieces of information, to tell us whether our
estimates are due to chance or not. For example in an extreme case, where we
fit a line to two or three points, the R-squared will always be close to one
because once we mechanically draw the line through a few points there aren't
any other points left over to tell us whether there is any error or not. So
to test whether R-squared is due to chance or not we rely on corrections for
"degrees of freedom," i.e., how many observations we have in excess of the
number of regression coefficients to be estimated.
It turns out that a good way to make this correction is to compute
the number: 1--
(R-squared)/k
(3)   F =-   -----------------------
(1-R-squared)/(N-k-1)
17/   See any statistics text or the SPSS manual, page 335, for details.


﻿- 26 -
where N is the number of sample observations and k is the number of regression
coefficients estimated. We see that as R-squared increases the number F
increases, but that as the number of observations and the number of
coefficients change F also changes. The number F can be compared to a table
to see if the value of F is statistically significant, that is, whether the
independent variables really have an effect on the dependent variable or
whether the measured effect might be due merely to chance.
One page of an F table is reprinted as Table 5. Tables such as
these, with so-called critical values of the F distribution for a few
probabilities (usually .1, .05, and .01) can be found in any statistics
book. Some computer packages (like SAS, which we have used) provide these
probabilities directly. SPSS does not, so we have to use a table. The
critical value, or level of significance, is the probability of making a
mistake that we are willing to accept. By convention, .05 is the most
commonly used level, i.e., there is one change in 20 that we will not reject
the hypothesis that our result is random when it is is fact random.


﻿- 27 -
Table 5: F Table showing 0.01 and 0.05 Drobability levels
Table for 0.05 probability level
u - degrecs of freedom for numerator
1    2     3     4    5     6     7    8     9    10   12    15   20    24    30   40    60   120    o
1  161   200  216   225   230  234   237  239   241   242  244   246  248   249   250  251   252   253  254
2  18.5  19.0  19.2  19,2  19.3  19.3  19.4  19.4  19.4  19.4  19.4  19.4  19.4  19.5  19.5  19.5  19.5  19.5  19.5
3  10.1  9.55  9.28  9.12  9.01  8.94  8.89  8.85  8.81  8.79  8.74  8.70  8.66  8.64  8.62  8.59  8.57  8.55  8.53
4  7.71  6.94  6.59  6.39  6.26  6.16  6.09  6.04  6.00  5.96  5.91  5.86  5.80  5.77  5.75  5.72  5.69  5.66  5.63
5  6.61  5.79  5.41  5.19  5.05  4.95  4.88  4.82  4.77  4.74  4.68  4.62  4.56  4.53  4.50  4.46  4.43  4.40  4.37
6  5.99  5.14  4.76  4.53  4:39  4.28  4.21 4.15  4,10  4.06  4.00  ? 94  3.87  3.84  3.81  3.77  3.74  3.70  3.67
7  5.59  4.74  4.35  4.12  3.97  3.87  3.79  3.73  3.68  3.64  3.57  3.31  3.44  3.41  3.38  3.34  3.30  3.27  3.23
8  5.32  4.46  4.07  3.84  3.69  3.58  3.50  3.44  3.39  3.35  3.28  3.22  3.15  3.12  3.08  3.04  3.01  2.97  2.93
9 '5.12  4.26  3.86  3.63  3.48  3.37  3.29  3.23  3.18  3.14  3.07  3.01  2.94  2.90  2.86  2.83  2.79  2.75  2.71
-     10  4.96  4.10  3.71  3.48  3.33  3.22  3.14  3.07  3.02  2.98  2.91  2.85  2.77  2.74  2.70  2.66  2.62 -2.58  2.54
0
E     11  4.84  3.98  3.59  3.36  3.20  3.09  3.01 2.95  2.90  2.85  2.79  2.72  2.65  2.61  2.57  2.53  2.49  2.45  2.40
12  4.75  3.89  3.49  3.26  3.11  3.00  2.91  2.85  2.80  2.75  2.69  2.62  2.54  2.51  2.47  2.43  2.38  2.34  2.30
13  4.67  3.81  3.41  3.18  3.03  2.92  2.83  2.77  2.71  2.67  2.60  2.53  2.46  2.42  2.38  2.34  2.30  2.25  2.21
14  4.60  3.74  3.34  3.11 2.96  2.85  2.76  2.70  2.65  2.60  2.53  2.46  2.39  2.35  2.31  2.27  2.22  2.18  2.13
15  4.54  3.68  3.29  3.06  2.90  2.79  2.71 2.64  2.59  2.54  2.48  2.40  2.33  2.29  2.25  2.20  2.16  2.11 2.07
16  4.49  3.63  3.24  3.01  2.85  2.74  2.66  2.59  2.54  2.49  2.42  2.35  2.28  2.24  2.19  2.15  2.11  2.06  2.01
17  4.45  3.59  3.20  2.96  2.81  2.70  2.61 2.55  2.49  2.45  2.38  2.31 2.23  2.19  2.15  2.10  2.06  2.01  1.96
18  4.41  3.55  3.16  2.93  2.77  2.66  2.58  2.51  2.46  2.41 2.34  2.27  2.19  2.15  2.11  2.06  2.02  1.97  1.92
19  4.38  3.52  3.13  2.90  2.74  2.63  2.54  2.48  2.42  2.38  2.31  2.23  2.16  2.11  2.07  2.03  1.98  1.93  1.88
20  4.35  3.49  3.10  2.87  2.71  2.60  2.51 2.45  2.39  2.35  2.28  2.20  2.12  2.08  2.04  1.99  1.95  1.90  1.84
92    21  4.32  3.47  3.07  2.84  2.68  2.57  2.49  2.42  2.37  2.32  2.25  2.18  2.10  2.05  2.01  1.96  1.92  1.87  1.81
1     22  4.30  3.44  3.05  2.82  2.66  2.55  2.46  2.40  2.34  2.30  2.23  2.15  2.07  2.03  1.98  1.9.4 1.89  1.84  1.78
23  4.28  3.42  3.03  2.80  2.64  2.53  2.44  2.37  2.32  2.27  2.20  2.13  2.05  2.01  1.96  1.91  1.86  1.81  1.76
24  4.26  3.40  3.01  2.78  2.62  2.51 2.42  2.36  2.30  2.25  2.18  2.11  2.03  1.98  1.94  1.89  1.84  1.79  1.73
25  4.24  3.39  2.99  2.76  2.60  2.49  2.40  2.34  2.28  2.24  2.16  2.09  2.01  1.96  1.92  1.87  1.82  1.77  1.71
30  4.17  3.32  2.92  2.69  2.53  2.42  2.33  2.27  2.21  2.16  2.09  2.01  1.93  1.89  1.84  1.79  1.74  1.68  1.62
40  4.08  3.23  2.84  2.61 2.45  2.34  2.25  2.18  2.12  2.08  2.00  1.92  1.84  1.79  1.74  1.69  1.64  1.58  1.51
60  4.00  3.15  2.76  2.53  2.37  2.25  2.17  2.10  2.04  1.99  1.92  1.84  1.75  1.70  1.65  1.59  1.53  1.47  1.39
120  3.92  3.07  2.68  2.45  2.29  2.18  2.09  2.02  1.96  1.91  1.83  1.75  1.66  1.61  1.55  1.50  1.43  1.35  1.25
ao  3.84  3.00  2.60  2.37  2.21  2.10  2.01  1.94  1.88  1.83  1.75  1.67  1.57  1.52  1.46  1.39  1.32  1.22  1.00
Table for 0.01 probability level
V = degrees of freedom for numerator
1    2     3     4    5     6     7    8     9    10   12   .15    20   24    30    40   60    120   c
1  4052 5000 5403 5625 5764 5859 5928 5982 6023 6056 6106 6157 6209 6235 6261 6287 6313 6339 6366
2  98.5  99.0  99.2  99.2  99.3  99.3  99.4  99.4  99.4  99.4  99.4  99.4  99.4  99.5  99.5  99.5  99.5  99.5  99.5
3  34.1  30.8  29.5  28.7  23.2  27.9  27.7  27.5  27.3  27.2  27.1  26.9  26.7  26.6  26.5  26.4  26.3  26.2  26.1
4  21.2  18.0  16.7  16.0  15.5  15.2  15.0  14.8  14.7  14.5  14.4  14.2  14.0  13.9  13.8  13.7  13.7  13.6  13.5
5  16.3  13.3  12.1  11.4  11.0  10.7  10.5  10.3  10.2  10.1 9.89  9.72  9.55  9.47  9.38  9.29  9.20  9.11 9.02
6  13.7  10.9  9.78  9.15  8.75  8.47  8.26  8.10  7.98  7.87  7.72  7.56  7.40  7.31  7.23  7.14  7.06  6.97  6.88
7  12.2  9.55  8.45  7.85  7.46  7.19  6.99  6.84  6.72  6.62  6.47  6.31  6.16  6.07  5.99  5.91 5.82  5.74  5.65
8  11.3  8.65  7.59  7.01 6.63  6.37  6.18  6.03  5.91  5.81  5.67  5.52  5.36  5.28  5.20  5.12  5.03  4.95  4.86
9   10.6  8.02  6.99  6.42  6.06  5.80  5.61 5.47  5.35  5.26  5.11  4.96  4.81 4.73  4.65  4.57  4.48  4.40  4.31
10  10.0  7.56  6.55  5.99  5.64  5.39  5.20  5.06  4.94  4.85  4.71  4.56  4.41 4.33  4.25  4.17  4.08  4.00  3.91
11  9.65  7.21  6.22  5.67  5.32  5.07  4.89  4.74  4.63  4.54  4.40  4.25  4.10  4.02  3.94  3.86  3.78  3.69  3.65
12  9.33  6.93  5.95  5.41 5.06  4.82  4.64  4.50  4.39  4.30  4.16  4.01 3.86  3.78  3.70  3.62  3.54  3.45  3.36
13  9.07  6.70  5.74  5.21 4.86  4.62  4.44  4.30  4.19  4.10  3.96  3.82  3.66  3.59  3.51 3.43  3.34  3.25  3.17
14  8.86  6.51  5.56  5.04  4.70  4.46  4.28  4.14  4.03  3.94  3.80  3.66  3.51  3.43  3.35  3.27  3.18  3.09  3.00
15  8.68  6.36  5.42  4.89  4.56  4.32  4.14  4.00  3.89  3.80  3.67  3.52  3.37  3.29  3.21  3.13  3.05  2.96  2.8?
16  8.53  6.23  5.29  4.77  4.44  4.20  4.03  3.89  3.78  3.69  3.55  3.41  3.26  3.18  3.10  3.02  2.93  2.84  2.75
42    17  8.40  6.11  5.19  4.67  4.34  4.10  3.93  3.79  3.68  3.59  3.46  3.31  3.16  3.08  3.00  2.92  2.83  2.75  2.65
18  8.29  6.01  5.09  4.58  4.25  4.01  3.84· 3.71 3.60  3.51  3.37  3.23  3.08  3.00  2.92  2.84  2.75  2.66  2.57
19  8.19  5.93  5.01 4.50  4.17  3.94  3.77  3.63  3.52  3.43  3.30  3.15  3.00  2.92  2.84  2.76  2.67  2.58  2.49
20  8.10  5.85  4.94  4.43  4.10  3.87  3.70  3.56  3.46  3.37  3.23  3.09  2.94  2.86  2.78  2.69  2.61  2.52  2.42
%     21  8.02  5.78  4.87  4.37  4.04  3.81  3.64  3.51  3.40  3.31 3.17  3.03  2.88  2.80  2.72  2.64  2.55  2.46  2.36
l    22  7.95  5.72  4.82  4.31 3.99  3.76  3.59  3.45  3.35  3.26  3.12  2.98  2.83  2.75  2.67  2.58  2.50  2.40  2.31
h     23  7.88  5.66  4.76  4.26  3.94  3.71  3.54  3.41  3.30  3.21  3.07  2.93  2.78  2.70  2.62  2.54  2.45  2.35  2.26
24  7.82  5.61 4.72  4.22  3.90  3.67  3.50  3.36  3.26  3.17  3.03  2.89  2.74  2.66  2.58  2.49  2.40  2.31 2.21
25  7.77  5.57  4.68  4.18  3.86  3.63  3.46  3.32  3.22  3.13  2.99  2.85  2.70  2.62  2.53  2.45  2.36  2.27  2.17
30  7.56  5.39  4.51 4.02  3.70  3.47  3.30  3.17  3.07  2.98  2.84  2.70  2.55  2.47  2.39  2.30  2.21  2.11 2.01
40  7.31  5.18  4.31  3.83  3.51  3.29  3.12  2.99  2.89  2.80  2.66  2.52  2.37  2.29  2.20  2.11 2.02  1.92  1.80
60  7.08  4,98  4.13  3.65  3.34  3.12  2.95  2.82  2.72  2.63  2.50  2.35  2.20  2.12  2.03  1.94  1.84  1.73  1.60
120  6.x5  4.79  3.95  3.48  3.17  2.96  2.79  2.66  2.56  2.47  2.34  2.19  2.03  1.95  1.86  1.76  1.66  1.53  1.38
r   6,43  4.61  3.78  3.32  3.02  2.80  2.64  2.51  2.41  2.32  2.18  2.04  1.88  1.79  1.70  1.59  1.47  1.32  1.00


﻿- 28 -
The table is read as follows. The F distribution has two numbers
associated with it called degrees of freedom. The so-called "degrees of
freedom in the numerator," "vl" ("nl", in some tables), is the number of
parameters estimated in the regression. In our simple Egyptian example, v1 is
equal to 2. The corresponding denominator, or "v2'" (or 'n2") is the
difference between the number of observations in the regression sample and the
number of independent (right hand side) variables. We want to find the
critical value F (v1, v2). Tables do not provide numbers for every possible
combination of degrees of freedom, but you can see from Fig. 4.1 that when v2
is greater than 30 the critical values don't change very much, so pick the
best approximation. For our example we choose F (2,120). We see that the
critical values for F (2,120) are 3.07 if the acceptable probability of a
mistake is less than .05 and 4.79 if the acceptable probability is less than
.01. By a mistake we mean rejecting the hypothesis that the coefficient is
zero when it is in fact zero, or rejecting a true hypothesis. 18/
Now compare the F statistic in Table 4.B to the critical values from
Table 5. 38,501 is greater than both 3.07 (level = .05) and 4.79 (level =
.01) so we reject the hypothesis that the regression results are random with
less than 1 chance in 100 of making the mistake discussed above (since 38.501
is greater than 4.79 the critical value at the .01 level). Again, by
convention the 0.05 level is accepted as an indication that the result is
statistically significant.
18/   This is another kind of mistake, that we do not reject the null
hypothesis that the coefficient is zero when in fact it is not zero, or
failing to reject a false hypothesis. Under certain assumptions the F
test will minimize the chance of this kind of error. See Blalock, pp.
91-96 and pp. 188-193.


﻿- 29 -
Standard Errors of Regression Coefficients, and Their Use in Tests.
The standard error of the regression and its associated F-test tell
us whether, overall, our estimates are due to some real relationship between
dependent variable and the independent variables, or whether our results
could be due merely to chance. When we have several independent variables, we
might want to know whether some of them are related to the dependent variable
and others are not. Alternatively, we might want to know how good our
estimate is: how close we can assume our estimate is to the true coefficient?
Each coefficient estimate is a random variable. Every time we draw
a new sample and estimate our regression, we get numerically different
results, even if the underlying population parameter (the true coefficient)
remains fixed. Thus we speak of the distribution of coefficient estimates.
Under certain conditions (chiefly that the regression estimated is the correct
model, i.e., contains all relevant variables and uses the correct mathematical
functional form of the relation) our coefficient estimates are unbiased, that
is, each time we draw a sample and estimate there is some difference between
our estimates and the (unknown) true parameter but over number of samples and
estimates these differences cancel out. In other words, "on average" we will
get close to the true estimates.19/ But the standard error of the coefficient
gives us a measure of the distribution of these errors. Remarkably enough,
19/   Having the correct model is a restrictive assumption, at least in the
strict sense of having all relevant variables, known without error, and
knowing the funtional form exactly. In particular, many economic
variables are difficult to measure, and will almost always have
measurement errors. Fortunately it turns out that if the missing
variables are uncorrelated with the included variables, and if errors in
independent variables are uncorrelated with the (true) error of the
fully specified model, then the coefficient estimates are still
unbiased. See Theil (1971), ch. 3.


﻿- 30 -
even though we do not know the true parameter, we can at least find out how
close to it we are likely to be.
Again, we will skip the computational details since the computer
does the dirty work (see any statistics text or the SPSS manual, page 326).
Referring back to Figure 3 we see that the standard error of the coefficient
of the log of income is 0.059072. Notice that this is much smaller than the
coefficient estimate of 0.413728. Intuitively, that indicates that our
estimate is a good one, i.e., one that indicates not too far from the true
parameter. But we can do better. We can formally test the hypothesis that
the true coefficient is actually some fixed number. We can also state that we
have a certain degree of confidence that the true parameter is within some
interval. We will explain by example.
First, suppose that we want to test the hypothesis that the true
(unknown) coefficient takes on some fixed value. A common example is to test
the hypothesis that the coefficient equals zero. This is common because it is
equivalent to testing whether or not the variable "makes a difference" in the
regression, but we could just as easily test, for example, the hypothesis that
the true coefficient equals one. If the errors are normally distributed, then
we can compute a test statistic as follows: (1) compute the difference
between the estimate and the "maintained hypothesis," the tentatively assumed
value of the true parameter and the estimated parameter; (2) divide this
number by the standard error of the regression coefficient, to get a measure
of how large the difference is relative to the errors expected in the
estimates. This measure is known as the t-statistic. 2i The t-statistic for
20/   Some readers will be familiar with t-tests in another context: testing
the equality of two sample means. In fact the test for a "significant
coefficient" is conceptually very similar to a test of the equality of
means. The interested reader can pursue this in any advanced statistics
text.


﻿- 31 -
our example from Table 3 os 7.004. Under the normality assumption,
statisticians have tabulated what a "large" value of this statistic is.21/
Table 6 reproduces a typical table. Most statistics books and packages
describe these tests as so-called t-tests, and most regression packages
automatically compute the t-test for the hypothesis that the coefficient is
zero, since this is the most common single hypothesis. It is, confusing
unfortunately, that SPSS is one of the few major packages which computes an
F-test for this hypothesis rather than the t-test, but the confusion is
unnecessary because both tests (t-test and F-test) give exactly the same
results. The numbers are different when you compute the test statistic, but
the tables used are different in exactly the same proportion. In fact, the F-
statistic is the square of the t-statistic, so SPSS would have printed out a
value of 7.004 * 7.004.22/ The table look-up procedure is similar to that for
the F-test described above, except that v1, the degrees of freedom for the
numerator is always one because we are testing one coefficient estimate at a
time.
Looking back to Table 3 we see that the critical value for F with 1
and 120 degrees of freedom is 3.92 at the .05 level and 6.85 at the .01 level
21/   The table look-up procedure is similar to the procedure used above for
the F-test so we will not repeat the example. As before, the researcher
selects a "significance level," by convention usually .05, which
represents the acceptable probability of a Type II error (not rejecting
the null hypothesis when in truth it is false). When using the t-test
one confronts the issue of whether to use the "one-tailed test" or the
"two-tailed test". A full discussion of this issue would take us too
far afield, so we simply recommend using the two-tailed test, which is
the more conservative procedure. See Blalock, ch. 10, especially pp.
127-128, for a more detailed discussion.
22/   The F-statistic always corresponds to the two-tailed test discussed in
the preceding footnote. To convince yourself, pick a significance
level, look up a few critical values for different degrees of freedom
for each test (using v1=1 for the F-test) and you will see that the
numbers in the F table are the square of the numbers in the t-table.


﻿- 32 -
Table 6
The t Distribution and the Normal Distribution'
Degrees       Pb       .25      1      .05      .025   .01      .005      G
of Freedom             .5      .2       .1      .05     .02      .01 -
1     1.000   3.078   6.314  12.706  31.821   63.657
2     .816    1.886   2.920   4.303    6.965   9.925
3     .765    1.638   2.353   3.182    4.541   5.841
4     .741    1.533   2.132   2.776    3.747   4.604
5     .727    1.476   2.015   2.571    3.365   4.032
6     .718    1.440   1.943   2.447    3.143   3.707
7      .711   1.415   1.895   2.365    2.998   3.499
8     .706    1.397   1.860   2.306    2.896   3.355
9      .703   1.383   1.833   2.262    2.821   3.250
10     .700    1.372   1.812   2.228   2.764    3.169
11     .697    1.363   1.796   2.201   2.713    3.106
12     .695    1.356   1.782   2.179   2.681    3.055
13     .694    1.350   1.771   2.160    2.650   3.012
14     .692    1.345   1.761   2.145    2,624   2.977
15     .691    1.341   1.753   2.131    2.602   2.947
16     .690    1.337   1.746   2.120    2.583   2.921
17     .689    1.333   1.740   2.110    2.567   2.898
18     .688    1.330   1.734   2.101   2.552    2.878
19     .688    1.328   1.729   2.093   2.539    2.861
20     .687    1.325   1.725   2.086    2.528   2.845
21     .686    1.323   1.721   2.080    2.518   2.831
22     .686    1.321   1.717   2.074   .2.508   2.819
23     .685    1.319   1.714   2.069    2.500   2.807
24     .685    1.318   1.711   2.064    2.492   2.797
25     .684    1.316   1.708   2.060    2.485   2.787
26     .684    1.315   1.706   2.056    2.479   2.779
27     .684    1.314   1.703   2.052    2.473   2.771
28     .683    1.313   1.701   2.048    2.467   2.763
29     .683    1.311   1.699   2.045    2.462   2.756
30     .683    1.310   1.697   2.042    2.457   2.750
40     .681    1.303   1.684   2.021    2.423   2.704
60     .679    1.296   1.671   2.000   2.390    2.660
120     .677    1.289   1.658   1.980    2.358   2.617
(Normal)   co      .674   1.282   1.5     1.960   2.326   2.76
Source. This tabie is abridged from E. S. Pearson and H. 0. Hartley. Bionerrika
Tables for Statisticiana, Vol. 1 (1954), p. 138, with kind permission of the Syndics
of the Cambridge University Press, publishers for the Biometrika Society.
a The smaller probability shown at the head of each column is the area in one
tail; the larger probability is the area in both tails. Example: With 20 degrees of
freedom, at value larger than 1.725 has a .05 probability and a t value exceeding
1.725 in absolute value has a .1 probability.


﻿- 33 -
(recall that we use degrees of freedom 1, 120 since it is the closest entry in
the table to the correct number of degrees of freedom 1, 253). Since 44.12 is
greater than either of these numbers, we can reject the null hypothesis that
the (unknown) true coefficient is zero. In other words, the test indicates
that income has a statistically discernable effect on housing consumption
(rent).
When producing important reports, always check the computed
statistic against the table. In exploratory work, however, a useful rule of
thumb is that a t-statistic greater than 2 (or an F statistic greater than 4)
is greater than the critical value at an approximate significance level of
.05, a common level. The rule of thumb is a good approximation as long as you
have at least 30 degrees of freedom, otherwise use the table even for
preliminary examination of the results. The rule is only correct for tests
involving a single coefficient.
If we wanted to test a different hypothesis we would set the test up
slightly differently. Suppose we want to test the hypothesis that the
coefficient of income is less than one.23/ Then the computer no longer prints
out the t or F statistic that we want, and we compute it by hand: (1) the
difference between 1 and .4137 is .5863; dividing this result by the standard
error of the coefficient (.059072) yields 9.93. This is the t-statistic, and
we can compare it to the critical value in a t table as before, or we can
square it to get the equivalent F statistic (98.60) and compare this result to
the critical value from Table 3 (here 3.92 or 6.85, depending on whether we
choose a significance level of .05 or .01). Again the computed statistic
23/   This is another common hypothesis because, as we will show below, this
kind of logrithmic regression coefficient is an elasticity, or measure
of responsiveness. If it is equal to one, then expenditures on housing
rise exactly proportionally with income. If less than one, non-housing
expenditures rise faster.


﻿- 34 -
exceeds the critical value, so we reject the null hypothesis. The (unknown)
true coefficient of income is less than one, or more strictly, the probability
that the coefficient is as large as one is less than .01 (the significance
24/
level) given the sample--
Confidence Intervals: Coefficients.
Now let's look at a related use of the standard error of the
coefficient: the confidence interval. When we get an estimate, we get a
single number that is our "best guess" of the unknown true parameter. This is
known as a point estimate. Even if we have a very good estimate, the chances
that the true income coefficient is exactly .413728 is very small.
Intuitively, the chance that it is between, say, .4 and .5 is much greater.
The chance that the true parameter is between .35 and .55 is greater still.
In fact, we can compute the probabilities that the unknown true estimate is
within an interval, that is, how likely it is that the true number lies
between two numbers which bracket the estimate. This is an extremely useful
computation, because we often care little about small deviations from the
estimate but we may care a lot about large deviations. If the true parameter
is .39 instead of .43, who cares? It makes little difference for policy
makers who want to calculate the affordability of housing projects. But if
24/   Sometimes, fact, had we chosen an even smaller significance level, say
.005 or .001 instead of .01, we might well have rejected the null
hypothesis. Since the choice of significance level is so arbitrary (.05
is chosen merely by convention, not with reference to a well specified
loss function), many statisticians now prefer to compute the probability
that the null hyupothesis is true given the sample, and report that
number. SAS computes this number but SPSS does not (see the last column
of Figure 3), and it is difficult to compute by hand so this paper
emphasizes the older, traditional method of hypothesis testing.


﻿- 35 -
the true parameter is .13 or 1.13, we do care, because these numbers imply
very different answers to policy questions about affordability,
If we can make a precise statement about the probability that a true
parameter lies within an interval, we call that a confidence interval, or an
interval estimate. Confidence intervals can be constructed using the standard
errors or the coefficients, and the method of construction is derived from the
same basic theory of random variables that gave us the t and F tests 26/
The upper bound of the confidence interval is computed as:
(1) b + t * (standard error of b)
where b is the regression coefficient, and t is the relevant critical value
from the ta)-le. For our example, at .05 level of significance, F is 3.92 so t
is 1.98, and we compute:
.413728 + [1.98 * (.059072] = .530691
or about 0.53. The lower bound is computed using a minus sign instead of a
plus sign in equation (1), so the lower bound is .296765 Since we used a
significaace level of .05, there is only a 5 out of 100 chance that the true
parameter lies outside of the interval between .30 and .56. Another way of
looking at this is to say that the probability that the true parameter is
within their interval is 1 - .05 = .95, so this is often called a 95 percent
confidence interval for the coefficient.
25/   Statistical significance is not the same as qualitative importance.
Throughout the analysis the researcher must bear in mind that he or she
needs to understand what magnitude make a policy difference, as well as
whether a result is statistically significant. This is common sense,
and an example will make the idea clear. Suppose we had computed an
income elasticity of, say, 0.93 with a standard error of 0.03. Testing
the hypothesis that the true coefficient is actually 1, we get a t-
statistic of 2.33 (or an F of 5.44). For a large number of degrees of
freedom, this is significant at the .05 level. But in practical terms,
.93 is so close to 1 that the policy implications are practically the
same. Unfortunately there are never any hard and fast rules about what
is a qualitatively important difference -- as opposed to a statistically
significant difference -- but the analyst must never lose sight of the
difference between importance and significance.
26/   See Theil (1971) ch. 4.


﻿- 36 -
Confidence Intervals for Predicted Values of the Dependent Variable.
When we predict the dependent variable we use all regression
coefficients simultaneously, so the confidence interval for the prediction or
forecast must make use of the joint distribution of all coefficients. This
problem is slightly more complicated, and the problem will probably not arise
in producing the Kenyan housing report, so we will not discuss it here.
Interested readers can consult Theil, pp. 130-137, or other econometric texts.


﻿- 37 -
Section 1.3.5      The Role of Functional Form (Transformations) in
Regression Analysis
The most natural specification of any relation in the ordinary-
least-squares regression framework (OLS) is a linear relation: such as:
(1) RENT = a + b*INCOME + c*HHSIZE = d*DISTANCE + u
where RENT and INCOME are self explanatory, HHSIZE is the number of persons in
the household, and DISTANCE is distance to the central business district, our
proxy for intrametropolitan housing prices; a, b, c and d are regression
coefficients to be estimated, and u is the residual, or estimated error term
concrete suggestions for actual specifications will be given in Part II.
Several variants of the linear functional form are commonly employed
in regression analysis. Each will be discussed briefly in turn, and then we
will compare their advantages and disadvantages, and make recommendations for
the Kenyan study. Only functional forms which can be easily estimated with
OLS regression (as computed with a package like SPSS) will be considered.
OLS is, of course, a linear technique, but that is not as
restrictive as it first seems because it is "linear in the parameters," i.e.,
the restriction is that the relationship between each coefficient and its
variable must be linear; variables themselves may be non-linear.
Let us illustrate this with a simplified example about the
relationship between household size and housing consumption (rents). Assume
for simplicity that household size is the -onl determinant of rent so we can
graph the relationship in two dimensions in Figure 5. This data was
artifically manufactured to emphasize "curvature". Suppose that we ran a
linear regression on this made-up data. With test data made up for this
example we get the results displayed in Table 7-A. Notice that the fit is
poor (R-squared is only .016) and the t-statistic for variable HHSIZE is not


﻿FIGURE 5
PLOT OF MANUFACTURED DATA ILLUSTRATING HYPOTHETICAL
NONLINEAR RELATIONSHIP BETWEEN RENT AND HHSIZE
RENT
1750-
1500-
0
S*
1250                                                                        *
1000
000
e        *
*0
100
250-*
**
070
1        2        3        4        5        6        7        8        9       10
HHSIZE


﻿TABLE 7-A
REGRESSION EXAMPLE USING MANUFACTURED DATA
RENT AND HOUSEHOLD SIZE
SIMPLE LINEAR MODEL
DEP VARIABLE: RENT
SUM OF         MEAN
SOURCE    OF      SQUARES      SQUARE      F VALUE       PROB>F
MODEL      1       249946      249946        1.611       0.2074
ERROR     98     15207604       155180
C TOTAL   99     15457550
ROOT MSE      393.928    R-SQUARE       0.0162
DEP MEAN      808.574     ADJ R-SQ      0.0061
C.V.         48.71893
PARAMETER     STANDARD  T FOR HO:
VARIABLE DF      ESTIMATE       ERROR PARAMETER=0    PROB > jTI
INTERCEP   1      702.977   92.057895        7.636       0.0001
HHSIZE     1    17.687810    13.936972       1.269       0.2074
TABLE 7-B
REGRESSION EXAMPLE USING MANUFACTURED DATA
RENT AND HOUSEHOLD SIZE
QUADRATIC MODEL
DEP VARIABLE: RENT
SUM OF         MEAN
SOURCE    OF      SQUARES      SQUARE      F VALUE       PROB>F
MODEL      2      3510183      1755241      14.251       0.0001
ERROR     97     11947067       123166
C TOTAL   99     15457550
ROOT MSE      350.950    R-SQUARE       0.2271
DEP MEAN      808.574     ADJ R-SQ      0.2112
C.V.         43.40354
PARAMETER     STANDARD  T FOR HO:
VARIABLE OF      ESTIMATE       ERROR PARAMETER=O    PROB > ITI
INTERCEP   1    44.817509      151.952       0.295       0.7687
HHSIZE     1      322.909   60.607330        5.328       0.0001
HSIZESQ    1   -26.679109    5.185272       -5.145       0.0001


﻿- 40 -
significant at commonly used levels. In other words, because our relation was
misspecified the regression results show only weak evidence of a relationship
even though visual inspection shows a reasonably strong but "curved" (not
linear) relationship.
Power Terms.
However, one way to algebarically represent a "curved" relationship
is with a quadratic equation, that is, with an equation which includes the
square of the relevant independent variable. If we compute the new variable
HHSIZESQ = HHSIZE ** 2, and add it to the regression, with our test data we
get the results in Table 7-B. Comparing them to 7-A, we see: (1) the fit
improves dramatically (to .227), and (2) both variables, HHSIZE and HSIZESQ
have large t-statistics. Now our results with the correctly specified
functional form indicate that household size is a strong determinant of
housing consumption, albeit in a nonlinear fashion.
Logarithms.
In fact, we have briefly already discussed another common
transformation used in regression models, logarithms. When we looked at the
plots in Figures 1 and 2, and chose the logrithmic transformation, we were
implicitly making a functional form decision. Appendix B discusses the ideas
behind logarithms in some detail, and will also delve into the economic
interpretation of logarithms, so we will have little to say about them here
except to note that they are one of the most important classes of variable
transformations.
Dummy Variables.
Sometimes we are interested in estimating the effect of a variable
which has no natural numerical representation. For example, suppose that we
believed that housing consumption depended partly on the sex of the household


﻿- 41 -
head. How can we specify a regression that will permit us to test this
hypothesis? Dummy variables are a useful technique for estimating the effects
of a categorical variable. If an observation has the characteristic of
interest (is in a particular category) the variable takes on the value 1;
otherwise the variable takes on the value 0. For example, we could construct
a variable named FEMALE which took the value I if the household was female-
headed. Even if a variable can be represented by a continuous number in a
natural way, it is sometimes useful to code the data in dummy variables,
especially if (1) there are a limited number of categories, and (2) there is
reason to believe that the effect of the variable varies with the level of the
variable. For example, the effect of a one-person change in household size on
housing consumption might vary between small and large households; if
household size is entered directly in a linear regression, the regression
coefficient measures the average effect of a change in household size across
the entire sample of different household sizes; since only one number is
estimated, if the true effects vary with household size, the simple linear
regression will not pick up the differences. This will be discussed in more
detail in Part II, with an example; for now we want to concentrate on the
mechanics of how these variables are computed and interpreted.
Suppose we wanted to compute dummy variables for household size
categories. If we know which household sizes occur in the sample, we can just
compute a series of dummy variables such as HH1 (I if household size equals 1,
0 otherwise), HH2 (1 if household size equals 2, 0 otherwise), HH3, HH4, HH5,
and so on.
There are two useful modifications to this simple procedure. First,
and most importantly, we must omit a dummy variable for one of the categories
in the regression procedure. The dummy variable measures the effect of being


﻿- 42 -
in a particular class relative to not being in that class. There must be a
"base case" or a class to which the effects of the other classes are
compared. In other words, if we run a regression with dummy variables HH2,
HH3, HH4 and HH5, the coefficients of those variables are, respectively, the
estimates of the difference in housing consumption between 2-person households
and the base case (omitted category, here 1-person households), 3-person
households and that same base case, 4-person households and the base case, and
5-person households and the base case. If we had included the dummy HH1 in
the regression as well, and had tried to compute the results, the computer
program would have failed because there would be no base case against which to
compare the effects of other categories.
There is a second useful modification to this procedure. It is
often the case with categorical variables that we may have only a few
observations in the extreme values; for example, we will likely have lots of
sample observations with 1, 2, 3, 4, perhaps even 10 individuals in a
household; but at some point there will be a fall-off in the number of sample
observations we can expect in some categories. In fact, we might not even be
sure that some categories will have any observations. Since there probably
isn't any fundamental difference between the effects of, say, the 12th
additional household member and the 13th, or even the 14th, we can use the
following trick: past a certain cutoff point, make the variable continuous
instead of a dummy variable, so that we can avoid having a dummy variable for
every possible category (some of which may not exist in our sample). Let's
suppose we determine the cutoff to be households of size 8. Then we can
create a new variable, HHGE9 which takes on the value of household size if
household size is greater than or equal to 9, and is zero otherwise.


﻿- 43 -
Table 8 illustrates how this procedure works. The variables HH2,
HH3, HH4, and so on are constructed as dummy variables; and HHGE9 is a
continuous variable. Table 9 estimates a model using these dummy variables in
place of HHSIZE and HSIZESQ (from above). The interpretation of these
coefficients is as follows. We estimate that a one-person household pays 502
shillings in rent, because that is the (rounded) estimate of the intercept.
The intercept is the estimate for the base case, or omitted category. We
estimate that a two person household spends an average of 523 shillings (502
plus 21); a three person household 751 (502 plus 249), and so on. We estimate
a nine person household spends (502 plus 9*24 = 718) 718 shillings, and a ten
person household 742 shillings (502 plus 10 * 24).


﻿-44 -
Table 8: Dummy Variable Coding Scheme
Number of
Persons           HH2    HH3    HH4     HH5    HH6    HH7    HH8    HHGE9
1               0       0      0      0      0      0      0        0
2               1       0      0      0      0      0       0       0
3               0       1      0      0      0      0       0       0
4               0       0      1      0      0      0       0       0
5               0       0      0      1      0      0       0       0
6               0       0      0      0      1      0       0       0
7               0       0      0      0      0      1       0      0
8               0       0      0      0      0      0       1       0
9               0       0      0      0      0      0       0       9
10                0      0      0      0       0      0      0       10
11                0      0      0      0       0      0      0       11
12                0      0      0      0       0      0      0       12


﻿TABLE 9
REGRESSION EXAMPLE ILLUSTRATING USE OF DUMMY VARIABLES
RENT AND HOUSEHOLD SIZE
DEP VARIABLE: RENT
SUM OF         MEAN
SOURCE    DF      SQUARES      SQUARE      F VALUE       PROB>F
MODEL      8      3269505      408688        3.051       0.0044
ERROR     91     12188046       133935
C TOTAL   99     15457550
ROOT MSE      365.971    R-SQUARE       0.2115
DEP MEAN      808.574    ADJ R-SQ       0.1422
C.V.         45.26127
PARAMETER     STANDARD  T FOR HO:
VARIABLE OF      ESTIMATE       ERROR PARAMETER=0    PROB > ITI
INTERCEP   1      501.970      162.513       3.089       0.0027
HH2        i    20.568941      193.834       0.106       0.9157
HH3        1      248.549     207.731        1.196       0.2346
HH4        1      377.665     207.731        1.818       0.0723
HH5        1      523.256      196.434       2.664       0.0091
HH6        1      606.550      199.509       3.040       0.0031                                                                  I
HH7        1      439.741     207.731        2.117       0.0370
HH8        1      348.476      193.834       1.798       0,0755                                                                  1
HHGE9      1    24-011529    18.805857       1.277       0.2049


﻿- 46 -
PART II: ECONOMIC MODELS FOR HOUSING MARKET ANALYSIS
Section   II.1     Introduction
The preceding pages have emphasized statistical techniques,
especially regression analysis, without much reference to what model we want
to apply the techniques to. A temptingly simple approach is to simply pick a
dependent variable of interest -- say housing consumption, as proxied by rent
-- and just put a lot of variables in the equation and pick out the best
fit. There are several problems with this approach. One important problem is
given enough totally unrelated variables, and running enough regressions,
eventually we'll hit upon a "significant" result purely by chance. Remember
that if we pick a .05 confidence interval, we are permitting 1 chance in 20
that we will observe an apparent relationship merely by chance (a Type II
error). If data are totally unrelated and we run 20 variables in regressions
we can expect to get one spurious "significant" result that is a mistake. A
second problem has even more practical importance. Many economic variables
happen to be related through "intervening" variables which can lead to
erroneous interpretation of statistical results. A few simple examples will
make this clear.
Suppose that you have the following information collected from a
sample of households:
(1)   Housing Consumption, measured by house value
(2)   Household income
(3)   Household size
(4)   Head's occupation
(5)   Whether or not the household, participates in a government housing
program
(6)   For whom the head voted in the last election


﻿- 47 -
Suppose further that you want to measure the effects of program
participation on housing consumption. Making a model consists of choosing the
variables that cause the household to choose a certain level of housing
consumption, and specifying the functional form of the relationship, e.g.,
whether it is linear. For the rest of this section we assume the variables
are related in a linear fashion so that we can concentrate on variable
selection.
It cannot be overemphasized that regression analysis is a
computational technique which is not by itself sufficient evidence of
causality (x causes y); rather, it demonstrates correlation (x and y often
occur together). Thinking about which variables "cause" other variables is
the essence of model building, and that which distinguishes the correct use of
statistics in social sciences from merely fitting the data at hand.
A variable is a cause of housing consumption if and only if value
can be changed by manipulating only that variable, holding othtr variables
constant.27/ Clearly, people with higher incomes can be expected to consume
more housing. Larger households may need more space and, thus, consume
more. Participation in the program may affect housing consumption; this is
the hypothesis of particular interest. It is less clear that head's
occupation will affect housing consumption, except indirectly: occupation is
a determinant of income, and income is a determinant of housing consumption.
We might assume that occupation and housing consumption are only related
through the intervening variable income, and that once we control for income
27/   Strictly speaking "variables" aren't causes or outcomes but our measures
of real world pheonomena, and often imperfect measures at that.
Variables don't cause anything, they measure that which causes. For
ease of exposition, however, we often speak loosely of variables as if
they are the phenomena they measure.


﻿- 48 -
differences among occupations we will observe no residual difference in value
among occupations.
Now consider household size. Larger households presumably require
more housing. They also have more wage earners, on average, and hence more
income. Therefore household size affects value both directly and through the
intervening variable income.
It is important to emphasize that strong relationships can exist
which don't belong in our model. Suppose there has been a recent election
with two candidates, and the only issue in the election is property taxes.
Mr. George favors a large increase in property taxes to finance municipal
government, while Mr. Rohatyn argues for cutting taxes and floating a bond
issue instead. Suppose further that all those who own large houses therefore
vote for Mr. Rohatyn, and all those who own small houses vote for
Mr. George. Clearly, there is a strong relationship between house value and
voting behavior, but voting behavior does not cause investment in housing. If
we regressed value against voting behavior, the procedure would give us a
"significant" coefficient for that behavior, but to interpret this as evidence
that voting is a determinant of investment is a mistake.  Mere statistical
techniques cannot substitute for common sense.
This is where economic theory becomes useful. Most economic theory
-- most social science theory of any kind -- can be very loosely thought of as
an attempt to put common sense on a more rigorous footing. Economics is a way
of looking at the world that helps us to decide which variables should be
important in explaining phenomena like consumption and investment. It would
be beyond the scope of this paper to discuss theory in detail, but we want to
use the next few pages to summarize the current consensus of economists about
a few simple models which can be estimated with the Kenyan data and which will


﻿- 49 -
yield information that will help policymakers make better decisions about
housing programs and policies.
Section 11.2       Composite Demand Models
The first paper in this series discussed alternative measures of
housing consumption in general terms.28/ That discussion will not be repeated
here, but for easy reference Table 10 replicates Exhibit 2 of that paper,
which summarizes different measures, their advantages, and their
disadvantages. This section will focus on models constructed from expenditure
measures, that is rents and house values. Later sections will deal with
individual characteristics and hedonic price models.
There are six essential issues which must be tackled when
constructing models of housing demand. Three are essentially measurement
issues: how to measure housing consumption, incomes, and prices. Another
issue is how to integrate related behavioral outcomes like tenure choice and
mobility into the demand relation. Finally there is the question of
alternative functional forms and the choice of estimating technique. Each
will be discussed in turn.
Section 11.2.1    Measurement Issues
Measuring housing consumption. Ordinary demand analysis begins by
postulating a relationship between the quantity of a good demanded, its
relative price, the income of the household, and other things that may affect
demand such as household size. If for simplicity we forget about "other
things" for a minute, and postulate that the demand function is linear, this
model suggests that given household survey data we estimate a regression
equation of the form:
(1)Q = a + b (I) + c (P) + u
28/   Malpezzi, Bamberger and Mayo, pp. 5-9.


﻿- 50 -
Table 10:   Measures of Housing Consumption
Measure                      Comments                         Advantages                      Disadvantares
Expenditures           Product of price and quantity.     Easily measured.                 Have to assume constant
Appropriate if price               price, or avera.Ce to permit
doesn't vary                      differences to cancel ouc.
A. Rent              Flow measure of                   Good measure of cost per          Rarely measured for owners.
expenditures.                      period.
Easy to measure for renters.
B. Value             Stock measure of                   Closely related to cost of       Rarely measured for renters.
expenditures.                      supplying a similar             Sometimes inaccurate even
structure.                        for ouners.
Direct Quantity        Examples include number of         Some--like number of rooms       Some--like condition of
and Quality            rooms, type of sanitation,        --are easily measured.            structure or lot size--are
Measures               utilities present, lot size,      If you have good quantity         difficult to reasure.
and condition of structure.        measures and expenditures       Focusing on one or a few.
Many possible measures exist.      can compute prices as well.        may give misleading results,
Multivariate           Regression analysis is most        In theory, good way to           Difficult to use in
Statistical            commonly used.                    compute prices and                practice.
Measures              Often referred to as hedonic        quantities.
index.
I                                           I


﻿- 51 -
where Q is the quantity of housing sevices demanded, I is the household's
income P is the relative price of housing, and u is the residual from the
regression equation. The estimated parameters are a, b and c. Income and
price elasticities can be calculated from b and c, respectively.29/ The
problem, perhaps not apparent at first, is the following: What is Q? What is
I? What is P?
The first problem is the measurement of consumption, Q. What is a
unit of housing services? Consider two renters. The rents they pay are not
consumption but are expenditures: R = P * Q. If I pay more rent than you,
does it mean I consume more housing, or do I pay a higher price? If we know.
that both pay the same price, then rents are a good measure of
consumption..30  Often studies use data from the same city and use rent as a
measure of consumption. But does the price of housing vary within the city?
That depends on whether by housing services you mean those services produced
solely by the structure, or those services produced jointly by structure and
location. If the former, then the price varies within the market, and the
29/   The elasticity is a unitless measure of responsiveness
E  =AQ)' and Ep = AQ/Q, where E    is the income elasticity
AI/I       AP/P
and E  is the price elasticity. Since the regression coefficients of I
and P in a linear regression can be interpreted as NQ/AI or tQ/AP, then
for any level of Q and Y we can calculate: E   = b * (I/Q) or Ep = c *
(P/Q), where b and c are the regression coefficients of income and
price.. In particular, note than for a linear demand model, the
elasticity varies for each observation. It is common to present such
variable elasticities evaluated at the mean value of the variables, as
in Table 2. However, if we estimate logrithmic models, the elasticity
is the coefficient. See Appendix B for details.
30/   House values are related to rents in the following way: a house is
worth the (discounted) sum of the future rents it can be expected to
command. The sum of rents is discounted since a dollar of rent today is
worth more than a dollar of rent a year from now (you can earn interest
on today's dollar for a year). Also the expected rent the dwelling
commands in the future might be different than today's rent. That's why
house values can change faster or slower than rents. Most of what we
say about the flow 'rent' can be applied to the stock concept 'value' so
most of the analysis concentrates on rents.


﻿- 52 -
resulting estimate is the so-called income consumption path, or Engel curve.
That is, the regression is:
(2)      R = P *Q = a + b (I) + u
where b has a slightly different interpretation than in (1): it reveals
expenditure behavior, not the direct relationship between quantity demanded,
and incomes and prices.
If the latter -- consider housing defined broadly as structure and
location -- then the price of housing including access is constant throughout
the market, denoted P and the regression estimated is:
(3)      R =P* Q = a + b (I) + u
where the bar over P indicates it is fixed over the sample and the coefficient
b can be shown to reveal the relationship between quantity demanded and
income, as well as the income-expenditure relationship.
Gross versus net rent. Another issue which arises in consumption measurement
is the following. Consider nominal rent payments (question H-2). Some
renters pay for utilities separately, and some have utility charges included
in their monthly rent. Ideally, we would like to choose either gross rent
(rent for structure plus utilities payments) or net rent (rent minus the
imputed charges for utilities which are included in monthly rent). Gross rent
corresponds more closely to a total cost of shelter while net rent
facilitiates comparisons with owner regressions based on house value, which
are net of utility payments. Unfortunately, good information on utility
payments is hard to collect.31/ One way to get around this problem is to use
nominal rent (question H-2) and to include a dummy variable in the regression
for households which have utilities included in rent. Since these households
presumably pay more to cover the utility payments, the coefficient of this
31/   See Follain and Malpezzi (1981).


﻿- 53 -
dummy variable should be positive, and the other regression results will be
for a sort of net rent, since the dummy controls for the extra costs. The
other procedure is to compute gross rent as the sum of rent and utility
payments. This is the simple procedure, although comparisons between owners
and renters are distorted.
Recommendation to CBS and MWH for Housing Consumption Measures In,Regression
Models of Housing Demand.
1. For renters, use nominal rent (variable H-2) plus monthly
utility payments (H-4).
2. For owners, use house value for single family structure (A-
1=1). If an owner-occupied multifamily unit, use structure value but estimate
separately from single family units.
Measuring housing prices. This topic has been partly addressed in
the section on consumption; these two measurement issues are obviously
closely related. Common procedures followed in earlier studies have been:
(1) ignore prices, (2) assume that prices are constant throughout the sample,
which amounts to the same thing, (3) allow prices to vary within submarkets,
e.g., compute a place to place index for several locations in the total
sample, and (4) compute separate prices for each observation, using either a
hedonic price index methods or using some proxy such as distance to the
Central Business District.
Recommendation to CBS and MWH for Price Measures in Regression Models of
Housing Demand.
1. Prices vary from city to city. Therefore, estimate separate
equations for large cities with big samples (more than 150 degrees of
freedom). For smaller towns, pool observations but include dummy variables
for different towns.


﻿- 54 -
2. Within a city or town, prices also vary with location. The
simplest model postulates that price varies with distance to Central Business
District. Therefore, to control for intrametropolitan price differences,
include distance to town center (c-15) as an independent variable if there is
sufficient variation in the distance variable to yield significant estimates.
Measuring incomes. Since adjusting the consumption of housing
services is so costly and undertaken so infrequently, it is commonly
postulated that the demand for housing is related to some expectation of the
household economic situation over a time period longer than the immediate
market period. Commonly researchers try to distinguish between current and
permanent income, where permanent income is adjusted to reflect long run
expectations about future income.2/  In other words, consumption does not
change as much from year-to-year as total income. People save in good years
and spend their savings or borrow in bad years. Rent changes even less than
total consumption, because it is so costly to move.
Since consumption is related to long-run or permanent income, this
suggests current income is not the true determinant of housing consumption,
permanent income is.
In practice, there are three common ways in which researchers try to
proxy permanent income, which is never directly observable. The first,
advocated by Friedman in his seminal paper, is to use a weighted average of
past incomes as proxy for permanent income, where the weights reflect some
market disc ut rate. This approach requires panel data (a cross secton of
households surveyed repeatedly over time). Most such panels have data for
three or four years at most, so the average used could be improved if longer
32/   The classic work on the permanent income hypothesis is Friedman
(1957).   A related hypothesis which yields similar qualitative
conclusions for the demand for durable goods is the life-cycle earnings
hypothesis (see Ando and Modigliani (1963).


﻿- 55 -
time series were available. Most studies using this approach assume a very
high discount rate. Also, note that the empirical implementation of
Friedman's theory is somewhat ad hoc, because the theory postulates that
consumption depends on future expectations, which may differ from past
experience.
A second method is to use a first stage regression of current income
against age, education and other determinants of current income, and to use
the prediction from this equation as an instrumental variable proxying
permanent income. This method implicitly assumes that the relevant permanent
income measure varies over a person's lifetime.
The third empirical approach is straightforward and intuitively
appealing. Since households make decisions about consumption largely on the
basis of permanent income, and consumption is measurable, why not use
consumption as a proxy for permanent income? The assumption of this approach
is that changes in transitory income do not affect total consumption or
housing consumption.
Of these three approaches, the third is appealing. The first
approach requires time-series data which are unobtainable. The second can be
done with Kenyan data but is somewhat complicated. The third approach is the
simplest and can be easily implemented with the Kenyan data. Mayo and
Malpezzi (1983) show that this simple approach yields results similar to more
complicated techniques.
Recommendation to CBS and MWH on Measuring Income in Regression Models of
Housing Demand.
1. Since income is coded into categories (section G) but
consumption variables are continuous, and since consumption is an excellent
proxy for the theoretically preferred "permanent income", use total


﻿- 56 -
consumption as our measure of income. For regression analysis, use the
natural logarithm of consumption.
Demographic Variables
Most economic texts focus on the role of prices and incomes in
determining patterns of demand. The underlying assumption is that other
determinants of demand, such as tastes, family composition and size, are "held
fixed". Empirical work requires that we include these kinds of variables in
our regression models so that this assumption is tenable.
The most important single demographic variable affecting housing
consumption is household size. Other candidates for inclusion in the analysis
are: age of household head; number of children (measured separately from
number of adults); and the sex of the head of household. Sometimes it is
hypothesized that tastes vary by income class or by tenure; this will be
discussed below.
In addition to selecting demographic variables for inclusion in the
regressions, we have to choose functional forms. Functional form was
discussed in Part I, but we should reemphasize here that several of the
demographic variables can be assumed to have different effects depending on
their level. For example, as household size increases, demand will usually
increase; but it may be the case that for extremely large families, food
expenditures become so large that they tend to "crowd out" additional housing
expenditures, so that we observe housing expenditures first rising then
falling with household size. In a similar fashion, housing expenditure might
first increase with head's age as the head reaches peak earning power and
contemplates a growing family; it might shrink with advancing age as children
form their own families. The point to note here is that the functional form
of the regression equation should be able to capture these nonlinearities.


﻿- 57 -
Recommendation to CBS and MWH on Demographic Variables in Regression Models of
Housing Demand.
1. Household size. At a minimum, the estimated demand relations
should include a measure of household size. Since the effects of one
additional household member are probably different for large families than for
small families, a flexible functional form -- dummy variables or an additional
quadratic term -- may be useful. These will be discussed in more detail under
functional form.
2. Other candidates for inclusion include age of household head
(possibly as dummy categories), a dummy for female headed households, and
number of children. Some experimentation may be necessary, to determine which
variables make a difference in the Kenyan context.
Section 11.2.2 Integrating the Effects of Tenure Choice and Mobility on
Housing Demand.
The earlier companion paper (Malpezzi, Bamberber and Mayo)
emphasized the relationships between moving and tenure, and the consumption of
housing. Much of the recent literature on housing demand in the developed
countries focuses on the relationship between tenure choice and demand. For
some time it has been common to estimate separate demand equations by tenure
group, and more recent work has tried to incorporate the simultaneity of the
tenure choice and housing demand decisions. Briefly, most studies of
developed country data show higher income elasticities for owners than for
renters, presumably because owner occupied housing is an investment good as
well as a consumption good, but the use of simultaneous methods has so far
demonstrated little impact on the size of the elasticities.
Of course, tenure in developing countries is often characterized by
much more complicated arrangements than is the case in developed countries,


﻿- 58 -
with various forms of squatting and rent-with-deposit schemes (key money),
often prevalent. A general survey of different types of tenure arrangements
can be found in Doebele (1978). However, for the purposes of the Housing
Project it is recommended that the demand equations be estimated separately
for a simple tenure grouping: (1) private renters, (2) private single-family
unit owners, (3) private multifamily unit owners, and (4) subsidized or public
units. Some of these groupings will not have sufficient observations for
reliable estimation in any but the longest cities. For small towns two of
these groups might have to be collapsed with a dummy variable for the smaller
group.
Mobility is also important as an indicator of current housing
preferences. Households that have moved recently or who are about to move may
in their actual or projected choices more accurately indicate the sorts of
priorities that households place on different housing and neighborhood
features than is the case for the sitting population. Thus project designers
may wish to pay greater attention to the choices made by recent movers or of
prospective movers (which can be ascertained by surveys) than to the choices
made by those who have neither moved recently nor intend to move.
Some recent studies have estimated separate demand regressions for
recent movers and long-term residents (see Mayo and Malpezzi, Appendix A, for
a discussion). We do not recommend this approach for the Kenyan study because
there are too few degrees of freedom to segment by this variable as well as
tenure and city. Instead, include length of tenure, and its square, directly
in the demand equation. This will correct for any bias due to possibly
different demands for long-term residents.


﻿- 59 -
Summary of Recommendation to CBS and MWH on Tenure and Mobility in Housing
Demand Estimation.
1. Estimate separate equations for each tenure group. If there are
too few observations for a particular tenure group in a smaller town, include
them with another group and use a dummy variable for the smaller group.
2. Included the length of tenure, and its square, in the regression
equation.
Section 11.2.3 Tying It All Together: Examples of Demand Equations Using
Egyptian Data
Now that we have discussed several variables and specification
issues in isolation, we will examine actual estimates which will illustrate
how these ideas work out in practice. However, bear in mind that this is an
example from a very a typical city in a different country. When similar
models are applied to data from Kenya, we will expect to find some important
differences in the results. The discussion is meant only to give the flavor
of looking over some results, and not to indicate "correct" results.
Tables 11 and 12 present estimates from demand equations whose
specification resembles the model we have discussed above. Notice the
following general points:
1. The dependent variables are different for renters (log gross
rent) and for owners (log house value). Rent is a flow concept (i.e. an
amount per month) while value is a stock concept (i.e. an amount paid once and
for all).
2. The models fit the data quite well. R-squared statistics are
around 0.4.


﻿TABLE II
CAIRO RENTER DEMAND EQUATIONS USING LOG OF GROSS RENT
DEP VARIABLE: LMGRENT
SUM OF         MEAN
SOURCE    OF      SQUARES       SQUARE      F VALUE       PROB>F
MODEL     11    58.583454     5.325769       14.660       0.0001
ERROR    241    87.552540     0.363289
C TOTAL 252       146.136
ROOT MSE     0.602734     R-SQUARE       0.4009
DEP MEAN     2.228022     ADJ R-SQ       0.3735
C.V.         27.05243
PARAMETER     STANDARD   T FOR HO:                   VARIABLE
VARIABLE OF      ESTIMATE        ERROR PARAMETER=0    PROB > ITI      LABEL
INTERCEP   1    -0.987718     0.620380       -1.592      0.1127   INTERCEPT
LCONSUME   1     0.481276     0.053391        9.014       0.0001 LOG CONSUMPTION
HHSIZE     1     0.011340     0.078861        0,144       0.8858 HOUSEHOLD SIZE
HSIZESQ    I -0.000634716 0.006612508        -0.096       0.9236
AGERESP    1    -0.033588     0.017504       -1.919       0.0562 AGE OF HEAD
AGESQ      1 0.0004114552 0.0001908698        2.156       0.0321 AGE SQUARED
FEMALE     1     0.086849     0.096230        0.903       0.3677 FEMALE HEADED HOUSEHOLD
LINGER     1    -0.054956     0.012128       -4.531       0.0001 LENGTH OF TENURE
LNGRSQ     1 0.0007157815 0.0002916292        2.454       0.0148 LENTGTH OF TENURE SQUARED
PUBHSG     1    -0.684564     0.198754       -3.444       0.0007 PUBLIC HOUSING DUMMY
GOVHSG     1     1.273605     0.377436        3.374       0.0009 GOVERNMENT SUBSIDY DUMMY
DIST       1 -0.00370299 0.008788988         -0.421      .0.6739 DISTANCE TO CITY CENTER


﻿TABLE 12
CAIRO OWNER DEMAND EQUATIONS USING LOG OF HOUSE VAP-UE
DEP VARIABLE: LVALUE
SUM OF         MEAN
SOURCE    OF      SQUARES       SQUARE      F VALUE      PROB>F
MODEL      9    27.531840     3.059093        3.450       0.0030
ERROR     41    36.352156     0.886638
C TOTAL   50    63.883996
ROOT MSE     0.941615     R-SQUARE       0.4310
DEP MEAN     8.963114     ADJ.R-SQ       0.3061
C.V.         10.50544
PARAMETER     STANDARD   T FOR HO:                   VARIABLE
VARIABLE DF      ESTIMATE        ERROR PARAMETER=0    PROB > IT!      LABEL
INTERCEP   1    -0.380739     2.122989       -0.179      0.8586  INTERCEPT
LCONSUME   1     0.894612     0.189601        4.718      0.0001 LOG CONSUMPTION
HHSIZE     1     0.223190     0.210164        1.062      0.2945 HOUSEHOLD SIZE
HSIZESQ    1    -0.027146     0.016278       -1.668      0.1030
AGERESP    1 0.002294721      0.058554        0.039      0.9689 AGE OF HEAD
AGESQ      1 0.0002119295 0.0006270886        0.338      0.7371 AGE SQUARED
FEMALE     1     0.212448     0.427333        0.497      0.6217 FEMALE HEADED HOUSEHOLD
LINGER     1     0.020451     0.033929        0.603      0.5500  LENGTH OF TENURE
LNGRSQ     1 -0.000241492 0.0007379156       -0.327      0.7451 LENTGTH OF TENURE SQUARED
DIST       1 0.004872892      0.031499        0.155       0.8778 DISTANCE TO CITY CENTER                                         H


﻿- 62 -
3. Housing consumption rises with income. For owners, it rises
almost proportionately with income (elasticity is close to 1, 0.89) while for
renters it rises more slowly (elasticity is only .48).
4. Housing consumption rises with household size (the coefficient
of HHSIZE is positive) but at a declining rate (the coefficient of HHSIZESQ is
negative).
5. Age of the head affects rents (t-statistics are about 2 for both
age variables) but not owner consumption.
6. Sex of household head has little measured effect on housing
consumption (t-statistics are very small for both owners and renters).
7. Length of tenure has a strong effect on rents but little effect
on values.
8. For renters, those who live in public housing pay less (the
coefficient of PUBHSG is less than zero) but those who receive government
subsidies spend more on housing (GOVHSG is greater than zero).
9. The effect of distance to the city enter, our proxy for housing
price, is negliglible.
Let us examine each of these points in a little more detail. The relationship
between rents and values was discussed earlier in the paper. Values are
simply the present value of the (discounted) expected future rents the
building and the land command. As we all know, the value of a unit can be 50
or 100 or more times the monthly rent it could command. One of the advantages
of the logrithmic model is that it facilitates comparisons between the owner
and renter results because the coefficients can be interpreted as percentage
changes, and the differences in the measurement units (stock versus flow) is
captured in the intercept term of the regression equation.


﻿- 63 -
Typical R-squared statistics for cross-section data in housing
market analysis range from .1 to .6. The fits reported here are quite good by
this informal rule of thumb.
As discussed above in Appendix B, the coefficient from a logrithmic
independent variable in a regression with a logrithmic dependent variable can
be directly interpreted as an elasticity, or unitless measure of
responsiveness. Therefore, we conclude that in Cairo owners spend more of
each additional unit of income on housing than do renters. This is not
surprising, since for owners housing is an investment good as well as current
consumption.
Most people expect housing consumption to increase with household
size, but for very large households housing consumption may begin to increase
more slowly or even decrease as more income is allocated to food, If this
hypothesis is correct we expect a positive coefficient for HHSIZE and a
negative coefficient for HHSIZESQ.
In general, the demographic variables (characteristics of the
household other than income) are less important for owners than for renters.
This is not surprising since higher adjustment costs presumably lead owners to
make longer term housing decisions which are less strongly related to current
demographic characteristics.
The reasons the length of tenure variable has such a strong effect
on renters in Cairo is that there is a strong rent control law in that
market. Rent control, if enforced, leads to severe distortions and
inefficiencies in the housing market. See Thibodeau (1982) for a discussion.
Since there were not enough public housing units or subsidized
households to estimate a separate regression for these samples as recommended
above, the dummy variable PUBHSG and GOVHSG were included in the pooled rental
sample. The results indicate that public housing residents spend much less


﻿- 64 -
than otherwise identical households. Those renters who receive subsidies
apparently do spend it on housing since they spend more than twice as much on
housing as otherwise identical households.
Distance to the city center was included as a proxy for housing
prices since close-in units presumably cost more per unit of housing services
than those farther out. But the effect of lower prices can be offset by
increased consumption of the quantity of housing services; since rent is price
times quantity, these two effects can cancel out. Also, when samples are
restricted to a few geographical areas to hold down costs, we may be left with
little variation in locational variables; this will also decrease the
reliability of the distance coefficient.
Regression models similar to those just presented can be estimated
using data from the larger cities and towns. For smaller towns, there may not
be enough degrees of freedom left to reliable estimate such models. It may be
necessary to estimate a simpler model in the smaller towns in order to
conserve degrees of freedom.
One simple model which has been used in the past studies and found
to perform well is to regress the same dependent variables against the log of
income or consumption, household size, and household size squared (See Mayo
and Malpezzi, 1983). Table 13 presents estimates from this simple model,
using the same sample as Tables 11 and 12. Notice the key result: the
coefficient of log consumption, our proxy for permanent income, is very
stable. The household size variables do change, because the variables omitted
from this simple model are more highly correlated with household size than
with income or consumption. In other words, with the simple model.we can
retain some confidence in the key income coefficients. Since the income or


﻿TABLE 13-A
SIMPLE DEMAND EQUATION FOR RENTERS IN CAIRO
DEP VARIABLE: LMGRENT
SUM OF         MEAN
SOURCE    DF      SQUARES       SQUARE      F VALUE      PROB>F
MODEL      3    40.783117    13.594372       24.236       0.000i
ERROR    343      192.397     0.560924
C TOTAL 346       233.180
ROOT MSE     0.748949     R-SQUARE       0.1749
DEP MEAN     2.338085     ADJ R-SQ       0.1677
C.V.         32.03256
PARAMETER     STANDARD   T FOR HO:                   VARIABLE
VARIABLE DF      ESTIMATE        ERROR PARAMETER=0    PROB > ITI      LABEL
INTERCEP   1    -1.396345     0.521220       -2.679      0.0077  INTERCEPT
LCONSUME   1     0.450601     0.054375        8.287      0.0001 LOG CONSUMPTION
HHSIZE     1    -0.144916     0.073210       -1.979       0.0486 HOUSEHOLD SIZE
HSIZESQ    1     0.012216 0.006185292         1.975       0.0491
Ln
TABLE 13-B
SIMPLE DEMAND EQUATION FOR OWNERS IN CAIRO
DEP VARIABLE: LVALUE
SUM OF         MEAN
SOURCE    DF      SQUARES       SQUARE      F VALUE       PROB>F
MODEL      3    24.459881     8.153294        8.163       0.0002
ERROR     50    49.940011     0.998800
C TOTAL   53    74.399893
ROOT MSE     0.999400     R-SQUARE       0.3288
DEP MEAN     8.905308     ADJ R-SQ       0.2885
C.V.         11.22252
PARAMETER     STANDARD   T FOR HO:                   VARIABLE
VARIABLE DF      ESTIMATE        ERROR PARAMETER=0    PROB > ITI      LABEL
INTERCEP   1     1.557142     1.598742        0.974      0.3348  INTERCEPT
LCONSUME   1     0.837154     0.175473        4.771      0.0001 LOG CONSUMPTION
HHSIZE     1     0.040022     0.197403        0.203       0.8402 HOUSEHOLD SIZE
HSIZESQ    1    -0.014159     0.015588       -0.908       0.3681


﻿- 66 -
consumption coefficient is the key result for affordability calculations, this
simple model can still provide valuable information for those towns which have
small sample sizes.
Another alternative is to pool several small towns, and include
dummy variables for each town but one.
Section 11.3 Introduction to Hedonic Price Indexes
This section summarizes some recent advances in housing market
analysis. In particular, we will focus on the estimation of so-called hedonic
prices for housing and how they can be used to construct indexes and to
estimate demand and supply relationships. First, we will present an
introductory and intuitive explanation of hedonic price estimation. Then we
will present an empirical example.
Let us emphasize at the outset that we do not recommend that hedonic
analysis be included in the first basic reports produced by CBS and MWH. This
additional analysis will be time consuming and should be the focus of future
reports, perhaps with technical assistance from the Bank and perhaps with
collaboration of academic researchers. This brief introduction is included
because these models will be useful for future price index work.
11.3.1    Theoretical Basis
To a large extent, housing market analysis consists of comparing
different dwellings. For example, measuring inflation requires comparing the
price of housing today to that of some base period, but often in the interim
the housing stock has changed, through new construction, rehabilitation,
conversion, and demolition, so that we are actually comparing two different
groups of dwellings. Other examples abound, such as comparing the price of
housing in different locations, measuring the effects of racial or caste


﻿- 67 -
discrimination in housing, and studying the effects of government subsidies
and tax policies on how we are sheltered. All require that we compare
different dwellings. Such comparisons are made daily not only by researchers,
but also by those interested in more effective government programs and by
bankers, developers, and landlords. In fact, each of us make such comparisons
every time we move or consider moving.
The problem faced by anyone trying to analyze a housing market is
the well-known difficulty of making these comparisons. How are the rents for
two different dwellings in two different locations related?331 Housing is not
a homogeneous good like wheat or oil, but can be thought of as a bundle of
diverse characteristics such as a number of rooms of certain types, in a
particular location, of a certain age, and so on. These specific
characteristics are more amenable to comparison, so one may compare dwellings
by comparing characteristics. Most people agree that comparing the value of,
say, two houses with the same number of rooms in nearby locations is easier
than comparing two dwellings with an unknown number of rooms in uncertain
locations, even though in practice the distinction between a "good"--housing--
and its "characteristics" or "attributes"--like the number of rooms--is very
much ad hoc.
The method of hedonic equations is one way expenditures on housing
can be decomposed into measurable prices and quantities so that rents for
different dwellings or for identical dwellings in different places can be
predicted. A hedonic equation is a regression of expenditures (rents or
values) on housing characteristics and will be explained in detail below.
33/   Of course for owners we usually measure expenditures by the stock
expenditure value -- rather than by the flow expenditure -- rent. For
now, assume everyone rents. We will return to this distinction later.


﻿- 68 -
Briefly, the independent variables represent the individual characteristics of
the dwelling, and the regression coefficients are estimates of the implicit
prices of these characteristics. The results provide us with estimated prices
for housing characteristics, and we can then compare two dwellings by using
these prices as weights. For example, the estimated price for a variable
measuring number of rooms indicates the change in value or rent associated
with the addition or deletion of one room. It tells us in a dollar and cents
way how much "more house" is provided by a dwelling with an extra room.
Ordinarily we would prefer to estimate such a regression separately
in each market, where prices and quantities ideally clear. The definition of
markets will be addressed in some detail later.
Once we have estimated the implicit prices of measurable housing
characteristics in each market, we can select a standard set of
characteristics, or bundle, and price a dwelling meeting these specifications
in each market. In this manner we can construct price indexes for housing of
constant quality across markets. In a similar fashion we can use the results
from a particular market's regression to estimate how prices of identical
dwellings vary with location within a single market (e.g., with distance from
the city center) or even to decompose the differences in rent or house values
into price and quantity differences. Some simplified examples will make these
procedures clear.
The hedonic regression assumes that we know the determinants of a
unit's rent:
R = f (S, L, C), where
R = contract rent
S = structural characteristics;


﻿- 69 -
L = neighborhood characteristics, including location
within the market; and
C = contract conditions or characteristics, such as
utilities included in rent.
11.3.2    An Example
Suppose we estimate this relationship assuming a log-linear
functional form:
ln R = a + bS + cL + dC
where a, b, c and d are regression coefficients. Of course, in
practice there can be many variables included on the right hand side. Table
14 presents a sample hedonic regression using the Cairo data.
Since we only want to introduce the concept of the hedonic index
Table 14 will not be discussed in detail. The coefficients of the independent
variables are interpreted as the percentage change in rent from an additional
unit of the characteristic. For example, Table 14 indicates that each
additional room adds 21 percent to the rent commanded by a dwelling; a bath is
worth 14 percent; and so on. Detailed interpretation of hedonic indexes can
be found in Malpezzi, Ozanne and Thibodeau (1981).
The determinants of rents and values are of interest in their own
right to project designers and others. In addition, the results can be used
to compute place-to-place price indexes for a constant quality dwelling. Once
the coefficients have been estimated with a separate regression for each
market (city or town) we can predict the rent for the sam unit in each market
by:
1. Pick a set of independent variables which describes the unit to
be priced. This is called the bundle.


﻿Tablel4
CAIRO RENTER HEDONIC EQUATION
MODEL: TWO
DEP VARIABLE: LGRENTI LOG GROSS RENT
SUM OF         MEAN
SOURCE    OF      SQUARES       SQUARE      F VALUE      PROB>F
MODEL     22    87.353286     3.970604       15.606      0.0001
ERROR    265    67.425505     0.254436
C TOTAL 287       154.779
ROOT MSE     0.504416     R-SQUARE      0.5644
DEP MEAN     2.264077     ADJ R-SQ       0.5282
C.V.         22.27912
PARAMETER     STANDARD   T FOR HO:                   VARIABLE
VARIABLE OF      ESTIMATE        ERROR PARAMETER=0   PROB > ITI       LABEL
INTERCEP   1     1.307565     0.439738  .    2.974       0.0032  INTERCEPT
ROOMS      1     0.209695     0.028091        7.465      0.0001 NUMBER OF ROOMS FOR HH
BATH       1     0.137401     0.091340        1.504      0.1337 JUST WHAT YOU THINK
LE2STORY   1    -0.023454     0.085953       -0.273      0.7852 STRUCTURE LE 2 STORIES
AGE76      1     0.688762     0.148079        4.651      0.0001 AGE DUMMY BUILT POST 76
AGE7176    1     0.459552     0.114360        4.018      0.0001 BUILT 71 TO 76
AGE6070    1     0.262954     0.083423        3.152      0.0018 BUILT 60 TO 70
AGLAND     1     0.079783     0.223413        0.357      0.7213 AGRICULTURAL LAND
PAVED      1    -0.046628     0.101338       -0.460      0.6458 PAVED ROAD
UPRMDL     1     0.139769     0.089565        1.561      0.1198 UPPER OR MIDDLE CLASS DISTRICT                                  C
LINGER     1    -0.013274 0.003841981        -3.455      0.0006  LENGTH OF TENURE
FURN       1     0.618900     0.256550        2.412      0.0165 RENT INCLUDES FURNITURE
WPRIV      1     0.127770     0.093400        1.368      0.1725 PRIVATE WATER CONNECTION
SPUB2      1    -0.095058     0.134986       -0.704      0.4819 SEWER CONNECTION
ELEC       1     0.277901     0.264421        1.051      0.2942 ELECTRICITY DUMMY
SLITE      1    -0.087595     0.065469       -1.338      0.1821 STREET LIGHTS
SIDE       1     0.101721     0.104296       0.975       0.3303 SIDEWALKS
POOR       1    -0.027840     0.080172       -0.347      0.7287 BLDG BAD OR COLLAPSE
LSCAPE     1     0.297088     0.097969        3.032      0.0027 LANDSCAPE HI OR MED QUALITY
DFAC       1    -0.405620     0.307701       -1.318      0.1886 DUMMY COMMUNITY FACILITIES
NFAC       1     0.149828     0.060533        2.475      0.0139 NUMBER COMMUNITY FACILITIES
DESLAND    1    -0.037973     0.239745       -0.158      0.8743 DESERT LAND
DIST       1    -0.018855 0.008929309        -2.112      0.0357 DISTANCE TO CBD


﻿- 71 -
2. For each regression in turn, multiply each coefficient by the
value chosen for this particular bundle, and sum them.
3. Since the original regressions were log-linear (at least in our
example), exponentiate this sum to get back to shillings.
Detailed discussion of these procedures can be found in Follain and Ozanne
(1981).


﻿- 72 -
PART III: COMPUTATIONAL TECHNIQUES
Section III.1: Preparing the Data For Analysis
This section will be brief, because many of the outstanding issues
are covered in Sae-Hau (1982), which is sent separately. Here we will
reiterate some of the points covered in the aide-memoire (Malpezzi to Ondorra,
July 1983). So far, this paper has emphasized techniques and models for
analysis. But as CBS and NWH staff would be the first to emphasize, good
analysis requires good data. Careful attention must be paid to the
preparation of the data for computer analysis. Three issues can be briefly
mentioned in this regard.
First, this is a complex survey with two levels of observation: the
household and the structure. It will often be necessary to cross-classify
responses by both levels of observations. For example, we might want to know
how many low-income people (household level information) live in units with
piped water (structure level information). For this kind of tabulation it is
necessary that household and structure variables be linked, and that the
counts be done using the correct weights (in this case, household level
weights). In other words, when the data are prepared the file structure must
link households and structures.
Second, careful attention must be paid to how missing values are to
be treated. For example, suppose a respondent owns his dwelling, so he does
not answer the question "Amount of rent paid last month." If the current
practice of filling all non-responses with zeroes is followed, it is easy to
make computing errors such as including these zeroes in the computational of
averages. It is also more difficult to visually inspect data when the zero-
fill procedure is used. Finally, there are some questions for which zero is a
legitimate response, and using the zero fill procedure means we can no longer


﻿- 73 -
tell a non-response from a legitimate response of zero. For these reasons it
is recommended that non-responses be coded as blanks.
Third, it is often necessary to recode responses for analysis.
These recodes should always be performed on copies of the original variables,
which are added to the file, and the original data should be left in unrecoded
form, so that if we want to change the recoding we can always go back to the
original data.
Section 111.2: Computational Notes
The outstanding computational issue is the choice of computer
software. Choosing the correct software makes life easy for both the analyst
and the programmer. Given the enormous demands on computer personnel, the
correct choice of software shifts much of the burden to the machine. We
recommend that as a long-range stzategy CBS investigates acquiring more
sophisticated,software such as SAS (Statistical Analysis System), perhaps-in
conjunction with a specialized table-producing set of software such as TPL 34/
In the short run, we recommend a greater reliance on SPSS, which is
already installed on your system. SPSS will be particularly useful for
regression analysis, and the manual (Nie et al., 1975) is a good general
reference on statistics. The paper by Sae-Hau also discusses the use of SPSS
in some detail.
SPSS has several disadvantages. It does not have much flexibility
for managing data sets. It lacks several basic econometric procedures which
would be of interest for future work (but these procedures are not necessary
for the current reports). It is difficult to add or modify existing
procedures. Finally, it does not have many capabilities for computing order
statistics.
34/   These are examples of available software. SAS is highly recommended,
but we have no firm recommendation for a specific table-producing
language at this time.


﻿- 74 -
In practice, then, FORTRAN or COBOL are often used to merge data
sets into a form which SPSS can handle. The lack of programming flexibility
is not a serious problem for the currently planned analysis, except that it
would be useful to have the use of order statistics. The FREQUENCIES
procedure does compute medians, however. If CBS decides to use other order
statistics, Appendix C lists some FORTRAN subroutine code which sort data and
comnpute some order statistics. This code would probably require modification
for the specific problem 1-t hand, but it may be useful as a guide.
For the currently planned reports, the lack of software probably
dictates that we rely on cross-tabulations and regression analysis for most of
the analysis, along with medians computed with FREQUENCIES in SPSS. For
future work, however, CBS will want to investigate software for the
computation of order statistics.


﻿- 75 -
APPENDIX A: KENYAN HOUSING QUESTIONNAIRE


﻿- 76   -
Page 1
CENTRAL BUREAU OF STATISTICS
MINISTRY      OF   ECONOMIC        PLANNING AND DEVELOPMENT
!71CHECK                                URBAN    HOUSING      SURVEY         1983                                INTERVIEWED    EDITFO
o  ta  UE No[HOUSEHOLD QUESTIONNAIRE                                                                   DATE   RI
.1 1 ... -ENUMEfRATORICLER1,
1                      11         ISUPERVISOR
STRUCTURE
A SOCIOL ECONOMIC CHARACTERISTICS OF TENANT
1 Name of household twad                                                                                          TW   NM
2 Occupation of houschola head    -                      k -            B. DESCRIPTICN  OF RESIDENTIAL UNIT
13      17
4           DUC4IN LEVE                           PERIOD  f1                 2.                                             5
SEX      O lDC4IFII LEPEl                                        TYPE CF UNIT  OWNERSHIP OF  TYPE OF SCHE  NUMBER OF ROOMS USED  ROOMS FOR SLENLET  UNIT
HAD                                 I" T HIS         _     __T                                     F 0 -9                            STATUS
I .UNIT By                                                                                                          I Owner-
I  Male  I No Sch.oiling                        i                 House .     rivate  1.  Worteuge                                            Occupied
2 Female  2  Primary                                            2.  Maisonetti  2. Local  Authority 2.  Private                                  Realed
3  Secondnry                                           A  Flat     3.  Central Govt. :I.Tenant Pu.chase                               3  other
&  University                                          I  Swuh;li  4  Pastotal  4.  Rental
[5.                                 Shanty   5.  C om pany  S. Si tasan dServic
Below    Ove,   Tol                     6.  Other   6. Unautho-ised  6. Unautherised
IS Years Ls15 Yeas
4.                                                                                  E Z FZ  received  per
Molp  Frmalft Mle  Femao  Maio  Female Years MonthE                                       z           Z     a
8         ' 1920     2324      7               35                 36                            39   0 13                                     5
C  1OST COMAON4 CONSTRUCTION MATERIALS                                                                            S FACILITIES
B 3.                             1.                        2. D C   T           4. 6 7EDNESTIC 
F1.1103   TUTEY WALLS    RUIOF                 WATEP SUPPLY           TOILET     BATHING  KITCHEN    LIGHTING TELEPHONE  ROOS  GARBAGE  DISPOSUN
___.-- --                    - .-                                 _ _U_ __FO        _ -. _  !  ----1-F
-I          ____AVILABIITYNOT WATER                                fA                                                Fl IJEFC*ILTI ECLESRVANSINE
I    Earth    01  8, 1 ks  I:  Ttalih              FOR      O R                                                                  WHEET(ER701
1nie 1  nvl  Hue      1   Frvt     1   oragOccuean
2, Wand    02 BIndCs    2   Tn              PED WATIEPIPED WATER                                             YeSTE          DPS     CLTIN STREETS1
-.  Terrnzo  103, 5lanes   3,  Tiues            1' OItsidit  1*                                CovtmmuYs iP-aeFls   noo :Pi nl  V Elecritt  2: Yes  V  YAS  Dtba unal  2 Sam
2C                M2osmty              a Private           i                                   en d
L Conrate.-Wao 0010 ,Con c rct a  4' Concrete   Withi  2* Comma nai     2: Private  Pit  Outdoor  2' oml2 Priae  ampfin  D:N0No 2Cmuinol2      Sm
..a          Fla "I      Centrae/  G'ommu.Tnan3: Prchtitase              3uth n iGaber
5: Can-crele-- Tiles  O5  Wood  5 Carrugalie Iron       Communal  2:  No  3 CommunS l   Flush  Indoor C mmint l
Concre-. c  06. Tin   6 Asbtstos Sheets    3 Out    4 None          4:Communal Pit  Otldcor  4  Other 3  Other             Duip             Dirt y
7 Concrete-Cement GTCorruguled  Iron  7.  Other  1OOm)I                 5: Other     5:Other  5: None                          4. Dlher
. OthOr CtrMu6LU-Cn  ont                        4s  6o                  6U Naa                 r 6s None
C!)MufJ-.Wood
10 Cnrd Board                                                                                                              Times
l61  6Oth65               6Per Monthpe
I                    I  i  s I
36                                                37           38  39 60 61     626   45 67A               3    ,5
2   i     3.                    1, 0                     2,3H.6.                               6 1DMSI


﻿- 77 -
CENTRAL BUREAU OF STATISTICS
MINISTRY OF ECONOMIC                  PLANNING AND DEVELOPMENT
URBAN       HOUSING        SURVEY      1983                                        INTERVIEWED       EDITED
HOUSEHOLD       QUESTIONNAIRE                                   oAE
RUCTUREHH CC            4                                                                                                    ENUMERATOR/
TGWN             No  DLILGI
TIWN        TRUCTGE No -I                                                                                                                SUPERVISOR
E. OPINIONS        ABOUT      NEIGHBOURHOOD                                     F.   TRAVEL.          TIME        AND       COST
1.                   2.                  3.                   /.,
OPINION    ABOUT    YCUR    NEIGHBOURHOOD                                      ONE WAY   DISTANCE    NORMAL    MODE      TOTAL    TRAVEL       TOTAL NORMAL DAILY COST
ON   THE   FOLLOWING                                                       TO   WORK             OF  TRAVEL             TIME
1! Worst                                  Specify                                       In KM           1:Foot                   In Minules                In K.Shitlings
2: Fair                                                                                                 2: Bicycle
3: Inditerent                                                                                          3:Private  car
i.: Good                                                                                                d: Matatu
5: Bus
5: Best                                                                                                 6:0Motorcycle
SECURITY    H          tUMAN    RECREATIVE  OTHER                                                       7: Other                                  HEAD       SPOUSE   CHILDREN  TOTAL
ENVIRONMENT ENVIRONME  FACILITIES                                                                                                                          TO SCHOOL HOUSEHOLD
CODE RATE                                 H AD      SPOUSE     HEAD      SPOUSE    HEAD       SPOUSE    TO WORK   TO WORK
I    1    1f           I    a     -          I   I      I I -         I        i.
13        1 15               16    17  18                                   19   20    21  22         23         24  25    27 28      30 31    333k       36137      39 0
G.     INCOME        AND      EXPENDITURE
GROSS   MONTHLY    INCOME   FROM    NUMBR                                          TOTAL   HOUSEHOLD       EXPENDITURE       LAST     "ONTH
ALL    SOURCES    IN K.SH  OF INCOME                                                                             IN K.SHILLINGS
0: Under   500                      CONTRIBUTORS                                        Food                                        I   I    I   I
1:  501 -  1,000                                                                                                                 45              19
2:  1001- Z000                                                                           Rent                                       a   I     a   1
3:   2,001 -  4000                                                                                                               50               54
1.:  4,001- 4000                                                                       Household      Requirements                 I    i    I   I
5:   F001 -  tU)O                                                                                                                 55              59
6:   4001 - 1 OOO                                                                        Transport                                    I      I    1
7:   10.001 - 21000                                                                                                              60                05
8:   Ovcr 20000                                                                         Waterl Light
9:   Unknown                                                                                                                     65               69
Other
1.3        LA70                                                                                                   71.
TOTAL                            I
75              79


﻿78   -
CONFIDENTIAL
CENTRAL BUREAU OF STATISTICS                                                                 Purje 3
MINISTRY OF ECONOMIC PLANNING AND DEVELOPMENT
URBAN HOUSING SURUEY 1983                                                    INTERVIEWED    EDITED
HOUSEHOLD QUESTIONNAIRE
TOWN            STRUCTURE     H/H   CHECI      FORM                                                                                  DATE
 -  j                                                                                                                                                    _
NUMBER  DIGI       TYPE
ENUMERATOR/
3                                                                                         CLERM
11        12                                                                                   SUPERVISCR
H. FOR RENTERS ONLY                                                                                                                                  RESERVED FOR
COMPUTER USE
l.  Year moved to the residential unit                                  Year
2. Amount of rent paid last month                                       HSh/Month
3. Does rent include the following charges:                                                                                                       15           19
WATER     Yes           No            ELECTRICITY   Yes                        TELEPHONE   Y            No                                 -
HIRE FOR FURNITURE                 No             DOMESTIC STAFF/ASKARI/GARDENER     Yes           No   [                                  -- 22
No                                                                                                    F     23
DS '-- 24
4. Imputed value for these charges                K_     _     MSh/Month
5. Net Rent paid (after deducting these hargee) __F.Sh/Month                                                                                      25       28
29         33
6. If rent is subsidised, is it?
Government subsidy [;     Subsidised by employer        For services rendered        Other        Not applicabie  3
LE.J2                                                3                                 5                          3
7. WILLINGNESS TO PAY FOR WATER
a) For units with pipEd - water inside
How much rent are you willing to pay for a similar unit but without piped - water inside?
______________MSh/Month                                         _______
b) For units with piped - water within 25 metres                                                                                              35
How much rent are you willing to pay for a similar unit but with piped - water inside the unit?
KSh/Month
c) For units with piped - water more than 25 metres irom the unit
1. How much rent are you willing to pay for a similar unit but with piped - water inside the unit?
I KSh/Month                                      16
ii. How much rent are you willing to pay for a similar unit but with a stand-pipe close to
the unit and shared by less than 10 families?
KSh/Month


﻿A
- 79 -
Page 4
CENTRAL BUREAU OF STATISTICS
MINISTRY OF ECONOMIC PLANNING AND DEVELOPMENT
URBAN       HOUSING      SURVEY       1983                                             INTERVIEWED      EDITED
STRUCTURF  H                                                   HOUSEHOLD     QUESTIONNAIRE                                     DATE
ENUMERATOR I
CLERK
SUPERVISOR
I. FOR    RECENT   MOVERS ( WITHIN THE        PAST   12 MONTHS ) ONLY          INFORMATION      ON   PREVIOUS     RESIDENTIAL      UNIT
1.Reason for moving .........................                           2. Amount of rent paid         K..............         ShlMonth            I
55        59
3                4.                        5.             6.             7               8.                               TOILET      10DATWING 1.KITCHEN 12.LIGHTING 13 TELEPH- iDOMESTIC
TYPE OF UNIT     NUMBER OF ROOMS            FLOOR        OUTER WALLS       ROOF            WATER   SUPPLY              FACILITIES   FACILITIES FACILITIES  FACILITIES  ONE   SERVANTS
USED FOR                                                                                                                                                SUARTERS
I House                                   1. Earth        01. Bricks     1. Thatch       PROVISION AVAILABILI- HOT-WATER
FOR PIPED TY OF PIPE[SSE                        rvt
WA ER  ATR    SYSTEM                  I Private
2. Maisonette                             2. Wood        02. Blocks      2 Tin             WATER     WATER            1.Private flush  Indoor   1. Private 1.Electricity 1. Yes  1. Yes
2 Private
3. Fl at                                  3. Terrazo     03.Stones       3 Tiles                                       2. Private pit  'outdoor  2 Communal 2.Paraftin  2 No   2  No
t. Concrele-                                                                   Communal                          Lamp
L. Swahili                                    wood       04.Concret      t. Conrete     LInside   Iu Private  1. Yes   3  flush       -.Communal 3. Private Il
S. CarrUg ated                                                 Ind oar   Communal
S. Shanty                                 S. Concrete-tiles 05.Woad       Irate       2.Outside 2.Communal 2. No    . Communal pit                     3. Other
Concrete-                      Asbestos    (within
Brick      O.Tin               Sheets       loom.   3.Private I         S. Other        4.Communal 4. Other
Concrete-    07 Corrugated                            Communal                             outdoor
Cement           Iron      7. Ot her      3.Outside                      6 None                    S. None
a. Other       08.Mud-Cament                  (e and    4. None                             S.Other
a-   04                                09. Mud-Wood                   /.None                                        6.None
a                                10. Cardboard
ui   0    6-                I1. Other
60   61   62   63   I. 65 66             67         65   69            70        71        72         73             7L        75        76        777                  7
J.   FOR   HOUSEHOLDS PLANNING TO MOVE         ONLY     INFORMATION     ON UNIT TO WHICH      PLANNING    TO MOVE
1. Reason    for moving....         .................................         2. Amount of rent expecting          to  pay     .............. KSh       Month
1.              A.                         .              6.            7.             9.                                            10.        It.       12.       13         00646STIC
TYPE OF UNIT       NUMBER OF R00MS           FLOOR        OUTER WALLS       ROOF             WATER   SUPPLf               TOILET      BATHING   KITCHEN 2IGHTING TELEPHONE SERVANTS
USED FOR                                                                                         FACILITIES    FACILITIES FACILITIES FACILITIES        QUARTERS
1. Earth       01. Bricks      1. Thatch      PROVISION AVAILABILI- HOT-WATER
FOR PIPED TY OF PIPED SYTE1. Private
1. House                                  2.Wood         02,Blocks      2. Tin            WATER     WATER             I.Private flush  Indoor   1. Private 1,Electricity 1 Yes  I Yes
2 Maisonette                              3.Terraoo      D3. Slones     3. Tiles                                      APrivate pit   2.Private 2-Communal 2.Paratin  2. No    2. we
FConcrete                                                                                    outdoor              Lamp
3 Flat                                       wood        0A.Concrete   4. Concrete    1. Inside  1. Private  1. Yes  S.Communal flush          3 Private I
t. Swahili                                S.Concrete-tiles OS.Wood      S. Corrugated iron 2.Outside  2.Communal 2. No  L.Communal pit  3.ommunal C ommunal ].Other
I within                                       Indoor
S. Shanty                                 6.Concrate-brick  06. Tin     6. Asbestos  sheets ( 00m)  ].Private 1       5. Ot tier          Lo    . Other
Communal
6 Other                                  Y.Concrete-cgment 7.Corrugated iron 7. Other  2.Outside                      6 None          L.Communal S.None
(beyond                                        outdoor
.Other         DA.Mud-ce ment                 (bol     4.None
Saudwo                                                          4. None                                       S.Other               -
10. Cardboard                                                                6. None
in             o                       1t.Other
55   16   67  1    9 30 91             92         1A94 I                      16        97                        is 15    100       101        102       103       10o
I


﻿- 80 -
CENTRAL          BUREAU OF STATISTILS                                                              o          agI
MINISTRY OF ECONOMIC PLANNING AND DEVELOMENI                                                                               a
rwI!NERVIEWED                                                                                                                                   EDITED
of                                                                            ' H CHECK  FORM  URBAN  HOUSING  SURVEY   1983                                  ---   -     INTERIEWE           EDITED
I-                                                                                                                   DATE
OWRSTRUCTURE  HoQ DIGIT                                                                          U
0                                                                   STRUCTURE         QUESTIONNAIRE                                 CNUMERATOR I CLERK
SUPERVISOR
STRUCTURE
IDENTIFICATION
A    DESCRIPTION     OF STRUCTURE                                                                                                                           TOWN     NAME
I      TYPE OF     2    TYPE OF      3*      NUMBER OF RESIDENTIAL UNITS                 4.      RENTAL INCOME LAST MONTH                   5   AGE OF       6    FLOOR       7   OUTER WALLS   6    ROOF
STRUCTURE         SCHEME                                                                                                                   STRUCTURE
1 House             1 Private                                                                                                                    IN YEARS     I- Earth         OtBricks         1 Thatch
2 Maisonette       2. Mortgage                                                                                                                                2 Wood           02.8locks         2-Tin
3 Block at FIats     Tenant Purcas                                                                       IN K. SHILLINGS                     1, Less than 5   3 Terrazo        3te           3-Tele
4 Swahili          4. Sites and Service                                                                                                      2 6 Less than 10  4, Concrete-Wood  OS Wood         A Concrete
5 Shanty           S Unauthorised                                                                                                           3 10 Less than 20  5 Coricete Tilte  06 Tiii         S Corrugated Iron
6 Other            6 Oiher                                                                                                                  4 Over 20         6 Conciete buick  07 Cortugated Iron  6 Asbestos Sheets
7-Concrete -Cement Gb Mud - Cenent  7 Other
09 Mud - Wood
6 Other
O_                                                                                                                           10 CurdblAird
Complete In-compret Occupied Un-occupied Owner-  Rented  Rented Units  Vacant/ Owner-      Total                                         11 Uthel
Occupied                          Occupied
131                                 e   19           22  23            26 27           32  33             38139                                              6       746                       49
B FACILITIES
1                       WATER SUPPLY                             2     TOILET FACILITIES      3    BATHING FACILITIES      4    KITELN    5   1.111AING                     AUAR      DISPOSAL                I
CLEANLINESS
PROVISION FOR PIPED WATERLATER SYSTEM                                                                                           FACILITIES       FACILlIlEi  WIERE DEPOSITED    FREOUI.NC, 0f
1 Inside               1 Private              1- Yes           1- Private Flush              1 Private Indoor            1 Private         1 Electricity    1 Piivate thustbn   LOLL-C TIo    1 Clcon
2 Outside  (Within 100m)  2- Communal           2 No             2-Private Pit               2. Private Outdoor            2 Communal       2 Parallin Lamp  2 Luonal Justi    IUJ  MUNTH     2 Soni gurbage
3'-Communal Flush            3- Communal Indoor           3  private lComanl                            lu
3 Outside  (Beyond 1om) J- Privale/ Communal                    4. Communal Pit              4 Communal Outdoor           4. Other         3 utl.!r         J Co.imunal Uunp                  3 Very Cirty
4 None                   4- None                                 S-Other        6       None    Other                      S None
Mane ~ ~  ~       ~       ~       ~       ~      9 52n                        S,..  Nonoee1 __  ___ L 1 -uI~               ~     ~
so                   s,                s2                            s54 sob ss                                                                              as                         s
C -DISTANCE TO PUBLIC AMENITIES
1 PIPED WATER 2. .EARLST    3. NEAREST     4- NEAREST    S. CHURCH/    6 HOSPITAL 7. CLINIC)     8- POLICE POSTI 9- MATATUI   10- TARMAC     11 TELEPHONE 12   aTRL ET   13 MAKELTI     . CUMM4UniITY  lb TOWN
NURSERY       PRIMARY       SECONDARY                                DISPENSARY     STATION       IIUS STOP      ROAD                          LIGHTING   SHUPPINO LENl    LIt ifRL     CENTRE
SCHOOL         SCHOOL
6162                 636H                          5          6             67             6869                         70             7              27                          47
OW1AN"E CODES   IN KILOMETRES
OWIThlN HOMNESTEAD     .LESS THAN I 2,1LESS THAN 2     3-2 LESS THAN 3  4-3 LESS THAN 4.  5-4. LESS THANS  6-5 LESS THAN 10  7. OVEIR 10 KM
1           .                                                                                                            .   1


﻿- 81 -
CONFIDENTIAL
CENTRAL BUREAU OF STATISTICS
MINISTRY OF ECONOMIC PLANNING AND DEVELOPMENT
URBAN HOUSING SURVEY 1983                                                                Page 2
C: W                                                          STRUCTURE QUESTIONNAIRE                                                       INTERVIEWED EDITED
TOWN   c    3    STRUCTURE  NU   HECK          FORM                                                                                 DATE
'--              UMBER  DIGIT       TYPEM
Ln0C3F010ENUMERATOR/
CLERK
SUPERVISOR
TO BE OBTAINED FROM A TRACEABLE OWNER/OWNER'S REPRESENTATIVE                                                                                         RESERVED FOR
COMPERE USE
TICK WHERE APPROPRIATE        OR FILL WITH APPROPRIATE NUMBER     e.g.                                                                               COMPUTER USE
D. SOCIO-ECONOMIC CHARACTERISTICS OF THE OWNER
1. Name of the owner....   ....................................... SEX  Hale         Female
Postal Address.......................................
Residential Address.....................................
2. Occupation of the owner.    .............................
3. Level of formal education attained by owner
No Schooling                         Primary                               Secondary                     University
4. Gross montly income of the owner's household (from all sources) in KSh.
Under 500              501-1,000              1,001-2,000            2,001-4,000            4,001-6,000
6,001-8,000            8,001-10,000          10,001-20,000           Above 20,000              Unkown
5. Number of income contributors
E. ACqUISITION AND FINANCING OF LAND AND STRUCTURE
1. LAND
1) Land tenure:        Own                              Lease                             Other
ii) When acquired, was the plot;
Developed                       Lndeveloped
I


﻿- 82-
LU I0ENi iAL
CENTRAL BUREAU OF STATISTICS
MINISTRY OF ECONOMIC PLANNING AND DEVELOPMENT                                                      Page 3
URBAN HOUSING SURVEY 1983                                                   INTERVIEWED   EDITED
TOWN             STRUCTURE   H/H    CHECK                         STRUCTURE QUESTIONNAIRE
J _NUMBER                DIGIT
ENUMERATOR/
CLERK
SUPERVISOR
RESERVJED IODR
COMPUTER USE
iii) Size of the plot                                                   ,         Acres
iv) Year of acquisition of land                                                                                                           24  25
26  27
v)  Value  of  land  when  acquired                                                     KSh.                                               28                34
vi) Estimated current value of land                                                      KSh.
vii) Annual land rate ( to Local Authorities)                                             KSh.
42        45
viii) Annual land rent ( to Central Government)                                            KSh
2.6  27    9
2. STRUCTURE
i) For the structure, do you have
Title deed            Lease          Temporary occupancy licence           Other
ii) Was the structure
Purchased                 Gift [jInherited                         Owner-built              Other [1
1                 2                      3                                           5
iii) Year of acquisition
52 53
iv)  Amount paid                                                                          t  Sh.                                             1               0
v) Year structure completed                                                                                                                     I
61 62
vi) Total legal fees paid                                                                   KSh.
3                68
vii) Amount of architectural fees paid                                                       KSh.                                           I   I I   I
69                 74
viii) Conotruction costs *(excludes legal and architectu:   fees)                             KSh.
75
ix) Ettimnoted current value of structure                                                   Sh.
82                 88
x) Hou LJOU the acauisition financed?
C1;h                 Credit                  Not applicable  [9
2                               3


﻿83                                                       CONFIDENTIAL
CENTRAL BUREAU OF STATISTICS                                                                    Page 4
MINISTRY OF ECONOMIC PLANNING AND DEVELOPMENT                                                 INTERVIEWED     EDITF-
URBAN HOUSING SURVEY 1983
STRUCTURE qUESTIONNAIRE                                             DATE
Cx ENUJMERATOR/
T OWN  .1  Ej   STRI IT URE INUMVIER ril 1,TI                                                                                            SPRIO
______  fRE  OLlE  DWT                                                                               SUERUTrJR     RUERUED FOR
xi) fur Cnsh Onl1,
1) Ilut iuo LIti viranced?
Snuims             Gift            Sale of prooerty               Loan [          Other []                                                   90 j
1               2                            3
b) If Inn, source:
Commercial Uank Mortgage              Housing Finance Company               National Housing Corporation
Insurance Companies                    Co-operatives             Other Financial Institutiona  [!
1T          ~        5L
Employer           Relative                  Other    FI1
7                                           9
i)   for Credit Only
a) Amun or dnwpayment                                                                  KSh.                                                   t2'
b) Munthly murtgngu payments                                                             KSh.
c)  Perliad of mortgae payments                                                     Years
d) Sorc of inance:
Commercial Bank Mortgage            Housing Finance Company           National Housing Corporation
2                                      3
Insurance Cumpanies                 Co-operatives             Other Financial Institutions
Employpr [A        Relative               Other     F
7                    8                    9
xiii)  Du you own other residential structures?      Yes            No16
xiv) Louation o olher residential structures:
In thiS L0n1 F   -1     In other towns             Both in this town and other towns             Not applicable  lIT
12                                                                 3                          4


﻿84 -
CENTRAL BUREAU OF STATISTICS
MINISTRY OF ECONOMIC PLANNING AND DEVELOPMENT
URBAN HOUSING SURVEY
1983
LISTING FORM
Town Stratum Cluster                                     Town Name.............
Stratum Name .
Serial Structure  l - Id  Total in  Name of House Hold  Head       Remarks
No.   No.     .,  H'Hold
__ ____________I__                        _    _


﻿- 85 -
APPENDIX B: INTRODUCTION TO LOGARITHMS AND ELASTICITY
There is no escape from the following fact: when first encountered,
logarithms are confusing. They have such desirable properties, however, that
it is well worth the effort to learn to use them. The next few pages serve as
an introduction to the subject, but realistically, only working with
logarithms for some time will make you completely comfortable with them. It's
worth the effort because they can make complicated regression models easy to
estimate/
Exponents and Logarithms
An exponent is the power to which a variable is to be raised, and
logarithms are just special exponents. In familiar power expressions such as
x2 or x3, the exponents are constants, but there is no reason why we can't
have variable exponents like xy (or as sometimes written, x ** y, where ** is
a common symbol for exponentiation). Suppose we believe that the relationship
between rent and income is of the following form:
(A-1) R = k * Ib
where k is a constant term and b is the exponent of income; both k and b are
to be estimated. R and I are, say, rent and income. Ordinary least squares
can't handle this problem as written, because the coefficient b and income are
not linearly related. We have to find a way to make b linearly related to I,
or more preci5ely, to some simple and computable function of I.
Definition of logarithm. When we have two numbers such as 4 and 16
which are related exponentially:
1/    Much of this annex is adapted from Chiang (1974) Chapter 10, and Mirer,
(1983), Chapters 2 and 6.


﻿- 86 -
(A-2) 42 = 16
then we define the exponent 2 to be the logarithm of 16 to the base 4, and can
rewrite A-2 as:
(A-3) log4 16 = 2
or in other words, the logarithm is the power (here 2) to which the base (here
4) must be raised to attain a particular number (16). A logarithm is simply
an exponent. "Log" is often written as the short form of logrithm. We can
also go back the other way, and exponentiate, or "find the antilog." We can
choose any base we want to. In practice, there are two common bases, base 10
(often called common logarithms) and a special base, the number 2.71828, often
denoted by the letter "e" and known as the base of natural logarithms. In
economics, we always use natural logarithms, because it turns out this type
has several desirable properties which we will discuss later. The special
number e is a special constant very much like the more familiar number pi
(3.14159) which turns up so often in geometry and other mathematics. The
derivation of e requires calculus so we will not discuss it here /
Table A-1 lists some representative numbers and their natural
logarithms. Of course, when analyzing data with computers we don't refer to
these tables, because the computer can automatically calculate the logs for
us.
Notice that all the numbers, "n," are greater than zero. One of the
restrictions with logarithms is that they cannot be computed for negative
numbers. Fortunately, this is not a serious problem for our kind of work
2/    See Chiang (1974) or any calculus text for the deriviation of e.


﻿-87-
Table A-l:        Natural Logarithms
n       log,n       n      log, n      n       log n
0.0        -         4.5     1.5041     9.0     2.1972
0.1     -2.3026      4.6     1.5261     9.!     2.2083
0.2     -1.6094      4.7     1.5476     9,2     2.2192
0.3     -1.2040      4.8     1.5686     9 3     2.2300
0.4     -0.9163      4.9     1.5892     9.4     2.2407
0.5     -0,693 1     5.0     1.6094     9.5     2.2513
0.6     -0.5108      5.1     1.6292     9.6     2.2618
0.7     -0.3567      5.2     1.6487     9.7     2.2721
0,8     -0.2231      5.3     1,6677     9.8     2.2824
0.9     -0.1054      5.4     1,6864     9.9     2.2925
1.0       0.0000     5.5     1.7047     10      2.3026
1.1       0.0953     5.6     1.7228      11     2.3979
1.2       0.1823     5.7     1.7405     12      2.4849
1.3       0.2624     5.8     1.7579     13      2.5649
1.4       0.3365     5.9     1.7750     14      2.6391
.5       0.4055     6.0     1.7918     15      2.7081
1.6       0.4700     6.1     1.8083     16      2.7726
1.7       0.5306     6.2     1.8245     17      2.8332
1.8       0.5878     6.3     1.8405     18      2.8904
1.9       0.6419     6.4     1.8563     19      2.9444
2.0       0.6931     6.5     1.8718      20     2.9957
2.1       0.7419     6.6     1.8871      25     3.2189
2.2       0.7885     6.7     1.9021      30     3,4012
2.3       0.8329     6.8     1.9169      35     3.5553
2.4       0.8755     6.9     1.9315      40     3.6889
2.5       0.9163     7.0     1.9459      45     3.8067
2,6       0.9555     7.1     1.9601      50     3.9120
2.7       0.9933     7.2     1.9741      55     4.0073
2.8       1.0296     7.3     1.9879  1   60     4.0943
2.9       1.0647     7.4     2.0015      65     4.1744
3.0       1.0986     7.5     2.0149      70     4.2485
3.1       1.1314     7.6     2.0281      75     4.3175
.2       1.1632     7.7     2.0142      80     4.3820
3.3       1.1939     7.8     2.054 1     85     4.4427
3.4       1.2238     7.9     2.0669      90     4.4998
3.5       1,2528     8.0     2.0794      95     4.5539
3.6       1.2809     8.1     2.0919     100     4.6052
3.7       1.3083     8.2     2.1041     200     5.2983
3.8       1.3350     8.3     2.1163     300     5.7038
3.9       1.3o10     8.4     2.1282     400     5.9915
4,0       1.3863     8.5     2.1401     500     6.2146
4.1       1.4110     8.6     2.1518     600     6.3069
4.2       1.4351     b.i     2.1633     700     6.5311
4.3       1.4586     8.8     2.1748     800     6,6846
4.4       1.4816     8.9     2.1861     900     6.8024


﻿- 88 -
because most important economic variables -- for example, incomes, prices,
rents, distances, household sizes -- are always positive.2/
It turns out that equations like A-1 can be easily re-expressed in
linear form if we use natural logarithms. If we perform the same mathematical
operation to both sides of A-1 the equality still holds. We can therefore
write:
(A-4) loge R = loge (k * Ib)
Loge, the natural log, is often written "ln" for brevity, or
(A-5) ln R = ln (k* Ib)
Now what? There are two important rules of logs which help simplify these
equations. These rules are so helpful t-at they are one of the major reasons
for using logs. They are:
Rule 1:    the log of the product of two numbers equals the sum of
the logs of the two numbers.
For example, ln(x * y) = ln(x) + ln(y)
Rule 2:    the log of an exponential function equals the exponent
times the log of the variable, or ln(xa) = a * ln(x)
We will not prove these rules (see Chiang or a calculus text) but we can
illustrate that they work by using examples from Table A-1.
For example, Rule 1 says that:
ln(15) = ln (3 * 5) = ln(3) + ln(5), or, from
3/    If there are a few legitimate negative numbers for some variable for
some variable for which you want to use logs, a common fix-up is to (1)
add the value of the largest negative number, plus one, to each
observation, (2) then take the log.


﻿- 89 -
Table A-1:
2.7081 = 1.0986 + 1.6094,
which verifies Rule 1. To test Rule 2, try:
ln(16) = ln(42) = 2 * In(4),
so check tl%at:
2.7726 = 2 * 1.3863
and Rule 2 is verified. The reader can make up his own examples with the
numbers from Tables A-1 to convince himself of the validity of these rules.
There is no reason why we can't use both rules on the same
problem. Let's return to equation A-5:
(A-5) In R = Ln (k* Ib)
Now apply Rule 1 and get:
(A-6) In R = ln(k) + ln(Ib)
and applying Rule 2 we get
(A-7) In R = ln(k) + b * ln(I)
and now we have transformed equation A-1 into a linear equation that can be
estimated using regression analysis! All we have to do is compute two
variables, the natural log of R (rent) and the natural log of I (income), and
use these new variables in the regression:
(A-8) In R = a + b * ln(I)
The only difference between A-8 and A-7 (and hence A-1) is that the intercept,
estimated, a, is the log of the original intercept, estimated, a, is the log
of the original intercept, k. It can easily be transformed back, but usually
we are more interested in the estimate of b, since that number can be
interpreted as the percentage change in rent given a one percent change in
income.


﻿- 90 -
If we have several right-hand side variables:
(A-9) R = k *Ib * pc
where P is price, and c is a new parameter to be estimated, we can use the
regression:
(A-10) ln R    a + b * ln(I) + c * ln(P)
Also, we can add other types of variables such as dummy variables, linear
variables, and squared variables, e.g.:
(A-11) ln R + a + b * ln(I) + c * ln(HH) + d* In (HHSQ)
where HH is household size, and HHSQ is household size, squared.
So far, so good. Now we know how to estimate linear logarithms
models derived from nonlinear (and unestimable using regression) models like
A-1 and A-9. So what? Why would we want to do that? It turns out that these
models have several desirable properties. First, models like A-1 are constant
elasticity models; the relative effect of a change in I upon R is constant,
and equal tc b. This means that the regression coefficient b is a very
convenient summary of the responsiveness of R to changes in I: if b happens
to equal 1, a 1 percent change in income implies a 1 percent change in rent,
or in other words, rents rise proportionally with income, or in another manner
of speaking, the typical rent-to-income ratio is constant, as income
changes. If b is zero, then there is no measured relationship between rent
and income. If b lies between 0 and 1, rents go up with income, but not as
fast as income goes up. Suppose we estimate b to be .6. Then a 1 percent
increase in income implies a .6 percent increase in rent. This means that at
higher incomes people pay higher levels of rent, but since the increases are
less than proportional, the rent-to-income ratio goes down. This is confusing
at first, and requires some thought.
Suppose we have three people:


﻿- 91 -
Proportional    Proportional
Change          Change        Rent
in              in
Person         Rent        Income      Income           Rent        Income
1            200         1,000         --              --           .20
2            260         1,500        50%             30%           .17
3            320         2,000        33%             23%           .16
Note the following: The second person has 50 percent more income than the
first, and the third 33 percent more than the second. But rents rise less
than proportionally to income (30 percent and 23 percent). In other words,
the level of rents go up while the rent-to-income ratio declines. Demand is
income-inelastic, in economics jargon. If we ran a regression like A-8 on a
sample which contained mostly people with this kind of consumption pattern,
then we can expect our estimate of b to be greater than zero but less than
one. If our sample contains mostly people like this:
Proportional    Proportional
Change          Change        Rent
in              in
Person         Rent        Income      Income           Rent        Income
1            200         1,000         --              --           .20
2            300         1,500        50%             50%           .20
3            400-        2,000        33%             33%           .20
Here rents rise in the same proportion as incomes, the rent-to-income ratio is
constant, and the coefficient from a regression like A-8 should be close to
one. If ve found rents going up faster than income, and rent-to-income ratios
increased with income, then b should be greater than one (demand is income
elastic).


﻿- 92 -
A second advantage is that the log form is fle,--ble, and this does
not place undue restrictions on the shape of the relationship to be
estimated. Figure A-1, adapted from Mirer (1983, p. 103) demonstrates this
point (using Y and X instead of R and I as variable names).
A third advantage is that log models in economics have more constant
variance than linear models. this means the following: The errors, or
differences between predicted rent and actual rent, are often found to vary
systematically with income when a simple linear model is used, in the
following way: larger errors are typically found for higher income people.
This problem, known as heteroskedasticity, means that our hypothesis tests
using t and F statistics, are incorrect. Log models usually do not suffer
from this problem as much as simple linear models.-
4/    See, for example Malpezzi, Ozanne and Thibodeau, (1981) pp. 24-26.


﻿- 93 -
Figure A-1
Flexibility of Logrithmic Functional Form
Y = (elo)xx)                     In Y =0 + t In X
Y                                In Y
(a)I                         <0      -I
X                                 inX
Y                                In Y
= 0
(b)
X                                 InX
Y                                In Y
(c)                    0 <:   1l
X In X
Y                                In Y
X                                 In X
Y                                In Y
(e)>
X                                 InX
._   The geometry of the log-linear relation depends on the sign of
t.When Y decreases as X increases (case (a), with St < 0], it is concave
upward. When Y increases with X (with 0, > 0), the concavity may be up-
ward or downward, depending on the magnitude of 01. Although Y is a
'nonlinear function of X, In Y is a linear function of In X; the. slope of
that line is the same g, as in the original formulation. The parameter 01 is
the elasticity of Y with respect to X.


﻿- 94 -
APPENDIX C: FORTRAN SUBROUTINES FOR ORDER STATISTICS
The code in this appendix is from Velleman and Hoaglin (1981), which
is also recommended as a guide to exploratory data analysis. The first part
of the appendix lists the basic code. The second part provides some
explanation of initialization and programming conventions.
We are indebted to Velleman and Hoaglin for permission to use this
code. However, as they point out, this code was designed for other purposes
(see their text), and should be used as a guide to writing code for order
statistics rather than copied directly.
In particular, the Kenyan housing survey is not self-weighting, so
medians should be computed using sample weights. This is a straightforward
extension of this code, where the sum of the weights is substituted for the
number of observations.


﻿- 95 -
BLOCK DATA
C
C CHARS CONTAINS THE SYMBOLS OF THE STANDARD FORTRAN CHARACTER SET,
C  AND  CHA - CHPT ARE THE CORRESPONDING INDICES INTO CHARS.
C  PUTCHR  IS THE PRIMARY USER OF THIS TRANSLATION VECTOR.
C
COMMON /CHARIO/ CHARS, CMAX,
1 CHA, CHB, CHC, CHD, CAE, CHF, CHG, CHH, CHI, CHJt CHK,
2 CHL, CHM, CHN, CHO, CHP, CHQ, CHR, CHS, CHT, CHU, CHV,
3 CHW, CHX, CHY, CHZ, CHO, CH1, CH2, CH3, CH4, CH5, CH6,
4 CH7, CH8, CH9, CHBL, CHEQi CHPLUS, CHMIN, CHSTAR, CHSLSH,
5 CHLPAR, CHRPAR, CHCOMA, CHPT
C
C
INTEGER CHARS(46), CMAX
INTEGER CHA, CHBv CHC9 CH0, CHE, CHFI CHG, CHHy CHI
INTEGER CHJv CHKI CHLt CHMv CHNP CHO, CHPY CHQ9 CHR
INTEGER CHS, CHTi CHU, CHVI CHWv CHXt CHY, CHZ
INTEGER CHA, CH8, CH2, CH3, CH4, CH5 CH6, CH71 CH8i CH9
INTEGER CHBL, CHEQ, CHPLUS, CHMIN, CHSTAR, CHSLSH
INTEGER CHLPAR, CHRPAR, CHCOMA, CHPT
DATA CHARS( I)tCHARS( 2)tCHARS( 3)9CHARS( 4) /lHA71HBtCHCHCHD/
DATA CHARS( 5)vCHARS( 6)tCHARS( 79CHARS( 8) /lHEtlHFPCHGHCHH/
DATA CHARS( 9)vCHARS(IO)tCHARS(11)vCHARS(12) /lHI91HJtlHKvlHL/
DATA CHARS(13j,)CHARS(14),CHARS(15),CHARS(16) /lHMtlHNtlHOvlHP/
DATA CHARS(17)tCHARS(18)tCHAPS(19),CHARS(201 /lHQtlHR91HStlHT/
DATA CHARS( 21) 9CHARS( 221 tCHAPS( 23),qCHARS( 24) /lHU71HVvlHW11HX/
DATA CHARS(25)vCHARS(26)tCHARS(27),CHARS(281 / 1HY,9XHZ,i1H09,lHl/
DATA CHARS(29),CHAPSC(3O),CHAr S(31),CHARS(32) /lH2,lH3llH4ilH5/
DATA CHARS(33)YCHARS(34,)?CHARS(35)?,CHARS(36) /IH69lH79lH8vlH9/
DATA CHARS(37)tCHARS(38)ICHARS(39)tCHARS(40) 11H tlH=tlH+,lH-/
DATA CHARS (41),iCHARS (42),1CHARS( 43),9CHARS( 44) /lH*,.'LH/tlH(ilH)/
DATA CHARS (45)i CHARS (46                  H, C   CH./
DATA CMAX /46/
DATA CHAvCHBiCHCICHDvCHE9CHF         / 19 2, 3, 4, 5, 6/
DATA CHGtCHHtCHI?CHJvCiKtCHL         / 7, 8, 9910illYI2/
DATA CHM-CHNiCHOiCHP,CHQvCHR         /1314,15il6ql7vl8/
DATA CHSiCHTiCHUCHViCHWPCHX          /120v2lv22?23924/
DATA CH*eCHZtCH0jCHliCH29CH3         /25t26,27,28i29,30/
DATA CH41CH5jCH6iCH7pCH8vCH9         /-1,32037340506/
DATA CHBL,CHEQ,CHPLUS,CHMIN          /3'1938939940/
DATA CHSTAR,CHSLSHvCHLPAR,CHRPAR     /41942t43944/
DATA CHCOMA,CHPT                     /45t46/
C
C
END


﻿-96 -
SUBq0JTINE CINIT(IOUNIT, IPMIN, IPMAX, IEPSI IMAXIN, ERR)
C
INTEGER IOUNIT, IPMIN, IPMAX, IMAXIN, EPR
REAL IEPSI
C
C  INITIALIZATION, TO BE CALLED AT START OF ANY  MAIN PROGRAM
C WHICH CALLS ONE OF THE EDA SUBROUTINES (EITHER DIRECTLY OR
C  INDIRECTLY).
C
C  IOUNIT  IS THE NUMBER OF THE UNIT TO WHICH OUTPUT IS DIRECTED.
C  IPMIN   IS THE LEFT MARGIN.
C  IPMAX   IS THE RIGHT MARGIN.
C  IEPSI   IS THE MACHINE-RELATED EPSILON.
C  IMAXIN  IS THE MAXIMUM PERMITTED INTEGER VALUE
C
C  ERR IS THE (USU   -E4 - F_, *TDA    Ib M+E   THER
C THE ROUTINE EXECUTED SUCCESSFULLY.
C
COMMON /CHRBUF/ P, PMAX, PMIN, OUTPTR, MAXPTR, OUNIT
COMMON /NUMBRS/ EPSI, MAXINT
C
INTEGER P(130), PMAX, PMIN, OUTPTR, MAXPTR, OUNIT
REAL EPSI, MAXINT
C
C LOCAL VARIABLES
C
INTEGER BLANK, I
DATA BLANK /1H /
C
C
ERR = 6
IF(IPMIN .LT. 1) GO TO 999
IF(IPMAX .GT. 130) GO TO 999
IF(IPMAX .LE. IPMIN) GO TO 999
ERR = 7
IF((1.0 + IEPSI) .LE. 1.0) GO TO q99
ERR = 0
OUNIT = IOUNIT
?MIN = IPMIN
OUTPTR = IPMIN
MAXPTR = IPMIN
PMAX = IPMAX
EPSI = IEPSI
MAXINT = FLOAT(IMAXIN)
C
DO 50 I = 1, 130
P(I) = BLANK
50 CONTINUE
C
999 RETURN
END


﻿- 97 -
SUBROUTINE PUTCHR(POSNv CHAR, ERR)
C
INTEGER POSCv CHARt ERR
C
C  PLACE THE CHARACTER   CHAR' AT POSITION   POSN  IN
C  THE OUTPUT LINE   P *IF    POSN =0 9 PLACE    CHAR  IN THE
C  NEXT AVAILABLE POSITION IN   P *MAXPTR     IS TO BE INITIAL-
C   IED TO  PMIN I AND  PRINT  MUST RESET IT.
C
COMMON /CHARTO/ CHARS, CMAXI
1 CHA, CHBi CHCt CHD, CHEv CHF9 CHG, CHHv CHI, CHJt CHK*
2 CML, CHMt CHN, CHOv CHPi CHQt CHR, CHS, CHTY '_HUi CHV,
3 CHW, CMHXt CHYi CHZr CHOr CHlt CMZv CH3? CH4? CH5i CH6v
4 CH7v CH87 CH9i CHBLt CHEQi CHPLUS, CHMINt CHSTARt CHSLSHi
5 CHLPARq CHRPARv CHCOMAt CHPT
C
COMMON /CHRBUF/ P, PMAXv PMINi OUTPTRY MAXPTRt OUNIT
C
INTEGER CHARS(46)t CMAX
INTEGER CHAv CHBj CHC, CHDt CHE, CHFv CHG9 CHHt CHI
INTEGER CH.J9 CHKi CML, CHM9 CHNi CH07 CMP? CHQ9 CHR
I NT ElSER CHS, CHT, CHUv CMVt CMHWv CMX, CMY, CH'K
INTEGER rHO,, CH1, CH21 CH31 CH4, CM5, CM6i CH7t CH8v CH9
INTEGER CHBL, CHEQ, CHPLUSt CHMIN, CHSTAPi CHSLSH
INTEGER CHLPARi CHRPAR, CHCOMA, CHPT
INTEGER P(130)C PMAXv PMINt OUTPTR9 MAXPTR, OUNIT
C
IF(CHAR .GT. 0 .AND, CHAR .LE. CMAX) GO TO 10
ERR = 4
RETURN
10 IF(POSN .NE. 0) OUTPTR = MAX0(PMINv POSN)
OUTPTR =MINO(OUTPTRt PMAX)
P(OUTPTR) = CHARS(CHAR)
MAXPTR = MAXO(MAXPTR, UTPTR)
OUTPTR L OUT.PTR +      1
RETURN
END
INTEGER FUNCTION WDTHOF(I)
INEG ER I
C FIND THE NUMBER OF CHARACTERS NEEDED TO PRINT I
INTEGER IA, IQ, NO
C
IA = IAS(I)
ND = 1
IF(I  L*r. 0) ND   2
10 IQ = IA110
IF(TQ *EQ. 0) GO TO 20
IA = IQ
ND   ND + 1
GO 10 10
20 WDTHCF =NO
RETURN
END


﻿- 98 -
SUBROUTINE PjTNUM'%POSN, N, W, ERR)
C
INTEGER POSN, N, W, ERR
C PLACE THE CHARACTER REPRESENTATION OF THE INTEGER N
C RIGHT-JUSTIFIED IN A FIELD W SPACES WIDE STARTING
C AT POSITICN POSN IN THE OUTPUT LINE    P
C
C THE VARIABLES   IP, INUM, AND  IW  ARE INTERNAL VERSIONS
C OF   POSN, Nv AND  W . WE PROCEED BY EXTRACTING THE
C DIGITS OF N, STARTING WITH THE LOW-ORDER DIGIT,
C AND STACKING THEM IN DSTK. ( ND     COUNTS THE DIGITS.)
C ONCE WE HAVE COLLECTED ALL THE DIGITS (AND KNOW THAT
C   W  SPACES ARE S,)FFICIENT), WE SKIP OVER ANY UNNEEDED
C SPACES, PUT OUT A MINUS SIGN IF NEEDED, AND THEN PUT OUT
C THE DIGITS, STARTING WITH THE HIGH-ORDER ONE.
C
C THiS ROUTINE CALLS PUTCHR   AND DEPENDS ON HAVING DIGITS
C 0 THROUGH 9 IN CONSECUtIVE ELEMENTS OF CHARS    IN THE
C COMMON BLOCK   CHARIO, STARTING AT CHO = 27. IT ALSO
C ASSUMES THAT THE MINUS SIGN IS AT  CHMIN = 40 IN CHARS.
C
INTEGER CHO, CHO, CHMIN, DSTK(20), INUM, IP, IQ, IW, ND
C
COMMON/CHRBUF/ P, PMAX, PMIN, OUTPTR, MAXPTR, OUNIT
INTEGER P(130), PMAX, PMIN, OUTPTR, MAXPTR, OUNIT
C
DATA CHO, CHMIN/27, 40/
C
C
IW = W
IF(N .LT. 0) IW = IW - 1
INUM = IASS(N)
C
C EXTRACT AND STACK HE DIGITS OF   INUM, CHECKING
C TO SEE THAT   N  FITS IN W   SPACES.
C
ND = 1
10 IQ = INUM/10
DSTK(ND) = INUM - IQ * 10
IF(NO .LE. 20 .AND. NO .LE. IW) GO TO 20
ERR '= 2
GO TO 999
20 IF(IQ .EQ. 0) GO TO 30
INUM = IQ
NO = NO + 1
GO TO 10
C
C UNSTACK THE DIGITS FROM DSTK AND PUT THEM OUT.
C NOTE THAT WHEN   N  IS NEGATIVE, A MINUS SIGN MUST BE
C  INSERTED IN THE SPACE BEFORE THE FIRST DIGIT. DECREASING
C   IW  BY 1 IN THE INITIALIZATION HAS PRO"IDED A SPACE
C  FOR THE MINUS SIGN.


﻿- 99 -
C
30 IP = POSN
IF(IP .EQ. 0) IP = OUTPTR
IP = IP + IW - ND
IF(N .GE. 0) GO TO 40
CALL PUTCHR(IPI CHMIN, ERR)
IP = IP + 1
40 CHO =.CHO+ DSTK(ND)
CALL PUTCHR(IP, CHO, ERR)
IF(ND .EQ. 1) GO TO 50
ND = ND - 1
IP = IP + 1
GO TO 40
50 CONTINUE
C
999 RETURN
END
SUBROUTINE PRINT
C
C  PRINT THE OUTPUT LINE P   ON UNIT  OUNIT   (MAXPTR
C  INDICATES THE RIGHTMOST POSITION WHICH HAS BEEN USED
C  IN THIS LINE).  THEN RESET  P TO SPACES, AND MAXPTR AND
C  OUTPTR TO PMIN.
C
COMMON /CHRBUF/ P, PMAX, PMIN, OUTPTR, MAXPTR, OUNIT
C
INTEGER P(130), PMAX, PMIN, OUTPTR, MAXPTR, OUNIT
C
C  LOCAL VARIABLES
C
INTEGER  BLANK, I
C
DATA BLANK /1H /
C
WRITE(OUNIT, 10) (P(I), 1=1, MAXPTR)
10 FORMAT(1X, 130A1)
C
00 20 I = 1, MAXPTR
P(I) = BLANK
20 CONTINUE
C
OUTPTR = PMIN
MAXPTR = PMIN
C
RETURN
END


﻿-100-
SUBROUTINE SORT( Y, N, ERR)
C
INTEGER N, ERR
REAL Y(N)
C
C  SHELL SORT  N  VALUES IN  Y()   FROM SMALLEST TO LARGEST.
C
C  NOTE THAT LOCAL SYSTEM SOPT UTILITIES APE LIKELY TO BE
C  MORE EFFICIENT, AND SHOULD BE SUBSTITUTED WHENEVER POSSIBLE.
C
C  LOCAL VARIABLES
C
INTEGER I, Ji J1, GAP, NMG
REAL TEMP
C
IF(N .GE. 1) GO TO 10
ERR = 1
GO TO 999
10 IF(N .EQ. 1) GO TO 999
C
C ONE ELEMENT 13 ALWAYS SORTED
C
GAP = N
20 GAP = GAP/2
NMG = N - GAP
DO 40 JI = 1, NMG
I = JI + GAP
C
C  DO  J = J1, 1, -GAP
C
J = J1
30   IF (Y(J) .LE. Y(I)) GO TO 40
C
C  SWAP OUT-OF-ORDER PAIR
C
TEMP = Y(I)
Y(I) = Y(J)
Y(J) = TEMP
C
C  KEEP OLD POINTER FOR NEXT TIME THROUGH
C
I = J
J = J - GAP
IF (J .GE. 1) GO TO 30
40 CONTINUE
IF (GAP .GT. 1) GO TO 20
999 RETURN
END


﻿- 101-
SUBROUTINE PSORT( ON, WITH, N, ERR)
C
INTEGER N, ERR
REAL ON(N), WITH(N)
C
C  PAIR SHELL SORT N   VALUES IN ON()   FROM SMALLEST TO LARGEST
C  CARRYING ALONG THE VALUES IN  WITH().
C
C NOTE THAT LOCAL SYSTEM SORT UTILITIES ARE LIKELY TO BE
C  MORE EFFICIENT, AND SHOULD BE SUBSTITUTED WHENEVER POSSIBLE.
C
C  LOCAL VARIABLES
C
INTEGER I, Jt Ji, GAP, NMG
REAL TON,TWITH
IF(N .GE. 1) GO TO 10
ERR = 1
GO TO 999
10 IF(N .EQ. 1) GO TO 999
C
C  ONE ELEMENT IS ALWAYS SORTED
C
GAP = N
20 GAP = GAP/2
NMG = N - GAP
DO 40 J1 = 1, NMG
I z JI + GAP
C
C  00  J = J1, 1, -GAP
C
J = J1
30   IF (CNIJ) .LE. ON(I)) GO TO 40
C
C  SWAP CUT-CP-ORDER PAIR
C
TON = ON(I)
ON(I) = ON(J)
ON(J) = TON
TWITH = WITH(I
WITH(I) = WITH(J)
WITH(J) = TWITH
C
C  KEEP OLD POINTER FOR NEXT TIME THROUGF
C
I   J
J   J - GAP
IF (J .GE. 1) GO TO 30
40 CONTINUE
IF (GAP .GT. 1) GO TO 20
999 RETURN
END


﻿- 102 -
SUBROUTINE YINFO(Y, N, MED, HL, HH, ADJL, ADJH, IADJL, IADJH,
1 STEP, ERR)
C
C  GET GENERAL INFORMATION ABOUT Y(. USEFUL FOR PLOT SCALING.
C SORTS Y() AND RETURNS IT SORTED. ALSO RETURNS
C      MED a MEDIAN
C      HL  v LOW HINGE            HH   =HI HINGE
C      ADJL = LOW ADJACENT VALUE ADJH =HI ADJ VALUE
C      IADJL=  ITS INDEX (LOCATN) IADJH=ITS INDEX
C
INTEGER N, IADJL, IADJH, EPR
REAL Y(N), MED, HL, HH, ADJL, ADJH, STEP
C
C  LOCAL VARIABLES
C
REAL HFENCE, LFENCE
INTEGER J? K, TEMPI, TEMP2
C
CALL SORT(Y, N, ERR)
IF (ERR .NE. 0) GO TO 999
K=N
J = (K/2)+1
C
TEMPI = N+1-J
MED = (Y(J) + Y(TEMPI))/2.0
C
K = (K+1)/2
J = (K/2) + 1
TEMPI = K+1-J
HL = (Y(J) + Y(TEMP1))/2.0
TEMPI = N-K+J
TEMP2 = N+1-J
HH = (Y(TEMP1) + Y(TEMP2))/2.0
C
STEP = (HH - HL)*1.5
HFENCE  HH + STEP
LFENCE = HL - STEP
c
C  FIND ADJACENT VALUES
C
IADJL = 0
20 IADJL = IADJL + 1
IF ( Y(IADJL) 4LE. LFENCE) GO TO 20
ADJL   Y(IAOJL)
c
IADJH = N+1
30 IADJH = IADJH - 1
IF ( Y(IADJH) .GE. HFENCE) GO TO 30
ADJH = Y(IADJH)
999 RETURN
END


﻿-103 -
SUBROUTINE NPOSW(HI? LO, NIGNOS, NN, MAXP, MZERO, PTOTL, FRACT,
1 UNIT, NPW, ERR)
C
C FIND A NICE (I.E., SIMPLE) DATA-UNITS VALUE TO ASSIGN TO ONE PLOT
C  POSITION IN ONE DIMENSION OF A PLOT. A PLOT POSITION IS TYPICALLY
C  ONE CHARACTER POSITION HORIZONTALLY, OR ONE LINE VERTICALLY.
C
C  ON ENTRY:
C HI, LO ARE THE HIGH AND LOW EDGES OF THE DATA RANGE TO BE PLOTTED.
C  NICNOS IS A VECTOR OF LENGTH   NN CONTAINING NICE MANTISSAS FOR
C           THE PLOT UNIT.
C  MAXP    IS THE MAXIMUM NUMBER OF PLOT POSITIONS ALLOWED IN THIS
C           DIMENSION OF THE PLOT.
C  MZERO   IS .TRUE. IF A POSITION LABELED -0 US ALLOWED IN THIS
C           DIMENSION, .FALSE. OTHERWISE.
C
C ON EXIT:
C PTOTL    HOLDS THE TOTAL NUMBER OF PLOT POSITIONS TC BE USED IN
C           THIS DIMENSION.  (MUST BE  .LE. MAXP.)
C  FRACT   IS THE MANTISSA OF THE NICE POSITION WIDTH. IT IS
C           SELECTED FROM THE NUMBERS IN NICNOS.
C  UNIT    IS AN INTEGER POWER OF 10 SUCH THAT NPW = FFACT * UNIT.
C NPW      IS THE NICE POSITION WIDTH. ONE PLOT POSITION WIDTH
C           WILL REPRESENT  A DATA-SPACE DISTANCE OF NPW.
C
C
INTEGER NN, MAXP, PTOTL, ERR
REAL HI, LO, NICNOS(NN), FRACT, UNIT, NPW
LOGICAL MZERO
C
C FUNCTIONS
INTEGER FLOOR, INTFN
C
C LOCAL VARIABLES
C
INTEGER I
REAL APRXW
C
IF (MAXP .GT. 0) GO TO 5
ERR = 8
GO TO 999
5 APRXW = (HI - LO)/FLOAT(MAXP)
IF(APRXW .GT. 0.0) GO TO 10
C
C HI .LE. LO IS AN ERROR
C
ERR = 9
GO TO 999
10 UNIT = 10.0**FLOOR(ALOG10(APRXW))
FRACT   APRXW/UNIT
DO 20 1 = 1, NN
IF(FRACT .LE. NICNOS(I) GO TO 30
20 CONTINUE


﻿104 -
30 FRACT a NICNOS(I)
NPW * FRACT * UNIT
PTOTL - INTFN(HI/NPW, ERR) - INTFN(LO/NPW, ERR) + 1
IF(ERR .NE. 0) GO TO 999
C
C IF MINUS ZERO POSITION POSSIBLE AND SGN(HI) .NE. SGN(LO), ALLOW IT.
C
IF(MZERO .AND. (HI*LO .LT. 0.0 .OR. HI .EQ. 0.0)) PTOTL=PTOTL+
C
C  PTOTL POSITIONS REQUIPED WITH THIS WIDTH -- FEW ENOUGH?
C
IF(PTOTL .LE. MAXP) GO TO 999
C
C TOO MANY POSITIONS NEEDED, SO BUMP NPW UP ONE NICE NUMBER
C
I 2 I+1
IF(I .LE. NN) GO TO 30
1 = 1
UNIT = UNIT * 10.0
GO TO 30
999 RETURN
END
INTEGER FUNCTION INTFN(X, ERR)
C
C  FIND THE INTEGER EQUAL TO OR NEXT CLOSER TO ZERO THAN X.
C
C CHECKS TO SEE THAT   X  IS NOT TOO LARGE TO FIT IN AN
C  INTEGER VARIABLE.
C
REAL X
INTEGER ERR
C
COMMON /NUMBRS/ EPSI, MAXINT
REAL EPSI, MAXINT
C
IF( ABS(X) .LE. MAXINT) GO TO 10
C
C X   IS TOO LARGE IN MAGNITUDE TO FIT IN AN INTEGER,
C  RETURN THE LARGEST LEGAL INTEGER AND SET THE ERROR FLAG.
C
ERR = 3
INTFN = IFIX( SIGN(MAXINT, X) )
GO TO 999
C
10 INTFN = INT((1.0 + EPSI) * X)
999 RETURN
END


﻿- 105 -
INTEGEP FUNCTION FLOOR (Y)
REAL Y
C FIND FLOOR(Y), THE LARGEST INTEGER NOT EXCEEDING Y
C
FLOOR = INT(Y)
IF(Y .LT. 0.0 .AND. Y .NE. FLOAT(FLOOR)) FLOOR = FLOOR - 1
RETURN
END
REAL FUNCTION MEDIAN(Y, N)
C  FIND THE MEDIAN CF THE SORTED VALUES Y(1), ...i Y(N).
INTEGER N
REAL Y(N)
C  LOCAL VARIABLES
INTEGER MPTR, MPT2
C
MPTR = (N/2) + 1
MPT2 = N-MPTR+1
MEDIAN   (Y(MPTR) + Y(MPT2))/2.0
RETURN
END
REAL FUNCTION GAU(Z)
REAL Z
C THIS FUNCTION CALCULATES THE VALUE OF THE STANDARD
C  GAUSSIAN CUMULATIVE DISTRIBUTION FUNCTION AT  Z.
C THE ALGORITHM USES APPROXIMATIONS GIVEN BY STEPHEN E. DERENZO
C  IN MATHEMATICS OF COMPUTATION, V. 31 (1977), PP. 214-225
C
C  LOCAL VARIABLES
REAL P, PI, X
C
X = ABS(Z)
IF(X .GT. 5.5) GO TO 10
P = EXP(-((83.0 * X + 351.0) * X + 562.0) * X /
1   (7C3.0 + 165.0 * X))
GO TO 20
C
10 PI = 4.0 * ATAN(1.0)
P = SQRT(2.0/Fl) * EXP(-(X * X/2.0 +
1    0.94/tX * X)J) / X
C
C  THE APPROXIMATIONS YIELD VALUES OF THE HALF-NORMAL TAIL AREA.
C TRANSLATE THAT INTO THE VALUE OF THE GAUSSIAN C.D.F. AND
C  ALLOW FOR THE SIGN OF Z.
C
20 GAU - P/2.0
IF(Z .GT. 0.0) GAU = 1.0 - GAU
C
RETURN
ENO


﻿- 106
C.2 FORTRAN
We hardly need to explain our decision to provide programs in FORTRAN-
it is the most nearly universal of all scientific programming languages. We
cannot, however, pretend that developing these programs was a labor of love.
A reader who examines them carefully will find segments that are awkward or


﻿- 107 -
tedious because FORTRAN is ill-suited to the programming needs of modern
data analysis. For example, the output capabilities of FORTRAN are far too
rigid for the graphic and semi-graphic displays that are common in explor-
atory data analysis. On the whole, however, the advantages of making these
programs as widely available as possible outweighed the difficulties of
FORTRAN.
If programs are to be widely used, they must be portable. That is, it
must be possible to move them from one'computing environment to another
with an absolute minimum number of changes. Fortunately for us, others have
laid substantial groundwork in developing portable (or, strictly speaking,
semi-portable) FORTRAN programs. As a result, a number of practices that
facilitate portability are well-established, and computer software to support
the most valuable of them is available. In this part of the appendix we briefly
describe the practices we have followed and the role they have played in the
development of our programs.
Consistency of style is also important for any set of programs that are
intended to be used (and read) togthr. Thus we also describe the particular
conventions we have chosen to follow. These range from simple choices that
affect only the appearance of the printed programs to overall decisions that
affect the structure and interrelations among all the programs in this book.
Related to interconnections is the question of just how one might
customarily uase these programs. We briefly discuss and illustrate two
approaches to this.
And finally there are the utility routines, which perform a variety of
essential services for the data analysis routines presented in Chapters 1
through 9. Listings for the utility i6i. 4ppe-rAppendix B.
Portability
A fully portable program or subroutine can be moved gracefully from one
computing machine to another. And even though the computers are of
different manufacture and have different systems software, the program
compiles without errors, executes without errors, and produces identically the
same results on both. This is the ideal situation. Unfortunately, it can rarely be
attained in practice; but with reasonable effort a good approximation to it is
possible. The two primary obstacles to overcome are differences among
dialects of the FORTRAN language and differences in characteristics of the
arithmetic hardware. (One must also contend with variations in system
conventions, but these are generally less serious.)


﻿- 108 -
The solution to the problem of dialects is conceptually quite simple:
One uses only a subset of FORTRAN that is handled in the same way by
essentially all known systems. In practice it is all too easy to slip back
unknowingly into using some facility or construction which is acceptable in
one's own environment but unacceptable in certain others. To avoid this, we
have restricted our FORTRAN to a. particular subset known as PFORT. This
is an attractive solution because this subset of FORTRAN is supported by a
piece of software, the PFORT Verifier (Ryder 1974), that takes a
FORTRAN program as input and reports on all its departures from this
subset of the language. Especially valuable is the Verifier's ability to process a
main program and all associated subroutines and to identify potential difficul-
ties of communication among them, including misuse of COMMON.
When a particular construction is acceptable in many (but not all)
dialects of FORTRAN, it is tempting to use it-especially when it would
make the programs easier to understand-and then to announce, "The
programs conform to PFORT,.except for. . . ." For example, subscript expres-
sions of the form N + 1 - I are common (as in LVALS, MEDPOL, and RGCOMP),
but the strict FORTRAN definition of subscript expressions is too restrictive
to permit this form. We have decided to avoid such complications and adhere
to PFORT. Thus we can state that all the FORTRAN programs in this book
have been processed by t4he PFORT Verifier without any warning messages.
The problem of arithmetic hardware characteristics is somewhat more
difficult than the problem of language dialects. Fortunately, EDA techniques
generally involve much less numerical computation than one finds in most
mathematical software. In fact, our programs need only two machine-related
constants: an epsilon, whose role was described earlier, and the REAL value of
the largest valid integer. We have isolated these as the variables EPSI and
MAXINT in the COMMON block NUMBRS so that they can be set once at
initialization. The initialization subroutine, CINIT, takes care of this.
CINIT, which should be called before any of the other FORTRAN
routines in this 'book, also sets several other variables that may vary from
installation to installation or from run to run:
OUNIT   the FORTRAN unit number for output (often unit 6),
PMIN    the left margin in the output line,
PMAX    the right margin in the output line.
In CINIT, the corresponding subroutine arguments all begin with the letter I to
indicate that they are initialization values. CINIT performs several basic checks
on these and then completes the initialization process. In the course of a


﻿- 109 -
SUBROUTINE CINIT(IOUNIT, IPMIN, IrPMAX, IEPSI, IMAXIN, ERR)
C
INTEGER IOUNIT, IPMIN, IPMAX, IMAXIN, EPR
REAL IEPSI
C
C  INITIALIZATION, TO BE CALLED AT START OF ANY   MAIN PROGRAM
C WHICH CALLS ONE OF THE EDA SUBROUTINES (EITHER DIRECTLY OR
C  INDIRECTLY).
C
C  IOUNIT  IS THE NUMBER OF THE UNIT TO WHICH OUTPUT IS DIRECTED.
C  IPMIN   IS THE LEFT MARGIN.
C  IPMAX   IS THE RIGHT MARGIN.
C  IEPSI   IS THE MACHINE-RELATED EPSILON.
C  IMAXIN  IS THE MAXIMUM PERMITTED INTEGER VALUE
C
C  ERR IS THE (USUAL) ERROR FLAG, TO INDICATE WHETHER
C  THE ROUTINE EXECUTED SUCCESSFULLY.
C
COMMON /CHRBUF/ Po PMAX, PMIN, OUTPTR, MAXPTR, OUNIT
COMMON /NUMBRS/ EPSI, MAXINT
C
INTEGER P(130), PMAX, PMIN, OUTPTR, MAX;TRY OUNIT
REAL EPSI, MAXINT
C
C  LOCAL VARIABLES
C
INTEGER BLANK, I
DATA BLANK /1H /
C
C
ERR = 6
IF(IPMIN .LT. 1) GO TO 999
IF(IPMAX .GT. 130)'GO T4V 999
IF(IPMAX .LE. IPMIN) GO TO 999
ERR = 7
IF((1.0 + IEPSI) .LE. 1.0) GO TO 999
ERR = 0
OUNIT = IOUNIT
PMIN = IPMIN
OUTPTR = IPMIN
MAXPTR = IPMIN
PMAX = IPMAX
EPSI = IEPSI
MAXINT = FLOAT(IMAXIN)
C
DO 50 1 = 1, 130
P(I) = BLANK
50 CONTINUE
C
999 RETURN
END .


﻿- 110 -
sequence of analyses, using several of the programs in this book, a. user may
reset the initialization variables by again calling CINIT. Of course, this causes
the previous values of these variables to be lost, and it causes the output line to
be set to all blanks, but it has no other side effects.
Stream Output
FORTRAN requires that the programmer specify the contents and format of
a line of output, essentially when the program is written. (While it is possible
for a running program to read a format specification or to construct one, it is
extremely difficult to program this in a portable way.) Because EDA displays,
such as the boxplot, depend heavily on the data, we usually can be no more
specific about the output format than to say that a line will contain a number
of characters-some digits, some symbols, and some blank spaces. As the
program executes, it must determine the format for a line and the character
that occupies each position on the line. For example, stem-and-leaf displays
come in three different formats, and each requires different characters in
special positions on the line. Thus the program needs to build each output line
a few characters at a time.
This style of output-allowing the program to determine the format
and contents of the output line as it goes along-is known as stream output.
Because such output capabilities are not a part of the FORTRAN language,
we have written special subroutines to simulate (in a rudimentary but portable
way) the features that we need to produce our EDA displays. Often, we have
used standard FORTRAN output.
The ;mportant variables for our stream output subroutines reside in the
COMMON block CHRBUF. At the heart of our simple stream output is the array P,
in which we construct a line of output. Our initialization routine, CINIT, sets P to
all blanks. Any routine needing to construct an output line can do so by storing
characters (alphabetic, numeric, or special symbols) in P; this is usually done
with the subroutines PUTCHR and PUTNUM. When the line is complete, the
routine PRINT writes out the contents of P and resets P to blanks.
The routine PUTCHR places a character in P, either at the position
specified by the argument POSN or at the next available position (if POSN is
zero). PUTCHR keeps track of the last print position used and the rightmost
non-blank position in the line.
The routine PUTNUM places into P the characters for an integer, N. The
calling program must specify the width, W, of the field (number of characters)
where the number should appear, and its starting position on the line. PUTNUM


﻿- 1112 -
translates the integer into the appropriate sequence of numerals and uses
PUTCHR to place them in P. Applications of PUTNUM include placing the depth
counts and the stems on each line of a stem-and-leaf display.
Finally, the integer function WDTHOF receives an integer, I, and returns
the number of characters (including a minus sign if I is negative) required to
print it. We use this information in printing the depth counts and stems in a
stem-and-leaf display.
Conventions
To promote clarity of these programs and to preserve their portability, we have
followed several conventions. None of these has especially sweeping conse-
quences, but we list them here so that they will be clear to the reader and
user.
Input/Output. Our subroutines do no input. Reading of data is the responsi-
bility of the user, who is in the best position to deal with features of the input
process that may depend on the particular version of FORTRAN or on the
devices where data are stored. It is customary to isolate output operations so
that they do not appear in computational subroutines. We have done this
where appropriate; but, of course, it makes no sense when the EDA technique
is primarily a display (as in stem-and-leaf, boxplot, condensed plotting, and
coded tables).
Scratch Storage. When a technique uses temporary storage whose size
depends on the number of data values, our routines are structured so that the
user supplies this storage through the argument list. (PLOT, for example,
requires two work arrays of length N because it must sort-the data points into
order on y while preserving the (x,y) pairs.) In this way we avoid any built-in
restriction on the amount of data that can be handled, and we make it
straightforward to accommodate the storage limitations that the user's system
may impose.
Characters. When we must work with characters, we store them, one
character to the word, in INTEGER variables or arrays. This may waste a certain
amount of space, but it is strongly preferable to dealing with heavy depen-
dence on the number of characters that can be stored in a word on the user's
particular machine. It further avoids the arithmetic that would be required to
pack and unpack characters stored several to the word. The character set that


﻿- 112 -
we have used is the bare minimum FORTRAN character set: the 26 letters,
the 10 digits, the 9 symbols - + - * / ( ) , . and the blank space. This
facilitates portability, but it is not much to work with in building displays. In
BASIC we are able to assume the much larger ASCII character set, and the
advaniages are evident when one compares the BASIC and FORTRAN
versions of the displays.
Dimensioning in Subroutines. When a subroutine argument is an array, our
declaration for it uses its actual dimensions, as in "REAL Y(N), . . ." in STMNLF.
We have not used "dummy" dimensions, as in "REAL A(1)" seen in some
programs.
Errors. We attempt to detect a variety of errors that a user might make, and
we communicate information on them through the INTEGER variable ERR, which
appears as the last argument of many of the subroutines. If no error condition
exists, ERR has the value 0. Otherwise, a positive value identifies the error
condition. (These error numbers are defined in Exhibit C-1.)
Exhibit C-1 FORTRAN Program Error Codes
Code    Subroutine                           Meaning
1     SORT                N : 0; nothing to sort
2      PSORT              N: 0; nothing to sort
3      INTFN              X > MAXINT; argument passed is too large to be
"fixed" as an integer variable
4      PUTCHR             Illegal character code
5     PUTNUM              Number won't fit in space provided
6     CINIT               Violated 0 < IPMIN < IPMAX < 130 in setting page
margins
7      CINIT              EPSI too small; 1.0 + EPSI - 1.0
8      NPOSW              No room allowed for plot
9      NPOSW              HI < LOW
11     STMNLF             N   1I
12     STEMP              Bad internal value-bad nice numbers?
13     STMNLF             Page too narrow for display
21     LVALS               Violated 2.: N : 24576
22     LVPRNT              Violated 3 5 NLV s 15; too many letter values
23     LVPRNT              Page width < 64 positions, not enough room


﻿- 113 -
Exhibit C-1 (continued)
Code        Subroutine                          Meaning
3.1     BOXES               N < 1
41      PLOT                 N < 5   4 t
42      PLOT                Violated 5 5 LINES s 40
or 1 s CHRS s 10
44      PLOT                 XMIN > XMAX
45      PLOT                 YMIN > YMAX
-Errors 44 -nd 45 are possible if incorrect
plot bounds have been specified in the
subroutine call.
51      RLINE                N < 6
52      RLINE               No iterations specified
53      RLINE               All x-values equal; no line possible
54      RUNE                Split is too uneven for resistance
61      RSM                  N<7
62      RUNMED               Insufficient workspace room
63      RUNMED               Internal error-error in sort program?
This error can occur if a system sort utility is
substituted for the supplied SORT subroutine,
but used incorrectly.
71      CTBL                 Zero dimensions Tor table
72      CTBL                Too many columns to fit on page
81      MEDPOL or TWCVS      Zero dimensions for table
82      MEDPOL              No half-steps specified
83      MEDPOL               Illegal start parameters
85      MEDPOL'             Table is empty
88      TWCVS                Zero grand effect; can't compute comparison
values
- 91      RGCOMP               L : 2; too few bins
92      RGCOMP               One of the hinges falls in the left-open bin. or in
the right-open bin
93     . RGPRNT              Page too narrow for rootogram table
94      RGPRNT               Room for rootogram table but not for graphic
display


﻿- 114 -
Exits. Each of our subroutines has a single exit, the RETURN statement
immediately preceding the END statement. In most subroutines this RETURN
bears the statement number 999.
Output FORMAT statements. We place each FORMAT statement immedi-
ately after the first WRITE statement that uses it. For our programs, which do
not use the same FORMAT statement in many different and widely separated
WRITE statements and often rely on the stream output routines described
earlier, this leads to much better readability than if we grouped all FORMAT
statements at the end of the subroutine.
Declared Identifiers. We do not rely on "implicit typing" to determine
(according to its first letter) whether an identifier'is INTEGER or REAL. Instead,
we explicitly declare all the identifiers used in each subprogram, except for the
standard FORTRAN functions. We strongly endorse this practice, which a
few FORTRAN compilers support by issuing a warning message for any
undeclared identifier, because it aids greatly in eliminating misspelled names.
(The PFORT Verifier, for example, lists all the identifiers in each program
unit, so that such errors stand out.)
Indentation. We find that it is generally easier to follow the logic of a
program when statements within a DO loop or following an IF statement are
indented slightly, and we have used this device throughout our programs.
Reference
Isaacs, Gerald L. 1976. "BASIC REVISITED, An Update. to Interdialect Translat-
ability of the BASIC Programming Language." CONDUIT, The University of
Iowa, Iowa, city.
Ryder, B.G. 1974. "The PFORT Verifier." Software-Practice and Experience
4:359-377.
Programming Yes   Glance at Appendix P,
? and turn to Chapter 2.
No
Proceed


﻿- 115 -
APPENDIX D: SUGGESTIONS FOR FURTHER READING
Statistical Techniques
Blalock (1960) is a widely available general statistics text. In
particular, the chapter on hypothesis testing is quite good. Other good
general statistics texts include Snedecor and and Cochrane (1967) and the SPSS
manual (Nie et al., 1975). Draper and Smith (1966) is a good general text on
regression analysis which is widely available.
Probably the most widely available econometrics text is Theil
(1971). Because of its availability, it has been cited several times in this
paper. It is, however, a difficult book. A better introduction to
econometrics which may be available is Wonnacott and Wonnacott (1970), or
perhaps Maddala (1977). Mirer (1983) is an excellent introduction, but it is
a recent book and perhaps more difficult to find.
Housing Market Analysis
An excellent general introduction is Quigley (1979). This paper
summarizes recent economic analysis of housing markets, but focuses on
developed country data. Chapter 6 of Linn (1979) summarizes housing market
analysis with reference to developing countries in general. Mayo et al.
(1982) is a good case study of a particular housing market (Cairo).
Mayo (1981) reviews the housing demand literature in the U.S., and
highlights the permanent income and price elasticity issue. Mayo and Malpezzi
(1983), and Keare and Jimenez (1983) review housing demand in developing
countries.
Mayo and Malpezzi (1983) also review the effects of tenure on housing
demand. A good basic reference on mobility is Goodman (1978).
Hedonic models are discussed in detail in Malpezzi et al. (1981), and
a good example of their application to place-to-place price indexes is Follain


﻿- 116 -
and Ozanne (1979). An interesting application to developing country data
(Korea) is Follain et al. (1982).
Computational Techniques
Velleman and Hoaglin (1981) provide many useful computer programs
and good discussions of exploratory data analysis. The SPSS manual is another
obvious source of computational information. Sae-Hau (1982) is highly
recommended for advice about data preparation. Information about SAS can be
obtained by writing directly to the SAS Institute Inc., Box 8000, Cary, North
Carolina, 27511, U.S.A.


﻿- 117 -
APPENDIX E: DATA USED FOR SIMPLE EXAMPLES
This data is of two kinds. Some examples used actual survey data
from Cairo, Egypt. This data is documented in Mayo et al. (1982). Other data
were manufactured by computer in order to emphasize particular points. All
the manufactured data, and the Cario data used for the simple examples in Part
I, are reproduced in this appendix so that the reader can get a feel for what
these data look like, and can replicate the examples.


﻿- 118 -
DATA APPENDIX
Variable Definitions
1. Data used for Figure 5 and Tables 7 and 9.
RENT - household rent paid.
HHSIZE - household size (continuous).
HSIZESQ - household size, squared (continuous).
HH2 - 1 if 2 person household; 0 otherwise.
HH3 - 1 if 3 person household; 0 otherwise.
HH8 - 1 if 8 person household; 0 otherwise.
HHGE9 - household size if greater than or equal to 8; 0 otherwise.
2. Data used for Figure 4 and Table 4.
El - manufactured "error term," normally distributed with mean 0 and
variance 1.
E2 - manufactured "error term", normally distributed with mean 0 and
variance 3.
X - manufactured variable, uniformly distributed between 0 and 10.
Y1 - computed as follows:
Y1 = 2 + .75 x + El.
Y2 - computed as follows:
Y2 = 2 + .75 x + E2.
3. Cairo renter data.
HHSIZE - household size.
HSIZESQ - household size, squared.
TOTINC - total family income (not used).
OWN - 0 if renter (all zeros, not used).
MINCOME - monthly permanent income (consumption), in Egyptian pounds.
MGRENT - monthly gross rent (rent plus utilities).


﻿- 119 -
DIST - distance from central business district (not used).
LMINCOME - log of monthly income.
LMGRENT - log of monthly rent.
RY - rent-to-income ratio.
RESIDUAL - residual, or estimated error, from simple log demand equation
(Table 3, Figure 3).
PREDICT - predicted log rent from simple demand equation. Notice that
RESIDUAL = LMGRENT - PREDICT.
MINCOME - monthly permanent income (consumption), in Egyptian pounds.
MGRENT - monthly gross rent (rent plus utilities).
DIST - distance from central business district (not used).
LMINCOME - log of monthly income.
LMGRENT - log of monthly rent.
RY - rent-to-income ratio.
RESIDUAL - residual, or estimated error, from simple log demand equation
(Table 3, Figure 3).
PREDICT - predicted log rent from simple demand equation. Notice that
RESIDUAL = LMGRENT - PREDICT.


﻿DATA APPENDIX
CAIRO RENTER DATA. RESIDUALS, AND PREDICTED RENTS
FROM FIGURE 3 AND TABLES 1-3
OBS    HHSIZE   HSIZESQ    TOTINC   OWN    MINCOME     MGRENT   DIST     LMINCOME   LMGRENT      RY      RESIDUAL    PREDICT
1      5         25         O     O       68.50     2.200    28.50   4.22683     0.78846   0.03212    -1.3058    2.09424
2       6        36         83     0       30.06      1.470   28.50    3,40320    0.38526    0.04890   -1.3248     1.71011
3       4         16       376     0      117.60      1.750   28.50    4.76729    0.55962    0.01488   -1.8130     2.37263
4       7        49         58     0       58.66      4.993   21.97    4.07176    1.60800    0.08511   -0.3590     1.96704
5       4         16        55     0       45.40      4.516   21.97    3.81551    1.50767    0.09948   -0.4646     1.97226
6       6        36         35     0       49.17      2.520   21.97    3.89528    0.92426    0.05125   -0.9928     1.91710
7       5        25         65     0       69.99      6.140   21.97    4.24835    1.81482    0.08773   -0.2885     2.10329
8       7        49         65     0       87.30      3.300   21.97    4.46935    1.19392    0.03780   -0.9404     2.13429
9       7        49         85     0       87.50      4.500   21.97    4.47164    1.50408    0.05143   -0.6312     2.13525
10       5        25        187     0       95.00     13.250   21.97   4.55388     2.58400   0.13947     0.3522     2.23181
11      4         16        106     0       99.80     15.310   21.97   4.60317     2.72851   0.15341     0.4249    2.30359
12      5         25        220     0      150.00     5.400     7.72   5.01064     1.68640   0.03600    -0.7376    2.42395
13       2         4         80     0       90.00    27.227     7.72   4.49981     3.30423   0.30253     0.9019     2.40231
14       3         9        270     0      300.00    21.500     7.72    5.70378    3.06805   0.07167     0.2371     2.83098
15       3         9        250     0      150.00    54.160     7.72    5.01064    3.99194   0.36107     1.4525     2.53940
16      6         36         91     0      165.95     10.006    7.72    5.11169    2.30318   0.06030    -0.1256     2.42879
17       6        36         55     0       37.60     7.100     7.72    3.62700    1.96009   0.18883     0.1558     1.80425
18      3          9         94     0       95.00     11.500    7.72   4.55388     2.44235   0.12105     0.0951    2.34727
19      4         16        410     0      124.50     13.000    7.72    4.82431    2.56495   0.10442     0.1683     2.39661
20       3         9         40     0       39.85      4.850    7.72    3.68512    1.57898    0.12171   -0.4028     1.98182
21       7        49         75     0       78.75      5.750    7.72    4.36628    1.74920    0.07302   -0.3417     2.09093
22       3         9         50     0       45.85      7.200    7.72    3.82538    1.97408    0.15703   -0.0667     2.04082
23      12       144        210     0      150.30     40.800    4.01    5.01263    3.70868    0.27146    1.2668     2.44191
24      12       144        100     0      106.00      8.000    4.01    4.66344    2.07944    0.07547   -0.2156     2.29502   F
25       1         1         10     0       47.00     4.650     4.01    3.85015    1.53687    0.09894   -0.6833     2.22018   C
26       4        16        100     0       70.00      6.000    4.01    4.24850    1.79176    0.08571   -0.3626     2.15139   I
27       2         4         35     0       35.00     6.150     4.01    3.55535    1.81645    0.17571   -0.1886     2.00502
28       7        49         90     0       69.75      7.750    4.01    4.24492    2.04769    0.11111    0.0078     2.03988
29       3         9         70     0      123.00     8.250     4.01    4.81218    2.11021    0.06707   -0.3457     2.45593
30       6        36         48     0       58.00      7.500    4.01    4.06044    2.01490    0.12931    0.0283     1.98658
31       7        49         66     0      325.00      6.100    4.01    5.78383   .1.80829    0.01877   -0.8789     2.68723
32       5        25         60     0       80.00     13.000    4.01    4.38203    2.56495    0.16250    0.4054     2.15952
33       9        81         80     0        1.20      2.650    3.86    0.18232    0.97456    2.20833    0.6521     0.32245
34       5        25         40     0       40.00     28.630    3.86    3.68888    3.3544G    0.71575    1.4865     1.86795
35       3         9         26     0       35.00     29.070    3.86    3.55535    3.36971    0.83057    1.4425     1.92723
36       8        64         85     0      267.00      9.473    1.48    5.58725    2.24845    0.03548   -0.3452     2.59361
37       5        25        170     0      180.00     21.130    1.48    5.19296    3.05069    0.11739    0.5500     2.50065
38       4         16      1000     0      400.00     37.640    0.60    5.99146    3.62807    0.09410    0.7405     2.88758
39       3         9        232     0      450.00     38.440    0.60    6.10925    3.64910    0.08542    0.6476     3.00154
40       3         9        180     0      200.00     27.000    0.60    5.29832    3.29584    0.13500    0.6354     2.66042
41       4        16         50     0      500.00     24.000    0.60    6.21461    3.17805    0.04800    0.1966     2.98145
42       4         16       620     0      600.00    156.000    0.60    6.39693    5.04986    0.26000    1.9917     3.05814
43       4        16        250     0      150.00     20.750    0.60    5.01064    3.03255    0.13833    0.5576     2.47499
44      10       100        163     0      163.00      7.950    1.78    5.09375    2.07317    0.04877   -0.3311     2.40428
45       3         9         35     0       50.00     14.080    1.78    3.91202    2.64476    0.28160    0.5675     2.07727
46       4         16        25     0       20.00      5.500    1.78    2.99573    1.70475    0.27500    0.0773     1.627,14
47       4        16         25     0       40.00      1.770    1.78    3.68888    0.57098    0.04425   -1.3480     1.91899
48       7        49        100     0      300.00      5.850    1.54    5.70378    1.76644    0.01950   -0.8871     2.65356
49       4         16        60     0      120.00      4.650    1.54    4.78749    1.53687    0.03875   --0.8443    2.38113
50       6        36         65     0       85.15     32.500    1.54    4.44441    3.48124    0.38168    1.3331     2.14810
51       4        16         54     0       55.00      7.200    1.54    4.00733    1.97408    0.13091   -0.0789     2.05295
52       6        36         40     0       40.00     26.200    1.54    3.68888    3.26576    0.65500    1.4355     1.83028
53      10       100        100     0      271.50      7.750    1.54    5.60396    2.04769    0.02855   -0.5712     2.61891
54       8        64        120     0      150.50    109.200    1.54    5.01396    4.69318    0.72558    2.3407     2.35246


﻿DATA APPENDIX
CAIRO RENTER DATA, RESIDUALS, AND PREDICTED RENTS
FROM FIGURE 3 AND TABLES 1-3
OBS    HHSIZE    HSIZESQ   TOTINC    OWN    MINCOME   MGRENT     DIST   LMINCOME    LMGRENT      RY       RESIDUAL    PREDICI
55       3         9       210       0     148.00    72.6500    1.34   4.99721     4.28565   0.490878      1.7519    2.53376
56       5        25        90       0      99.50     7.8745    1.34   4.60016     2.06363   0.079141     -0.1877    2.25128
57       4         16       85       0      80.00     6.7500    1.34   4.38203     1.90954   0.084375     -0.3010    2.21057
58       5        25       100       0     100.00    10.2500    1.34   4.60517     2.32728   0.102500      0.0739    2.25339
59       9        81        60       0     102.25     7.4000    2.38   4.62742     2.00148   0.072372     -0.1908    2.19230
60       4         16       GO       0     150.00     3.7666    2.38   5.01064     1.32617    0.025111    -1.1488    2.47499
61       2         4        70       0      60.00     5.7500    2.38   4.09434     1.74920    0.095833    -0.4825    2.23175
62      11        121      110       0     188.00     6.1500    2.38   5.23644     1.81645    0.032713    -0.6770    2.49350
63       2         4        29       0      30.00     3.1000    2.38   3.40120     1.13140   0.103333     -0.8088    1.94017
64       2         4        50       0      55.50     4.5000    2.38   4.01638     1.50408   0.081081     -0.6949    2.19895
65       9        81       165       0     152.20     8.3166    3.12   5.02520     2.11825    0.054643    -0.2414    2.35963
66       3         9       195       0     100.00     8.3000    3.12   4.60517     2.11626    0.083000    -0.2526    2.36884
67       3         9        50       0      40.50    13,2500    3.12   3.70130     2.58400    0.327160     0.5954    1.98863
68       2         4       100       0      90.00     4.0500    3.12   4.49981     1.39872    0.045000    -1.0036    2.40231
69       5         25       58       0      47.36     5.2100    3.12   3.85778     1.65058    0.110008    -0.2884    1.93900
70       4         16       10       0      23.75     1.8500    2.38   3.16758     0.61519    0.077895    -1.0845    1.69970
71       4         16       90       0      94.85     5.3000    2.38   4.55230     1.66771   0.055878     -0.6145    2.28219
72       5        25       118       0     112.50     7.8000    2.38   4.72295     2.05412   0.OG9333     -0.2488    2.30294
73       8        64        30       0      24.00     7.0000    2.38   3.17805     1.94591    0.291667     0.3657    1.58017
74       7        49       140       0     134.10    10.6000    2.38   4.89859     2.36085    0.079045     0.0460    2.31485
75       6        36       108       0     278.35     7.6500    2.38   5.62888     2.03471    0.027483    -0.6116    2.64635
76       6        36       120       0     120.00     7.1500    3.27   4.78749     1.96711   0.059583     -0.3253    2,29242
77       3         9        32       0      40.00     2.6000    5.34   3.68888     0.95551    0.065000    -1.0279    1.98340
78       3         9        54       0     40.00      3.1000    5.34   3.68888     1.13140   0.077500     -0.8520    1.98340
79       5        25        90       0      33.35     8.1000    5.34   3.50706     2.09186    0.242879     0.3004    1.79146     4
80       5        25       195       0     138.00     7.1500    5.34   4.92725     1.96711    0.051812    -0.4218    2.38888
81       8        64       200       0     195.86    13.1100    5.34   5.27740     2.57338    0.066936     0.1101    2.46327
82       7        49        90       0      90.00     8.7500    5.34   4.49981     2.16905    0.097222     0.0220    2.14710
83       5         25      100       0     100.00     8.9000    5.34   4.60517     2.18605    0.089000    -0.0673    2.25339
84       4         16       85       0      65.00     4.1000    5.34   4.17439     1.41099    0.063077    -0.7122    2.12322
85       2         4        56       0      50.00     2.3000    2.97    3.91202    0.83291    0.046000    -1.3221    2.15505
86       3         9        50       0      44.50     2.7500    2.97    3.79549    1.01160    0.061798    -1.0166    2.02825
87       4         16       90       0      64.00    12.5000    2.97   4.15888     2.52573    0.195313     0.4090    2.11670
88       6         36      125       0     125.00     6.2500    2.97   4.82831     1.83258    0.050000    -0.4770    2.30959
89       2         4        40       0      40.00     4.2500    2.97   3.68888     1.44692    0.106250    -0.6143    2.06119
90       5         25       35       0      60.00     8.6500    2.97   4.09434     2.15756    0.144167     0.1190    2.03851
91       5         25      200       0     200.00    10.2500    2.97    5.29832    2.32728    0.051250    -0.2177    2.54497
92       7        49       150       0     150.00     6.4000    2.97    5.01064    1.85630    0.042667    -0.5057    2.36198
93       7        49       140       0     102.00     7.1500    2.97   4.62497     1.96711    0.070098    -0.232G    2.19975
94       4         16      130       0     119.00     7.6500    2.97   4.77912     2.03471    0.064286    -0.3429    2.37761
95       7        49        75       0      75.00     5.7500    2.97   4.31749     1.74920    0.076667    -0.3212    2.07041
96      10       100       110       0     136.00     7.7500    2.97    4.91265    2.04769    0.056985    -0.2804    2.32810
97       9         81       55       0      55.00     9.0000    2.97   4.00733     2.19722    0.163636     0.2658    1.9311G
98       7        49        80       0      80.00    44.4000    2.97    4.38203    3.79324    0.555000     1.6957    2.09756
99      10        100       48       0      49.00     3.1000    5.20    3.89182    1.13140    0.OG3265    -0.7673    1.898cl
100       7        49       100       0     133.50     7.4500    5.20   4.89410     2.00821   0.055805     -0.3047    2.31296
101       3         9       150       0     125.00    12.6500    5.20   4.82831     2.53766   0.101200      0.0749    2.46271
102       5        25       250       0     154.00     7.1500    5.20   5.03695     1.96711   0.046429     -0.4679    2.43502
103       5        25       130       0     90.00     11.2500    5.20   4.49981     2.42037   0.126000      0.2113    2.20907
104      8         64        80       0     81.50     11.7500    5.20   4.40060     2.46385   0.144172      0.3694    2.09445
105       6        36        50       0     50.00      5.3100    5.20   3.91202     1.66959   0.106200     -0.2546    1.92415
106       5        25       200       0    200.00     15.5000    5.20   5.29832     2.74084   0.077500      0.1959    2.54497
107       4        16       150       0     130.00    13.0000    5.20   4.86753     2.56495   0.100000      0.1502    2.41480
108       3         9        60       0     53.00      5.5000    5.20   3.97029     1.70475   0.103774     -0.3970    2.10178


﻿DATA APPENDIX
CAIRO RENTER DATA, RESIDUALS, AND PREDICTED RENTS
FROM FIGURE 3 AND TABLES 1-3
OBS    HHSIZE    HSIZESQ   TOTINC    OWN    MINCOME    MGRENT     DIST    LMINCOME   LMGRENT       RY      RESIDUAL    PREDICT
109      5        25         43       0      68.50      8.400     5.20   4.22683     2.12823   0.122628      0.0340    2.09424
110      4         16        85       0      67.00      6.000     5.20   4.20469     1.79176   0.089552     -0.3442    2.13597
111      7        49        100       0      96.00      8.200     5.20   4.56435     2.10413   0.085417     -0.0701    2.17425
112      8        64         70       0      70.00     11.000     5.20   4.24850     2.39790   0.157143      0.3674    2.03046
113      6        36         75       0       9.25      6.250     5.20   2.22462     1.83258   0.675676      0.6182    1.21433
114      6        36         85       0      60.00      9.150     5.20   4.09434     2.21375   0.152500      0.2129    2.00084
115      7        49         40       0      56.50      8.249     3.27   4.03424     2.11009   0.146000      0.1588    1.9512G
116      5        25         43       0      52.50     10.500     3.27   3.96081     2.35138   0.200000      0.3690    1.98234
117      5        25         70       0      104.40     5.000     3.27   4.64823     1.60944   0.047893     -0.6621    2.27150
118      8        64         50       0      61.15      8.500     3.27   4.11333     2.14007   0.139002      0.1665    1.97360
119      3         9         70       0     116.50      9.500     3.27   4.75789     2.25129   0.081545     -0.1818    2.43309
120      4         16        70       0      91.00     10.000     3.27   4.51086     2.30259   0.109890      0.0378    2.26476
121      5        25         70       0      70.00      7.500     3.27   4.24850     2.01490   0.107143     -0.0885    2.10335
122      5        25         65       0      GO.00      5.000     3.27   4.09434     1.60944   0.083333     -0.4291    2.03851
123      4         16       107       0     630.00      3.000     3.27   6.44572     1.09861   0.004762     -1.9801    3.07867
124      7        49         95       0      80.75      4.750     3.27   4.39136     1.55814   0.058824     -0.5433    2.10148
125      6        36         90       0      120.00     4.000     3.27   4.78749     1.38629   0.033333     -0-9061    2.29242
126      8        64        150       0      150.00     4.750     3.27   5.01064     1.55814   0.031667     -0.7929    2.35106
127      7        49         70       0      40.00      4.750     3.27   3.68888     1.55814   0.118750     -0.2478    1.80598
128      5        25          7       0      70.00      4.500     3.27   4.24850     1.50408   0.064286     -0.5993    2.10335
129      5        25         50       0      70.00      3.850     3.27   4.24850     1.34807   0.055000     -0.7553    2.10335
130      8        64         35       0      32.50      6.900     3.27   3.48124     1.93152   0.212308      0.2238    1.70771
131      7        49        130       0      128.00     7.150     3.27   4.85203     1.96711   0.055859     -0.3282    2.29527
132      6        36         85       0      87.35      6.100     3.27   4.46992     1.80829   0.069834     -0.3505    2.15883
133      8        64        240       0      120.00     7.550     3.27   4.78749     2.02155   0.062917     -0.2356    2.25719   r
134      6        36         90       0      90.00      2.450     3.27   4.49981     0.89609   0.027222     -1.2753    2.17140
135      4         16        85       0      GO.00      4.900     5.34   4.09434     1.58921   0.081667    -0.5003    2.08955
136      7        49        130       0      100.00     4.270     5.34   4.60517     1.45161   0.042700     -0.7398    2.19142
137      5        25         45       0      44.00      9.300     5.34   3.78419     2.23001   0.211364      0.3220    1.90804
138      3         9         80       0      75.00      8.550     5.34   4.31749     2.14593   0.114000     -0.1019    2.24783
139      6        36         90       0      125.00     7.750     5.34   4.82831     2.04769   0.062000     -0.2619    2.30959
140      3         9         75       0      80.00     10.750     5.34   4.38203     2.37491   0.134375      0.0999    2.27498
141      7        49         65       0      65.00     23.648     5.05   4.17439     3.16330   0.363822      1.1531    2.01021
142      5        25         60       0      60.00      3.600     5.05   4.09434     1.28093   0.060000     -0.7576    2.03851
143      4         16       115       0      100.00    16.520     5.05   4.60517     2.80457   0.165200      0.5001    2.30443
144      4         16        85       0      100.00    15.800     5.05   4.60517     2.76001   0.158000      0.4556    2.30443
145      5        25        200       0     400.00     20.300     5.05   5.99146     3.01062   0.050750      0.1741    2.83654
146      4         16       302       0     254.56     20.976     (.23   5.53954     3.04338   0.082401      0.3459    2.69748
147      5        25         40       0      40.20     20.550     6.23   3.69387     3.02286   0.511194      1.1528    1.87005
148      7        49        150       0      150.00    21.750     6.23   5.01064     3.07961   0.145000      0.7176    2.36198
149      5        25        360       0      147.00    18.881     6.23   4.99043     2.93816   0.128442      0.5227    2.41545
150      4         16       380       0     250.00     19.010     6.23   5.52146     2.94497   0.076040      0.2551    2.68987
151      5        25        200       0     500.00    224.100    11.58   6 21461     5.41209   0.448200      2.4817    2.93041
152      7        49        100       0      148.50    20.240    11.58   5.00058     3.00766   0.136296      0.6499    2.35776
153      3         9         95       0      140.50    18.000    11.58   4.94521     2.89037   0.128114      0.3785    2.51188
154      1         1        680       0      162.00    28.250    11.58   5.08760     3.34109   0.174383      0.6004    2.74072
155      4        16        500       0     293.20     44.800    11.58   5.68085     3.80221   0.152797      1.0453    2.75692
156      6        36        240       0     263.86     20.660    11.58   5.57542     3.02820   0.078299      0.4043    2.62386
157      5        25        998       0     352.75     21.000    11.58   5.86576     3.04452   0.059532      0.2609    2.78366
158      3         9        185       0      165.80    27.650    11.58   5.11078     3.31963   0.166767      0.7381    2.58153
159      4         16       160       0      124.00     7.500     8.91   4.82028     2.01490   0.060484     -0.3800    2.39492
160      7        49        300       0      133.50    11.250     8.91   4.89410     2.42037   0.084270      0.iO74    2.31296
161      8        64         50       0      40.00      7.750     8.91   3.68888     2.04769   0.193750      0.2526    1.79506
162      9        81         38       0      40.00     11.128     8.91   3.68888     2.40946   0.278200      0.6120    1.79750


﻿DATA APPENDIX
CAIRO RENTER DATA, RESIDUALS, AND PREDICTED RENTS
FROM FIGURE 3 AND TABLES 1-3
OBS    HHSIZE    HSIZESQ    TOTINC    OWN    MINCOME    MGRENT     DIST    LMINCOME    LMGRENT      RY       RESIDUAL   PREDICT
163      7         49        157      0     120.00      7.2998     8.91    4.78749     1.98785   0.06083    -0.28027    2.26812
164      4         16         80      0      118.50     8.0000     8.91    4.77491     2.07944   0.06751    -0.29639    2.37583
165      5         25        150      0      150.00     9.5000     8.91    5.01064     2.25129   0.06333    -0.17266    2.42395
166      7         49        40       0      80.50      3.7000     8.31    4.38826     1.30833   0.04596    -0.79184    2.10018
167      6         36        200      0     209.50     20.6235     8.31    5.34472     3.02643   0.09844     0.49961    2.52682
168      4         16        IGO      0     100.00      8,2500     8.31    4.60517     2.11021   0.08250    -0.19422    2.30443
169      4         16        60       0      74.50      9.1500     8.31    4.31080     2.21375   0.12282     0.03315    2.180GO
170      7         49        120      0      100.13    11.3320     8.31    4.60647     2.42763   0.11317     0.23566    2.19197
171      8         64        130      0      116.50    21.3000     8.31    4.75789     3.05871   0.18283     0.81397    2.24474
172      8         64        120      0      98.14      7.3900     8.31    4.58640     2.00013   0.07530    -0.17247    2.17260
173      5         25        125      0      86.50      6.7500     8.31    4.46014     1.90954   0.07803    -0.28284    2 19239
174      5         25        100      0     100.00      5.3000     8.31    4.60517     1.66771   0.05300    -0.58568    2.25339
175      4         16        200      0      79.00     11.2500     8.31    4.36945     2.42037   0.14241     0.21509    2.20527
176      7         49        60       0      100.00    26.9500    11.28    4.60517     3.29398   0.26950     1.10256    2.19142
177      2          4        440      0      GO.00     30.5900    11.28    4.09434     3.42067   0.50983      1.18892   2.23175
178      2          4        GO       0      60.00     14.1600    11.28    4.09434     2.65042   0.23600     0.41867    2.23175
179      2          4        90       0      72.95      6.4500    11.28    4.28977     1.86408   0.08842    -0.44988    2.31396
180      3          9        69       0      83.00      8.7500    11.28    4.41884     2.16905   0.10542    -0.12141    2.29046
181      6         36        GO       0      59.50      4.7500    11.28    4.08598     1.55814   0,07983    -0.43918    1.99732
182      2          4        105      0      85.15      13.9900   11.28    4.44441     2.63834   0.16430     0.25934    2.37901
183      4         16         80      0      95.00     10.2500    11.28    4.55388     2.32728   0.10789     0.04442    2.28285
184      6         36         80      0      84.50     13.5000    11.28    4.43675     2.60269   0.15976     0.45781    2.14487
185      5         25        40       0      70.00     12.1500    11.28    4.24850     2.49733   0.17357     0.39398    2.10335
186      9         81        120      0     240.60      9.8500    11.28    5.48314     2.28747   0.04094    -0.26479    2.55226
187      4         16        195      0     200.00      9.2500    11.28    5.29832     2.22462   0.04625    -0.37138    2.59601
188      3          9         90      0      130.00     8.6000    11.28    4.86753     2.15176   0.06615    -0.32745    2.47921
189      5         25        105      0        1.52    11.9000    11.28    0.41871     2.47654   7,82895      1.98420   0.49234
190      4         16        150      0     122.00     27.0400    11.28    4.80402     3.29732   0.22164     0.90924    2.38808
191      2          4         60      0      60.00     18.2000    11.28    4.09434     2.90142   0.30333     0.66967    2.23175
192      2          4         70      0      70.00     18.1500    11.28    4.24850     2.89867   0.25929     0.60208    2.29659
193      6         36         45      0      45.00      5.5000    11.28    3.80666     1.70475   0.12222    -0.17508    1.87983
194      3          9         50      0      25.00     12.7900    11.28    3.21888     2.54866   0.51160     0.76297     1.78569
195      3          9         70      0      59.84     20.0700    11.28    4.09167     2.99923   0.33539     0.84G39    2.15284
196      5         25        100      0      123.00    44.8000    11.28    4.81218     3.80221   0.36423      1.46174   2.34047
197      2          4        310      0      118.65    19.8000    11.28    4.77618     2.98568   0.16688     0.46712    2.51856
198      4         16         35      0      35.00      6.1000     9.50    3.55535     1.80829   0.17429    -0.05453    1.86282
199      4         16         50      0      50.30      7.0000     9.50    3.91801     1.94591   0.13917    -0.06946    2.01537
200      6         36        310       0      70.00     9.0000     9.50    4.24850     2.19722    0.12857    0.13154    2.06568
201      3          9         60       0      58.20     9.2000     9.50    4.06389     2.21920    0.15808    0.07805    2.14115
202      6         36         60       0      79.50     6.0000     9.50    4.37576     1.79176    0.07547    -0.32746   2.11922
203      7         49        120       0     120.00     7.0000     9.50    4.78749     1.94591    0.05833    -0.32221   2.26812
204      5         25         65       0      72.15     14.0000    9.50    4.27875     2.63906    0.19404    0.52298    2.11608
205      5         25        120       0     96.35      15.9782    9.50    4.56799     2.77123    0.16583     0.53348   2.23775
206      2          4         81       0      80.10     16.0000    9.50    4.38328     2.77259    0.19975    0.41930    2.35329
207      6         36         80       0      87.00     9.0000     9.50    4.46591     2.19722    0.10345    0.04008    2.15714
208      8         64         95       0      74.00     11.6640    9.50    4.30407     2.45651    0.15762    0.40267    2.05384
209      5         25         54       0     44.80      8.7500     9.50    3.80221     2.16905    0.19531     0.25343    1.91562
210      5         25         18       0      58.75     4.2500    15.44    4.07329     1.44692    0.07234    -0.58273   2.02965
211      5         25         60       0      56.20     9.0000    15.44    4.02892     2.19722    0.16014    0.18624    2.01099
212      3          9         75       0      38.50     4.0000     9.20    3.65066     1.38629    0.10390    -0.58103    1.96733
213      6         36        150       0     200.06     9.0000     6.83    5.29862     2.19722    0.04499    -0.31020   2.50742
214      4         16         40       0      75.60     7.2500     6.83    4.32546     1.98100    0.09590    -0.20577   2.18677
215      5         25         50       0      82.20     15.2000    6.83    4.40916     2.72130    0.18491     0.55036   2.17094
216      5         25         50       0     132.70     11.7826    6.83    4.88809     2.46662    0.08879     0.09422   2.37240


﻿DATA APPENDIX
CAIRO RENTER DATA, RESIDUALS, AND PREDICTED RENTS
FROM FIGURE 3 AND TABLES 1-3
OBS    HHSIZE    HSIZESQ   TOTINC    OWN    MINCOME    MGRENT     DIST    LMINCOME    LMGRENT      RY      RESIDUAL    PREDICT
217      7         49       200       0       94.00    19.0000    6.83    4.54329     2.94444    0.20213     0.7790    2.16539
218      4         16        60       0      797.00    29.9500    6.83    6.68085     3.39953    0.03758     0.2220    3.17758
219      6         36        45       0       56.25    56.5000    6.83    4.02981     4.03424    1.00444     2.0605    1.97369
220      4         lb        80       0       84.00     7.0000    6.83    4.43082     1.94591    0.08333    -0.2852    2.23109
221      5         25        60       0      108.20     4.7000    6.83    4.68398     1.5475G    0.04344    -0.7390    2.28654
222      3          9        100      0       58.50     7.7000    6.83    4.06903     2.04122    0.13162    -0.1021    2.14331
223      3          9        60       0       54.00     6.7500    6.83    3.98898     1.90954    0.12500    -0.2001    2.10964
224      3          9        54       0       49.00     4.0000    6.83    3.89182     1.38629    0.08163    -0.6825    2.06877
225      4         16        160      0       51.00     8.8300    6.83    3.93183     2.17816    0.17314     0.1570    2.02119
226      4         16        65       0      101.25     8.2500    6.83    4.61759     2.11021    0.08148    -0.1994    2.30966
227      8         64        58       0       79.00     6.2000    6.83    4.36945     1.82455    0.07848    -0.25G8    2.08134
228      3          9        77       0       70.00     6.6500    5.34    4.24850     1.89462    0.09500     -0.3242   2.21881
229      4         16        60       0       53.60     3.6000    5.34    3.98155     1.28093    0.06716    -0.7612    2.04210
230      6         36        70       0      121.50     8.7500    5.34    4.79991     2.16905    0.07202    -0.1286    2.29764
231      3          9       850       0      400.35    34.2500    3.41    5.99234     3.53369    0.08555     0.5813    2.95236
232      2          4        50       0      250.00    25.0000    3.41    5.52146     3.21888    0.10000     0.3868    2.83207
233      3          9       560       0      300.00    21.8150    3.41    5.70378     3.08260    0.07272     0.2516    2.83098
234      1          1        100      0       75.25    12.5000    2.67    4.32082     2.52573    0.16611     0.1076    2.41817
235      9         81        65       0      123.63    11.3866    2.67    4.81729     2.43244    0.09210     0.1603    2.27217
236      2          4        70       0       71.35     6.0000    2.67    4.26760     1.79176    0.08409     -0.5129   2.30-163
237      6         36        30       0       71.00     6.5000    2.67    4.26268     1.87180    0.09155    -0-1998    2.07165
238      1          1        33       0       30.00     4.1000    4.60    3.40120     1.41099    0.13667     -0.6203   2.03133
239      5         25        80       0       74.70     7.7500    4.GO    4.31348     2.04769    0.10375    -0.0830    2.13069
240      6         36        56       0       63.00    11.4500    4.60    4.14313     2.43799    0.18175     0.4166    2.02136
241      7         49        100      0       64.50    11.3100    4.60    4.16667     2.42569    0.17535     0.4187    2.00696
242      5         25         10      0       30.00     4.0000    4.60    3.40120     1.38629    0.13333    -0.3606    1.74693
243      7         49        59       0       90.00     5.6200    4.60    4.49981     1.72633    0.06244    -0.4208    2.14710
244      5         25        100      0      107.00    16.0000   20.78    4.67283     2.77259    0.14953     0.4907    2.28185
245      7         49        63       0      232.75     1.7500   20.78    5.44996     0.55962    0.00752    -1.9872    2.5469
246      8         64        70       0       89.00     5.0000   20.78    4.48864     1.60944    0.05618    -0.5220    2.13148


﻿DATA APPENDIX
MANUFACTURED DATA USED FOR FIGURE 4 AND TABLE 4
OBS         YI         Y2      X         El         E2
1     G.2572     4.7017   7.30486    -1.2214    -2.7769
2     3.7961     1.2889    2.76033   -0.2742    -2.7814
3     3.9352     7.1287    2.92823   -0.2609     2.9325
4     6.7654     2.7448    4.78812     1.1743   -2.8463
5     5.7980     6.8462    3.95260    0.8336     1.8818
6     2.9036    -1.8579    1.41929   -0.1608    -4.9224
7     4.8417     3.3707    3.94397   -0.1163    -1.5873
8     5.2678     3.8073    6.38352   -1.5199    -2.9803
9     7.0222     8.0525    7.74733   -0.7883     0.2420
10    11.0499    9.4522    9.42398     1.9819     0.3842
11     5.9499    6.1467    8.91342    -2.7352    -2.5384
12     7.2163    4.8124    7.81591    -0.6456    -3.0495
13     4.1508    5.0737    2.04735     0.6153     1.5382
14     9.1109    9.2515    9.83824    -0.2677    -0.1272
15     0.9573    -2.6896    1.32915   -2.0395    -5.6864
16     9.0866    10.3918   8.94844     0.3753     1.6805
17     5.9646    6.8467    6.45608    -0.8775     0.0046
18     8.2261    -0.1643   7.41044     0.6683    -7.7222
19     6.4915    5.6458    7.25698    -0.9512    -1.7969
20     7.2152     0.0384    8.12568   -0.8791    -8.0559
21     7.9828     7.0470    8.22832   -0.1885    -1.1243
22     2.9310     4.8966    3.41728   -1.6320     0.3336
23     5.7897     5.5461    4.24225    0.6080     0.3644
24     8.3074     8.6137    9.51738   -0.8306    -0.5244                                     I
25   . 8.2678     3.8932    8.67647   -0.2396    -4.6141
26     3.5404     4.9051    5.50754   -2.5903    -1.2255
27     6.6281     6.9439    5.21601    0.7161     1.0319
28     7.1956     8.1062    5.46228     1.0988    2.0095
29     5.4800     1.2020    4.57682    0.0473    -4.2306
30     2.2808     5.2443    2.54581   -1.6285     1.3349
31     8.0023     7.9911    7.40131    0.4513     0.4401
32     5.3462     6.1389    3.81178    0.4874     1.2800
33     5.1822    11.6926    4.57574   -0.2496     6.2607
34     6.1736    -2.9086    4.45625    0.8314    -8.2508
35     8.5563     5.4878   6.21945      1.8918   -1.1768
36     4.1185     4.2269   0.27373      1.9132    2.0216
37     1.3947     5.5413   0.59601    -1.0523     3.0942
38     5.6674     3.0515    7.06635   -1.6324    -4.2183
39     6.2627     7.5628    4.19621     1.1155    2.4157
40     5.2715     3.2858    5.75619   -1.0457    -3.0313
41     6.8996     9.6865    4.24299     1.7174    4.5043
42     3.0212     6.5399    1.95067   -0.4418     3.0769
43     3.8518     8.1030    4.86813   -1.7993     2.4519
44     8.7223    11.3889    8.71114    0.1890     2.8556
45     8.2958    18.8834    8.21094    0.1376    10.7252
46     1.9373    -2.5850    1.27457   -1.0186    -5.5409
47     3.4920     2.4327    1.68873    0.2254    -0.8338
48     4.0539    -1.7246    2.47829    0.1952    -5.5833
49     3.3034    -0.9916    2.68922   -0.7135    -5.0085
50     7.9437     9.3611    7.63800    0.2152     1.6326
51     2.7190    -0.7560    1.93477   -0.7321    -4.2070
52     8.2728    11.1079    7.62877    0.5512     3.3863
53     8.3185     6.6138    6.76854     1.2421   -0.4626
54     8.5273     5.0214    8.91924   -0.1621    -3.6681
55     6.6156     4.7760    5.61250    0.4062    -1.4334


﻿DATA APPENDIX
MANUFACTURED DATA USED FOR FIGURE 4 AND TABLE 4
OBS         Yl      Y2          X           El         E2
56     9.8167    10.4806    9.30891    0.8350     1.4989
57     5.7180     9.2328    4.84727    0.0826     3.5973
68     7.5164     9.3803    8.11860   -0.5726     1.2914
59     8.7414     4.7444    9.36393   -0.2816    -4.2786
60     9.8128     2.6768    9.58399    0.6248    -6.5112
61     8.9143     9.4339    8.03762    0.8861     1.4057
62     8.7137    11.3970    8.20226    0.5620     3.2453
63     7.3125    11.5217    5.37305     1.2827    5.4919
64     5.0852     4.6769    4.78197   -0.5013    -J.9096
65     3.1596     7.6258    0.57190    0.7307     5.1968
66     3.2792     7.3876    1.94593   -0.1802     3.9282
67     5.1313     7.0664    5.27208   -0.8227     1.1123
68     7.9128     9.9160    7.83728    0.0349     2.0381
69     3.6012     6.1118    1.20883    0.6946     3.2051
70     7.8628     6.1583    6.77867    0.7788    -0.9257
71     8.2659    15.6029    9.04776   -0.5199     G.8171
72     7.0373     4.2921    5.63454    0.8114    -1.9338
73    10.0857    10.8948    9.63971    0.8559     1.6650
74     6.4872     4.8879    4.62658     1.0173   -0.5820
75     6.9800    16.5513    8.94754   -1.7306     7.8406
76     4.4206     0.9233    1.35229     1.4064   -2.0909
77     7.2138     8.7318    7.89473   -0.7073     0.8107
78     9.1808    10.7612    6.75661    2.1133     3.6937
79     9.9127     6.7066    8.38590     1.6232   -1.5828
80     2.7530     3.6563    1.80679    -0.6020    0.3012
81     6.1451     7.9387    6.74919   -0.9168     0.8768
82     4.3602     6.3511    3.62027   -0.3550     1.6359                                    a%
83     7.0077     6.1692    5.85053    0.6198    -0.2187                                     I
84     8.7111    10.3237    9.79575   -0.6357     0.9769
85     6.6527    12.4448    7.12975   -0.6945     5.0975
86    10.4G35     7.2518    9.69738     1.1905   -2.0213
87     3.1937     8.0237    3.80862   -1.6628     3.1672
88     0.9650     4.7521    1.44205   -2.1166     1.6706
89     7.9280     5.5978    6.52070     1.0375   -1.2927
90     3.8328    -1.0384    3.33984   -0.6721    -5.5433
91     2.8134     6.7692    2.61311    -1.1465    2.8094
92     7.9368     9.5197    8.53754   -0.4664     1.1165
93     2.6352     2.2180    0.37018    0.3576    -0.0596
94     2.8469     5.0158    1.61536   -0.3647     1.8042
95     8,3037     7.9988    9.40893    -0.7530   -1.0579
96     7.3079    10.4642    5.88674    0.8929     4.0492
97     7.7162     8.4037    8.49647    -0.6561    0.0314
98    -1.1956    -0.1309    0.15125   -3.3090    -2.2444
99     3.1250     4.3293    2.03925    -0.4044    0.7999
100     6.2774     1.9409   3.65065     1.5394    -2.7971


﻿DATA APPENDIX
MANUFACTURED DATA USED FOR FIGURE 5 AND TABLES 7, 9
OBS       RENT   HHSIZE    HSIZESQ   HH2    HH3   HH4    HIH5   HH6   IH7    HH8    HHGE9
1     622.31      8        64       0      0     0      0      0     0      1       0
2     521.8G      3          9      0      1     0      0      0     0      0       0
3    1093.25      3          9      0      1     0      0      0     0      0       0
4     715.37      5         25      0      0     0      1      0     0      0       0
5    1088.18      4         16      0      0      1     0      0     0      0       0
6     107.76      2          4      1      0     0      0      0     0      0       0
7     741.27      4         16      0      0      1     0      0     0      0       0
8     671.97      7         49      0      0     0      0      0      1     0       0
9     924.20      8         64      0      0     0      0      0     0      1       0
10    638.42      10       100       0      0     0      0     0      0      0      10
11     546.16      9        81       0      0     0      0      0     0      0       9
12    595.05       8        64       0      0     0      0     0      0      1      0
13     953.82      3         9       0      1     0      0      0     0      0       0
14    587.28      10       100       0      0     0      0     0      0      0      10
15     31.36       2         4       1      0     0      0     0      0      0      0
16    968.05       9        81       0      0     0      0     0      0      0       9
17    970.46       7        49       0      0     0      0      0     1      0       0
18     127.78      8        64       0      0     0      0      0     0      1       0
19    720.31       8        64       0      0     0      0     0      0      1      0
20      -5.59      9         81      0      0     0      0      0     0      0       9
21     687.57      9         81      0      0     0      0      0     0      0       9
22     933.36      4         16      0      0      1     0      0     0      0       0
23    1036.44      5         25      0      0     0      1      0     0      0       0
24     547.56     10        100      0      0     0      0      0     0      0      10
25     338.59      9         81      0      0     0      0      0     0      0       9
26     927.45      6         36      0      0     0      0      1     0      0       0
27    1153.19      6         36      0      0     0      0      1     0      0       0
28    1250.95      6         36      0      0     0      0      1     0      0       0
29     576.94      5         25      0      0     0      1      0     0      0       0
30     933.49      3          9      0      1     0      0      0     0      0       0
31     944.01      8         64      0      0     0      0      0     0      1       0
32    1028.00      4         16      0      0      1     0      0     0      0       0
33    1626.07      5         25      0      0     0      1      0     0      0       0
34     174.92      5         25      0      0     0      1      0     0      0       0
35     852.32      7         49      0      0     0      0      0      1     0       0
3G     502.16      1          1      0      0     0      0      0     0      0       0
37     609.42      1          1      0      0     0      0      0     0      0       0
38     475.17      8         64      0      0     0      0      0     0      1       0
3!:   1241.57      5         25      0      0     0      1      0     0      0       0
40     746.87      6         36      0      0     0      0      1     0      0       0
41    1450.43      5         25      0      0     0      1      0     0      0       0
42     907.69      2          4      1      0     0      0      0     0      0       0
43    1245.19      5         25      0      0     0      1      0     0      0       0
44    1085.56      9         81      0      0     0      0      0     0      0       9
45    1372.52      9         81      0      0     0      0      0     0      0       9
46      45.91      2          4      1      0     0      0      0     0      0       0
47     516.62      2          4      1      0     0      0      0     0      0       0
48     241.67      3          9      0      1     0      0      0     0      0       0
49     299.15      3          9      0      1     0      0      0     0      0       0
50    1063.26      8         64      0      0     0      0      0     0      1       0
51     179.30      2          4      1      0     0      0      0     0      0       0
52    1238.63      8         64      0      0     0      0      0     0      1       0
53     923.74      7         49      0      0     0      0      0      1     0       0
54     433.19      9         81      0      0     0      0      0     0      0       9
55     906.66      6         36      0      0     0      0      1     0      0       0


﻿DATA APPENDIX
MANUFACTURED DATA USED FOR FIGURE 5 AND TABLES 7, 9
OBS       RENT   HHSIZE    HSIZESQ   HH2    HH3   HH4    HHS    HH6   HH7    HH8    HHGE9
56     749.89     10        100      0      0     0      0      0     0      0      10
57    1359.73      5         25      0      0     0      1      0     0      0       0
58     929.14      9         81      0      0     0      0      0     0      0       9
59     172.14     10        100      0      0     O0        0         0      0      10
60     -51.12     10        100      0      0     0      0      0     0      0      10
61     940.57      9         81      0      0     O      0      0     0      0       9
62    1124.53      9         81      0      0     0      0      0     0      0       9
63    1599.19      6         36      0      0     0      0      1     0      0       0
64     909.04      5         25      0      0     0      1      0     0      0       0
65     819.68      1          1      0      0     0      0      0     0      0       0
66     992.82      2          4      1      0     0      0      0     0      0       0
67    1161.23      6         36      0      0     0      0      1     0      0       0
68    1103.81      8         64      0      0     0      0      0     0      1       0
69     920.51      2          4      1      0     0      0      0     0      0       0
70     877.43      7         49      0      0     0      0      0      1     0       0
71    1281.71     10        100      0      0     0      0      0     0      0      10
72     856.62      6         36      0      0     0      0      1     0      0       0
73     766.50     io        100      0      0     0      0      0     0      0      10
74     941.80      5         25      0      0     0      1      0     0      0       0
75    1584.06      9         81      0      0     0      0      0     0      0       9
76     390.91      2          4      1      0     0      0      0     0      0       0
77     981.07      8         64      0      0      0     0      0     0      1       0
78    1339.37      7         49      0      0     0      0      0      1     0       0
79     641.72      9         81      0      0     0      0      0     0      0       9
80     630.12      2          4      1      0     0      0      0     0      0       0
81    1057.68      7         49      0      0      0     0      0      1     0       0
82    1063.59      4         16      0      0      1     0      0     0      0       0                     00
83    1028.13      6         36      0      0     0      0      1     0      0       0
84     697.69     10        100      0      0     0      0      0     0      0      10
85    1409.75      8         64      0      0     0      0      0     0      1       0
86     397.87     10        100      0      0      0     0      0     0      0      10
87    1216.72      4         16      0      0      1     0      0     0      0       0
88     767.06      2          4       1     0     0      0      0     0      0       0
89     840.73      7         49      0      0     0      0      0      1     0       0
90     345.67      4         16      0      0      1     0      0     0      0       0
91    1080.94      3          9      0      1      0     0      0      0     0       0
92     911.65      9         81      0      0      0     0      0      0     0       9
93     294.04      1          1      0      0     0      0      0      0     0       0
94     780.42      2          4      1      0      0     0      0     0      0       0
95     494.21     10        100      0      0      0     0      0     0      0      10
96    1454.92      6         36      0      0      0     0      1      0     0       0
97     803.14      9         81      0      0     0      0      0     0      0       9
98      75.56      1          1      0      0      0     0      0      0     0       0
99     879.99      3          9      0      1      0     0      0      0     0       0
100    620.29       4         16      0      0      1     0      0     0      0       0


﻿- 129 -
APPENDIX F
OUTLINE OF SUGGESTED TABLES FOR URBAN HOUSING SURVEY REPORT
Introduction
This outline is intended to be a basis for discussion for final
selection of tables.
General Comments
The following is a partial list of tables which can be produced for
each city and for the larger towns in the 1983 Urban Housing Survey. In
addition to weighted counts, it is important that (1) weighted proportions,
and (2) the unweighted number of observations in each cell be included, so
that the reliability of the estimates is self-documented. The most important
criterion variables for the tables include structure type, age of the
structure, number of residential units, tenure, and income class. The design
of the tables will have to be modified for larger and small samples. For
example, there are eight income classes by two tenure groups by four water
outcomes, or 64 cells. This may be no problem in Nairobi, but in the smaller
towns we will have to use a smaller number of income classes and collapse the
water outcomes because of the smaller sample size. For example, we could use
four income classes and compute a recoded variable (1 = private and/or
communal piped water, 2 = no piped water). The classifications will need Lo
be collapsed in many other cases as well. For example, there are six
structure types and many possible values for "Number of Residential Units."
These criterion variables will have to be collapsed into a manageable number
of categories.
The attached computer printout shows one possible format for a table
which looks at the percentage of households with piped water by income class


﻿- 130 -
and tenure for Cairo. It is only one possible suggested format, and other
formats can be designed.
Many tables specify two criterion variables, e.g., income class and
tenure. It is extremely useful to break the tables out by (1) both criteria
(income and tenure) together and (2) each separately. See the attached
example.
Another point to remember is that the sample size needed to reliably
estimate proportions is less than that needed to reliably estimate means or
medians (see, for example, Kish, Survey Sampling ch. 2). It may be necessary
to collapse categories further for means and medians. A common rule of thumb
is to not report medians or averages for cells containing fewer than 25 sample
observations. Weighted counts should still be reported for small cells so
that tables sum up correctly.
Comments on Table Specifications and Outline
Since the number of tables can become quite large, we recommend
producing several reports of varying detail for different audiences. One
possible work plan would incldue: (1) the production of a full set of tables
for each city, arranged by city, which can be used for reference and by
specialists and planners working in a particular town; (2) a one-volume report
following the outline of the table specifications (outline modified as
required) which presents conclusions and sample tables from representative
cities; and (3) an executive summary of no more than 25 pages which summarizes
the key findings of the survey for policymakers.
Regression models similar to those described in Section 1.3 can be
the topic of separate reports since the estimation of these models is likely
to be time consuming.


﻿- 131 -
The following outline lists suggested tables by chapter. The
chapters refer to the one-volume report; chapter 1 can be modified to stand on
its own as the executive summary.
Chapter 1: Introduction and Summary
- describe the survey
- summarize main results by chapter
Chapter 2: Characteristics of the Current Housing Stock
A. Characteristics of Structures
1. Distribution of structures by type of structure
(i.e. house, maisonette, etc.).
2. Number of residential units by type of
structure.
3. Number of residents by type of structure, and
by age of structure.
4. Type of construction materials used in outer
walls, by type of structure.
5. Type of construction materials used for roofs,
by type of structure.
6. Type of construction materials used for floors,
by type of structure.
7. Age of structure, by type of structure.
8. Median estimated value of structure by type and
by age.
9. Ownership of unit, by structure type.
10. Type of scheme of structure, by structure type.
B. Characteristics of Households
1. Tenure, by income class.
2. Type of water supply, by income class and
tenure.
3. Type of sanitation, by income class and tenure.
4. Type of lighting, by income class and tenure.
5. Type of scheme, by income class and tenure.
6. Type of ownership, by income class and tenure.
7. Number of households reporting income from
rent, by income class and tenure.
8. Type of kitchen, by income class and tenure.
9. Type of bathing facilities, by income class and
tenure.
10. Number of rooms, median rent per room, and
median persons per room, by income class and
tenure.
11. Garbage disposal - frequency of collection and
type of disposal, by income class and tenure.
12. Proportion of units which contain servants'
quarters, by income class and tenure.


﻿- 132 -
13. Median expenditures on food, rent, household
requirements, transport, water/light, and total
expenditure, by income class.
14. Tables on distances, cost of travel, mode of
transport to public amenities maybe useful for
future reports.
15. Tables on opinions about neighborhood for
future reports.
Chapter 3: Estimating Housing Demand
A. Revealed Preferences of Recent Movers
1. Number of movers by previous and current
tenure.
2. Number of moves by income class and tenure.
3. Comparing the previous residential unit with
the current one by:
(a) type of unit (i.e. movement from flat to
house, etc.)
(b) number of rooms
(c) rent
(d) water supply
B. Revealed Preferences of Households Planning to Move
1. Number of planned moves by current and expected
tenure.
2. Number of planned moves by income class.
3. Comparing the current residential unit to the
one planning to move to by:
(a) type of unit
(b) number of rooms
(c) rent
(d) water supply
C. Renters Affordability and Willingness to Pay
1. Median monthly contract rent and gross rent
(rent plus utilities) paid by income class.
2. Table based on Table 2 of Analyzing an Urban
Housing Survey, using income class in place of
"Income Quintiles," and using rent-to-
consumption ratio in place of "Rent-to-Income
Ratio." Consumption must be used in place of
income because we do not have continuous income
measures but we do have continuous measures of
consumption.


﻿- 133 -
D. The Housing Consumption of Owner Occupants
1. Median current value of owner-occupied
structures, by income class, structure type,
and number of residential units.
2. Table based on C-2 above, substituting current
value of structure for rent, and classified by
structure type and number of residential units.
Chapter 4: Housing Supply
A. Characteristics of Housing Supply*
1. Type of scheme (included in Chapter 1)
2. Type of loan (owners only) by income class
3. Source of finance (owners only) by income class
4. Land tenure (owners only) by income class
* MWH and planning department should give their
suggestions for additional tables.
B. Characteristics of the Rental Market
1. Proportion of total housing units for rental
2. Median monthly rent (contract and gross) by
number of rooms
3. Proportion of residential units where part of
the unit is sublet.