Applied Econometrics Normal Distribution

1

Applied Econometrics

Lecture 1: Normal Distribution

For many random variables, the probability distribution is a specific bell-shaped curve, called the

normal curve, or Gaussian curve. This is the most common and useful distribution in statistics.

1) Standard normal distribution

The standard normal distribution has the probability density function as follows:

e

z

2π

1

P(z)Y

2

2

1

−

==

Features of the curve are:

1) z

2

increases in the negative exponent. Therefore, P(z) decreases, approaching 0

symmetrically in both tails.

2) The mean, which is zero (μ = 0), is the balancing point or the center of symmetry.

3) The standard deviation is one (σ = 1)

Example 1.1: If z has a standard normal distribution, find: P(-2<z<2)

1

Solution: P(-2<z<2) = 1 – P(z<-2) – P(z>2) = 1 – 2. (0.023) = 0.954

2) General normal distribution

The general normal distribution has the probability density function as follows:

e

σ

μX

2πσ

1

Y

2

2

1

⎟

⎠

⎞

⎜

⎝

⎛

−

=

−

The quantity Y, which is the height of the curve at any point along the scale of X, is known as the

probability density of that particular value of the variable quantity, X.

Example 2.1: The local authorities in a certain city install 2,000 electricity lamps in the streets of the

city. If these lamps have an average life of 1,000 burning hours, with a standard deviation of 200

hours, what number of the lamps might be expected to fail in the first 700 burning hours?

1

If z is continuous, P(z≥c) = P(X>c). In other words, ≥ and > can be used interchangeably for any continuous random

variable.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

2

Solution: In this case, we want to find the probability corresponding to the area of the probability

curve below t = [(700-1000)/200] = -1.5. We ignore the sign and enter our table at 1.5 to find that the

probability for lives less than 700 hours is P = 0.067. Hence the expected number of failures will be

2,000 x 0.067 = 134.

Example 2.2: What number of lamps may be expected to fail between 900 and 1,300 burning hours?

Solution:

z The number of lamps, which will fail under 900 hours: The corresponding value of t = [(900 –

1000)/200] = -0.5. Entering the table with this value of t, we find for the probability of failure

below 900 hours: P = 0.309.

z The number of lamps, which will fail over 1,300: The corresponding value of t = [(1,300 –

1,000)/200] = 1.5. Entering the table with this value of t, we find for the probability of failure

over 1,300 hours: P = 0.067.

z Hence the probability of failure outside the limits 900 to 1,300 hours will be 0.376

(0.309+0.067 = 0.376). It follows that the number of lamps we may expect to fail outside these

limits is: 2,000 x 0.376 = 752. But we were asked to find the number, which are likely to fail

inside the limits stated. This is 2,000 – 752 = 1,248.

Example 2.3: After what period of burning hours would you expect that 10% of the lamps would

have failed?

Solution: What we want here is the value of t corresponding to a probability P = 0.1. Looking along

our table, we find that when t = 1.25 the probability is P = 0.106. This is near enough for our purpose

of prediction. Hence we may take it that 10% of the lamps will fail at 1.25 standard deviations. Since

one standard deviation is equal to 200 hours, it follows that 10% of the lamps will fail before 1,000 –

1.25 x (200) = 1,000 – 250 = 750 hours.

3) Moment-based characteristics of a distribution

First moment

Mean > Median: the distribution is skewed to the right

Mean ≅ Median ≅ Mode: the distribution is symmetrically distributed

Mean < Median: the distribution is skewed to the left

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

3

Second moment

The spread of a distribution is measured by its standard deviation

(

)

1n

X

X

S

n

1i

2

i

−

=

∑

−

=

Third moment

Coefficient of skewness: a

3

= (1/ns

3

) ∑(X

i

- X )

3

z Cubic power preserves the sign of an expression but inflate the larger deviations proportionally

much more than smaller deviations. If the distribution is symmetrical, negative and positive

cubic power will cancel each other out.

z The cubic power of the standard deviation in the denominator is used to standardize the

measure and so remove the dimension (i.e., it will not depend on the units in which the variable

is measured)

z If a

3

> 0, the distribution is skewed to the right (meaning its long tail is to the right) and the

mean is greater than the median

If a

3

≅ 0, the distribution is normally distributed (approximate symmetry) and the mean is

approximately equal to the median

If a

3

< 0, the distribution is skewed to the left (meaning its long tail is to the left) and the mean

is smaller than the median

Fourth moment

Coefficient of kurtosis: a

4

= (1/ns

4

) ∑(X

i

- X )

4

z Fourth powers make each sign positive but inflate larger deviations even more than cubic

powers or squares would do.

z The presence of heavy tails, therefore, will tend to inflate the numerator proportionally more

than denominator. The fatter the tails, therefore, the higher the kurtosis.

z The fourth power of the standard deviation in the denominator standardizes the measure and

renders it dimensionless.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

4

z If a

4

> 3, the distribution has heavier tails than a normal distribution

If a

3

< 3, the distribution has a rectangular distribution which has a body but no tails

Table 3.1: Moment-based characteristics of a distribution

Measure Population Sample

X ∼ N(0,1)

First moment Center

E(X) = μ

X = (1/n) ∑X

i

0

Second moment Spread

E(X-μ)

2

= σ

2

S

2

= [1/(n-1)] ∑(X

i

- X )

2

1

Third moment Skewness

(1/σ

3

) E(X-μ)

3

a

3

= (1/ns

3

) ∑(X

i

- X )

3

0

Fourth moment Kurtosis

(1/σ

4

) E(X-μ)

4

a

4

= (1/ns

4

) ∑(X

i

- X )

4

3

4) The skewness – kurtosis (Jarque – Bera) test for normality

The hypothesis of normality distribution H

0

is as follows:

H

0

: α

3

= 0 and α

4

= 3

Against

H

1

: α

3

≠ 0 or α

4

≠ 3 or both

The relevant test statistic is BJ which follows a chi-square distribution with two degree of freedom

BJ = a

3

2

(n/6) + (a

4

– 3)

2

(n/24)

If BJ > 5.99, the normality distribution is formally rejected.

If BJ ≤ 5.99, we have no conclusion

5) Transformations towards normality

If the data are unimodal but skewed, a data transformation is called for to correct for the skewness in

the data. To do this we rely on the ladder of power transformations, which enable us to correct for

differences in the direction of skewness (positive or negative) and its strength. Often, but not always,

a transformation renders the transformed data symmetric, and, hopefully, also more normal in shape.

If so, the classical model of inference about the population mean using the sample mean as estimator

can again be used. Table 5.1 illustrates the hierarchy of these power transformations and their impact

on the skewness in the data.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

5

Table 5.1: Ladder of Power to Reduce Skewness

Power p Transformation Effect on skewness

3

2

1

0

-1

X

3

X

2

X

lnX

1/X

Reduce extreme negative skewness

Reduce negative skewness

Leaves data unchanged

Reduce positive skewness

Reduce extreme positive skewness

The power used in transformation need not be only an integer but can contain fractions as well. The

choice of an appropriate transformation often involves a trade-off between one which is ideal for the

purposes of data analysis and one which performs reasonably well on this count but also has the

advantage that it lends itself to a more straightforward interpretation (in substantive terms) of the

results.

References

Bao, Nguyen Hoang (1995), ‘Applied Econometrics’, Lecture notes and Readings,

Vietnam-Netherlands Project for MA Program in Economics of Development.

Maddala, G.S. (1992), ‘Introduction to Econometrics’, Macmillan Publishing Company, New York.

Mukherjee Chandan, Howard White and Marc Wuyts (1998), ‘Econometrics and Data Analysis for

Developing Countries’ published by Routledge, London, UK.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

6

Workshop 1: Normal Distribution

1) Phil and Kim Bell do not know whether to buy a house now or wait a year, in which case a price

increase may put a house beyond their reach. Their best guess is that, if they wait a year, the

price increase will be approximately normal, with a mean of 8% and, reflecting the uncertainty

of the market, a standard deviation of 10%.

1.1) If the price increase exceeds 25% they feel they will be unable to afford a house. What is

the chance of this?

1.2) On the other hand, if the price drops, they will have won their gamble handsomely. What is

the chance of this?

2) Using the data file SOCECON (with the world socioeconomic data for 1990) on the diskette,

make histograms and compute means, median and modes for the following variables:

GNP (gross national product) per capita

HDI (human development index)

FERT (fertility rate)

LEXPM and LEXPF (male and female life expectancy)

POPGRWTH (population growth rate)

In each case, discuss the different averages in the light of the shape of the empirical distribution.

Would you say that any of the distributions is reasonably symmetrical and bell-shaped?

3) Collecting the macroeconomic indicators Y (GDP), I (Investment), C (consumption), X

(Exports) and M (Imports) at fixed price on the World Development Indicators 2003 for 200

countries in the world,

3.1) Make histograms and compute means, median and modes for the above variables

3.2) Calculate the coefficients of skewness and kurtosis

3.3) Use the Jarque – Bera test for normality of each variable

3.4) Transform each variable towards normality

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

7

4) Collecting data of life expectancy (LE) and GDP per capita (Y) of 200 countries (WDI 2003),

4.1) Plot the histogram (frequency graph) for each of your two samples (life expectancy and

income per capita)

4.2) Calculate the mean, mode, and median for each of your two samples

4.3) Calculate the skewness and kurtosis for each of your two samples

4.4) Use the Bera – Jarque test for normality for each of your two samples

4.5) In each case find the most appropriate transformation so that the data are approximately

normal

4.6) Calculate the regression coefficients from regressing LE on Y using a different functional

forms

LE = a

0

+ a

1

Y

ln(LE) = b

0

+ b

1

Y

LE = c

0

+ c

1

lnY

ln(LE) = d

0

+ d

1

lnY

and compare their coefficients of determination

4.7) Which of the models you have estimated best fits of the data? Discuss your results

4.8) Does the direction of causality exist?

Written by Nguyen Hoang Bao May 17, 2004

1

Applied Econometrics

Lecture 1: Normal Distribution

For many random variables, the probability distribution is a specific bell-shaped curve, called the

normal curve, or Gaussian curve. This is the most common and useful distribution in statistics.

1) Standard normal distribution

The standard normal distribution has the probability density function as follows:

e

z

2π

1

P(z)Y

2

2

1

−

==

Features of the curve are:

1) z

2

increases in the negative exponent. Therefore, P(z) decreases, approaching 0

symmetrically in both tails.

2) The mean, which is zero (μ = 0), is the balancing point or the center of symmetry.

3) The standard deviation is one (σ = 1)

Example 1.1: If z has a standard normal distribution, find: P(-2<z<2)

1

Solution: P(-2<z<2) = 1 – P(z<-2) – P(z>2) = 1 – 2. (0.023) = 0.954

2) General normal distribution

The general normal distribution has the probability density function as follows:

e

σ

μX

2πσ

1

Y

2

2

1

⎟

⎠

⎞

⎜

⎝

⎛

−

=

−

The quantity Y, which is the height of the curve at any point along the scale of X, is known as the

probability density of that particular value of the variable quantity, X.

Example 2.1: The local authorities in a certain city install 2,000 electricity lamps in the streets of the

city. If these lamps have an average life of 1,000 burning hours, with a standard deviation of 200

hours, what number of the lamps might be expected to fail in the first 700 burning hours?

1

If z is continuous, P(z≥c) = P(X>c). In other words, ≥ and > can be used interchangeably for any continuous random

variable.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

2

Solution: In this case, we want to find the probability corresponding to the area of the probability

curve below t = [(700-1000)/200] = -1.5. We ignore the sign and enter our table at 1.5 to find that the

probability for lives less than 700 hours is P = 0.067. Hence the expected number of failures will be

2,000 x 0.067 = 134.

Example 2.2: What number of lamps may be expected to fail between 900 and 1,300 burning hours?

Solution:

z The number of lamps, which will fail under 900 hours: The corresponding value of t = [(900 –

1000)/200] = -0.5. Entering the table with this value of t, we find for the probability of failure

below 900 hours: P = 0.309.

z The number of lamps, which will fail over 1,300: The corresponding value of t = [(1,300 –

1,000)/200] = 1.5. Entering the table with this value of t, we find for the probability of failure

over 1,300 hours: P = 0.067.

z Hence the probability of failure outside the limits 900 to 1,300 hours will be 0.376

(0.309+0.067 = 0.376). It follows that the number of lamps we may expect to fail outside these

limits is: 2,000 x 0.376 = 752. But we were asked to find the number, which are likely to fail

inside the limits stated. This is 2,000 – 752 = 1,248.

Example 2.3: After what period of burning hours would you expect that 10% of the lamps would

have failed?

Solution: What we want here is the value of t corresponding to a probability P = 0.1. Looking along

our table, we find that when t = 1.25 the probability is P = 0.106. This is near enough for our purpose

of prediction. Hence we may take it that 10% of the lamps will fail at 1.25 standard deviations. Since

one standard deviation is equal to 200 hours, it follows that 10% of the lamps will fail before 1,000 –

1.25 x (200) = 1,000 – 250 = 750 hours.

3) Moment-based characteristics of a distribution

First moment

Mean > Median: the distribution is skewed to the right

Mean ≅ Median ≅ Mode: the distribution is symmetrically distributed

Mean < Median: the distribution is skewed to the left

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

3

Second moment

The spread of a distribution is measured by its standard deviation

(

)

1n

X

X

S

n

1i

2

i

−

=

∑

−

=

Third moment

Coefficient of skewness: a

3

= (1/ns

3

) ∑(X

i

- X )

3

z Cubic power preserves the sign of an expression but inflate the larger deviations proportionally

much more than smaller deviations. If the distribution is symmetrical, negative and positive

cubic power will cancel each other out.

z The cubic power of the standard deviation in the denominator is used to standardize the

measure and so remove the dimension (i.e., it will not depend on the units in which the variable

is measured)

z If a

3

> 0, the distribution is skewed to the right (meaning its long tail is to the right) and the

mean is greater than the median

If a

3

≅ 0, the distribution is normally distributed (approximate symmetry) and the mean is

approximately equal to the median

If a

3

< 0, the distribution is skewed to the left (meaning its long tail is to the left) and the mean

is smaller than the median

Fourth moment

Coefficient of kurtosis: a

4

= (1/ns

4

) ∑(X

i

- X )

4

z Fourth powers make each sign positive but inflate larger deviations even more than cubic

powers or squares would do.

z The presence of heavy tails, therefore, will tend to inflate the numerator proportionally more

than denominator. The fatter the tails, therefore, the higher the kurtosis.

z The fourth power of the standard deviation in the denominator standardizes the measure and

renders it dimensionless.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

4

z If a

4

> 3, the distribution has heavier tails than a normal distribution

If a

3

< 3, the distribution has a rectangular distribution which has a body but no tails

Table 3.1: Moment-based characteristics of a distribution

Measure Population Sample

X ∼ N(0,1)

First moment Center

E(X) = μ

X = (1/n) ∑X

i

0

Second moment Spread

E(X-μ)

2

= σ

2

S

2

= [1/(n-1)] ∑(X

i

- X )

2

1

Third moment Skewness

(1/σ

3

) E(X-μ)

3

a

3

= (1/ns

3

) ∑(X

i

- X )

3

0

Fourth moment Kurtosis

(1/σ

4

) E(X-μ)

4

a

4

= (1/ns

4

) ∑(X

i

- X )

4

3

4) The skewness – kurtosis (Jarque – Bera) test for normality

The hypothesis of normality distribution H

0

is as follows:

H

0

: α

3

= 0 and α

4

= 3

Against

H

1

: α

3

≠ 0 or α

4

≠ 3 or both

The relevant test statistic is BJ which follows a chi-square distribution with two degree of freedom

BJ = a

3

2

(n/6) + (a

4

– 3)

2

(n/24)

If BJ > 5.99, the normality distribution is formally rejected.

If BJ ≤ 5.99, we have no conclusion

5) Transformations towards normality

If the data are unimodal but skewed, a data transformation is called for to correct for the skewness in

the data. To do this we rely on the ladder of power transformations, which enable us to correct for

differences in the direction of skewness (positive or negative) and its strength. Often, but not always,

a transformation renders the transformed data symmetric, and, hopefully, also more normal in shape.

If so, the classical model of inference about the population mean using the sample mean as estimator

can again be used. Table 5.1 illustrates the hierarchy of these power transformations and their impact

on the skewness in the data.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

5

Table 5.1: Ladder of Power to Reduce Skewness

Power p Transformation Effect on skewness

3

2

1

0

-1

X

3

X

2

X

lnX

1/X

Reduce extreme negative skewness

Reduce negative skewness

Leaves data unchanged

Reduce positive skewness

Reduce extreme positive skewness

The power used in transformation need not be only an integer but can contain fractions as well. The

choice of an appropriate transformation often involves a trade-off between one which is ideal for the

purposes of data analysis and one which performs reasonably well on this count but also has the

advantage that it lends itself to a more straightforward interpretation (in substantive terms) of the

results.

References

Bao, Nguyen Hoang (1995), ‘Applied Econometrics’, Lecture notes and Readings,

Vietnam-Netherlands Project for MA Program in Economics of Development.

Maddala, G.S. (1992), ‘Introduction to Econometrics’, Macmillan Publishing Company, New York.

Mukherjee Chandan, Howard White and Marc Wuyts (1998), ‘Econometrics and Data Analysis for

Developing Countries’ published by Routledge, London, UK.

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

6

Workshop 1: Normal Distribution

1) Phil and Kim Bell do not know whether to buy a house now or wait a year, in which case a price

increase may put a house beyond their reach. Their best guess is that, if they wait a year, the

price increase will be approximately normal, with a mean of 8% and, reflecting the uncertainty

of the market, a standard deviation of 10%.

1.1) If the price increase exceeds 25% they feel they will be unable to afford a house. What is

the chance of this?

1.2) On the other hand, if the price drops, they will have won their gamble handsomely. What is

the chance of this?

2) Using the data file SOCECON (with the world socioeconomic data for 1990) on the diskette,

make histograms and compute means, median and modes for the following variables:

GNP (gross national product) per capita

HDI (human development index)

FERT (fertility rate)

LEXPM and LEXPF (male and female life expectancy)

POPGRWTH (population growth rate)

In each case, discuss the different averages in the light of the shape of the empirical distribution.

Would you say that any of the distributions is reasonably symmetrical and bell-shaped?

3) Collecting the macroeconomic indicators Y (GDP), I (Investment), C (consumption), X

(Exports) and M (Imports) at fixed price on the World Development Indicators 2003 for 200

countries in the world,

3.1) Make histograms and compute means, median and modes for the above variables

3.2) Calculate the coefficients of skewness and kurtosis

3.3) Use the Jarque – Bera test for normality of each variable

3.4) Transform each variable towards normality

Written by Nguyen Hoang Bao May 17, 2004

Applied Econometrics Normal Distribution

7

4) Collecting data of life expectancy (LE) and GDP per capita (Y) of 200 countries (WDI 2003),

4.1) Plot the histogram (frequency graph) for each of your two samples (life expectancy and

income per capita)

4.2) Calculate the mean, mode, and median for each of your two samples

4.3) Calculate the skewness and kurtosis for each of your two samples

4.4) Use the Bera – Jarque test for normality for each of your two samples

4.5) In each case find the most appropriate transformation so that the data are approximately

normal

4.6) Calculate the regression coefficients from regressing LE on Y using a different functional

forms

LE = a

0

+ a

1

Y

ln(LE) = b

0

+ b

1

Y

LE = c

0

+ c

1

lnY

ln(LE) = d

0

+ d

1

lnY

and compare their coefficients of determination

4.7) Which of the models you have estimated best fits of the data? Discuss your results

4.8) Does the direction of causality exist?

Written by Nguyen Hoang Bao May 17, 2004

## Tài liệu kinh tế chính trị

## Tài liệu kinh tế chính trị

## Tài liệu kinh tế vi mô

## KInh tế ứng dụng_ Lecture 7: Multicollinearity

## KInh tế ứng dụng_ Lecture 9: Autocorrelation

## Tài liệu kinh te Bac My tt

## Tài liệu Kinh tế trí thức ( tập 2) pdf

## Tài liệu Kinh tế học vi mô bài giảng 9 pptx

## Tài liệu Kinh tế học vi mô bài giảng 8 docx

## Tài liệu Kinh tế học vi mô bài giảng 9 ppt

Tài liệu liên quan