STATISTICS: The study of methods for collecting,
organizing, and analyzing data
oDescriptive Statistics: Procedures used to organize and
present data in a convenient and communicable form
oInferential Statistics: Procedures employed to
arrive at broader conclusions or inferences about
populations on the basis of samples
POPULATION: The complete set of actual or
potential elements about which inferences are made
SAMPLE: A subset of the population selected using
some sampling method
oSampling methods
Cluster sample: A population is divided into
groups called clusters; some clusters are randomly
selected, and every member in them is observed
Stratified sample: The population is divided into
strata, and a fixed number of elements of each
stratum are selected for the sample
Simple random sample: A sample selected so that
each possible sample of the same size has an equal
probability of being selected; used for most
elementary inference
VARIABLE: An attribute of elements ofa population
or sample that can be measured; ex: height, weight,
IQ, hair colo~ and pulse rate are some of the many
variables that can be measured for people
DATA: Values of variables that have been
observed
oTypes of data
Qualitative (or "categorical") data are descriptive
but not numeric; ex: your gender, your birthplace,
the color of an automobile
Quantitative data take numeric values
Discrete data take counting numbers (0, 1,2, ... ) as
values, usually representing things that can be
counted; ex: the number of fleas on a dog, the
number of times a professor is late in a semester
Continuous data can take a range of numeric
values, not just counting numbers; ex: the height of
a child, the weight of a bag of beans, the amount of
time a professor is late
oLevels of measurement
Qualitative data can be measured at the:
oNominal level: Values are just names, without any
order; ex: color of a car, major in college
oOrdinal level: Values have some natural order;
ex: high school class (freshman!sophomore!
junior/senior), military rank
Quantitative data can be measured at the :
oInterval level: Numeric data with no natural zero
point; intervals (differences) are meaningful , but
ratios are not; ex: temperature in Fahrenheit
degrees; 80°F is 20°F hotter than 60°F, but it is not
150% as hot
oRatio level: Numeric data for which there is a
true zero; both intervals and ratios are
meaningful; ex: weight, length, duration, most
physical properties
STATISTIC: A numeric measure computed
from sample data, used to describe the sample and to
estimate the corresponding popUlation parameter
PARAMETER: A numeric measure that describes a
population; parameters are usually not computed, but
are inferred from sample statistics
FREQUENCY DISTRIBUTION
MEASURES OF DISPERSION
Provides the frequency (number of times observed)
of each value of a variable
SUM OF SQUARES (SS): The sum of squared
deviations from the mean
(" )2
2
2
L.,.X·
oPopulation SS: L(X; f.1x) orLx;  ,;
Table #1: Students in a driving class are polled
regarding number of accidents they've had:
(# of accidents) (frequency) (relative frequency)
x
f
RF
5
4
3
2
0.0526
3
9
0.1579
2
15
0.2632
1
16
0.2807
0
12
0.2105
0.0351
Table #2: The scores on a midterm exam are grouped
into classes:
class
f
cumulative freq.
9099
4
80
8089
18
76
7079
31
58
19
27
5059
7
8
4049
1
1
MEAN: Most commonly used measure of central
tendency, usually meant by "average"; sensitive to
extreme values
SAMPLE MEAN
1 n
LX;
= I
oTrimmed mean: Computed discarding some
number of the highest and lowest values; less
sensitive than ordinary mean
i
G
oWeighted mean: Computed with a L W;X;
weight multiplied to each value, making ; =
some values influence the mean more
L W;
heavily than others
;=I
d
MEDIAN: Value that divides the set so the same
number of observations lie on each side of it; less
sensitive to extreme values; for an odd number of
values, it is the middle value; for an even number, it
is the average of the middle two; ex: in Table 1, the
median is the average of the 28th and 29th observa
tions, or 1.5
MODE: Observation that occurs with the greatest
frequency; ex: in Table 1, the mode is 1
1
I
(J
N
= N L(X; f.1)
2
=I
STANDARD SCORES: Also known as Zscores;
the standard score of a value is the directed number
of standard deviations from the mean at which the
MEASURES OF CENTRAL
TENDENCY
= Ii
I
n
2
oSample variance: s2= =1 L(x; x)
n
; =I
oVariances for grouped data:
2
I G
2
Population: (J = N Lf;(m; f.1)
i = I
2
1
G
2
Sample: s = =1 Lf;(m; x)
n
; =I
STANDARD DEVIATION: The square root of the
variance; unlike variance, it has the same units as the
original data and is more commonly used:
i
CUMULATIVE FREQUENCY DISTRIBUTION:
Frequencies count all observations at a particular value
or class and all those less; ex: third column of Table 2
X
2
oSample ss: L(X; x) orLx;  nVARIANCE: The average of square differences
between observations and their mean
1 N
2
oPopulation variance: (J2= N L(x; f.1)
ex: Pop. S,D.:
RELATIVE FREQUENCY DISTRIBUTION: Each
frequency is divided by the total number of observa
tions to produce the proportion or percentage of the
data set having that value; ex: third column ofTable 1
POPULATION MEAN
(LX;)
2
; = I
GROUPED FREQUENCY DISTRIBUTION:
Values of the variable are grouped into classes
6069
2
value is found; that is, z = x ~ J.l
oA positive zscore indicates a value greater than the
mean; a negative zscore indicates a value less than
the mean; a zscore of zero indicates the mean value
·Converting every value in a data set or distribution
to a zscore is called standardization; once a data set
or distribution has been standardized, it has a new
mean ~= 0, and a new standard deviation a = I
GRAPHING TECHNIQUES
BAR GRAPH: A graph that uses bars to indicate the
frequency of occurrence of observations
oHistogram: A bar graph used with quantitative,
continuous variables
FREQUENCY CURVE: A graph representing a
frequency distribution in the form of a continuous
line that traces a histogram
oCumulative frequency curve: A continuous line
that traces a histogram where bars in all the lower
classes are stacked up in the adjacent higher class;
cannot have a negative slope
oSymmetric curve: The frequency curve is unchanged
if rotated around its center; median = mean
oNormal curve: Bellshaped curve; symmetric
Skewed curve: Deviates from symmetry; frequency
curve is shifted with a longer "tail" to the left (mean
< median) or to the right (mean > median)
~ tA?ArSkd
10
~~ tt
10
5
0
+5
+10
SKEWED CURVE
5
o
+5
+\0
PROBABILITY
STATISTICAL INFERENCE
A measure of the likelihood of a random event; the
longterm relative frequency with which an outcome
or event occurs
Probability of occurrence of Event A
(A) = Number of outcomes favoring Event A
p
Total number of outcomes
·Sample space: All possible simple outcomes of an
experiment
• Relationships between events
Exhaustive: 2 or more events are said to be
exhaustive if they represent all possible outcomes
·Symbolically, peA or B or...) = I
Nonexhaustive: Two or more events are said to be
nonexhaustive if they do not exhaust all possible
outcomes
Mutually exclusive: Events that cannot occur
simultaneously: peA and B) = 0, and peA or B) =
peA) + PCB); ex: males, females
Nonmutually exclusive: Events that can occur
simultaneously: peA or B) = peA) + PCB)  peA and
B); ex: males, brown eyes
Independent: Events whose probability is
unaffected by occurrence or nonoccurrence of each
other: P(AIB) = peA); P(BIA) = PCB); and peA and
B) = P(A)P(B); ex: gender and eye color
Dependent: Events whose probability changes
depending upon the occurrence or nonoccurrence
of each other: P(AIB) differs from peA); P(BIA)
differs from PCB); and peA and B) = peA) P(BIA) =
PCB) P(AIB); ex: race and eye color
·Joint probabilities: Probability that 2 or more
events occur simultaneously
• Marginal probabilities or unconditional
probabilities = summation of probabilities
·Conditional p robabilities: Probability of A given
the existence of S, written, P(AIS)
•Ex: Given the numbers I to 9 as observations in a
sample space:
Events mutually exclusive and complementary;
ex: P(all odd numbers); P(all even numbers)
Events mutually exclusive but not complementary;
ex: P(an even number); P(the numbers 7 and 5)
 Events neither mutually exclusive or exhaustive;
ex: P(an even number or a 2)
A random variable takes numeric values randomly,
with probabilities specified by a probability distri
bution (or density) function
• Discrete random variables: Take only distinct
values (as with quantitative data)
• Binomial distribution: A model for the number (x)
of successes in a series of n independent trials where
each trial results in success with probability p, or
failure with probability 1  p; ex: The number (x) of
heads ("successes") obtained in 12 (n) tosses of a
fair (probability of heads = p = 0.5) coin
P(x)=nCx P x(l_p)nx where P(x) is the probability
of exactly x successes out of n trials with a constant
probability p of success on each trial;
nCx = n!/(nx)!x!
Binomial mean: 11 = np
Binomial variance: (J2 = np(l p)
As n increases, the binomial approaches the
Normal distribution
•Hypergeometric distribution:
 Represents the number of successes from a series
of n trials where each trial results in success or
failure
 Like the binomial, except that each trial is drawn
from a small population with N elements split
between NI successes and Nz failures
 Then the probability of splitting the n trials
between x I successes and Xz failures is:
P(
XI
an
NI!
N2!
d )  xl!(N I  X I)!X2!(Nz X 2)!
X2 N!
n! (N  n)!
Hypergeometric
mean:
nNI
111 = E(xl) = Nand
variance' O'z= N  n[nNI][Nz]
.
NI
N
N
·Poisson distribution: A model for the number of
occurrences of an event x=0,l,2,... , counted over
some fixed interval of space and time rather than
some fixed number of trials; the parameter is the
average number of occurrences, A, for x=0,1,2,3, ... ,
and >0, otherwise P(x)=O
AA x
p (x) = _e_,_ Poisson mean and variance: 1.
x.
FREQUENCY TABLE
Event C
Event D
I Totals
Event E
52
36
87
Event F
62
71
133
Totals
114
106
220
EX: Joint Probability Between C and E
p(C & E) = 52/220 = 0.24
JOINT, MARGINAL & CONDITIONAL
PROBABILITY TABLE
EventC
Event D
Event E
0.24
0.16
0.40
(CfE)=O.60
(DfE)=O.40
Event F
0.28
0.32
0.60
(CIF)=O.47
(D/F)=O.53
Marginal
Probability
0.52
0.48
Marginal Conditional
Probability ProbablHty
1.00
Conditional (E/C)=O.46 (EfD)=O.33
Probability (F/C)=O.54 (FfD)=O.67
Sampling distribution: A theoretical probability
distribution of a statistic that would result from
drawing all possible samples of a given size from
some population
 A continuous random variable may take on any
value along an uninterrupted interval of a number line
 Probabilities are measured only over intervals,
never for single values; the probability that a
continuous random variable falls between two
values is exactly equal to the area under the density
curve between those two values
•Normal distribution: Bell curve; a distribution
whose values cluster symmetrically around the
mean (also median and mode); common in nature
and important in making inferences
 The density curve is the graph of:
I(x) = __
1_ e 
0'/2ii
(x 
Ji )' /2(5' where f(x) =
frequency at a given value
(J = standard deviation of the normal distribution
J1 = the mean of the normal distribution
x = value of normally distributed variable
·Standard normal distribution: A normal distri
bution with a mean of 0, and standard deviation of I;
values following a normal distribution can be
transformed to the standard normal distribution by
using zscores [see Measures of Dispersion, page I]
2
• In order to make inferences about a population,
which is unobserved, a random sample is drawn
The sample is used to compute statistics, which are
then used to draw probability conclusions about the
parameters of the population
Population
(unobserved)
•
'andom mmpUng
measured by
Parameters
(unknown)
..
statistical inferellce
Sample
(observed)
measured by
Statistics
(known)
BIASED & UNBIASED
ESTIMATORS
• Unbiased estimator of a parameter: An estimator
(sample statistic) with an average value equal to the
value of the parameter; ex: the sample mean is an
unbiased estimator of the population mean; the
average value of all possible sample means is the
population mean; all other factors being equal, an
unbiased estimator is preferable to a biased one
• Biased estimator of a parameter: An estimator
(sample statistic) that does not equal on the average
the value of the parameter; ex: the median is a biased
estimator, since the average of sample medians is not
always equal to the population median; variance
calculated from a sample, dividing by n, is a biased
estimator of the population variance; however, when
calculated with nI it is unbiased
 Note: Estimators themselves present only one source
of bias; even when an unbiased estimator is
used, bias in the sample (elements not all
equally likely to be chosen) may still be present
 Elementary methods of in ference assume
unbiased sampling
Sampling distribution: The probability distribution
of a sample statistic that would resul t from drawing
all possible samples of a given size from some
population; because samples are drawn at random,
every sample statistic is a random variable, and
has a probability di stribution that can be described
using mean and standard deviation
·Standard error: The standard deviation of the
estimator; do not confuse this with the standard
deviation of the sample itself; measures the
variability in the estimates around their expected
value, while the standard deviation of the sample
refl ects the variability within the sample around the
sample mean
The standard deviation of all possible sampl e means
of a given sample size, drawn from the same
population, is called the standard error of the
sample mean
If the population standard deviation (J is known , the
standard error is: O' x= ~
';11
Usually, the popUlation standard devi ation s is
unknown, and is estimated by s; in thi s case, the
.
d stan dard
estImate
error'IS: 0' ,,'" s" = ~
in
Note: in either case, the standard error of the sample
mean decreases as sample si ze is increased  a larger
sample provides more reliable information about the
population
HYPOTHESIS TESTING
•In a hypothesis test, sample data is used to accept or
reject a null hypothesis (Ho) in favor of an
alternative hypothesis (H .); the significance level
at which the null hypothesis can be rejected
I....... indicates how much evidence the sample
provides against the null hypothesis
•Null hypothesis (Ho): Always specifies a
value (the null hypothesis value) for a
population parameter; the null hypothesis is
assumed to be truethis assumption underlies
the computations for the hypothesis test; ex:
Ho: "a coin is unbiased," that is, the proportion
of heads is 0.5: Ho: P = 0.5
•Alternative hypothesis (H.): Never specifies a
value for a parameter; the alternative hypothesis
states that a population parameter has some value
different from the one specified under the null
hypothesis; ex: H.: A coin is biased; that is, the
proportion of heads is not 0.5: HI: p 0.5
I. Twotailed (or nondirectional): An alternative
hypothesis (H 1) that states only that the
population parameter is simply different from
the one specified under Ho; twotailed probability
is employed; ex: To use sample data to test
whether the population mean pulse rate is
different from 65, we would use the twotailed
hypothesis test Ho: ~ = 65 vs. HI: ~ 65
2. Onetailed (or directional): An alternative
hypothesis (H t) that states that the population
parameter is greater than (righttailed) or less
than (lefttailed) the value specified under Ho; one
tailed probability is employed; ex: to use sample
data to test whether the population mean pulse rate
1_
is greater than 65, we would use the righttailed
I/"
hypothesis test Ho: ~ = 65 vs. H.: ~ > 65
• The alternative hypothesis HI is also
sometimes known as the "research hypothesis,"
as only claims expressed as alternative
hypotheses can be positively asserted
• Level of significance: The probability of observing
sample results as extreme or more extreme than
those actually observed, under the assumption the
null hypothesis is true; if this probability is small
enough, we conclude there is sufficient evidence to
reject the null hypothesis; two basic approaches:
I. Fixed significance level (traditional method): A
level of significance a is predetermined; commonly
used significance levels are 0.0 I, 0.05, and 0.10
• Thesmaller the significance level a, the higher the
standard for rejecting Ho; critical value(s) for the
test statistic are determined such that the probability
of the test statistic being farther from zero than the
critical value (in one or two tails, depending on HI)
is a; if the test statistic falls beyond the critical
valuein the rejection region then Ho can be·
rejected at that fixed significance level a
2. Observed
significance
level
(pvalue
method): The test statistic is computed using
the sample data, then the appropriate probability
distribution is used to find the probability of
observing a sample statistic that differs at least
that much from the null hypothesis value for the
population parameter (the probability value, or
~
pvalue); the smaller the pvalue, the better the
evidence against Ho
I ~
·This method is more commonly used by
computer applications
·The pvalue also represents the smallest signifi
cance level a at which Ho can be rejected; thus,
pvalue results can be used with a fixed signifi
cance level by rejecting Ho ifpva[ue ~ a
*
*
• Generally, the larger (farther from zero, positive
or negative) the value of the test statistic, the
smaller the pvalue will be, providing better
evidence against the null hypothesis in favor of the
alternative
• Notion of indirect proof: Through traditional
hypothesis testing, the null hypothesis can never be
proven true; ex: if we toss a coin 200 times and tails
comes up exactly 100 times, we have no evidence
the coin is biased, but cannot prove the coin is fair
because of the random nature of samplingit is
possible to flip an unfair coin 200 times and get
exactly 100 heads, just as it is possible to draw a
sample from a population with mean 104.5 and find
a sample mean of 101; failing to reject the null
hypothesis does not prove it true and rejecting it
does not prove it false
·Two types of errors
 Type I error: Rejecting Ho when it is actually true;
the probability of a type I error is given by the
significance level a; type I is generally more
prominent, as it can be controlled
 Type II error: Failing to reject Ho when it is
actually false; the probability of a type II error is
denoted Ii; type II error is often (foolishly)
disregarded: it is difficult to measure or control, as
Ii depends on the unknown true value of the
parameter in question, which is not known
True Status of Ho
Statistical
Hypotheses
Ho True
Ho False
,       r      1    t.....
"0
Reject "0
Accept
Decision:
Correct (la)
i Type (~)
II error
Type I error
Correct (1 ~)
(al
CENTRAL 'IMIT THEOREM
(for sample mean x)
If XI, X2, X3, ... x n , is a simple random sample of n
elements from a large (infinite) population, with
mean Il( m) and standard deviation (J, then the distri
bution of x takes on the bell shaped distribution of a
normal random variable as n increases and the distri
bution of the ratio:
x ~ f1
I"rn
approaches the standard
normal distribution as n goes to infinity; in practice,
a normal approximation is acceptable for samples of
size 30 or larger
INFERENCE FOR
POPULATION MEAN USING
THE Z·STATISTIC
(0 KNOWN)
Requires that the sample must be drawn from a
normal distribution or have a sample size (n) of at
least 30
• Used when the population standard deviation (J is
known: If (J is known (treated as a constant, not
random) and the above conditions are met, then the
distribution of the sample mean follows a normal
distribution, and the test statistic z follows a
standard normal distribution: Note that this is rarely
the case in reality. and the tdistribution is more
widely used
3
·The test statistic is
z=
x;;:
where
~
=
population mean (either known or hypothesized
under Ho) and (J" x = (J" /Iii
·Critical region: The portion of the area under the
curve which includes those values of the test statistic
that provide sufficient evidence for the rejection of
the null hypothesis
 The most often used significance levels are 0.01,
0.05, and 0.1; for a onetailed test using zstatistic,
these correspond to zvalues of 2.33, 1.65, and 1.28
respectivelypositive values for a righttailed test,
negative for a lefttailed test
•For a twotailed test, the critical region for a =
0.01 is split into two equal outer areas marked by
zvalues of 12.581; for a = 0.05, the critical values
of z are 11.961, and for a = 0.10, the critical values
are 11.651
Ex 1: Given a population with (J = 50, a simple
random sample of n = 100 values is chosen with a
sample mean X of 255; test using the pvalue
method Ho: ~ = 250 vs. HI: ~ > 250; is there
sufficient evidence to reject the null hypothesis?
• In this case, the test statistic z
(255250)/(501'" I 00) = 1.00
•Looking at Table A, the area given for z = 1.00 is
0.3413; the area to its right (since H. is ">", this
is a righttailed test) is 0.5  0.3413 = 0.1587, or
15.87%
·This is the pvalue: the probability, if Ho is true
(that is, if ~ = 250), of obtaining a sample mean of
255 or greater; it also represents the smallest
significance level a at which Ho can be rejected
•Since, even if Ho is true, the probability of
obtaining a sample mean ~ 255 from this
popUlation with a sample of size n = 100 is about
16%, it is quite plausible that Ho is true there is
not very good evidence to support the alternative
hypothesis that the population mean is greater
than 250so we fail to reject Ho
·It can't even be rejected at the weakest common
significance level of a = 0.10, since 0.1587 >
0.10; remember, this doesn't prove the population
mean to be equal to 250; we just haven't accumu
lated sufficient evidence against the claim
Ex 2: A simple random sample of size n = 25 is
taken from a population following a normal distri
bution with (J = 15; the sample mean x is 95; use
the pvalue method to test Ho: ~ = 100 vs. H.: 1.1
100; is there sufficient evidence to reject the claim
that the population mean is 100 at a significance
level a of 0.1 O? At a = 0.05?
•In this case, the test statistic z = (95
100)/(15/"'25) =  5/3 =  1.67
·Since the normal curve is symmetric, we can look
up a zscore of 1.67  the value in Table A is
0.4525, that is, P(O < z < 1.67) = P( 1.67 < z < 0)
= 0.4525
Thus, P(z < 1.67) = P(z > 1.67) = 0.5  0.4525 =
0.0475
• Since this is a twotailed test (HI: ~* 100), the p
value is twice this area, or 0.095
• Since the pvalue = 0.095 < 0.10 = a, there is
sufficient evidence to reject the null hypothesis at
a significance level a of 0.10, but in the second
case, the pvalue = 0.095 > 0.05 = a, so the
sample data are not strong enough to reject at the
higher (0.05) level of significance
*
~
Table A
Normal Curve Areas
Area from mean to
Ex: A simple random sample of size 25 is taken from a population following a
normal distribution, with a sample mean 42, and the sample standard deviation
7.5; test at a fixed significance level a = 0.05: "0: 11 = 45 vs. HI: 11 > 45
z
·This is a lefttailed test (H ( 11 < 45), so the critical value and rejection region
~
0
Z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 .0
1.1
1 .2
1.3
1.4
1 .5
1.6
1.7
1.8
1 .9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
.00
.01
.02
.03
.04
z
.05
.06
.07
.08
.09
.0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
.0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
.0793 .0832 .0871 .0910.0948.0987 .1026 .1064 .1103 .1141
will be negative
·Consulting Table B to find the appropriate critical value, with df = n  I = 24,
produces a critical value of  1.711; the null hypothesis can be rejected at a =
0.05 if the value of the test statistic t < 1.711
·The test statistic t = (42  45)/(7 .5/.,125) = 311.5 =  2; since this is less than
the critical value of  1.711 , HO is rejected at a = 0.05
Table B
Critical Values of t
.1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
.1554 .1591 .1628 .1664 .1700.1736 .1772 .1808 .1844 .1879
Values indicate area to right of ta
.1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
.2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
.2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
~
.2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
.3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
.3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
A*
B'
.3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
.3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
.4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
info
.4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
.4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
.4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
.4554 .4564 .4573 .4582 .4591 .4599 .4608
.4641 .4649 .4656 .4664 .4671 .4678 .4686
.4713 .4719 .4726 .4732 .4738 .4744 .4750
.4772 .4778 .4783 .4788 .4793 .4798 .4803
.4616
.4693
.4756
.4808
.4625
.4699
.4761
.4812
.4633
.4706
.4767
.4817
.4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
.4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
.4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
.4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
.4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
.4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
.4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
.4974 ~4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
.4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
.4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
INFERENCE FOR POPULATION MEAN
USING THE t·STATISTIC
(a UNKNOWN)
Requires that the sample must be drawn from a normal distribllfion or
have a sample size (n) ofat least 30
·When (J is not known as is usually the caseit is estimated from s,
the sample standard deviation
•Because of the variability of both estimates the sample mean as well
as the sample standard deviation the test statistic follows not a zdis
tribution, but atdistribution
·Comparison between t and zdistributions
Although both distributions are symmetric about a mean of zero, the t
distribution is more spread out than the normal distribution, producing
a larger critical value of t as the boundary for the rejection region
The tdistribution is characterized by its degrees of freedom (dt),
referring to the number of values that are free to vary after placing cer
tain restrictions on the data
•For example, if we know that a sample of size 4 produces a mean of
87, we know that the sum of the numbers is 4 * 87 = 348; this tells us
nothing about the individual values in the sample there are an infi
nite number of ways to get four numbers to add up to 348; but as soon
as we 've chosen three of them, the fourth is determined
•For instance, the first number might be 84, the second 98, and the
third 81; but if the first three numbers are 84, 98, and 81 , then the
fourth must be 85, the only number producing the known sample
meanthat is, there are n I or 3 degrees of freedom in this example
 For a test about a population mean, the tstatistic follows atdistribution
with n 1 df
•As df increases, the tdistribution approaches the standard normal z
distribution
The test statistic t used for testing hypotheses about a population mean
·
IS: t
x j.1 were
h
S
=~
11 = i
popu '
atlon mean under H 0 an d Sx= in
Note: This is not so dif.forentfrom the test statistic z used when (J is known!
uares
ns
0.1
0.2
3.078
1.886
1 .638
1.533
1.476
1.440
1.415
1.397
1 .383
1.372
1.363
1 .356
1.350
1.345
1.341
1.337
1.333
1 .330
1 .328
1 .325
1 .323
1.321
1 .319
1 .318
1 .316
1.315
1.314
1.313
1.311
1.310
1 .282
0.05
0 .1
6.314
2.920
2.353
2 . 132
2.015
1.943
1 .895
1.860
1 .833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1 .725
1.721
1.717
1 .714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.645
0.025
0 .05
12.706
4 .303
3.182
2.776
2.571
2 .447
2.365
2.306
2.262
2.228
2.201
2 . 179
2 .160
2.145
2.131
2.120
2.110
2.101
2.093
2 .086
2 .080
2.074
2.069
2 .064
2.060
2.056
2.052
2 .048
2.045
2 .042
1 .960
0.01
0.02
31.821
6.965
4 .541
3.747
3.365
3.143
2 .998
2.896
2.821
2 .764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2 .508
2 .500
2.492
2 .485
2.479
2.473
2.467
2.462
2.457
2.326
0.005
0 .01
63.657
9 .925
5.841
4.604
4.032
3.707
3 .499
3 .355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2 .921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.576
A * = Level of significance for onetailed test
B* = Level of significance for twotailed test
Note: The tdistribution is a robust alternative to the zdistribution when testing for the
population mean: inferences are likely to be valid even if the population distribution is
far from normal; however, the larger the departure from normality in the population,
the larger the sample size needed for a valid hypothesis test using either distribution
sider
ceds
can
CONFIDENCE INTERVALS
Confidence interval: Interval within which a population parameter is likely to be
found; determined by sample data and a chosen level of confidence (I  a  [a
refers to the level of significance])
·Common confidence levels are 90%, 95%, and 99%, just as common levels of
significance are 0.10, 0.05, and 0.01
• (I  a) confidence interval for 11:
Xi 
Zal2(Yrn) :S j.1 :S X + Zal2(Yrn) where Za/2 is the value of the standard
normal variable z that puts an area a/2 in each tail of the distribution
·A tstatistic should be used in place of the zstatistic when (J is unknown and s
must be used as an estimate
·Ex: Given X = 108, s= 15, and n =26, estimate a 95% confidence interval for the
population mean
 Since the population variance is unknown, the tdistribution is used
 The resulting interval, using atvalue of2.060 from Table B (row 25 of the mid
dle column), is approximately 102 to 114
 Consequently, any null hypothesis that 11 is between 102 to 114 is tenable based
on this sample
Any hypothesized 11 below 102 or above 114 would be rejected at 0.05 significance
4
3

... 
_
~
l"'"1l'!'1I~~CJ~"'
COMPARING
POPULATION MEANS
~
Sampling distribution of the difference
between means: If a number of pairs of sam
III pIes were taken from the same population or
II from two different populations, then:
The distribution of differences between
"
...
pairs of sample means tends to be normal
(zdistribution)
 The mean of these differences between means
~ f.i x I  X 2 is equal to the difference between
the population means, that IS 111  112
Independent samples
 We are testing whether or not two samples
are drawn from populations with the same
mean, that is, HO: 111 = 112, versus a one or
twotailed alternative
 When 01 and 02 are known, the test statistic
z follows a standard normal distribution
under the null hypothesis
 The standard error of the difference between
Z
=
means
CY x
I
Homogeneity of variances (a criterion for the pooled
2sample ttest): The condition that the variances of
two populations are equal; to establish homo~eneity
of variances, test HO: 0)2 =
vs. H f 0) oF 022
(note that this is e~uivalent to testing
HO: o)2 / 02 2 = ) vs. Hfo) /°2 2 oF 1)
 Under the null hypothesis, the test statistic s 12/S22
follows an Fdistribution with degrees of freedom:
(n I  I , n2  1); if the test statistic exceeds the critical
value in Table C, then the null hypothesis can be
rejected at the indicated level of significance
oi
difference in means, the following statistic
can be used for hypothesis tests:
(x, x z)(f.1, 
f.1z)
Top row = .05; bottom row = .0 I;
Points for distribution of F
Degrees of freedom for numerator
 x
 When 01 and 02 ~re unknown, which is usu
ally the case, substitute s 1 and s2 for ° I and
02, respectively, in the above formulas, and
....
use the tdistribution with df= n I + n2  2
"IlIIIII  Pooled ttest
Both populations have normal distributions
n < 30
Requires homogeneity of variance: 01 and
02 are not known but assumed equal a
risky assumption!
~ Many
statisticians do not recommend the t
distribution with pooled standard error, the
above approach is more conservative
The hypothesis test may be 2 tailed (= vs. oF) or
1 tailed: III :5112 and the alternative is
111 >112 (or 11) ~112 and the alternative is
111 < 112)
Degrees of freedom (df):
(nl)+(nT1)= n 1+nT2
Use the given formula below for estimating
<7 ~  ~ to determine s ~  x2
DetermIne the critical region for rejection by
assigning an acceptable level of significance
and looking at the Hable with
df=nl +n22
Use the following formula for the estimated
standard ec;r:.:.r: :. o: . :r:_ _ _ _ _ _~,..._
Z 
=
s_= l(n,os,z+ (nz  l)si][n,+ nz]
XI
X,
n, + nz2
n,nz
~
Z
IU
~
Matched pairs: When making repeated
measurements of the same elements, we can
test for the mean difference
For instance, clients of a weightloss pro
gram might be weighed before and after the
program, and a significant mean difference
ascribed to the effectiveness
Standard error of the mean difference
General formula: sd =
J,; where sd is
the standard deviation of differences :
=
~
S;t=
 We can then test HO:
2
3
4
5
6
7
8
200
216
225
230
234
237
239
9
10
241 1 242
4052 4999 5403 5625 5764 5859 5928 5981 6022 16056
2
..
3
.S
E
98.49 99.01 99.17 19925 1 9930 1 99 .33 99.34 99.36 99.38 99.40
10.13 9.55 9.28 1 912 I 9.01 I 894 8.88 8.64 8.81 8.78
twotailed alternative by using a ttest statistic
distribution, in which case: z =
7.71
5
~
E
Q
1
.::...."
Q
6
~
7
."
6.94
6.59
6.39
6.26
6.16
6.08
6.04
6.00
In(lp  nn) In
5.96
21.20 18.00 16.69 11598 15.52 15.21 14.98 14.80 14.66 14.54
~ 8
9
6.61
5.79
5.41
5.19
5.05
4.88
4.95
4.82
4.78
4.74
16.26 13.27 12.06 11 .39 10.97 10.67 10.45 10.27 10.15 10.05
5.99
5.14
4.53
4.39
4.28
4.21
4.15
13.74 10.92 9.78
9.15
8.75
8.47
8.Z6
8.10
7.98
7.87
5.59
4.12
3.97
3.87
3.79
3.73
3.68
3.63
6.62
4.74
4.76
4.35
12.25 9.55
8.45
7.85
7.46
7.19
5.32
4.46
4.07
3.84
3.69
3.58
11.26 8.65
7.59
7.01
6.63
6.37
1
4.10
4.06
7.00
6.64
6.71
3 .50
3.44
3.39
3.34
6.19
6.03
5.91
5.82
5.12 4.26
3.86
3.63
3.48
3.37
3.29
3.23
3.18
3.1 3
10.56 8.02
6.99
6.42
6.06
5.80
5.62
5.47 I 5.35
5.26
2.97
4.10
3.71
3.48
3.33
3.22
3.14
3.07
3.02
10 10.04 7.56
6.55
5.99
5.54
5.39
5.21
5.08
4.95 14.85
4.96
• Correlation: A relationship between two variables
 The correlation coefficient r (also known as the
" Pearson
ProductMoment
Correlation
Coefficient") is a measure of the linear (straight
line) relationship between two quantitative vari
ables
 Ex: Given observations to two variables X and
Y, we can compute their corresponding sums of
squares:
SSx = I,(x(x x)/sx)2 and SSy = I,(y(y y)/s i
The formulas for the Pearson correlation (r):
Dx  x)(y ANALYSIS OF VARIANCE
(ANOVA)
 Purpose: To determine whether any significant dif
ference exists between more than two group means
 Indicates possibility of overall mean effect of the
experimental treatments; does not specify which of
the means are different
ANOVA: Consists of obtaining independent esti
mates from population subgroups
 The total sum of squares is partitioned into known
components of variation
 Partition of variances
Betweengroup variance (BGV): Reflects the
magnitude of the difference(s) among the group
means
 Withingroup variance (WGV): Reflects the dis
persion within each treatment group; also referred
to as the error term
Test
 When the BGV is large relative to the WGY, the F
ratio will also be large
BGV
=
nI(X: X;t)2
th
k' 1 0
where Xi = mean of i
treatment group and Xtot = mean of all n values
across all k treatment groups
SSI+ SS2+ . .. + SSk
WGV =
n _ k
where the SS's are
IJ.d = 0 versus a one or
proportion standard error of In (I  n) In
As the sample size increases, it concentrates more
around its target mean; also gets closer to the nomlal
4
c
..."...
t
18.51 19.00 19.16 19.25 19.30 19.33 19.36 1937 19.38 19.39
In random samples ofsize n, the sample proportion
p
fluctuates
around
the
population
.
.h
.
f n (l  n )
proportIOn p WIt a vanance 0
n
34.12 30.81 29.46 1 28.71 28.24 27.91 27.67 27.49 27.34 27.23
~
IU
~
1
161
1
Q
CY x
USING FRATIO: F=BGV/WGV
 Degrees of freedom are k I for the numerator
and n k for the denominator
If BGV>WGY, the experimental treatments are
responsible for the large differences among
group means
Null hypothesis: The group sample means are
all estimates of a common population mean;
that is, HO:I1) = 112 = 113 = ... = l1k ' for all k
treatment groups, vs. HI : at least one pair of
means is different (determining which pair(s)
are different requires followup testing)
Table C Critical Values of F
_ x 2 = j(cy~) In,+ (CY~) Inz
 Where (Ill  112) represents the hypothesized
Z=
r..rl] .'1 I :J.!.':i I ~ [tIl'/.!.':i r.!., ~ [~+"
the sums of squares [see Measures of Central
Tendency, page I] of each subgroup's values
around the subgroup mean
5
r =
y)
jSSx. SS)'
=
LXY _ CLX~LY)
r =rF====~"g,,===~
[LX2 (L;) ZllLy
2
(L,;) 2]
Note that I :5 r :5 1 for any data set; when r =
1, the data are said to have perfect positive cor
relation if plotted, they would form a straight
line with positive (upward) slope; when r =\.
the data are said to have perfect negative corre
lation if plotted, they would form a straight
line with negative (downward) slope; if r = 0,
the data are said to have no linear correlation;
(it is possible, of course, that they are related in
some other way)
Note: It is possible, of course,for a random sam
ple ji"Olll a population >,vith ::.ero correlation 10
produce by chan ce a sample with
O!
r'"
CHISQUARE ( X2) TESTS
 Most widely used nonparametric test
The X2 mean = its degrees of freedom
The X2 variance = twice its degrees of freedom
Can be used to test independence, homogeneity,
and goodnesso f fit
The square of a standard normal variable is a
chisquare variable with df = I
 Like the tdistribution, the shape of the di stribu
tion depends on the value of df
CHISQUARE (X2) TESTS (continued)
DEGREES OF FREEDOM (dt) COMPUTATION
•If chisquare tests for the goodnessoffit to a
hypothesized distribution (uses frequency distri
bution), df = g ~ I, where g = number of groups, or
classes, in the frequency distribution.
•If chisquare tests for homogeneity or independence
(uses twoway contingency table), df = (#of
rows ~ I)(# of columns ~ I)
Regression is a method for predicting values of
one variable (the outcome or dependent variable)
on the basis of the values of one or more
independent or predictor variables; fitting a
regression model is the process of using sample
data to determine an equation to represent the
relationship
GOODNESSOFFIT TEST: To apply the chisquare
distribution in this manner, the critical chisquare value
is expressed as:
L
if
~ 1)2
0
Ie '
frequency of the variable, fe = expected frequency
(based on hypothesized population distribution)
TESTS OF CONTINGENCY: Application of chi
square tests to two separate populations to test statis
tical independence of attributes
TESTS OF HOMOGENEITY: Application of chi
square tests to two samples to test if they came from
populations with like distributions
RUNS TEST: Tests whether a sequence (to comprise a
sample) is random; the following equations are applied:
2nl1l2 1 d
( 'R ) =~+
an SR
1l1+1l2
21111l,(2111112 ~ 1I1~1I,)
h
 were
)
(1I1+1l2)(nl+n2~1)
R = mean number of runs
nl = number of outcomes of one type
n2 = number of outcomes of the other type
SR = standard deviation of the distribution of the
number of runs
HYPOTHESIS TEST FOR
LINEAR CORRELATION
With a simple random sample of size n producing a
sample correlation coefficient r, it is possible to test
for the linear correlation in the population, P; that is,
we conduct the hypothesis test Ho: P = Po, versus a
right, left, or twotailed alternative; usually we are
interested in determining whether there is any linear
correlation at all; that is, po = 0
. . .
Th e test statIstIc
IS: t
=/
(I' 
(I
Po)
~r2)/(n~2)
which follows a tdistribution with n ~ 2 degrees of
freedom under Ho; this hypothesis test assumes that
the sample is drawn from a population with a bivariate
normal distribution.
'Ex: A simple random sample of size 27 produces a
correlation coefficient r = 0.41; is there sufficient
evidence at a = 0.05 of a negative linear
relationship?
Since we're testing for a negative linear relationship,
we need a lefttailed test: Ho: P = 0 vs. HI: P < 0;
the critical value can be found trom the tdistri
bution with n ~ 2 = 25 df, and onetailed a = 0.05;
since this is a lefttailed test, we take the negative:
1.708; that is, if the test statistic is less than
~I.708 , we conclude that there is sufficient
evidence of a negative linear relationship
The test statistic t =
 0.41
)(1 ~(~ 0.412)/(27 2))
SIMPLE LINEAR
REGRESSION
where fo = observed
~ 2.248, allowing us
to reject the null hypothesis of no linear correlation
and support the alternative hypothesis of a negative
linear correlation at a = 0.05
ISBN~13: 9781572229440
ISBN10. 1572229446
911~ 111,1ll ~~llllll~I!I!~llllrlllllr Ilil lI
In a simple linear regression model, we use only one
predictor variable and assume that the relationship to
the outcome variable is linear; that is, the graph of the
regression equation is that of a straight line; (we often
refer to the "regression line"); for the entire
population, the model can be expressed as:
y = ~ + ~lX + e
y is called the dependent variable (or outcome
variable), as it is assumed (0 depend on a linear
relationship to x
x is the independent variable, also called the
predictor variable
~ is the intercept of the regression line; that is, the
predicted value for y when x = 0
~I is the slope of the regression linethe marginal
change in y per unit change in x
e refers to random error; the error term is assumed (0
follow a normal distribution with a mean olzero and
constant variationthat is, there should be no
increase or decrease in dispersion for different
regions along the regression line; in addition , it is
assumed that error terms are independent for
different (x, y) observations
On the basis of sample data, we find estimates bo
and bl of the intercept ~o and slope ~l; this gives
us the estimated (or sample) regression equation
y= bo+ blX
The parameter estimates bo and bl can be derived in
a variety of ways; one of the most common is known
as the method of least squares; least squares
estimates minimize the sum of squared differences
between predicted and actual values of the dependent
variable y
For a simple linear regression model , the least
squares estimates of the intercept and slope are:
estimated slope = b 1 = SSxy 1 SSx
estimated intercept = bo =
Y~
bIx
These estimatesand other calculations
regressioninvolve sums of squares:
in
SSxy = I(x ~ x)(y ~ y) = Ixy ~ (Ix)(Iy)/n
SSx = I(x
~
x)2 = I(x 2) ~ (.h)2/n
SSy = I(y ~ Y)2 = I(y2) ~ (Iy)2/n
Ex: A simple random sample of 8 cars provides the
following data on engine displacement (x) and
highway mileage (y); fit a simple linear regression
model
U.S. $5.95 I CAN.$8.95
Customer Hotline # 1.800.230.9522
free downloads &
J'lUtldredS of tltres at
qUlcKsluay.com
(displacement) (mileage)
x
y
x2
5.7
18 32.49
2.5
19
6.25
3.8
20
14.44
19
7.84
2.8
4.6
17 21.16
1.6
32
2.56
1.6
29
2.56
1.4
30
1.96
SUMS: 24
184 89.26
y2
xy
324 102.6
361
47.5
400
76
361
53.2
78.2 ,....,.
289
,.,
1024 51.2
841
46.4
900
42
4500 497.1
Fitting a model entails computing the leastsquares
estimates bo and bl; note that there are 8 observationsthat is, n = 8
First, SSxy = Ixy ~ (Ix)(Iy)/n = ~54.9, SSx = I(x 2 )
CIx)2/n = 17.26, and SSy = I(y2) ~ (Iy)2/n = 268
~
Then the estimated slope is bl = SSxy/SSx = ~3.18 ,
and (he estimated intercept is bo = Y~ b l X = 32.54
The estimated regression model, then , is mileage =
32.54 ~ 3.18 displacement
SIGNIFICANCE OF A
REGRESSION MODEL
We can assess the significance of the model by testing to
see if the sample provides sufficient evidence of a linear
relationship in the population; that is, we conduct the
hypothesis test: Ho: ~1 = 0 versus HI: ~I;C 0; this is
exactly equivalent to testing for linear correlation in the
population: Ho: P = 0 versus HI: p;c 0; the test for
correlation is somewhat simpler:
.
. .
The correlatIOn coeffiCient r = /
SSxy
~SSx.SSy
The test statistic t =
(r~O)
I
f(l ~r 2)/(n ~ 2)
=
= 0.8072
~3.350
Consulting Table S, with degrees of freedom
=n~2=6 , we obtain a critical value of 3.143 at
a=0.02, and a critical value of 3.707 at a=O.OI;
since we have a twotailed test, we should consider
the absolute value of the test statistic, which exceeds
3.143, but does not exceed 3.707; that is. we can
reject Ho at a=0.02 but not at a=O.OI , so the p
value is between 0.02 and 0.01; (the actual pvalue,
which can be found using computer applications. is
0.0154); this is a reasonably significant model
LINEAR DETERMINATION
Regression models are also assessed by the coefficient
of linear determination, r2; this represents the
proportion of total variation in y that is explained by the
regression model; the coefficient of linear determination
can be calculated in a variety of ways; the easiest is to
compute r2 = (r)2; that is, the coefficient of determi
nation is the square of the coefficient of correlation
RESIDUALS
The difference between an observed and a fitted value of
y(y ~ y) is called a residual; examining the residuals is
useful (0 identify outliers (observations far from the
regression line, representing unusual values for x
and y) and to check the assumptions of the model
l'iOTICE TO STUU[NT: This Qu!ckQuickstud y® guide uJVers the b;;tsies of
[ntroductory Statistics. Due to its condensed fonnm, however, use it as a Statistics g il ide and
not :.IS a replacement for asslg!H"!d coune work.
All rIghts reserved, No pan of this publication may be reproduced or transmitlcli in fin}"
form, or by any means, electronic or mechanical, including photocopy, record ing. or any
information storage and retrieval system, without written permission fro lll the publi sher
.c 2002, 2UU5 BarCharts, Jnc, Boca Raton, FL 0308
6
6
3