Bruce L. Bowerman

Miami University

Richard T. O’Connell

Miami University

Emily S. Murphree

Miami University

Business Statistics in Practice

Using Modeling, Data, and Analytics

EIGHTH EDITION

with major contributions by

Steven C. Huchendorf

University of Minnesota

Dawn C. Porter

University of Southern California

Patrick J. Schur

Miami University

bow49461_fm_i–xxi.indd 1

20/11/15 4:06 pm

BUSINESS STATISTICS IN PRACTICE: USING DATA, MODELING, AND ANALYTICS, EIGHTH EDITION

Published by McGraw-Hill Education, 2 Penn Plaza, New York, NY 10121. Copyright © 2017 by McGraw-Hill

Education. All rights reserved. Printed in the United States of America. Previous editions © 2014, 2011, and

2009. No part of this publication may be reproduced or distributed in any form or by any means, or stored in a

database or retrieval system, without the prior written consent of McGraw-Hill Education, including, but not

limited to, in any network or other electronic storage or transmission, or broadcast for distance learning.

Some ancillaries, including electronic and print components, may not be available to customers outside the

United States.

This book is printed on acid-free paper.

1 2 3 4 5 6 7 8 9 0 DOW/DOW 1 0 9 8 7 6

ISBN 978-1-259-54946-5

MHID 1-259-54946-1

Senior Vice President, Products & Markets: Kurt L. Strand

Vice President, General Manager, Products & Markets: Marty Lange

Vice President, Content Design & Delivery: Kimberly Meriwether David

Managing Director: James Heine

Senior Brand Manager: Dolly Womack

Director, Product Development: Rose Koos

Product Developer: Camille Corum

Marketing Manager: Britney Hermsen

Director of Digital Content: Doug Ruby

Digital Product Developer: Tobi Philips

Director, Content Design & Delivery: Linda Avenarius

Program Manager: Mark Christianson

Content Project Managers: Harvey Yep (Core) / Bruce Gin (Digital)

Buyer: Laura M. Fuller

Design: Srdjan Savanovic

Content Licensing Specialists: Ann Marie Jannette (Image) / Beth Thole (Text)

Cover Image: ©Sergei Popov, Getty Images and ©teekid, Getty Images

Compositor: MPS Limited

Printer: R. R. Donnelley

All credits appearing on page or at the end of the book are considered to be an extension of the copyright page.

Library of Congress Control Number: 2015956482

The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a website does

not indicate an endorsement by the authors or McGraw-Hill Education, and McGraw-Hill Education does not

guarantee the accuracy of the information presented at these sites.

www.mhhe.com

bow49461_fm_i–xxi.indd 2

20/11/15 4:06 pm

ABOUT THE AUTHORS

Bruce L. Bowerman Bruce L.

Bowerman is emeritus professor

of information systems and

analytics at Miami University in

Oxford, Ohio. He received his

Ph.D. degree in statistics from

Iowa State University in 1974,

and he has over 40 years of

experience teaching basic statistics, regression analysis, time series forecasting,

survey sampling, and design of experiments to both

undergraduate and graduate students. In 1987 Professor

Bowerman received an Outstanding Teaching award

from the Miami University senior class, and in 1992 he

received an Effective Educator award from the Richard

T. Farmer School of Business Administration. Together

with Richard T. O’Connell, Professor Bowerman has

written 23 textbooks. These include Forecasting, Time

Series, and Regression: An Applied Approach (also

coauthored with Anne B. Koehler); Linear Statistical

Models: An Applied Approach; Regression Analysis:

Unified Concepts, Practical Applications, and Computer Implementation (also coauthored with Emily

S. Murphree); and Experimental Design: Unified

Concepts, Practical Applications, and Computer Implementation (also coauthored with Emily S. Murphree).

The first edition of Forecasting and Time Series earned

an Outstanding Academic Book award from Choice

magazine. Professor Bowerman has also published a

number of articles in applied stochastic process, time

series forecasting, and statistical education. In his spare

time, Professor Bowerman enjoys watching movies and

sports, playing tennis, and designing houses.

Richard T. O’Connell Richard

T. O’Connell is emeritus professor of information systems

and analytics at Miami University in Oxford, Ohio. He has

more than 35 years of experience teaching basic statistics,

statistical quality control and

process improvement, regression analysis, time series forecasting, and design of experiments to both undergraduate and graduate business

students. He also has extensive consulting experience

and has taught workshops dealing with statistical process control and process improvement for a variety of

companies in the Midwest. In 2000 Professor O’Connell

received an Effective Educator award from the

Richard T. Farmer School of Business Administration.

Together with Bruce L. Bowerman, he has written 23

textbooks. These include Forecasting, Time Series, and

Regression: An Applied Approach (also coauthored

with Anne B. Koehler); Linear Statistical Models:

An Applied Approach; Regression Analysis: Unified

Concepts, Practical Applications, and Computer Implementation (also coauthored with Emily S. Murphree);

and Experimental Design: Unified Concepts, Practical

Applications, and Computer Implementation (also

coauthored with Emily S. Murphree). Professor

O’Connell has published a number of articles in the

area of innovative statistical education. He is one of the

first college instructors in the United States to integrate

statistical process control and process improvement

methodology into his basic business statistics course.

He (with Professor Bowerman) has written several

articles advocating this approach. He has also given

presentations on this subject at meetings such as the

Joint Statistical Meetings of the American Statistical Association and the Workshop on Total Quality

Management: Developing Curricula and Research

Agendas (sponsored by the Production and Operations

Management Society). Professor O’Connell received

an M.S. degree in decision sciences from Northwestern University in 1973. In his spare time, Professor

O’Connell enjoys fishing, collecting 1950s and 1960s

rock music, and following the Green Bay Packers and

Purdue University sports.

Emily S.

Murphree is emerita professor

of statistics at Miami University

in Oxford, Ohio. She received

her Ph.D. degree in statistics

from the University of North

Carolina and does research in

applied probability. Professor

Murphree received Miami’s

College of Arts and Science Distinguished Educator

Award in 1998. In 1996, she was named one of Oxford’s

Citizens of the Year for her work with Habitat for Humanity and for organizing annual Sonia Kovalevsky

Mathematical Sciences Days for area high school girls.

In 2012 she was recognized as “A Teacher Who Made a

Difference” by the University of Kentucky.

Emily S. Murphree

iii

bow49461_fm_i–xxi.indd 3

20/11/15 4:06 pm

AUTHORS’ PREVIEW

1.4

Business Statistics in Practice: Using Data, Modeling,

and Analytics, Eighth Edition, provides a unique and

flexible framework for teaching the introductory course

in business statistics. This framework features:

Figure 1.6

(a) A histogram of the 50 mileages

We now discuss how these features are implemented in

the book’s 18 chapters.

Chapters 1, 2, and 3: Introductory concepts and

statistical modeling. Graphical and numerical

descriptive methods. In Chapter 1 we discuss

data, variables, populations, and how to select random and other types of samples (a topic formerly

discussed in Chapter 7). A new section introduces statistical modeling by defining what a statistical model

is and by using The Car Mileage Case to preview

specifying a normal probability model describing the

mileages obtained by a new midsize car model (see

Figure 1.6):

Percent

.5

33

.5

.0

32

33

.5

.0

31

32

.5

.0

30

31

.5

.0

29

30

.5

33

.5

.0

33

.0

.5

32

.0

31

.5

31

.0

30

.5

30

29

32

18

16

The exact reasoning behind

15 and meaning of this statement is given in Chapter 8, which discusses confidence intervals.

10

10

6

5

4

2

.5

33

.0

.5

33

32

32

.0

0

31

.5

and Minitab to carry out traditional statistical

analysis and descriptive analytics. Use of JMP and

the Excel add-in XLMiner to carry out predictive

analytics.

20

10

31

.0

• Use of Excel (including the Excel add-in MegaStat)

In Chapters 2 and 3 we begin to formally discuss the

statistical analysis used in statistical modeling and the

statistical inferences that can be made using statistical

models. For example, in Chapter 2 (graphical descriptive methods) we show how to construct the histogram

of car mileages shown in Chapter 1, and in Chapter 3

(numerical descriptive methods) we use this histogram

BI

to help explain the Empirical Rule. As illustrated in

Figure 3.15, this rule gives tolerance intervals providing estimates of the “lowest” and “highest” mileages

that the new midsize car model should be expected to

get in combined city and highway driving:

.5

students doing complete statistical analyses on

their own.

Mpg

all mileages achieved by the new midsize cars, the population histogram would look “bellshaped.” This leads us to “smooth out” the sample histogram and represent the population of

all mileages by the bell-shaped probability curve in Figure 1.6 (b). One type of bell-shaped

probability curve is a graph of what is called the normal probability distribution (or normal

probability model), which is discussed in Chapter 6. Therefore, we might conclude that the

statistical model describing the sample of 50 mileages in Table 1.7 states that this sample has

been (approximately) randomly selected from a population of car mileages that is described

by a normal probability distribution. We will see in Chapters 7 and 8 that this statistical

model and probability theory allow us to conclude that we are “95 percent” confident that the

sampling error in estimating the population mean mileage by the sample mean mileage is no

more than .23 mpg. Because we have seen in Example 1.4 that the mean of the sample of n 5

50 mileages in Table 1.7 is 31.56 mpg, this implies that we are 95 percent confident that the

true population mean EPA combined mileage for the new midsize model is between 31.56 2

.23 5 31.33 mpg and 31.56 1 .23 5 31.79 mpg.10 Because we are 95 percent confident that

the population mean EPA combined mileage is at least 31.33 mpg, we have strong statistical

evidence that this not only meets, but slightly exceeds, the tax credit standard of 31 mpg and

thus that the new midsize model deserves the tax credit.

Throughout this book we will encounter many situations where we wish to make a statistical inference about one or more populations by using sample data. Whenever we make

assumptions about how the sample data are selected and about the population(s) from which

the sample data are selected, we are specifying a statistical model that will lead to making

what we hope are valid statistical inferences. In Chapters 13, 14, and 15 these models become

complex and not only specify the probability distributions describing the sampled populations but also specify how the means of the sampled populations are related to each other

through one or more predictor variables. For example, we might relate mean, or expected,

sales of a product to the predictor variables advertising expenditure and price. In order to

150 a response variable

Chapter

3 sales to one

Descriptive

Statistics:variables

Numerical

and Some Predictive Analytics

relate

such as

or more predictor

soMethods

that we can

explain and predict values of the response variable, we sometimes use a statistical technique

called regression analysis and specify a regression model.

F i g u r e 3 . 1 5 Estimated Tolerance Intervals in the Car Mileage Case

The idea of building a model to help explain and predict is not new. Sir Isaac Newton’s

equations describing motion and gravitational attraction help us understand bodies in

motion and are used today by scientists plotting the trajectories of spacecraft. Despite their

Histogram of the 50 Mileages

successful use, however, these equations are only approximations to the exact nature of

25

motion. Seventeenth-century

Newtonian

has been superseded by the more sophis22 physics

22

ticated twentieth-century physics of Einstein and Bohr. But even with the refinements of

30

• Many new exercises, with increased emphasis on

Mpg

.0

in yellow and designated by icons BI in the

page margins—that explicitly show how

statistical analysis leads to practical business

decisions.

4

2

0

30

• Business improvement conclusions—highlighted

6

5

.5

learning by presenting new concepts in the

context of familiar situations.

10

10

29

• Continuing case studies that facilitate student

18

15

Percent

of probability, probability modeling, traditional

statistical inference, and regression and time series

modeling.

(b) The normal probability curve

22

16

• A substantial and innovative presentation of

• Improved and easier to understand discussions

22

20

Chapter 1 and used throughout the text.

business analytics and data mining that provides

instructors with a choice of different teaching

options.

A Histogram of the 50 Mileages and the Normal Probability Curve

25

• A new theme of statistical modeling introduced in

17

Random Sampling, Three Case Studies That Illustrate Statistical Inference

Mpg

30.8

30.0

29.2

Estimated tolerance interval for

the mileages of 68.26 percent of

all individual cars

32.4

Estimated tolerance interval for

the mileages of 95.44 percent of

all individual cars

33.2

34.0

Estimated tolerance interval for

the mileages of 99.73 percent of

all individual cars

Figure 3.15 depicts these estimated tolerance intervals, which are shown below the histogram.

Because

difference3:

between

the upper

and lower limitssections

of each estimated tolerance

Chapters 1,

2,theand

Six

optional

dis- interval is fairly small, we might conclude that the variability of the individual car mileages

BI business

around

the estimated

mean mileage of

31.6 mpg

is fairly small.

Furthermore, The

the interval

cussing

analytics

and

data

mining.

_

[x 6 3s] 5 [29.2, 34.0] implies that almost any individual car that a customer might pur-

chase this

year will is

obtainused

a mileagein

between

mpg

Disney Parks

Case

an29.2optional

section of

_ and 34.0 mpg.

Before continuing, recall that we have rounded x and s to one decimal point accuracy

order

to simplify ourhow

initial example

of the Empirical

Rule. If, instead,

we calculate

Chapter 1 toinEmpirical

introduce

business

analytics

and

data the

_

Rule intervals by using x 5 31.56 and s 5 .7977 and then round the interval endpoints to one

decimal

place accuracy

at the

end of the

calculations,

we obtain

the same inmining are used

to

analyze

big

data.

This

case

considtervals as obtained above. In general, however, rounding intermediate calculated results can

lead toDisney

inaccurate final

results. Because

this, throughout this

book we will avoid

greatly

ers how Walt

World

in ofOrlando,

Florida,

uses

rounding intermediate results.

next note

thatmany

if we actually

count

number of the

50_collect

mileages in Table

3.1 that

MagicBandsareWe

worn

by

of

its

visitors

to

mas_ the

_ contained in each of the intervals [x 6 s] 5 [30.8, 32.4], [x 6 2s] 5 [30.0, 33.2], and

3s] 5real-time

[29.2, 34.0], we find

that these intervals

contain, respectively,

34, 48,and

and 50 of

[x 6of

sive amounts

location,

riding

pattern,

the 50 mileages. The corresponding sample percentages—68 percent, 96 percent, and 100

percent—are

close

to

the

theoretical

percentages—68.26

percent,

95.44

percent,

and

purchase history

data. These data help Disney improve99.73

percent—that apply to a normally distributed population. This is further evidence that the

population

of all mileages

is (approximately)

distributed and

thus that the Empirivisitor experiences

and tailor

its normally

marketing

messages

cal Rule holds for this population.

To

conclude

this

example,

we

note

that

the

automaker

has

studied

the

city and

to different highway

typesmileages

of visitors.

At its Epcot park, combined

Disney

of the new model because the federal tax credit is based on these combined mileages. When reporting fuel economy estimates for a particular car model to the

public, however, the EPA realizes that the proportions of city and highway driving vary from

purchaser to purchaser. Therefore, the EPA reports both a combined mileage estimate and

separate city and highway mileage estimates to the public (see Table 3.1(b) on page 137).

iv

bow49461_fm_i–xxi.indd 4

20/11/15 4:06 pm

A Dashboard of the Key Performance Indicators for an Airline

Figure 2.35

Flights on Time

Average Load

Arrival

Average Load Factor

Breakeven Load Factor

90%

Departure

Midwest

50 60 70 80 90 100 50 60 70 80 90 100

85%

50 60 70 80 90 100 50 60 70 80 90 100

80%

Northeast

Pacific

75%

50 60 70 80 90 100 50 60 70 80 90 100

70%

South

Fleet Utilization

90

80

75

Short-Haul

95

100

85

80

75

70

90

International

95

85

100

90

80

100

75

70

95

70

Apr May June July Aug Sept Oct

Costs

10

Fuel Costs

Nov Dec

Total Costs

8

$ 100,000,000

Regional

85

Feb Mar

Jan

50 60 70 80 90 100 50 60 70 80 90 100

6

4

2

0

Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec

2.8

Flights on Time

Average Load

Arrival

The Number of Ratings and the Mean Rating for Each of Seven Rides at Epcot

Nemo & Friends

(0 5 Poor, 1 5 Fair, 2 5 Good, 3 5 Very Good, 4 5 Excellent, 5 5 Superb) and

Mission: Space green

an Excel

Mission:

SpaceOutput

orangeof a Treemap of the Numbers of Ratings and the Mean Ratings

Living With The Land

(a) The number

of

ratings

and the mean ratings DS DisneyRatings

Spaceship Earth

Test Track

Ride

Number of Ratings

Mean Rating

Soarin'

Soarin‘

2572

4.815

94

0

Test Track presented by Chevrolet

20

40 2045 60

80

Pacific

697

Living

With

725

Mission: Space orange

Midwest

Mission: Space

green

70%

Regional

2.186

Living With The Land

85

90

80

Short-Haul

95

85

100

75

90

80

100

75

70

85

90

100

70

Chapter 3

Descriptive Statistics: Numerical Methods and Some Pr

4

(Continued )

2

(e) XLMiner classification

tree using coupon redemption training data

0

(f) Growing the tree in (

Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec

Card

Figure 2.36

2.5

3.116

2.712

Nov Dec

Total Costs

6

0.5

,0.5; that is, Card 5 0

3.408

467

The Seas with Nemo

50 &

60Friends

70 80 90 100 50 60 1157

70 80 90 100

Fuel Costs

8

F i75g u r e 3 . 2 8

70

Apr May June July Aug Sept Oct

Costs

10

95

80

Feb Mar

Jan

International

182

95

Spaceship1.319

Earth

2.186

Mission: Space

The

Arrival orange

Departure

1589

Land

Breakeven Load Factor

75%

50 60 70 80 90 100 50 60 70 80 90 100

Fleet Utilization

Mission:the

Space

orange

1589or objective, which

3.408 represented by

graphs

single

primary

measure

to aMean

target,

F i g u rcompare

e 2.37

The

Number

of Ratings

and the

Rating for Each of SevenisRides

at Epcot

Mission:

Space green

467

3.116times uses five

a symbol on

the bullet

bullet

predicted

waiting

(0 5graph.

Poor, 1 The

5 Fair,

2 5 graph

Good, 3of5Disney’s

Very Good,

4 5 Excellent,

5 5 Superb) and

The Seas with Nemo & Friends

1157

2.712

colors ranging

fromandark

green

to

red

and

signifying

short

(0

to

20

minutes)

to

very

long (80

Excel Output of a Treemap of the Numbers of Ratings and the Mean Ratings

to 100 minutes) predicted waiting times. This bullet graph does not compare the predicted

(b) Excel

output

of the treemap

waiting times

to

an

objective.

However,

the

bullet

graphs

located

in

the

upper

left

of the

(a) The number of ratings and the mean ratings DS DisneyRatings

dashboard in Figure 2.35 (representing the percentages of on-time arrivals and departures

Ride

Number of Ratings

Mean Rating

4.8 example,

for the airline)

do displayTest

objectives

represented

vertical

Soarin'

Track

The

Seas Withby short

Mission:

Spaceblack lines. For

Soarin‘

2572

4.815

Nemo

Friends green

consider the bullet graphspresented

representing

the &percentages

of on-time arrivals and departures in

Test Track presented

2045

4.247

by by Chevrolet

the Midwest, which are shown

below.

3.6

Spaceship Earth Chevrolet

80%

50 60 70 80 90 100 50 60 70 80 90 100

100

4.247

725

85%

50 60 70 80 90 100 50 60 70 80 90 100

South

Chapter 2 Spaceship

Descriptive

Analytics

Earth Statistics: Tabular and Graphical

697 Methods and Descriptive

1.319

Living With The Land

50 60 70 80 90 100 50 60 70 80 90 100

Northeast

$ 100,000,000

Figure 2.37

Average Load Factor

90%

Departure

Midwest

$0.5; that is, Card 5 1

Excel Output of a Bullet Graph of Disney’s 15

Predicted Waiting Times (in minutes) for 9

the Seven Epcot Rides Posted at 3 p.m. on February 21, 2015 DS DisneyTimes

Purchases

1.3

51.17

,51.17

(b) Excel output of the treemap

13

Nemo & Friends

Mission: Space green

Mission: Space orange

Living With The Land

Spaceship Earth

Test Track

Soarin'

4.8

Test Track

The Seas With

Mission: Space

presented

Nemo & Friends green

The airline’s objective

by was to have 80 percent of midwestern arrivals be on time. The

approximately 75 percent

of actual midwestern arrivals

that were

on time is in3.6

the airline’s

Chevrolet

Living

Spaceship

light brown “satisfactory” region of Mission:

the bullet

graph,With

but this 75Earth

percent does not reach the

Space

The

2.5

80 percent objective.

orange

Land

Soarin'

Purchases

$51.17

22.52

,22.52

2

7

helps visitors choose their next ride by continuously rides posted by Disney

. on February

21,33.95

2015,

at 3 p.m

1

,33.95

$33.95

0

0

0

2

0

summarizing predicted waiting times for seven popular and Figure 2.37p 5shows

a

treemap

illustrating

ficticious

4

3

50

51

50

p5

p5

13

2

2

rides

on

large

screens

in

the

park.

Disney

management

visitor

ratings

of

the

seven

Epcot

rides.

Other

graphics

Treemaps We next discuss treemaps, which help visualize two variables. Treemaps

1

1

display uses

information

series of clustered

rectangles,

represent a whole.

The sizes planalso

thein ariding

pattern

datawhich

it collects

to1.3 make

discussed in the optional section on descriptive

analyt-2

4

of the rectangles represent a first variable, and treemaps use color to characterize the vari5 .667

51

p5

p5

graphs compare the single primary measure to a target, or objective, which is represented by

4

ning

decisions,

as isaccording

shown

by variable.

the following

business

gauges,

datatimesdrill-down

graph-3

ous rectangles

within the treemap

to a second

For example, suppose

a symbolics

on theinclude

bullet graph. The

bullet graph ofsparklines,

Disney’s predicted waiting

uses five

(as a purely hypothetical example) that Disney gave visitors at Epcot the voluntary opporcolors ranging from dark green to red and signifying short (0 to 20 minutes) to very long (80

improvement

conclusion

from toChapter

ics, and

combining

graphics

tunity to use their personal

computers or smartphones

rate as many 1:

of the seven Epcot

to 100 minutes)

predicteddashboards

waiting times. This bullet

graph does not compare

the predictedillustrating a

rides

desired objective

on a scalewas

from

to 5.80Here,

0 represents

“poor,”

1 represents

“fair,”

waiting times to an objective. However, the bullet graphs located in the upper left of the

Theasairline’s

to 0have

percent

of midwestern

arrivals

be on time.

The2

business’s

key

performance

indicators.

For example,

dashboard in Figure 2.35 (representing

percentages

of on-time

arrivals and departures

represents “good,”

3 represents

“very

good,” 4arrivals

represents

“excellent,”

and

represents

(g) the

XLMiner

best pruned

classification

tree using the validation data

approximately

75 percent

of actual

midwestern

that were

on time is

in 5the

airline’s

for the airline) do display objectives represented by short vertical black lines. For example,

“superb.”

Figure

2.37(a)

gives

of graph,

ratings

andthis

the75

mean

rating

fornot

each

ridethe

on

…As

a matter

of

fact,

Channel

13

News

light

brown

“satisfactory”

regiontheofnumber

the bullet

but

percent

does

reach

Figure

2.35

is

a

dashboard

showing

eight

“flight on

consider the bullet graphs representing the percentages of on-time arrivals and departures in

a particular

day. (These data are completely fictitious.) Figure 2.37(b) shows the Excel

80

percent objective.

the Midwest, which are shown below.

in

reported

on

6,

Card

output

of Orlando

a treemap, where

the size and color

ofMarch

the rectangle

for2015—

a particular ride repretime”

bullet

graphs

and

three

“flight

utilization”

gauges

We next

treemaps,

which

help

twofor

variables.

Treemaps

sent, respectively,

the discuss

total number

of ratings

the visualize

mean rating

the ride. Treemaps

The colors

0.5

display

information

in a(signifying

series

of clustered

which

representora5,

whole.

sizes

the

writing

ofand near

this

case—that

rangeduring

from

dark green

a mean rectangles,

rating

the “superb,”

level)The

to white

for

an

airline.

of

the rectangles

represent

a firstthe

variable,

and1,treemaps

colorby

to the

characterize

theonvari5

11

(signifying

a mean

rating near

“fair,” or

level), asuse

color scale

the

Disney

had

announced

plans

toshown

add For

a third

ous

rectangles

within

treemap

to

a second

example,

suppose

treemap.

Note that

sixthe

of the

sevenaccording

rides are rated

to be atvariable.

least “good,”

four of the

seven

Chapter 3 containsPurchases

four optional sections

that disPurchases

(as

a

purely

hypothetical

example)

that

Disney

gave

visitors

at

Epcot

the

voluntary

opporrides“theatre”

are rated to be at for

least “very

good,” and(a

one ride

is rated asride)

“fair.” Many

treemaps

Soarin’

virtual

in

tunity

to

use

their

personal

computers

or

smartphones

to

rate

as

many

of

the

seven

Epcot

cuss

six

methods

of

predictive

analytics.

The

methods

51.17

22.52

use a larger range of colors (ranging, say, from dark green to red), but the Excel app we

rides

asobtain

desiredFigure

on a scale

from

to 5.range

Here,

represents

“poor,”

1 represents

“fair,”

2

used order

to

2.37(b)

gave0long

the

of 0colors

shown

in that figure.

Also, note

that

to

shorten

visitor

waiting

times.

0 practical

11 way

5

0 applied and

represents

“good,”

3

represents

“very

good,”

4

represents

“excellent,”

and

5

represents

discussed

are

explained

in

an

treemaps are frequently used to display hierarchical information (information that could

“superb.”

Figure

givesdifferent

the number

of ratings

and be

theused

meantorating

ride on

be displayed

as a 2.37(a)

tree, where

branchings

would

show for

the each

hierarchical

by

using

the

numerical

descriptive

statistics

previously

ainformation).

particular day.

(These

data

are

completely

fictitious.)

Figure

2.37(b)

shows

the

Excel

For example, Parks

Disney could

have visitors

voluntarily

rate the

in each

0

1

0

1

The

Disney

Case

isrectangle

also

used

inrides

an

optional

output

of aOrlando

treemap,

where the size and

color

of the

for

a particular

repreof its four

parks—Disney’s

Magic

Kingdom,

Epcot, Disney’s

Animalride

Kingdom,

discussed in Chapter

sent,

respectively,

the totalStudios.

number A

of treemap

ratings and

the mean

rating for by

thebreaking

ride. Theacolors

0 3. These2methods are:

0

6

and

Disney’s

Hollywood

would

be

constructed

large

50

51

50

5 .857

p5

p5

p5

p5

section

2 rating

to nearhelp

discuss

range from darkof

green Chapter

(signifying a mean

the “superb,”

or 5, level)descriptive

to white

13

2

2

7

(signifying a mean rating near the “fair,” or 1, level), as shown by the color scale on the

Classification

tree

modeling

and

regression

tree

analytics.

Specifically,

Figure

2.36

shows

a

bullet

graph

•

treemap. Note that six of the seven rides are rated to be at least “good,” four of the seven

rides are rated to be at least

“very good,” and

one ride is rated

as “fair.”

treemapsEpcot

modeling (see

Section

3.7

thepruned

following

figures):

summarizing

predicted

waiting

times

forManyseven

(i) XLMiner

training

dataand

and best

Fresh demand

regression

use a larger range of colors (ranging, say, from dark green to red), but the Excel app we

^

0

20

40

80

BI

177

Decision Trees: Classification Trees and Regression Trees (Optional)

JMP Output of a Classification Tree for the Card Upgrade Data

Figure 3.26

DS

CardUpgrade

Arrival

Departure

Midwest

50 60 70 80 90 100 50 60 70 80 90 100

1

Upgrade

0.50

0

0.25

^

0.00

^

^

^

Purchases

,39.925

Purchases.539.925

PlatProfile

Purchases.526.185

(0)

PlatProfile(1)

Purchases.532.45

Purchases,26.185

Purchases,32.45

All Rows

used to obtain Figure 2.37(b) gave the range of colors shown in that figure. Also, note that

treemaps are frequently used to display hierarchical information (information that could

be displayed as a tree, where different branchings would be used to show the hierarchical

information). For example, Disney could have visitors voluntarily rate the rides in each

of its four Orlando parks—Disney’s Magic Kingdom, Epcot, Disney’s Animal Kingdom,

and Disney’s Hollywood Studios. A treemap would be constructed by breaking a large

Split

Prune

Color Points RSquare

0.640

tree (for Exercise 3.57(a))

N Number of Splits

40

4

DS

Fresh2

All Rows

Count

G^2

40 55.051105

Level

0

1

Rate

AdvExp

LogWorth

6.0381753

22

0.4500 0.4500

18

9

PurchasesϾϭ32.45

G^2

Count

21 20.450334

Level

0

1

Rate

4

0.8095 0.7932

17

Rate

Level

1

0.9375 0.9108

15

0

1

0

1

0

1

Rate

18

0.0526 0.0725

1

3

0.4000 0.4141

2

Level

0

1

Rate

G^2

Count

5 5.0040242

Prob Count

0.0000 0.0394

0

1.0000 0.9606

11

Level

0

1

Rate

Prob Count

0.2000 0.2455

1

0.8000 0.7545

4

0

Error Report

Class

# Cases # Err

0

1

Overall

(h) Pruning the tree in (e)

Nodes

% Error

4

3

2

1

0

6.25

6.25

6.25

6.25

62.5

Best P

& Min

Classification Confusion M

Predicted C

Actual Class

0

0

5

1

0

Error Report

Class # Cases # Errors

0

6

1

10

Overall

16

3

AdvExp

5.525

7.075

8

6

Count

10

Prob Count

0.8889 0.8588

8

0.1111 0.1412

1

Level

0

1

Prob. for 0

Cust. 1

Cust. 2

1

PriceDif

7.45

G^2

0

Rate

For Exercise 3.56 (d) and (e)

8.022

0.3

Prob. for 1

0.142857

0

0.857143

1

9.52

Prob Count

1.0000 0.9625

10

0.0000 0.0375

0

3

5

8.587

Predicted

Value

8.826

PriceDif

8.826

8.021667

v

bow49461_fm_i–xxi.indd 5

16

8

24

9

AdvExp

PurchasesϽ26.185

G^2

Count

9 6.2789777

Prob Count

0.6000 0.5859

Prob Count

0.9474 0.9275

PurchasesϾϭ26.185

G^2

Count

5 6.7301167

Level

Rate

LogWorth

0.1044941

PurchasesϽ39.925

G^2

0

Rate

LogWorth

0.937419

Prob Count

0.0625 0.0892

PurchasesϾϭ39.925

Count

11

Level

PlatProfile(0)

G^2

Count

16 7.4813331

Level

G^2

Count

19 7.8352979

Prob Count

PlatProfile(1)

0

1

PurchasesϽ32.45

LogWorth

1.9031762

0.1905 0.2068

15

6.65

Prob Count

0.5500 0.5500

0

1

^

Partition for Upgrade

1.00

0.75

Actual Class

0

100

^

3.7

33.33333

20.83333

12.5

4.166667

4.166667

Classification Confusio

Predicte

^

60

% Error

0

1

2

3

4

$22.52

2

Purchases

^

Nodes

➔

Chapter 2

A Dashboard of the Key Performance Indicators for an Airline

Figure 2.35

Excel Output of a Bullet Graph of Disney’s Predicted Waiting Times (in minutes) for

the Seven Epcot Rides Posted at 3 p.m. on February 21, 2015 DS DisneyTimes

Descriptive Statistics: Tabular and Graphical Methods and Descriptive Analytics

Figure 2.36

94

93

Descriptive Analytics (Optional)

20/11/15 4:06 pm

0.3

0.1

AdvExp

6.

6.

(a) The Minitab output

Dendrogram

Complete Linkage

192

0.00

Chapter 3

Sport

SportCluster ID

Boxing

Boxing

5

Basketball

Basketball

4

Golf

Similarity

Swimming

33.33

66.67

100.00

1

4

BOX

SKI

5

7

9

12

SWIM P PONG H BALL TEN

10

3

1

12

8

TR & F GOLF BOWL BSKT

HKY

1

Descriptive

Statistics:

2

Golf

3

Swimming

Skiing

Skiing valthe centroids of each cluster (that

is, the six mean

Baseball

Baseball

ues on the six perception scales

of the cluster’s

memPing-Pong

Ping-Pong

bers), the average distance of each

cluster’s members

Hockey

Hockey

from the cluster centroid, and the distances between

Handball

Handball

the cluster centroids DS SportsRatings

Track & Field Track & Field

a Use the output to summarize

the

members

of each

Bowling

Bowling

36 cluster.

Tennis

FOOT

Tennis

BASE

Dist.

Clust-1

Dist.

Clust-2 Dist.

Dist.

Clust-3 Dist.

Dist.

Clust-4 Dist.

Dist.

Clust-5 Dist. C

Cluster

ID Dist.

Clust-1

Clust-2

Clust-3

Clust-4

5

1

3

4

3

3

2

3

5.64374

5

3.350791

4

Numerical

5.192023

2

5.649341

5.64374

6.914365

3.350791

Methods

and

0.960547

5.192023

4.656952

3

5.528173

5

3.138766

4.656952

4.049883

5.528173

Method 5 Complete

Dendrogram

• Boxing

Chapter

3

• Basketball

Some

Predictive3.825935

Analytics

3.825935

7.91777

0.960547

1.836496

3.138766

2.58976

4.049883

4.171558

3

4.387004

4

3.396653

4.171558

7.911413

4.387004

6.290393

1.836496

6.107439

2.58976

1.349154

3.396653

6.027006

7.911413

XLMiner Output for Exercise 3.61

(b) The JMP output

5.353059

4.289699

1.315692

4.599382

b By using the members of each cluster and the clus0

4.573068

3.76884

3.928169

0

4.573068

3.76884

ter1 centroids, discuss

the basic

differences between

4.081887

3.115774

1.286466

6.005379

3

4.081887

1.286466

the

clusters.

Also, discuss 3.115774

how this k-means

cluster

4.357545

7.593

5.404686

0.593558

4

4.357545

7.593

5.404686

analysis

leads

to the same practical

conclusions

3.621243

3.834742

0.924898

4.707396

3

3.834742

0.924898

about

how3.621243

to improve the popularities

of baseball

3.362202

3.4311

1.201897

4.604437

3

3.4311

1.201897

and

tennis 3.362202

that have been obtained

using the

previ4.088288

0.960547

2.564819

7.005408

2

4.088288

ously

discussed

hierachical0.960547

clustering. 2.564819

Football

4

3.8 andFootball

the following

figures):

• Hierarchical clustering and k-means clustering (see Section

196

4.289699

5.649341

4.599382

6.914365

5.417842

1.349154

1.04255

6.027006

2.382945

5.353059

5.130782

1.315692

2.3

4.546991

7.91777

3.223697

6.290393

4.5

2.382945

6.107439

5.052507

3.928169

2.3

3.434243

6.005379

5.110051

0.593558

3.4

2.643109

4.707396

2.647626

4.604437

2.6

4.257925

7.005408

2.710078

5.417842

4.2

5.712401

1.04255

5.7

} }

Sport

Cluster ID Dist.Cluster

Clust-1 Dist.

Clust-2

Dist.

Dist.Summary

Clust-4Ncon

Dist.

Fast

Compl

Team

Opp

Cluster

Fast Clust-3

Compl

Team

Easy Clust-5

Ncon

ChapterEasy

201

Descriptive Statistics: Numerical

Methods and Some

AnalyticsCluster-1

Cluster-1

4.78

2.16

3.33

3.6

2.67 3.6

Boxing

5 Predictive

5.64374

5.649341 4.184.784.289699

5.353059

4.18

2.16

3.332.382945

Opp Summary

Chapter

5.1

3.2

5.0

5.1

2.6

2.7

2

2.67

In the real world,

companies

such as Amazon

In the

Netflix

real world,

sell or companies

rent

thousands

such

oraseven

Amazon

and Netflix

sell or rent

thousands

or even

Cluster-2

5.6 and

4.825

5.99

3.475

1.71

3.92

4

3.350791

6.914365

1.315692

Cluster-2

5.64.599382

4.825

5.99

3.4755.130782

1.71 These

3.92are These are

millions of items and find association rules based

millions

on millions

of items of

and

customers.

find association

In orderrules

to based on millions of customers.

In order to

the3.022

centroids

the centroids

Cluster-3

2.858

4.796

5.078

3.638

2.418

3.022

Cluster-3

2.858

4.796association

5.078

3.6384.546991

2.418

2meaningful

5.192023

0.960547

3.825935

7.91777

make

obtaining

association

rules

manageable,

make

obtaining

these

meaningful

companies

break

products

rules

manageable,

these

companies

break

products

F• Golf

igure 3.35

Minitab Output of a Factor Analysis of the Applicant Data (4 Factors

Used)

Cluster-4

1.99

3.253333

1.606667

4.62

5.773333

2.363333

Cluster-4

1.99

3.253333

1.606667

4.623.223697

5.773333 (for

2.363333

• Bowling

Swimming

3 obtaining

4.656952

3.138766

1.836496

6.290393

for which they are

association rules

into

forvarious

which

they

categories

are

obtaining

(for example,

association

comedies

rules

into various

categories

example, comedies

• Baseball

Cluster-5

2.6or thrillers)

4.61

6.29

5

4.265 5related

3.22

or thrillers) and5hierarchies

(for example,Cluster-5

a hierarchy

related

and

to how

hierarchies

new

the(for

product

example,

is). a hierarchy

to

how new the3.22

product is).

2.6

4.61

6.29

4.265

Skiing

5.528173

4.049883

2.58976

6.107439

2.382945

Principal

• Ping-Pong Component Factor Analysis of the Correlation Matrix

• Handball

Baseball

1

0

4.573068

3.76884

3.928169

5.052507

Unrotated

Factor Loadings and Communalities

• Tennis

Ping-Pong

3

4.081887

3.115774

1.286466

6.005379

3.434243

• Swimming

Cluster

#Obs

Avg. #Obs

Dist

Distance

Avg. Dist

Distance

Variable

Factor1

Factor2

Factor3

Factor4 for Section

Communality

Exercises

3.10Cluster

Exercises

for Section

3.10

• Track & Field

Hockey

4

4.357545

7.593

5.404686

0.593558 Between

5.110051

Between

Cluster-1

1

0

Cluster-1

1

0

Var

1

0.447

20.619

20.376

20.121

0.739

• Skiing

CONCEPTS

CONCEPTS3.69 The XLMiner output

of an association

rule analysis

3.69Cluster-2

of The XLMinerCluster-3

output of an association

rule analysis o

Centers

Cluster-1

Handball0.020

3.6212430.422

3.834742

0.924898

4.707396

2.643109

Centers

Cluster-1

Cluster-2 Cluster-4

Cluster-3 Clu

Clu

Cluster-2

2

0.960547

Var 2

0.583

0.050

0.2823

Basketball

• Hockey

• Football

Golf

Cluster-2

2 DVD0.960547

the

data using

a specified support perthe DVD renters data using a specified support per3.66 What is the purpose of association rules?

3.66 What is the purpose

ofrenters

association

rules?

Cluster-1

0per- 4.573068

3.76884

3.928169

Track & Field

3.3622020.881 Cluster-3

3.4311

1.201897

4.604437

2.647626

Cluster-1

4.573068

3.76884

Var 3

0.109

20.339

0.494

0.7143

centage

of 40 percent

and a specified

confidence

centage of040 percent

and a specified

confidence per-5

Cluster-3

5

1.319782

5

1.319782

3.67

Discuss

the

meanings

of

the

terms

support

percentage,

3.67

Discuss

the

meanings

termsissupport

DS DVDRent centage

DS DVDRen

Varwill

4 illustrate k-means

0.617

0.3572

0.877

centage ofof70the

percent

shownpercentage,

below.

of0 70 percent

is shown

Cluster-2

4.573068

3.112135

7.413553

4

Cluster-2

4.573068

0 below.

3.112135

4.088288

0.960547

2.564819

7.005408

4.257925

We

clustering by using0.181

a real dataBowling

mining20.580

project. For

Cluster-4

0.983933

Cluster-4 3

3

0.983933

confidence

percentage,

and

lift

ratio.

confidence

percentage,

and

lift

ratio.

Var 5

0.356

0.299 the

20.179

0.885

a Summarize the

recommendationsCluster-3

based

on a lift

a 3.76884

Summarize the3.112135

recommendations

based on0a lift 2

confidentiality

purposes, we0.798

will consider a fictional

grocery chain. However,

Cluster-3

3.76884

0

5.276346

Tennis

4.1715580.825 Cluster-5

3.396653

1.349154

5.417842

2.710078 3.112135

Cluster-5

2

2.382945

2 ratio2.382945

Var 6 reached are real.0.867

0.185

20.0693

conclusions

Consider, then, the Just

Right grocery chain,0.184

which has

greater than 1.

ratio greater than 1.

3.928169

7.413553

5.276346

0

5

AND

METHODS

AND6.027006

APPLICATIONS Cluster-4

3.928169

7.413553

5.276346

2.3Var

million

holders. Store managers

are interested

in clustering

Football

4 APPLICATIONS

4.3870040.855 Overall

7.911413

1.04255 ofCluster-4

5.712401

Overall

13

1.249053

7 store loyalty card0.433

0.582

20.360 their METHODS

0.446

13

1.249053

b Consider

the recommendation

DVD B

based on b Consider the recommendation of DVD B based on

customers

whose shopping habits

tend to be similar. They

expect to 3.68 20.228

Cluster-5

5.052507

5.224902

In the previous XLMiner output,

show how the 3.68

lift

In the previous XLMiner

output,

the Cluster-5

lift

Var 8 into various subgroups

0.882

0.056

0.248

0.895

5.052507

2.622167

having rented

C &show

E. (1)how

Identify

and

interpret the 4.298823

having rented 2.622167

C4.298823

& E. (1) Identify

and interpret the

find that certain customers tend to buy many cooking basics like oil, flour, eggs, rice, and

ratio

of 1.1111(rounded) for the0.779

recommendation of C ratio of 1.1111(rounded)

Var 9

0.365

20.794

20.093

0.074

support forfor

C the

& E.recommendation

Do the same for of

theCsupport for

support for C & E. Do the same for the support for

raw chickens, while others are buying prepared items from the deli, salad bar, and frozen

to renters of B has been calculated.

Interpret this lift to renters of B has

Interpret

lift

C &been

E &calculated.

B. (2) Show

how thethis

Confidence%

of 80

C & E & B. (2) Show how the Confidence% of 80

Var

10Perhaps there are other

0.863

20.069

20.166

0.787

food

aisle.

important categories

like calorie-conscious,0.100

vegetarian,

Cluster 0.256 Fast 20.209

Compl

Team 0.879Easy

Ncon

Opp

ratio.

ratio.

has been calculated. (3) Show how the Lift Ratio of

has been calculated. (3) Show how the Lift Ratio o

Var 11

0.098

or premium-quality

shoppers. 0.872

1.1429 (rounded) has been calculated.

1.1429 (rounded) has been calculated.

The

don’t know 0.908

what the clusters are and

hope the data

will enlighten

Varexecutives

12

0.030

0.135them.4.78

0.097

Cluster-1

4.18

2.16 0.8523.33

3.6

2.67

LO3-10

LO3-10

They

choose

to

concentrate

on

100

important

products

offered

in

their

stores.

Suppose

that

Var 13

0.913

20.032

0.073

0.218

These are

Cluster-2

4.825

5.99 0.888

3.475

1.71

3.92

product 1 is fresh strawberries, product 2 is olive oil, product 3 is hamburger

buns, and prod- 5.6

Var 14

0.710

0.115

20.558

0.884

uct

4 is potato chips.

having a Just

Right loyalty

card,

they willMethods

know the

Chapter

3 For each customer

Descriptive

Statistics:

Numerical

and20.235

Some

Analytics

Interpret

the 3.638

the centroids

the 2.418

Cluster-3

4.796Predictive

5.078

3.022

Var 15

0.646

20.604

20.107 2.858

20.029

0.794Interpret

Rule: If all

Antecedent items are purchased,

then with

Rule:Confidence

If all Antecedent

percentage

itemsConsequent

are purchased,

items

then

willwith

alsoConfidence

be purchased.

percentage Consequent items will also be purchase

}

3.9 Factor

(Optional

and Requires

3.9 Analysis

Factor Analysis

(Optional

and Requi

Section

3.4)

Section

3.4)

• Factor analysis and association rule mining (see Sections 3.9 and 3.10 and the following figures):

196

information provided

information

provided 2.363333

1.606667

4.62 5.773333

Factor analysis

starts

with a large

of correlated

variables

andvariables

attemptsand

to fin

Factor

analysis

starts number

with a large

number of

correlated

a

Variance

7.5040

2.0615

1.4677

1.2091

12.2423

by

a6.29

factor

analysis

a 5factor

analysis

Row

Antecedent

(x) by

Consequent

Row

(y)

ID Support

Confidence%

for x3.22

Support

Antecedent

for y (x)

Support

Consequent

for x & y

(y) Support

Lift Ratio

for x Support for y Support for x & y

Lift Ra

Cluster-5

2.6 ID Confidence%

4.61

4.265

%

0.500 Output of0.137

0.081

0.816

underlying

uncorrelated

factors

that

describe

the

“essential

aspects”

of

the

large

nu

underlying

uncorrelated

factors

that

describe

the

“essential

aspects”

of

F Var

igure 3.35

Minitab

a Factor Analysis 0.098

of the Applicant

Data

(4

Factors

Used)

(Optional).

1

71.42857143 B

A

7 B

7

A

5 1.020408163 7

7

5 1.0204081

(Optional). 1 71.42857143

Cluster-4

1.99

3.253333

correlated variables. To illustrate factor analysis, suppose that a personnel officer h

To 5illustrate

factor

analysis,7 suppose that5 a person

2

71.42857143 A

B

2

71.42857143

7 correlated

A

7variables.

B

1.020408163

7

1.0204081

Rotated Factor Loadings and Communalities

85.71428571 A

C

3

85.71428571

A

C

6applicants

0.952380952

7 sales

6 0.9523809

viewed

and 7rated

48 and

job9 applicants

sales

positions

on positions

the 9following

15following

variables

Principal

Component Factor Analysis of the Correlation Matrix 3

viewed

rated

48 jobfor

for

on the

Varimax Rotation

Cluster

#Obs

Avg. DistC

Distance

4

77.77777778

B

4

77.77777778

9 C

7

B

7 1.111111111 9

7

7 1.1111111

Unrotated

Loadings and Communalities

Variable Factor Factor1

Factor2

Factor3

Between

15 Form of100

6 Lucidity

11 Ambition

1 B Form of

application

letter

6 Lucidity

11

Cluster-1

1

5Factor4

100 0

BCommunality

C

7application

9 letter

C

7 1.111111111

7

9

7 1.1111111

Variable

Factor1

Factor2

Factor3

Var

1

0.114

20.833✓

0.739

Centers

Cluster-1

Cluster-3

6Factor4

71.42857143

BCommunality

&C

A

71.42857143

7 Cluster-2

B&C

7

A

5 Cluster-4

1.020408163 7 Cluster-5 7

5 1.0204081

Cluster-220.111

220.138

0.960547

26 Appearance

7 Honesty 7 Honesty

12 Grasp

2

Appearance

12

Var

2

0.440

20.150

20.394

0.226

0.422

7 20.121

83.33333333 A & C

B

7

83.33333333

C

7

B

1.19047619

7

5

1.190476

Cluster-1

06 A &4.573068

3.768845

3.9281696

5.052507

Var 1

0.447

20.619

20.376

0.739

Cluster-3

5

1.319782

Var 2

3

Var

Var 3

4

Var

Var 4

5

Var

Var

6

Var 5

Var

7

Var 6

Var 7

8

Var

Var 8

9

Var

Var

10

Var 9

Var

11

Var 10

Var 11

12

Var

Var 13

Var 12

Var 14

Var 13

Var 15

Var 14

Var 15

Variance

% Var

Variance

% Var

0.061

0.583

0.216

0.109

0.919 ✓

0.617

0.864 ✓

0.798

0.217

0.867

0.918 ✓

0.433

0.085

0.882

0.796 ✓

0.365

0.916 ✓

0.863

0.804 ✓

0.872

0.739 ✓

0.908

0.436

0.913

0.379

0.710

0.646

5.7455

0.383

7.5040

0.500

20.127

0.050

20.247

20.339

0.104

0.181

20.102

0.356

0.246

0.185

20.206

0.582

20.849✓

0.056

20.354

20.794

20.163

20.069

20.259

0.098

20.329

0.030

20.364

20.032

20.798✓

0.115

20.604

2.7351

0.182

2.0615

0.137

0.881

C

100

5 3 ability

A Academic

&B

9

C

5 1.111111111

5

9

5 1.1111111

38 Academic

8 Salesmanship

13 Potential

84.298823

Salesmanship

13

0.422

Cluster-2

4.573068

0 ability

3.1121355

7.413553

0.877

A&C

9

71.42857143

7 B

6

A&C

1.19047619 7

6

5

1.190476

0.881

4

Likability

9

Experience

14

Keenness

92.622167

Experience

14

20.162

20.062

0.885

Cluster-3

3.768847 4 A Likability

3.112135

05 1.020408163

5.2763467

10 2

71.42857143

A

B&C

10

71.42857143

7

B&C

7

5 1.0204081

Cluster-5

2.382945

20.580

0.357

0.877

20.259

0.825

11 0.006

83.33333333 E

C

83.33333333

6 E 7.413553

9

C 5.2763465 0.925925926 06

9

5 0.9259259

Cluster-4

3.928169

0.299

20.179

0.885

511 Self-confidence

10 Drive 105.224902

15 Suitabilit

5 Self-confidence

Drive

15

Overall 20.864

13

1.249053

✓

0.855

12 0.003

80 C & E

B

12

580 C & E

7

B

4 1.142857143 5

7

4 1.1428571

0.184

20.069

0.825

Cluster-5

5.052507

4.298823

2.6221674 1.111111111

5.2249024

0

20.088

0.895

13 20.049

100 B & E

C

13

100

4 B&E

9

C

9

4 1.1111111

20.360

0.446

0.855

0.055

0.219

0.779

0.248

20.228

0.895

20.160

20.050

0.787

20.093

0.074

0.779

20.105

20.042

0.879

0.100

20.166

0.787

Chapter

Summary

Chapter Summary

20.340

0.152

0.852

0.256

20.209

0.879

LO3-10 20.425

0.230

0.888

0.135 We began this

0.097

0.852

use percentiles

and and

quartiles

to measure

use we

percentiles and quartiles to measure variation, and

chapter

by

presenting

and

comparing

several

We

began

mea- thistochapter

by presenting

comparing

several variation,

mea- to and

20.541

20.519

0.884

0.073

0.218

0.888

Interpret the

learned

how toWe

construct

box-and-whiskers

using the

how to construct a box-and-whiskers plot by using

tendency. We defined the

population mean

sures of

andcentral

tendency.

defined athe

population meanplot

and by learned

20.078 sures of central

0.082

0.794

20.558 we saw how

20.235

0.884

to estimate the population mean by using we

a sample

saw how quartiles.

to estimate the population mean by using a sample quartiles.

information

provided

20.107

0.794with

2.4140 mean. We20.029

1.3478

12.2423

Factor

analysis

starts

large

of

variables

and

attempts

to findhowfewer

After learning

howand

to measure

and

central

tendency

After learning

to measure and depict central tende

also

defined the

median and

mode,

and we acompared

mean.

We number

also defined

the correlated

median

mode,

and

wedepict

compared

by a factor 0.161

analysis

0.090 and mode for symmetrical

0.816

and

variability,

weforpresented

various

optional topics.

and

First,variability,

we

we presented

median,

the mean,

and

and mode

symmetrical

distributions

and

underlying

uncorrelated

factors

thatmedian,

describe

the

“essential

aspects”

of the

large

number

of various optional topics. First,

1.4677 the mean,1.2091

12.2423 distributions

(Optional).0.098 for distributions

discussed

numerical

measures

relationship

between several numerical measures of the relationship betw

that are skewed to the right or left. We then

for dis

studtributions

that areseveral

skewed

to the right

or left. of

Wethe

then

stud- discussed

0.081

0.816

correlated variables.

To illustrate factor analysis, suppose that a personnel officer has inter20.006

0.020

✓

Cluster-420.874

0.494

8

0.928✓

0.282

100 A & B

20.081

0.983933

93

71.42857143

B

0.714

3.9 Factor Analysis (Optional and Requires

Section 3.4)

variables. These

included

covariance,

the correlation

two variables. These included the covariance, the correla

ied measures of variation (or spread). We defined the

iedrange,

measurestwo

of variation

(or spread).

We the

defined

the range,

coefficient,

and

the least

We

then introduced

coefficient,

the and the least squares line. We then introduced

variance, and

standardand

deviation,

and48

we job

saw how

tovariance,

estimate and

standard

andsquares

we saw

how

to estimate

viewed

rated

applicants

for

sales deviation,

positions

on

theline.

following

15

variables.

a weighted

mean andby

also

explained

how toconcept

computeof a weighted mean and also explained how to com

a population variance and standard deviation by using a population

sample. concept

varianceofand

standard deviation

using

a sample.

Factor 2, “experience”;

Factor

3,

“agreeable

descriptive

statistics

for grouped

data. Indeviation

addition, wedescriptive

showed statistics for grouped data. In addition, we sho

We learned 1

that aForm

good way

to

interpret

the

standard

We

deviation

learned

that

a

good

way

to

interpret

the

standard

of application

letter

6 Lucidity

11 Ambition

how toiscalculate

the geometric

mean

and demonstrated

its interto calculate the geometric mean and demonstrated its in

when a population

is (approximately)

normally distributed

whenisa to

population

(approximately)

normally

distributed

is to how

Variable

Factor1

Factor2

Factor3

Factor4

personality”;

Factor

4, “academic

ability.” Variable

2 (appearance)

does

notCommunality

load heavily

pretation.

Finally,

used the

numerical methods

of

chapter

Finally, we used the numerical methods of this cha

use the Empirical

Rule, and we studied Chebyshev’s Theorem,

use the Empirical

Rule,

and wewestudied

Chebyshev’s

Theorem,

2

Appearance

7

Honesty

12thispretation.

Grasp

Var

1

0.114

20.833✓

20.111

20.138

0.739

on any factor and thus is its own factor, as Factor 6 on the

Minitab

outputcontaining

in Figure

3.34large fractions

to give

an introduction

four important

to give an introduction to four important techniques of predic

which

gives us intervals

reasonably

which gives

of us

intervals

containing to

reasonably

large techniques

fractions ofof predictive

Var 2

0.440

20.150

20.394

0.226

0.422

decision

trees,

analysis,

analysis,

analytics:

and decision trees, cluster analysis, factor analysis,

the population

matter what

the population’s shape

the population

might analytics:

units8no matter

what

the cluster

population’s

shapefactor

might13

3 units

Academic

Salesmanship

Potential

indicated

is true. Variable

1 (form20.127

of application letter)

loads

heavily

onnoFactor

2ability

(“experiVar 3

0.061

20.006 be. We also 0.928✓

association

rules.

saw that, when a data set is 0.881

highly skewed,be.

it isWe

best

also saw

that, when

a data set is highly skewed, it is best association rules.

Rotated Factor Loadings and Communalities

asVarimax

follows:Rotation

Factor 1, “extroverted personality”;

BI

BI

We believe that an early introduction to predictive andistributions. Sampling distributions and conalytics (in Chapter 3) will make statistics seem more

fidence intervals. Chapter 4 discusses probability

useful

and

relevant

from

beginning

and✓thus

by

featuring

of probability modeling

ence”).

summary,

there

is notthe

much

difference20.874

between

the moti7-factor

and

4-factor

solu- a new discussion

Var 4 In

0.216

20.247

20.081

0.877

4 Likability

9 Experience

14 Keenness to join

Var 5 We might therefore

0.919 ✓ conclude

0.104

20.162 can be20.062

0.885

tions.

that the in

15 the

variables

reduced toand

the following

vate

students

to

be

more

interested

entire

course.

using

motivating

examples—The

Crystal15Cable

5

Self-confidence

10

Drive

Suitability

✓

Var

6

0.864

20.102

20.259

0.006

0.825

five uncorrelated factors: “extroverted personality,” “experience,” “agreeable personality,”

Var 7

0.217

0.246

20.864 ✓

0.003

0.855 a real-world example of gender discrimiHowever,

our presentation

gives instructors

various

Casefocus

and

“academic

ability,”

and “appearance.”

helps the

personnel officer

on

✓

Var 8

0.918

20.206This conclusion

20.088

20.049

0.895

choices.

This

is 0.085

because,ofafter

coveringMoreover,

the

in- analyst

nation

atata pharmaceutical company—to illustrate

the

characteristics”

a20.849✓

job applicant.

if a company

wishes

Var“essential

9

0.055optional

0.219

0.779

10 date to use a0.796

20.354

20.160to predict20.050

0.787

aVar

later

tree ✓diagram

or

regression

analysis

sales

performance

on

the

troduction

to

business

analytics

in

Chapter

1,

the

five

the

probability

rules. Chapters 5 and 6 give more

Var 11

0.916 ✓

20.163

20.105

20.042

0.879

basis

of

the

characteristics

of

salespeople,

the

analyst

can

simplify

the

prediction

modeling

✓

Var 12

0.804

20.259 and predictive

20.340

0.152

0.852

optional

sections

on

descriptive

analytconcise

discussions

of discrete and continuous probprocedure

by using0.739

the five

uncorrelated

vari✓

Var 13

20.329 factors instead

20.425 of the original

0.230 15 correlated0.888

ics

2 andvariables.

3 can

be covered

in any order

Varin

14

0.436

20.364

20.541

20.519

ability0.884

distributions (models) and feature practical

ables

asChapters

potential predictor

20.078

0.794

Var 15

0.379

20.798✓

In general,

mining project

where wethe

wish

to predict a 0.082

response variable

and

inillustrating the “rare event approach” to

without

lossinofa data

continuity.

Therefore,

instructor

can

examples

Variance

2.4140

12.2423

which

there are an 5.7455

extremely large2.7351

number of potential

correlated1.3478

predictor variables,

it can

choose

which

of

the

six

optional

analytics

secmaking

a

statistical inference. In Chapter 7, The Car

% Var

0.383

0.182 business

0.161

0.090

0.816

be useful to first employ factor analysis to reduce the large number of potential correlated pretions

to coverto early,

as part offactors

the that

main

flow

Chap- predictor

Mileage

Case is used to introduce sampling distribudictor variables

fewer uncorrelated

we can

useof

as potential

variables.

ters 1–3, and which to discuss later. We recommend

tions and motivate the Central Limit Theorem (see

as follows:

Factorchosen

1, “extroverted

“experience”;

3, “agreeable

that

sections

to bepersonality”;

discussedFactor

later2, be

coveredFactorFigures

7.1, 7.3, and 7.5). In Chapter 8, the automaker

personality”; Factor 4, “academic ability.” Variable 2 (appearance) does not load heavily

after

Chapter 14,

which

presents

the

further

predictive

in

The

Car

on any factor and thus is its own factor, as Factor 6 on the Minitab output in Figure 3.34 Mileage Case uses a confidence interval

analytics

of multiple

regression,

procedure

indicated is topics

true. Variable

1 (form of linear

application

letter) loads logistic

heavily on Factor

2 (“experi- specified by the Environmental Protection

ence”). In summary,

there isnetworks.

not much difference between the 7-factor and Agency

4-factor soluregression,

and neural

(EPA) to find the EPA estimate of a new midtions. We might therefore conclude that the 15 variables can be reduced to the following

size

model’s

true mean mileage and determine if the

five uncorrelated factors: “extroverted personality,” “experience,” “agreeable personality,”

Chapters

4–8: and

Probability

probability

new focus

midsize

“academic ability,”

“appearance.” and

This conclusion

helps themodpersonnel officer

on model deserves a federal tax credit (see

the “essential

characteristics”

job applicant. Moreover,

if a company analyst

wishes

at

eling.

Discrete

andof a continuous

probability

Figure

8.2).

a later date to use a tree diagram or regression analysis to predict sales performance on the

vibasis of the characteristics of salespeople, the analyst can simplify the prediction modeling

procedure by using the five uncorrelated factors instead of the original 15 correlated variables as potential predictor variables.

In general, in a data mining project where we wish to predict a response variable and in

which there are an extremely large number of potential correlated predictor variables, it can

be useful to first employ factor analysis to reduce the large number of potential correlated predictor variables to fewer uncorrelated factors that we can use as potential predictor variables.

bow49461_fm_i–xxi.indd 6

20/11/15 4:06 pm

m for the

population

of six preThis sample mean is the point estimate of the mean has

mileage

been working

to improve

gas mileages,

we cannot assume that we know the true value of

production cars and is the preliminary mileage estimate

for the new

midsize

that

was

m for the

new

midsize model. However, engineering data might

the population

mean

mileagemodel

indicate that the spread of individual car mileages for the automaker’s midsize cars is the

reported at the auto shows.

same from

modelthe

andnew

year midsize

to year. Therefore, if the mileages for previous models

When the auto shows were over, the automaker decided

to model

furthertostudy

hadtests.

a standard

to .8 mpg,

it might be reasonable to assume that the standard

model by subjecting the four auto show cars to various

Whendeviation

the EPAequal

mileage

test was

deviation of the mileages for the new model will also equal .8 mpg. Such an assumption

performed, the four cars obtained mileages of 29 mpg, 31 mpg, 33 mpg, and 34 mpg. Thus,

would, of course, be questionable, and in most real-world situations there would probably not

the mileages obtained by the six preproduction cars were

29 mpg,

31 mpg,

32 mpg,

be an actual

basis30

formpg,

knowing

s. However,

assuming that s is known will help us to illustrate

33 mpg, and 34 mpg. The probability distribution sampling

of this population

sixinindividual

car

distributions,ofand

later chapters

we will see what to do when s is unknown.

mileages is given in Table 7.1 and graphed in Figure 7.1(a). The mean of the population of

C EXAMPLE 7.2 The Car Mileage Case: Estimating Mean Mileage

A Probability Distribution Describing thePart

Population

Six Individual

Car Mileages

Consider

the infinite population of the mileages of all of the new

1: BasicofConcepts

Table 7.1

Individual Car Mileage

Probability

Figure 7.1

30

31

1y6

1y6

Probability

0.20

29

30

1/6

1/6

1/6

1/6

31

32

33

34

0.15

0.10

0.05

0.00

Individual Car Mileage

(b) A graph of the probability distribution describing the

population of 15 sample means

3/15

0.20

Probability

0.15

2/15 2/15

2/15 2/15

0.10

1/15

1/15

29.5

30

1/15

1/15

33

33.5

0.05

0.00

29

30.5

31

31.5

32

32.5

1y6

1y6

Sampling Distribution of the Sample Mean x When n 5 5, and (3) the Sampling

(a) A graph of the probability distribution describing the

population of six individual car mileages

1/6

1y6

re 7.3

A Comparison of (1) the Population of All Individual Car Mileages, (2) the

T a b l e 7 .F2i g uThe

Population

of Sample Means

A Comparison of Individual Car

Mileages and Sample Means

1/6

midsize cars that could potentially be produced by this year’s manufacturing process. If

33

34

we assume32that this population

is normally distributed with mean m and standard deviation

29

1y6

34

of the

Mean x When n 5 50

(a) The population of theDistribution

15 samples

of nSample

5 2 car

mileages and corresponding sample means

Car

Mileages

Sample

Mean

(a) The population of individual mileages

Sample

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

(b) A probability distribution describing the

of sampling

the sample mean x¯ when n 5 50

population(c)ofThe

15sampling

sample distribution

means: the

distribution of the sample mean

Sample

Mean

Frequency

Probability

29.5

30

30.5

31

31.5

32

32.5

33

33.5

1

1

2

2

3

2

2

1

1

1y15

1y15

2y15

2y15

3y15

2y15

2y15

1y15

1y15

The Sampling Distribution of the Sample Mean

Figure 7.5

The Central Limit Theorem Says That the Larger the Sample Size Is, the More

Nearly Normally Distributed Is the Population of All Possible Sample Means

The normal distribution describing the population

of all possible sample means when the sample size

is 50, where x¯ 5 and x¯ 5

5 .8 5 .113

n

50

Scale of sample means, x¯

Sample Mean

7.1

The normal distribution describing the

29, 30

29.5

population of all individual car mileages, which

has mean and standard deviation 5 .8

29, 31

30

29, 32

30.5

29, 33

31

Scale of gas mileages

29, 34

31.5

30, 31

30.5

(b) The

distribution of

30,sampling

32

31the sample mean x¯ when n 5 5

30, 33

31.5

The normal distribution describing the population

30, 34

32

of all possible sample means when the sample

31, 32

31.5

size is 5, where x¯ 5 and x¯ 5

5 .8 5 .358

n

5

31, 33

32

31, 34

32.5

32, 33

32.5

32, 34

33

Scale of sample means, x¯

33, 34

33.5

8.1

z-Based Confidence Intervals for a Population Mean: s Known

335

Figure 8.2

349

Three 95 Percent Confidence Intervals for m

The probability is .95 that

x¯ will be within plus or minus

1.96x¯ 5 .22 of

x

x

x

n=2

n=2

n=2

x¯

x¯

x

(a) Several sampled populations

n=6

n=6

n=2

x¯

n=6

x¯

Population of

all individual

car mileages

n=6

.95

Samples of n 5 50

car mileages

m

n 5 50

x¯ 5 31.56

x¯

31.6

31.6 2 .22

31.6 1 .22

31.56

x¯

x¯

x¯

x¯

n 5 50

x¯ 5 31.2

31.78

31.68

31.34

n 5 50

x¯ 5 31.68

31.46

31.90

31.2

n = 30

n = 30

x¯

x¯

n = 30

30.98

n = 30

x¯

x¯

(b) Corresponding populations of all possible sample means for

different sample sizes

_

How large must the sample size be for the sampling distribution of x to be approximately

normal? In general, the more skewed the probability distribution of the sampled population, the larger the sample size must be for the population of all possible sample means to

be approximately normally distributed. For some sampled populations, particularly those

described by symmetric distributions, the population of all possible sample means is approximately normally distributed for a fairly small sample size. In addition, studies indicate that,

if the sample size is at least 30, then for most sampled populations the population of

all possible sample means is approximately normally distributed. In this book, when_

ever the sample size n is at least 30, we will assume that the sampling distribution of x is

approximately a normal distribution. Of course,

if the sampled population is exactly nor_

mally distributed, the sampling distribution of x is exactly normal for any sample size.

Chapters 9–12: Hypothesis testing. Two-sample

procedures. Experimental design and analysis

of variance. Chi-square tests. Chapter 9 discusses

hypothesis testing and begins with a new section on

formulating statistical hypotheses. Three cases—

e-billing Case

: Reducing

Mean Bill Payment

7.3 The Case,

C EXAMPLE

The

Trash Bag

The

e-billing

Case,Timeand The

Recall that a management consulting firm has installed a new computer-based electronic

Valentine’s

Day Chocolate Case—are then used in a

billing system in a Hamilton, Ohio, trucking company. Because of the previously discussed

advantages

of the new

billingexplains

system, and because

the trucking company’s

renew

section

that

the critical

value clients

and are

p-value

ceptive to using this system, the management consulting firm believes that the new system

will reduce the mean to

bill payment

time by

than 50 percent. The

mean payment

time

approaches

testing

amore

hypothesis

about

a populausing the old billing system was approximately equal to, but no less than, 39 days. Theretion

Anew

summary

boxthevisually

these

m denotes the

mean payment time,

consulting firmillustrating

believes that m will be

fore, ifmean.

less than 19.5 days. To assess whether m is less than 19.5 days, the consulting firm has

randomly selected a sample of n 5 65 invoices processed using the new billing system and

_has determined the payment times for these invoices. The mean of the 65 payment times is

x 5 18.1077 days, which is less than 19.5 days. Therefore, we ask the following question: If

bow49461_fm_i–xxi.indd 7

31.42

_

3 In statement 1 we showed that the probability is .95 that the sample mean x will be

2 we showed

within

plus or minus 1.96s_x 5 .22 of the population mean m. In statement

_

_

that x being within plus or minus .22 of m is the same as the interval [x 6 .22] containing m. Combining these results, we see that the probability is .95 that the sample mean

_

x will be such that the interval

_

_

[x 6 1.96s_x] 5 [x 6 .22]

contains

the population

mean m.

approaches

is presented

in the middle of this section

editions) so that

Statement

3 says

that,

before

we randomly

select

the sample,to

theredeveloping

is a .95 probability

that

more

of

the

section

can

be

devoted

the

_

we will obtain an interval [x 6 .22] that contains the population mean m. In other words,

summary

boxthat

and

showing

how

it. ofInthese

addition,

m,to

anduse

5 percent

intervals do

95 percent

of all intervals

we might

obtain contain

_

not contain m. For this reason, we call the interval [x 6 .22] a 95 percent confidence interval

a

five-step

hypothesis

testing

procedure

emphasizes

for m. To better understand this interval, we must realize that, when we actually select the

sample,

will observe one particular

extremely

large number

of possible

thatwesuccessfully

usingsample

anyfrom

ofthethe

book’s

hypothesis

samples. Therefore, we will obtain one particular confidence interval from the extremely large

testing

summary

boxes

requires

simply

identifying

number

of possible

confidence intervals.

For example,

recall that

when the automaker

randomly

selected

sample of n 5 50

cars and tested them

as prescribed

the EPA,

the automaker

the the

alternative

hypothesis

being

testedbyand

then

look_

obtained the sample of 50 mileages given in Table 1.7. The mean of this sample is x 5

ing

inandthe

summary

box

the corresponding

critical

31.56

mpg,

a histogram

constructed

usingfor

this sample

(see Figure 2.9 on page 66)

indicates

that the population of all individual car mileages is normally distributed. It follows that a

value rule and/or p-value (see the next page).

95 percent confidence interval for the population mean mileage m of the new midsize model is

(rather

than

at the interval

end, asfor

in mprevious

A 95

percent

confidence

_

[ x 6 .22] 5 [31.56 6 .22]

5 [31.34, 31.78]

vii

Because we do not know the true value of m, we do not know for sure whether this interval

contains m. However, we are 95 percent confident that this interval contains m. That is, we

are 95 percent confident that m is between 31.34 mpg and 31.78 mpg. What we mean by

“95 percent confident” is that we hope that the confidence interval [31.34, 31.78] is one of

the 95 percent of all confidence intervals that contain m and not one of the 5 percent of all

confidence intervals that do not contain m. Here, we say that 95 percent is the confidence 20/11/15 4:06 pm

396

Chapter 9

Hypothesis Testing

p-value is a right-tailed p-value. This p-value, which we have previously computed, is the

area under the standard normal curve to the right of the computed test statistic value z. In

the next two subsections we will discuss using the critical value rules and p-values in the

summary box to test a “less than” alternative hypothesis (Ha: m , m0) and a “not equal to”

alternative hypothesis (Ha: m Þ m0). Moreover, throughout this book we will (formally or

informally) use the five steps below to implement the critical value and p-value approaches

to hypothesis testing.

The Five Steps of Hypothesis Testing

1 State the null hypothesis H0 and the alternative hypothesis Ha.

2 Specify the level of significance a.

402

Chapter 9

3 Plan the sampling procedure

and select the test statistic.

Hypothesis Testing

Using a critical value rule:

LO9-4

4 Use the summary box to find the critical value rule corresponding to the alternative hypothesis.

Use critical

values

5

Collect

the and

sample data, compute the value of the test statistic, and decide whether to reject H0 by using the

p-values to perform a t

critical value rule. Interpret the statistical results.

If we do not know s (which is usually the case), we can base a hypothesis test about m on

test about a population

Usingwhen

a p-value

mean

s is rule: the sampling distribution of

_

x alternative

2m

unknown.

4 Use the summary box to find the p-value corresponding to the______

hypothesis. Collect the sample data,

9.3 t Tests about a Population Mean:

s Unknown

__

syÏn

compute the value of the test statistic, and compute the p-value.

5

the sampledapopulation

is normally

distributed

(or ifthe

thestatistical

sample size

is large—at least 30),

Reject H0 at level ofIfsignificance

if the p-value

is less than

a. Interpret

results.

then this sampling distribution is exactly (or approximately) a t distribution having n 2 1

degrees of freedom. This leads to the following results:

Testing a “less than” alternative hypothesis

We have seen in the e-billing case that to study whether the new electronic billing system

A ttime

Test

a Population

s Unknown

reduces the mean bill payment

byabout

more than

50 percent, theMean:

management

consulting

firm will test H0: m 5 19.5 versus Ha: m , 19.5 (step 1). A Type I error (concluding that Ha:

m , 19.5 is true when H0: m_ 5 19.5 is true) would result in the consulting firm overstating

Null

Test

Normal population

x2

m

the benefits of the new billing

system,

both to the company in which it has been installed and

__0

Hypothesis

H0: m 5 m0

Statistic

t 5 _______

df 5 n 2 1

Assumptions

or

syÏ n

to other companies that are considering

installing such a system. Because the consulting firm

Large sample size

desires to have only a 1 percent chance of doing this, the firm will set a equal to .01 (step 2).

To perform the hypothesis test, we will randomly select

a sample of n 5 65 invoices

_

Critical Value Rule

p-Value (Reject

H0 if p-Value Ͻ ␣)

payment times of these

paid using the new billing system and calculate the mean x of the

Ha: Ͼ 0

Ha: Ͻ Then,

Ha: ϶

Ͼ 0 we willHautilize

: Ͻ 0the test statistic

Ha: ϶ 0in the

0

invoices.

because

the0sample sizeHais: large,

summary

(step 3):

Do not

Reject Reject

Do not boxReject

Do not

Reject

_

reject H0

H0

H0

reject H0

H0

reject H0

H0

x 2 19.5

__

z 5 ________

9.4

z Tests about

a Population

Proportion

407

p-value

p-value

syÏn

_

␣

␣ր2

␣ր2

␣

A

value

of

the

test

statistic

z

that

is

less

than

zero

results

when

x

is

less

than

19.5.

This

In order to see how to test this kind of hypothesis, remember that when n is large, the

_

of Ha because

the point

estimate

x indiprovides

to␣ր2

support

rejecting H0 in0favor

sampling0distribution

of

t␣ր2

Ϫt␣ 0 evidence Ϫt

t␣

0

t

t

0

0 ԽtԽ

ϪԽtԽ

m

might

be

less

than

19.5.

To

decide

how

much

less

than

zero

the

value

of the

cates

that

pˆ 2Hp0 if

p-value ϭ twice

p-value ϭ area

p-value ϭ area

Reject H0 if

Reject

Reject H0 if

_________

________

of Hofa tat level

ofleft

significance

, wetonote

test t statistic

mustԽtԽbeϾ p(1

reject

theaarea

the that

to the right

to the

of t

t Ͼ t␣

Ͻ Ϫt␣

tto

␣ր2—that

2 p)is,H0 in favor

________

right of ԽtԽ

t␣ր2form

or tnϽ H

Ϫt␣ր2

Ha: m , 19.5 is oft Ͼthe

a: m , m0, and we look in the summary box under the critical

value rule heading Ha: m , m0. The critical value rule that we find is a left-tailed critical

is approximately a standard

normal

p0 denote a specified value between

value rule

and distribution.

says to do theLet

following:

0 and 1 (its exact value will depend on the problem), and consider testing the null hypothesis

H0: p 5 p0. We then have the following result:

Place the probability of a Type I error, ␣, in the left-hand tail of the standard normal curve

Mean

Debt-to-Equity

Ratio

Here

2za

the negative of

the

and useCtheEXAMPLE

normal table 9.4

to find

theCommercial

critical

valueLoan

2z␣.Case:

AThe

Large

Sample

Test

about

Population

Proportion

␣ is

normal point z␣. That is, 2z␣ is the point on the horizontal axis under the standard normal

One measure of a company’s financial health is its debt-to-equity ratio. This quantity is

curve that gives a left-hand

Null

Testtail area equal to ␣.

np0 $ 5

defined to be the ratio

of the company’s

equity.

If this

pˆ 2 p0 corporate debt to the company’s

3

Hypothesis H0:Reject

and

p 5 p0 H0: ϭ 19.5 in favor

Assumptions

__________

of indication

Hz␣:5____________

Ͻ

19.5

if and only

if the computed

value

of the

ratio is too high, itStatistic

is one

of2 financial

instability.

For obvious

reasons,

banks

(1

p

)

p

n(1 the

2 p 0) $ 5

0

0

__________

test statistic

z is less

the critical

value

2z

4). Because

equals .01,

␣ (step

often monitor

the than

financial

health of

companies

to which

they have␣ extended

commercial

n

criticalloans.

value Suppose

2z␣ is 2z.

[see

Table

A.3aand

Figure

01 5

that,

in 22.33

order to

reduce

risk,

large

bank 9.3(a)].

has decided to initiate a policy

limiting

theRule

mean debt-to-equity ratio for its portfolio

commercial

loansϽ to

Critical

Value

␣) being less

p-Valueof(Reject

H0 if p-Value

than 1.5. In order to assess whether the mean debt-to-equity ratio m of its (current) comHa: p Ͼ p0

Ha: p Ͻ p0

Ha: p ϶ p0

Ha: p Ͼ p0

Ha: p Ͻ p0

Ha: p ϶ p0

mercial loan portfolio is less than 1.5, the bank will test the null hypothesis H0: m 5 1.5

Do not

Reject Reject

not alternative

Reject

Do

not

Reject H : m , 1.5. In this situation, a Type I error (rejectversusDothe

hypothesis

a

reject H0

H0

H0

H0

H0

reject H0

H0

result

in the bank concluding that the

ing Hreject

p-value

p-value

0: m 5 1.5 when H0: m 5 1.5 is true) would

mean debt-to-equity ratio of its commercial loan portfolio is less than 1.5 when it is not.

1.0 5 ␣

␣

Because

the bank ␣ր2

wishes to be␣ր2

very sure that it does not commit this Type I error, it will

1.1 1 9

versus Ha byϪzusing

a .01

To perform

the hypothesis

test

0 z1␣2 9

ϪzH

z␣ր2 level of significance.

0

z

z

0

0 test,

ϪԽzԽ

ԽzԽ the

␣ 00

␣ր2 0

1.2

bank

randomly

selects

a sample

of 15p-value

of its ϭcommercial

loanϭaccounts.

Auditsϭ of

these

area

p-value

area

p-value

twice

Reject

Reject

H0 if

Reject

H0 if

1.3 H01if2 3 7

to the right of z ratiosto (arranged

the left of z in increasing

the area to the

—that is,

z Ͼ z␣

z Ͻ Ϫz␣ result ԽzԽ

z␣ր2following

companies

in Ͼthe

debt-to-equity

order):

1.4 1 5 6

right

of

ԽzԽ

Ͼ z␣ր21.22,

or z Ͻ1.29,

Ϫz␣ր2 1.31, 1.32, 1.33, 1.37, 1.41, 1.45, 1.46, 1.65, and 1.78.

1.05, 1.11, 1.19, z1.21,

1.5

The mound-shaped stem-and-leaf display of these ratios is given in the page margin and

1.6 5

indicates that the population of all debt-to-equity ratios is (approximately) normally dis1.7 8

tributed. It follows that it is appropriate to calculate the value of the test statistic t in the

DS DebtEq

summary box. Furthermore, because the alternative hypothesis Ha: m , 1.5 says to use

C EXAMPLE 9.6 The Cheese Spread Case: Improving Profitability

Ï

Ï

Hypothesis testing summary boxes are featured

theory. Chapters 13–15 present predictive analytWe have seen that the cheese spread producer wishes to test H0: p 5 .10 versus Ha: p , .10,

throughout Chapter

Chapter

(two-sample

proceics

where p is the9,

proportion

of all10

current

purchasers who would

stop buying

the methods

cheese spread that are based on parametric regression

can betime

rejected

in

if the new

spout

were used. The

producer will use

the newand

spout if H0and

dures), Chapter

11

(one-way,

randomized

block,

series

models. Specifically, Chapter 13 and

favor of Ha at the .01 level of significance. To perform the hypothesis test, we will rantwo-way analysis

of nvariance),

Chapter

(chi-square

the

first seven

sections of Chapter 14 discuss simple

domly select

5 1,000 current

purchasers12

of the

cheese spread, find the

proportion

(pˆ) of

these purchasers

stop buying the cheese

the new spout

used,multiple

and

tests of goodness

of fit who

andwould

independence),

andspread

theifreandwere

basic

regression analysis by using a more

calculate the value of the test statistic z in the summary box. Then, because the alternative

mainder of hypothesis

the book.

addition,

emphasis

is placed

, .10

says to use the

left-tailed critical

value rule in thestreamlined

summary box, weorganization and The Tasty Sub Shop (revHa: pIn

the value of zimportance

is less than 2za 5after

2z.01 5 22.33. (Note that using

H0: p 5 .10 ifpractical

throughout will

on reject

estimating

enue prediction) Case (see Figure 14.4). The next five

this procedure is valid because np0 5 1,000(.10) 5 100 and n(1 2 p0) 5 1,000(1 2 .10) 5

testing for statistical

sections

ofthat

Chapter 14 present five advanced modeling

900 are both significance.

at least 5.) Suppose that when the sample is randomly selected,

we find

63 of the 1,000 current purchasers say they would stop buying the cheese spread if the new

topics

that

can

be covered in any order without loss of

spout were used. Because pˆ 5 63y1,000 5 .063, the value of the test statistic is

Chapters 13–18: Simple and

multiple

regression

continuity:

dummy

variables (including a discussion

p

ˆ

2

p

.063 2 .10 5 23.90

0

___________

z 5 ___________

_________ 5 ____________

.10(1 2 .10) and

analysis. Model building. Logistic

regression

of

interaction);

quadratic

and quantitative

(1

2

p

)

p

___________

0

0

n

Ï_________

Ï 1,000 Con- interaction variables; modelvariables

neural networks.

Time

series

forecasting.

building

and the effects

Because z 5 23.90 is less than 2z.01 5 22.33, we reject H0: p 5 .10 in favor of Ha: p , .10.

trol charts.

statistics.

Decision

of multicollinearity;

residual analysis and diagnosing

that the proportion

of all current purchasers

who would

ThatNonparametric

is, we conclude (at an a of .01)

␣ 5 .01

2z.01

0

22.33

p-value

5 .00005

z

0

23.90

viii

stop buying the cheese spread if the new spout were used is less than .10. It follows that the

company will use the new spout. Furthermore, the point estimate pˆ 5 .063 says we estimate

that 6.3 percent of all current customers would stop buying the cheese spread if the new

spout were used.

BI

3

Some statisticians suggest using the more conservative rule that both np0 and n(1 2 p0) must be at least 10.

bow49461_fm_i–xxi.indd 8

20/11/15 4:06 pm

outlying and influential observations; and logistic regression (see Figure 14.36). The last section of Chapter

14 discusses neural networks and has logistic regression as a prerequisite. This section shows why neural

network modeling is particularly useful when analyzing big data and how neural network models are used

to make predictions (see Figures 14.37 and 14.38).

Chapter 15 discusses time series forecasting, includ594

Chapter 14

652

Multiple Regression and Model Building

Chapter 14

Figure 14.36

Excel and Minitab Outputs of a Regression Analysis of the Tasty Sub Shop Revenue Data

in Table 14.1 Using the Model y 5 b0 1 b1x1 1 b2x2 1 «

Figure 14.4

ing Holt–Winters’ exponential smoothing models, and

refers readers to Appendix B (at the end of the book),

which succinctly discusses the Box–Jenkins methodology. The book concludes with Chapter 16 (a clear

discussion of control charts and process capability),

Chapter 17 (nonparametric statistics), and Chapter 18

(decision theory, another useful predictive analytics

topic).

Deviance Table

Source

Regression

Purchases

PlatProfile

Error

Total

(a) The Excel output

Regression Statistics

Multiple R

R Square

Adjusted R Square

Standard Error

Observations

0.9905

0.9810 8

0.9756 9

36.6856 7

10

ANOVA

Regression

Residual

Total

df

2

7

9

SS

486355.7 10

9420.8 11

495776.5 12

Coefficients

125.289 1

14.1996 2

22.8107 3

Intercept

population

bus_rating

MS

243177.8

1345.835

Standard Error 4

40.9333

0.9100

5.7692

t Stat 5

3.06

15.60

3.95

F

180.689 13

Significance F

9.46E-07 14

P-value 6

0.0183

1.07E-06

0.0055

Lower 95% 19

28.4969

12.0478

9.1686

Coefficients

Coef

Term

210.68

Constant

0.2264

Purchases

3.84

PlatProfile

(b) The Minitab output

Analysis of Variance

Source

DF

Regression

2

Population

1

Bus_Rating

1

Error

7

Total

9

Adj SS

486356 10

327678

21039

9421 11

495777 12

Model Summary

S

R-sq

36.6856 7

98.10% 8

Coefficients

Term

Constant

Population

Bus_Rating

AdJ MS

243178

327678

21039

1346

Coef

125.3 1

14.200 2

22.81 3

R-sq(adj)

97.56% 9

F-Value

180.69 13

243.48

15.63

Setting

47.3

7

1 b0

2 b1

8 R2

9 Adjusted R2

3 b2

14 p-value for F(model)

Fit 15

956.606

T-Value 5

3.06

15.60

3.95

SE Fit 16

15.0476

VIF

P-Value 6

0.018

0.000

0.006

1.18

1.18

95% CI 17

(921.024, 992.188)

4 S bj 5 standard error of the estimate bj

10 Explained variation

5 t statistics

95% PI 18

(862.844, 1050.37)

6 p-values for t statistics

11 SSE 5 Unexplained variation

15 yˆ 5 point prediction when x1 5 47.3 and x2 5 7

17 95% confidence interval when x1 5 47.3 and x2 5 7

12 Total variation

7 s 5 standard error

13 F(model) statistic

16 syˆ 5 standard error of the estimate yˆ

18 95% prediction interval when x1 5 47.3 and x2 5 7

residual—the difference between the restaurant’s observed and predicted yearly revenues—

fairly small (in magnitude). We define the least squares point estimates to be the values of

14

Multiple Regression and Model Building

bChapter

0, b1, and b2 that minimize SSE, the sum of squared residuals for the 10 restaurants.

The formula for the least squares point estimates of the parameters in a multiple regresmodel

expressed using a branch of mathematics called matrix algebra. This formula

F i g u r e 1 4 . 3 7 Thesion

Single

LayerisPerceptron

is presented in Bowerman, O’Connell, and Koehler (2005). In the main body of this book,

Excel Layer

and Minitab to compute the needed Output

estimates.

Input Layer we will rely onHidden

LayerFor example, consider

the Excel and Minitab outputs in Figure 14.4. The Excel output tells us that the least squares

point estimates of b0, b1, and b2 in the Tasty Sub Shop revenue model are b0 5 125.289,

b1 5 14.1996, and b2 5 22.8107 (see 1 , 2 , and 3 ). The point estimate b1 5 14.1996 of b1

l1 5 h10that

1 h11

x1 1 hyearly

says we estimate

mean

revenue increases by $14,199.60 when the population size

12x2

1 … residents

1 h1kxk

increases by 1,000

and the business rating does not change. The point estimate

g(L) 5

e l2 2 1

el2 1 1

if response

1

variable is

1 1 e2L

qualitative.

L

if response

variable is

quantitative.

elm 2

1

elm1 1

1 An input layer consisting of the predictor variables x1, x2, . . . , xk under consideration.

2 A single hidden layer consisting of m hidden nodes. At the vth hidden node, for

v 5 1, 2, . . . , m, we form a linear combination ℓv of the k predictor variables:

ℓ 5h 1h x 1h x 1...1h x

v

v0

v1 1

v2 2

vk k

Here, hv0, hv1, . . . , hvk are unknown parameters that must be estimated from the sample

data. Having formed ℓv, we then specify a hidden node function Hv(ℓv) of ℓv. This hidden node function, which is also called an activation function, is usually nonlinear. The

activation function used by JMP is

eℓv 2 1

Hv (ℓv) 5 ______

eℓv 1 1

[Noting that (e2x 2 1)y(e2x 1 1) is the hyperbolic tangent function of the variable x, it

follows that Hv (ℓv) is the hyperbolic tangent function of x 5 .5 ℓv.] For example, at nodes

1, 2, . . . , m, we specify

bow49461_fm_i–xxi.indd 9

95% CI

(218.89, 22.46)

( 0.0458, 0.4070)

( 0.68, 7.01)

Z-Value

22.55

2.46

2.38

P-Value

0.011

0.014

0.017

VIF

1.59

1.59

95% CI

(1.9693, 1110.1076)

P-Value

0.993

0.998

0.919

Variable

Purchases

PlatProfile

Setting

42.571

1

Fitted

Probability

0.943012

SE Fit

0.0587319

95% CI

(0.660211, 0.992954)

Variable

Purchases

PlatProfile

Setting

51.835

0

Fitted

Probability

0.742486

SE Fit

0.250558

95% CI

(0.181013, 0.974102)

byOutput

25 percent.

oddsEstimation

ratio estimate

of 46.76

for PlatProfile

that we

JMP

of NeuralThe

Network

for the Credit

Card Upgrade

Data DS says

CardUpgrade

24.34324

H1_2:Purchases

H1_2: PlatProfile:0

H1_2:Intercept

0.062612

0.172119

22.28505

H1_3:Purchases

H1_3: PlatProfile:0

H1_3:Intercept

0.023852

0.93322

21.1118

Upgrade(0):H1_1

Upgrade(0):H1_2

Upgrade(0):H1_3

Upgrade(0):Intercept

2201.382

236.2743

81.97204

27.26818

}

}

}

5 2.0399995

}

}

}

5 .7698664933

(210.681.2264(42.571)13.84 (1))

e

_________________________

5 .9430

(210.681.2264(42.571)13.84 (1))

ℓˆ2 5 hˆ20 1 hˆ21(Purchases) 1 hˆ22(JDPlatProfile)

11e

5 22.28505 1 .062612(51.835) 1 .172119(1)

1.132562

e

21

H2 (ℓˆ2) 5 ____________

1.132562

e

11

5 .5126296505

• The upgrade5 1.132562

probability for a Silver card holder who had purchases of $51,835 last year

and does not conform to the bank’s Platinum profile is

}

ℓˆ3 5 hˆ30 1 hˆ31(Purchases) (210.681.2264(51.835)13.84(0))

1 hˆ32(JDPlatProfile)

e

11e

1.0577884

e

21

_________________________

5 21.1118 1 .023852(51.835) 1 .93322(1)

H (ℓˆ ) 5 ____________

1.0577884

(210.681.2264(51.835)13.84(0))3 3

e

11

5 1.0577884

5 .4845460029

Lˆ 5 b0 1 b1H1 (ℓˆ1) 1 b2H2 (ℓˆ2) 1 b3H3 (ℓˆ3)

5 27.26818 2 201.382(.7698664933)

}

1236.2743(.5126296505) 1 81.97204(.4845460029)

5 21.464996

5 .7425

1

g (Lˆ ) 5 __________

2(21.464996)

11e

5 .1877344817

Most

Likely

H1_1

H1_2

H1_3

1

0

31.95

0

1

2.001752e-11

20.108826172

20.056174265

0.2837494642

0

17

0

34.964

1

0.3115599633

0.6884400367

20.408631173

20.133198987

20.540965203

1

33

1

39.925

1

3.3236679e-6

0.9999966763

20.151069087

0.0213116076

20.497781967

1

40

0

17.584

0

1

1.241383e-17

20.728299523

20.466805781

0.1198442857

0

41

42.571

1

6.333631e-10

0.9999999994

20.001969351

0.1037760733

20.47367384

1

42

51.835

0

0.1877344817

0.8122655183

0.7698664933

0.5126296505

0.4845460029

1

card holders who have not yet been sent an upgrade offer and for whom we wish to estimate the probability of upgrading. Silver card holder 42 had purchases last year of $51,835

(Purchases 5 51.835) and did not conform to the bank’s Platinum profile (PlatProfile 5 0).

Because PlatProfile 5 0, we have JDPlatProfile 5 1. Figure 14.38 shows the parameter estimates for the neural network model based on the training data set and how they are used

to estimate the probability that Silver card holder 42 would upgrade. Note that because the

response variable Upgrade is qualitative, the output layer function is g(L) 5 1y(1 1 e2L).

The final result obtained in the calculations, g(Lˆ) 5 .1877344817, is an estimate of the probability that Silver card holder 42 would not upgrade (Upgrade 5 0). This implies that the

estimate of the probability that Silver card holder 42 would upgrade is 1 2 .1877344817 5

.8122655183. If we predict a Silver card holder would upgrade if and only if his or her

upgrade probability is at least .5, then Silver card holder 42 is predicted to upgrade (as is

Silver card holder 41). JMP uses the model fit to the training data set to calculate an upgrade

probability estimate for each of the 67 percent of the Silver card holders in the training data

set and for each of the 33 percent of the Silver card holders in the validation data set. If a

particular Silver card holder’s upgrade probability estimate is at least .5, JMP predicts an

upgrade for the card holder and assigns a “most likely” qualitative value of 1 to the card

holder. Otherwise, JMP assigns a “most likely” qualitative value of 0 to the card holder. At

the bottom of Figure 14.38, we show the results of JMP doing this for Silver card holders

1, 17, 33, and 40. Specifically, JMP predicts an upgrade (1) for card holders 17 and 33, but

only card holder 33 did upgrade. JMP predicts a nonupgrade (0) for card holders 1 and 40, and

neither of these card holders upgraded. The “confusion matrices” in Figure 14.39 summarize

The idea behind neural network modeling is to represent the response variable as a nonlinear function of linear combinations of the predictor variables. The simplest but most widely

used neural network model is called the single-hidden-layer, feedforward neural network.

This model, which is also sometimes called the single-layer perceptron, is motivated (like

all neural network models) by the connections of the neurons in the human brain. As illustrated in Figure 14.37, this model involves:

bow49461_ch14_590-679.indd 654

SE Coef

4.19

0.0921

1.62

Probability

Probability

Upgrade Purchases PlatProfile (Upgrade50) (Upgrade51)

lm = hm0 + hm1x1 + hm2x2

+ … + hmkxk

Hm(lm) 5

AIC

25.21

Chi-Square

19.21

17.14

3.23

H1_1:Intercept

…

xk

P-Value

0.000

0.000

0.001

Neural

(Optional)

657 purchases

estimate of 1.25 for Purchases says that

forNetworks

each increase

of $1,000 in last year’s

by a Silver card holder, we estimate that the Silver card holder’s odds of upgrading increase

estimate that the

odds of upgrading for a Silver card holder who conforms to the bank’s Platinum profile are

Neural

Validation: 46.76

Randomtimes

Holdback

Modelthe

NTanH(3)

larger than

odds of upgrading for a Silver card holder who does not conform

to the bank’s Platinum profile, if both Silver card holders had the same amount of purchases

Estimates

last year. Finally, the bottom of the Minitab output says that we estimate that

Parameter

Estimate

ˆ 1 hˆ (Purchases)

ℓˆ1 5 h

)

H1_1:Purchases

0.113579

• The upgrade

probability

for1ahˆSilver

card

holder who had purchases

of $42,571 last year

10

11

12(JDPlatProfile

e 2.0399995 2 1

H1_1: PlatProfile:0

0.495872

5 24.34324

1 .113579(51.835)

1 .495872(1)

H1 (ℓˆ1) 5 ____________

and conforms

to the bank’s

Platinum

profile is

e 2.0399995 1 1

L 5 0 1 1H1(l1) 1 2H2(l2)

1 … 1 mHm(lm)

…

H2(l2) 5

Chi-Square

35.84

13.81

10.37

Goodness-of-Fit Tests

DF

Test

37

Deviance

37

Pearson

8

Hosmer-Lemeshow

Figure 14.38

e l1 2 1

e l1 1 1

l2 5 h20 1 h21x1 1 h22x2

1 … 1 h2kxk

Adj Mean

17.9197

13.8078

10.3748

0.5192

14.13

x1

x2

Adj Dev

35.84

13.81

10.37

19.21

19 95% confidence interval for bj

654

H1(l1) 5

Contribution

65.10%

46.26%

18.85%

34.90%

100.00%

Odds Ratios for Categorical Predictors

Level A

Level B

Odds Ratio

PlatProfile

0

46.7564

1

Odds ratio for level A relative to level B

Regression Equation

Revenue 5 125.3 1 14.200 Population 1 22.81 Bus_Rating

Variable

Population

Bus_Rating

Seq Dev

35.84

25.46

10.37

19.21

55.05

Odds Ratios for Continuous Predictors

Odds Ratio

95% CI

Purchases

1.2541 (1.0469, 1.5024)

P-Value

0.000 14

0.000

0.006

R-sq(pred)

96.31%

SE Coef 4

40.9

0.910

5.77

DF

2

1

1

37

39

Model Summary

Deviance Deviance

R-Sq R-Sq(adj)

61.47%

65.10%

Upper 95% 19

222.0807

16.3517

36.4527

Multiple Regression and Model Building

Minitab Output of a Logistic Regression of the Credit Card Upgrade Data

23/11/15 4:37 pm

ix

23/11/15 5:27 pm

WHAT SOFTWARE IS

AVAILABLE

MEGASTAT® FOR MICROSOFT EXCEL® 2003,

2007, AND 2010 (AND EXCEL: MAC 2011)

MegaStat is a full-featured Excel add-in by J. B. Orris of Butler University that is available

with this text. It performs statistical analyses within an Excel workbook. It does basic

functions such as descriptive statistics, frequency distributions, and probability calculations,

as well as hypothesis testing, ANOVA, and regression.

MegaStat output is carefully formatted. Ease-of-use features include AutoExpand for quick

data selection and Auto Label detect. Since MegaStat is easy to use, students can focus on

learning statistics without being distracted by the software. MegaStat is always available

from Excel’s main menu. Selecting a menu item pops up a dialog box. MegaStat works with

all recent versions of Excel.

MINITAB®

Minitab® Student Version 17 is available to help students solve the business statistics exercises in the text. This software is available in the student version and can be packaged with

any McGraw-Hill business statistics text.

TEGRITY CAMPUS: LECTURES 24/7

Tegrity Campus is a service that makes class time available 24/7. With Tegrity Campus, you

can automatically capture every lecture in a searchable format for students to review when

they study and complete assignments. With a simple one-click start-and-stop process, you

capture all computer screens and corresponding audio. Students can replay any part of any

class with easy-to-use browser-based viewing on a PC or Mac.

Educators know that the more students can see, hear, and experience class resources, the

better they learn. In fact, studies prove it. With Tegrity Campus, students quickly recall key

moments by using Tegrity Campus’s unique search feature. This search helps students efficiently find what they need, when they need it, across an entire semester of class recordings.

Help turn all your students’ study time into learning moments immediately supported by your

lecture. To learn more about Tegrity, watch a two-minute Flash demo at http://tegritycampus

.mhhe.com.

x

bow49461_fm_i–xxi.indd 10

20/11/15 4:06 pm

www.downloadslide.net

ACKNOWLEDGMENTS

We wish to thank many people who have helped to

make this book a reality. As indicated on the title page,

we thank Professor Steven C. Huchendorf, University

of Minnesota; Dawn C. Porter, University of Southern

California; and Patrick J. Schur, Miami University; for

major contributions to this book. We also thank Susan

Cramer of Miami University for very helpful advice on

writing this new edition.

We also wish to thank the people at McGraw-Hill

for their dedication to this book. These people include senior brand manager Dolly Womack, who is

extremely helpful to the authors; senior development

editor Camille Corum, who has shown great dedication

to the improvement of this book; content project manager Harvey Yep, who has very capably and diligently

guided this book through its production and who has

been a tremendous help to the authors; and our former

executive editor Steve Scheutz, who always greatly

supported our books. We also thank executive editor

Michelle Janicek for her tremendous help in developing this new edition; our former executive editor Scott

Isenberg for the tremendous help he has given us in

developing all of our McGraw-Hill business statistics

books; and our former executive editor Dick Hercher,

who persuaded us to publish with McGraw-Hill.

We also wish to thank Sylvia Taylor and Nicoleta

Maghear, Hampton University, for accuracy checking Connect content; Patrick Schur, Miami University,

for developing learning resources; Ronny Richardson,

Kennesaw State University, for revising the instructor

PowerPoints and developing new guided examples and

learning resources; Denise Krallman, Miami University,

for updating the Test Bank; and James Miller, Dominican University, and Anne Drougas, Dominican University, for developing learning resources for the new

business analytics content. Most importantly, we wish

to thank our families for their acceptance, unconditional

love, and support.

xi

bow49461_fm_i–xxi.indd 11

20/11/15 4:06 pm

www.downloadslide.net

DEDICATION

Bruce L. Bowerman

To my wife, children, sister, and other family members:

Drena

Michael, Jinda, Benjamin, and Lex

Asa, Nicole, and Heather

Susan

Barney, Fiona, and Radeesa

Daphne, Chloe, and Edgar

Gwyneth and Tony

Callie, Bobby, Marmalade, Randy, and Penney

Clarence, Quincy, Teddy, Julius, Charlie, Sally, Milo, Zeke,

Bunch, Big Mo, Ozzie, Harriet, Sammy, Louise, Pat, Taylor,

and Jamie

Richard T. O’Connell

To my children and grandchildren:

Christopher, Bradley, Sam, and Joshua

Emily S. Murphree

To Kevin and the Math Ladies

xii

bow49461_fm_i–xxi.indd 12

20/11/15 4:06 pm

www.downloadslide.net

CHAPTER-BY-CHAPTER

REVISIONS FOR 8TH EDITION

Chapter 1

• Initial example made clearer.

• Two new graphical examples added to better intro•

•

•

•

•

duce quantitative and qualitative variables.

How to select random (and other types of) samples

moved from Chapter 7 to Chapter 1 and combined

with examples introducing statistical inference.

New subsection on statistical modeling added.

More on surveys and errors in surveys moved from

Chapter 7 to Chapter 1.

New optional section introducing business analytics

and data mining added.

Sixteen new exercises added.

Chapter 2

• Thirteen new data sets added for this chapter on

graphical descriptive methods.

• Fourteen new exercises added.

• New optional section on descriptive analytics

added.

Chapter 3

• Twelve new data sets added for this chapter on

numerical descriptive methods.

• Twenty-three new exercises added.

• Four new optional sections on predictive analytics

added:

one section on classification trees and regression trees;

one section on hierarchical clustering and

k-means clustering;

one section on factor analysis;

one section on association rule mining.

Chapter 4

• New subsection on probability modeling added.

• Exercises updated in this and all subsequent

chapters.

Chapter 5

• Discussion of general discrete probability dis-

tributions, the binomial distribution, the Poisson

distribution, and the hypergeometric distribution

simplified and shortened.

Chapter 6

• Discussion of continuous probability distributions

and normal plots simplified and shortened.

Chapter 7

• This chapter covers the sampling distribution of

the sample mean and the sampling distribution of

the sample proportion; as stated above, the material

on how to select samples and errors in surveys has

been moved to Chapter 1.

Chapter 8

• No significant changes when discussing confidence

intervals.

Chapter 9

• Discussion of formulating the null and alternative

hypotheses completely rewritten and expanded.

• Discussion of using critical value rules and

p-values to test a population mean completely

rewritten; development of and instructions

for using hypothesis testing summary boxes

improved.

• Short presentation of the logic behind finding

the probability of a Type II error when testing

a two-sided alternative hypothesis now

accompanies the general formula for calculating

this probability.

Chapter 10

• Statistical inference for a single population variance

and comparing two population variances moved

from its own chapter (the former Chapter 11) to

Chapter 10.

• More explicit examples of using hypothesis testing

summary boxes when comparing means, proportions, and variances.

Chapter 11

• New exercises for one-way, randomized block, and

two-way analysis of variance, with added emphasis

on students doing complete statistical analyses.

xiii

bow49461_fm_i–xxi.indd 13

20/11/15 4:06 pm

www.downloadslide.net

Chapter 12

• No significant changes when discussing chi-square

tests.

Chapter 13

• Discussion of basic simple linear regression analy-

sis streamlined, with discussion of r2 moved up and

discussions of t and F tests combined into one section.

• Section on residual analysis significantly shortened

and improved.

• New exercises, with emphasis on students doing

complete statistical analyses on their own.

• Section on logistic regression expanded.

• New section on neural networks added.

• New exercises, with emphasis on students doing

complete statistical analyses on their own.

Chapter 15

• Discussion of the Box–Jenkins methodology

slightly expanded and moved to Appendix B (at the

end of the book).

• New time series exercises, with emphasis on students doing complete statistical analyses on their

own.

Chapters 16, 17, and 18

Chapter 14

• Discussion of R2 moved up.

• Discussion of backward elimination added.

• New subsection on model validation and PRESS

• No significant changes. (These were the former

Chapters 17, 18, and 19 on control charts, nonparametrics, and decision theory.)

added.

xiv

bow49461_fm_i–xxi.indd 14

20/11/15 4:06 pm

www.downloadslide.net

BRIEF CONTENTS

Chapter 1

2

Chapter 2

54

An Introduction to Business Statistics and

Analytics

Descriptive Statistics: Tabular and

Graphical Methods and Descriptive

Analytics

Chapter 3

Descriptive Statistics: Numerical Methods

and Some Predictive Analytics

Chapter 4

Probability and Probability Models

Chapter 5

Discrete Random Variables

Chapter 13

530

Chapter 14

590

Chapter 15

680

Chapter 16

726

Chapter 17

778

Chapter 18

808

Simple Linear Regression Analysis

Multiple Regression and Model Building

Times Series Forecasting and Index

Numbers

134

Process Improvement Using

Control Charts

220

Nonparametric Methods

254

Decision Theory

Chapter 6

288

Chapter 7

326

Appendix A

828

Chapter 8

346

Appendix B

852

Chapter 9

382

Answers to Most Odd-Numbered

Exercises

863

References

871

Photo Credits

873

Index

875

Continuous Random Variables

Sampling Distributions

Confidence Intervals

Hypothesis Testing

Chapter 10

Statistical Inferences Based on

Two Samples

428

Chapter 11

464

Chapter 12

504

Experimental Design and Analysis

of Variance

Chi-Square Tests

Statistical Tables

An Introduction to Box–Jenkins Models

xv

bow49461_fm_i–xxi.indd 15

20/11/15 4:06 pm

www.downloadslide.net

CONTENTS

Chapter 1

An Introduction to Business Statistics

and Analytics

1.1 ■ Data 3

1.2 ■ Data Sources, Data Warehousing, and Big

Data 6

1.3 ■ Populations, Samples, and Traditional

Statistics 8

1.4 ■ Random Sampling, Three Case Studies That

Illustrate Statistical Inference, and Statistical

Modeling 10

1.5 ■ Business Analytics and Data Mining

(Optional) 21

1.6 ■ Ratio, Interval, Ordinal, and Nominative Scales

of Measurement (Optional) 25

1.7 ■ Stratified Random, Cluster, and Systematic

Sampling (Optional) 27

1.8 ■ More about Surveys and Errors in Survey

Sampling (Optional) 29

Appendix 1.1 ■ Getting Started with Excel 36

Appendix 1.2 ■ Getting Started with MegaStat 43

Appendix 1.3 ■ Getting Started with Minitab 46

Chapter 2

Descriptive Statistics: Tabular and Graphical

Methods and Descriptive Analytics

2.1 ■ Graphically Summarizing Qualitative Data 55

2.2 ■ Graphically Summarizing Quantitative Data 61

2.3 ■ Dot Plots 75

2.4 ■ Stem-and-Leaf Displays 76

2.5 ■ Contingency Tables (Optional) 81

2.6 ■ Scatter Plots (Optional) 87

2.7 ■ Misleading Graphs and Charts (Optional) 89

2.8 ■ Descriptive Analytics (Optional) 92

Appendix 2.1 ■ Tabular and Graphical Methods Using

Excel 103

Appendix 2.2 ■ Tabular and Graphical Methods Using

MegaStat 121

Appendix 2.3 ■ Tabular and Graphical Methods Using

Minitab 125

Chapter 3

Descriptive Statistics: Numerical Methods

and Some Predictive Analytics

Part 1 ■ Numerical Methods of Descriptive Statistics

3.1 ■ Describing Central Tendency 135

3.2 ■ Measures of Variation 145

3.3 ■ Percentiles, Quartiles, and Box-and-Whiskers

Displays 155

3.4 ■ Covariance, Correlation, and the Least Squares

Line (Optional) 161

3.5 ■ Weighted Means and Grouped Data

(Optional) 166

3.6 ■ The Geometric Mean (Optional) 170

Part 2 ■ Some Predictive Analytics (Optional)

3.7 ■ Decision Trees: Classification Trees and

Regression Trees (Optional) 172

3.8 ■ Cluster Analysis and Multidimensional Scaling

(Optional) 184

3.9 ■ Factor Analysis (Optional and Requires

Section 3.4) 192

3.10 ■ Association Rules (Optional) 198

Appendix 3.1 ■ Numerical Descriptive Statistics

Using Excel 207

Appendix 3.2 ■ Numerical Descriptive Statistics

Using MegaStat 210

Appendix 3.3 ■ Numerical Descriptive Statistics

Using Minitab 212

Appendix 3.4 ■ Analytics Using JMP 216

Chapter 4

Probability and Probability Models

4.1 ■ Probability, Sample Spaces, and Probability

Models 221

4.2 ■ Probability and Events 224

4.3 ■ Some Elementary Probability Rules 229

4.4 ■ Conditional Probability and Independence 235

xvi

bow49461_fm_i–xxi.indd 16

20/11/15 4:06 pm

www.downloadslide.net

4.5 ■ Bayes’ Theorem (Optional) 243

4.6 ■ Counting Rules (Optional) 247

Chapter 8

Chapter 5

8.1 ■ z-Based Confidence Intervals for a Population

Mean: σ Known 347

8.2 ■ t-Based Confidence Intervals for a Population

Mean: σ Unknown 355

8.3 ■ Sample Size Determination 364

8.4 ■ Confidence Intervals for a Population

Proportion 367

8.5 ■ Confidence Intervals for Parameters of Finite

Populations (Optional) 373

Appendix 8.1 ■ Confidence Intervals Using

Excel 379

Appendix 8.2 ■ Confidence Intervals Using

MegaStat 380

Appendix 8.3 ■ Confidence Intervals Using

Minitab 381

Discrete Random Variables

5.1 ■ Two Types of Random Variables 255

5.2 ■ Discrete Probability Distributions 256

5.3 ■ The Binomial Distribution 263

5.4 ■ The Poisson Distribution (Optional) 274

5.5 ■ The Hypergeometric Distribution

(Optional) 278

5.6 ■ Joint Distributions and the Covariance

(Optional) 280

Appendix 5.1 ■ Binomial, Poisson, and

Hypergeometric Probabilities

Using Excel 284

Appendix 5.2 ■ Binomial, Poisson, and

Hypergeometric Probabilities

Using MegaStat 286

Appendix 5.3 ■ Binomial, Poisson, and

Hypergeometric Probabilities

Using Minitab 287

Confidence Intervals

Chapter 9

Hypothesis Testing

6.1 ■ Continuous Probability Distributions 289

6.2 ■ The Uniform Distribution 291

6.3 ■ The Normal Probability Distribution 294

6.4 ■ Approximating the Binomial Distribution by

Using the Normal Distribution (Optional) 310

6.5 ■ The Exponential Distribution (Optional) 313

6.6 ■ The Normal Probability Plot (Optional) 316

Appendix 6.1 ■ Normal Distribution Using

Excel 321

Appendix 6.2 ■ Normal Distribution Using

MegaStat 322

Appendix 6.3 ■ Normal Distribution Using

Minitab 323

9.1 ■ The Null and Alternative Hypotheses and Errors

in Hypothesis Testing 383

9.2 ■ z Tests about a Population Mean:

σ Known 390

9.3 ■ t Tests about a Population Mean:

σ Unknown 402

9.4 ■ z Tests about a Population Proportion 406

9.5 ■ Type II Error Probabilities and Sample Size

Determination (Optional) 411

9.6 ■ The Chi-Square Distribution 417

9.7 ■ Statistical Inference for a Population Variance

(Optional) 418

Appendix 9.1 ■ One-Sample Hypothesis Testing

Using Excel 424

Appendix 9.2 ■ One-Sample Hypothesis Testing

Using MegaStat 425

Appendix 9.3 ■ One-Sample Hypothesis Testing

Using Minitab 426

Chapter 7

Chapter 10

Chapter 6

Continuous Random Variables

Sampling Distributions

Statistical Inferences Based on Two Samples

7.1 ■ The Sampling Distribution of the Sample

Mean 327

7.2 ■ The Sampling Distribution of the Sample

Proportion 339

7.3 ■ Derivation of the Mean and the Variance of

the Sample Mean (Optional) 342

10.1 ■ Comparing Two Population Means by Using

Independent Samples 429

10.2 ■ Paired Difference Experiments 439

10.3 ■ Comparing Two Population Proportions by

Using Large, Independent Samples 445

10.4 ■ The F Distribution 451

xvii

bow49461_fm_i–xxi.indd 17

20/11/15 4:06 pm

www.downloadslide.net

10.5 ■ Comparing Two Population Variances by Using

Independent Samples 453

Appendix 10.1 ■ Two-Sample Hypothesis Testing

Using Excel 459

Appendix 10.2 ■ Two-Sample Hypothesis Testing

Using MegaStat 460

Appendix 10.3 ■ Two-Sample Hypothesis Testing

Using Minitab 462

Chapter 11

Experimental Design and Analysis

of Variance

11.1 ■ Basic Concepts of Experimental Design 465

11.2 ■ One-Way Analysis of Variance 467

11.3 ■ The Randomized Block Design 479

11.4 ■ Two-Way Analysis of Variance 485

Appendix 11.1 ■ Experimental Design and Analysis

of Variance Using Excel 497

Appendix 11.2 ■ Experimental Design and Analysis

of Variance Using MegaStat 498

Appendix 11.3 ■ Experimental Design and Analysis

of Variance Using Minitab 500

Chapter 12

Chi-Square Tests

12.1 ■ Chi-Square Goodness-of-Fit Tests 505

12.2 ■ A Chi-Square Test for Independence 514

Appendix 12.1 ■ Chi-Square Tests Using Excel 523

Appendix 12.2 ■ Chi-Square Tests Using

MegaStat 525

Appendix 12.3 ■ Chi-Square Tests Using

Minitab 527

Chapter 13

Simple Linear Regression Analysis

13.1 ■ The Simple Linear Regression Model and

the Least Squares Point Estimates 531

13.2 ■ Simple Coefficients of Determination and

Correlation 543

13.3 ■ Model Assumptions and the Standard

Error 548

13.4 ■ Testing the Significance of the Slope and

y-Intercept 551

13.5 ■ Confidence and Prediction Intervals 559

13.6 ■ Testing the Significance of the Population

Correlation Coefficient (Optional) 564

13.7 ■ Residual Analysis 565

Appendix 13.1 ■ Simple Linear Regression Analysis

Using Excel 583

Appendix 13.2 ■ Simple Linear Regression Analysis

Using MegaStat 585

Appendix 13.3 ■ Simple Linear Regression Analysis

Using Minitab 587

Chapter 14

Multiple Regression and Model Building

14.1 ■ The Multiple Regression Model and the Least

Squares Point Estimates 591

14.2 ■ R2 and Adjusted R2 601

14.3 ■ Model Assumptions and the Standard

Error 603

14.4 ■ The Overall F Test 605

14.5 ■ Testing the Significance of an Independent

Variable 607

14.6 ■ Confidence and Prediction Intervals 611

14.7 ■ The Sales Representative Case: Evaluating

Employee Performance 614

14.8 ■ Using Dummy Variables to Model Qualitative

Independent Variables (Optional) 616

14.9 ■ Using Squared and Interaction Variables

(Optional) 625

14.10 ■ Multicollinearity, Model Building, and Model

Validation (Optional) 631

14.11 ■ Residual Analysis and Outlier Detection in

Multiple Regression (Optional) 642

14.12 ■ Logistic Regression (Optional) 647

14.13 ■ Neural Networks (Optional) 653

Appendix 14.1 ■ Multiple Regression Analysis Using

Excel 666

Appendix 14.2 ■ Multiple Regression Analysis Using

MegaStat 668

Appendix 14.3 ■ Multiple Regression Analysis Using

Minitab 671

Appendix 14.4 ■ Neural Network Analysis in

JMP 677

Chapter 15

Times Series Forecasting and Index

Numbers

15.1 ■ Time Series Components and Models 681

15.2 ■ Time Series Regression 682

15.3 ■ Multiplicative Decomposition 691

15.4 ■ Simple Exponential Smoothing 699

15.5 ■ Holt–Winters’ Models 704

xviii

bow49461_fm_i–xxi.indd 18

20/11/15 4:06 pm

www.downloadslide.net

15.6 ■ Forecast Error Comparisons 712

15.7 ■ Index Numbers 713

Appendix 15.1 ■ Time Series Analysis Using

Excel 722

Appendix 15.2 ■ Time Series Analysis Using

MegaStat 723

Appendix 15.3 ■ Time Series Analysis Using

Minitab 725

17.4 ■ Comparing Several Populations Using the

Kruskal–Wallis H Test 794

17.5 ■ Spearman’s Rank Correlation Coefficient 797

Appendix 17.1 ■ Nonparametric Methods Using

MegaStat 802

Appendix 17.2 ■ Nonparametric Methods Using

Minitab 805

Chapter 16

Decision Theory

Process Improvement Using Control Charts

16.1 ■ Quality: Its Meaning and a Historical

Perspective 727

16.2 ■ Statistical Process Control and Causes of

Process Variation 731

16.3 ■ Sampling a Process, Rational Subgrouping,

and Control Charts 734

16.4 ■ x– and R Charts 738

16.5 ■ Comparison of a Process with Specifications:

Capability Studies 754

16.6 ■ Charts for Fraction Nonconforming 762

16.7 ■ Cause-and-Effect and Defect Concentration

Diagrams (Optional) 768

Appendix 16.1 ■ Control Charts Using MegaStat 775

Appendix 16.2 ■ Control Charts Using Minitab 776

Chapter 17

Nonparametric Methods

17.1 ■ The Sign Test: A Hypothesis Test about the

Median 780

17.2 ■ The Wilcoxon Rank Sum Test 784

17.3 ■ The Wilcoxon Signed Ranks Test 789

Chapter 18

18.1 ■ Introduction to Decision Theory 809

18.2 ■ Decision Making Using Posterior

Probabilities 815

18.3 ■ Introduction to Utility Theory 823

Appendix A

Statistical Tables 828

Appendix B

An Introduction to Box–Jenkins Models 852

Answers to Most Odd-Numbered

Exercises 863

References 871

Photo Credits 873

Index 875

xix

bow49461_fm_i–xxi.indd 19

20/11/15 4:06 pm

www.downloadslide.net

bow49461_fm_i–xxi.indd 20

20/11/15 4:06 pm

www.downloadslide.net

Business Statistics in Practice

Using Modeling, Data, and Analytics

EIGHEIGHTH EDITION

bow49461_fm_i–xxi.indd 21

20/11/15 4:06 pm

CHAPTER 1

www.downloadslide.net

An

Introduction

to Business

Statistics and

Analytics

Learning Objectives

When you have mastered the material in this chapter, you will be able to:

LO1-1 Define a variable.

LO1-2 Describe the difference between a quantitative

variable and a qualitative variable.

LO1-3 Describe the difference between cross-

sectional data and time series data.

LO1-4 Construct and interpret a time series (runs) plot.

LO1-5 Identify the different types of data sources:

existing data sources, experimental studies,

and observational studies.

LO1-6 Explain the basic ideas of data

warehousing and big data.

LO1-7 Describe the difference between a

population and a sample.

LO1-8 Distinguish between descriptive statistics

and statistical inference.

LO1-9 Explain the concept of random sampling

and select a random sample.

LO1-10 Explain the basic concept of statistical

modeling.

LO1-11 Explain some of the uses of business

analytics and data mining (Optional).

LO1-12 Identify the ratio, interval, ordinal, and

nominative scales of measurement (Optional).

LO1-13 Describe the basic ideas of stratified random,

cluster, and systematic sampling (Optional).

LO1-14 Describe basic types of survey questions, survey

procedures, and sources of error (Optional).

Chapter Outline

1.1

1.2

1.3

1.4

Data

Data Sources, Data Warehousing, and Big Data

Populations, Samples, and Traditional Statistics

Random Sampling, Three Case Studies

That Illustrate Statistical Inference, and

Statistical Modeling

1.5 Business Analytics and Data Mining (Optional)

1.6 Ratio, Interval, Ordinal, and Nominative

Scales of Measurement (Optional)

1.7 Stratified Random, Cluster, and Systematic

Sampling (Optional)

1.8 More about Surveys and Errors in Survey

Sampling (Optional)

2

bow49461_ch01_002-053.indd 2

19/10/15 12:53 pm

www.downloadslide.net

T

he subject of statistics involves the study

of how to collect, analyze, and interpret

data. Data are facts and figures from which

conclusions can be drawn. Such conclusions are

important to the decision making of many professions and organizations. For example, economists

use conclusions drawn from the latest data on unemployment and inflation to help the government

make policy decisions. Financial planners use recent

trends in stock market prices and economic conditions to make investment decisions. Accountants use

sample data concerning a company’s actual sales revenues to assess whether the company’s claimed sales

revenues are valid. Marketing professionals and

data miners help businesses decide which products

to develop and market and which consumers to

target in marketing campaigns by using data

that reveal consumer preferences. Production supervisors use manufacturing data to evaluate, control,

and improve product quality. Politicians rely on data

from public opinion polls to formulate legislation

and to devise campaign strategies. Physicians and

hospitals use data on the effectiveness of drugs and

surgical procedures to provide patients with the best

possible treatment.

In this chapter we begin to see how we collect

and analyze data. As we proceed through the chapter, we introduce several case studies. These case

studies (and others to be introduced later) are revisited throughout later chapters as we learn the statistical methods needed to analyze them. Briefly, we

will begin to study four cases:

The Cell Phone Case: A bank estimates its cellular

phone costs and decides whether to outsource

management of its wireless resources by studying

the calling patterns of its employees.

an automaker studies the gas mileage of its new

midsize model.

The Marketing Research Case: A beverage company

investigates consumer reaction to a new bottle

design for one of its popular soft drinks.

The Car Mileage Case: To determine if it qualifies

for a federal tax credit based on fuel economy,

The Disney Parks Case: Walt Disney World Parks

and Resorts in Orlando, Florida, manages Disney

parks worldwide and uses data gathered from

its guests to give these guests a more “magical”

experience and increase Disney revenues and

profits.

1.1 Data

LO1-1

Data sets, elements, and variables

Define a variable.

We have said that data are facts and figures from which conclusions can be drawn. Together,

the data that are collected for a particular study are referred to as a data set. For example,

Table 1.1 is a data set that gives information about the new homes sold in a Florida luxury

home development over a recent three-month period. Potential home buyers could choose

either the “Diamond” or the “Ruby” home model design and could have the home built on

either a lake lot or a treed lot (with no water access).

In order to understand the data in Table 1.1, note that any data set provides information

about some group of individual elements, which may be people, objects, events, or other

entities. The information that a data set provides about its elements usually describes one or

more characteristics of these elements.

Any characteristic of an element is called a variable.

Table 1.1

A Data Set Describing Five Home Sales

DS

HomeSales

Home

Model Design

Lot Type

List Price

Selling Price

1

2

3

4

5

Diamond

Ruby

Diamond

Diamond

Ruby

Lake

Treed

Treed

Treed

Lake

$494,000

$447,000

$494,000

$494,000

$447,000

$494,000

$398,000

$440,000

$469,000

$447,000

3

bow49461_ch01_002-053.indd 3

19/10/15 12:53 pm

www.downloadslide.net

4

Chapter 1

LO1-2

Describe the difference

between a quantitative

variable and a

qualitative variable.

Table 1.2

2014 MLB Payrolls

DS MLB

Team

2014

Payroll

Los Angeles Dodgers

New York Yankees

Philadelphia Phillies

Boston Red Sox

Detroit Tigers

Los Angeles Angels

San Francisco Giants

Texas Rangers

Washington Nationals

Toronto Blue Jays

Arizona Diamondbacks

Cincinnati Reds

St. Louis Cardinals

Atlanta Braves

Baltimore Orioles

Milwaukee Brewers

Colorado Rockies

Seattle Mariners

Kansas City Royals

Chicago White Sox

San Diego Padres

New York Mets

Chicago Cubs

Minnesota Twins

Oakland Athletics

Cleveland Indians

Pittsburgh Pirates

Tampa Bay Rays

Miami Marlins

Houston Astros

235

204

180

163

162

156

154

136

135

133

113

112

111

111

107

104

96

92

92

91

90

89

89

86

83

83

78

77

48

45

Source: http://baseball.about

.com/od/newsrumors/fl/2014

-Major-League-Baseball-Team

-Payrolls.htm (accessed January 14,

2015).

Figure 1.1

40

An Introduction to Business Statistics and Analytics

For the data set in Table 1.1, each sold home is an element, and four variables are used to

describe the homes. These variables are (1) the home model design, (2) the type of lot on

which the home was built, (3) the list (asking) price, and (4) the (actual) selling price. Moreover, each home model design came with “everything included”—specifically, a complete,

luxury interior package and a choice (at no price difference) of one of three different architectural exteriors. The builder made the list price of each home solely dependent on the model

design. However, the builder gave various price reductions for homes built on treed lots.

The data in Table 1.1 are real (with some minor changes to protect privacy) and were provided by a business executive—a friend of the authors—who recently received a promotion

and needed to move to central Florida. While searching for a new home, the executive and his

family visited the luxury home community and decided they wanted to purchase a Diamond

model on a treed lot. The list price of this home was $494,000, but the developer offered to

sell it for an “incentive” price of $469,000. Intuitively, the incentive price’s $25,000 savings

off list price seemed like a good deal. However, the executive resisted making an immediate decision. Instead, he decided to collect data on the selling prices of new homes recently

sold in the community and use the data to assess whether the developer might accept a lower

offer. In order to collect “relevant data,” the executive talked to local real estate professionals

and learned that new homes sold in the community during the previous three months were

a good indicator of current home value. Using real estate sales records, the executive also

learned that five of the community’s new homes had sold in the previous three months. The

data given in Table 1.1 are the data that the executive collected about these five homes.

When the business executive examined Table 1.1, he noted that homes on lake lots had sold

at their list price, but homes on treed lots had not. Because the executive and his family wished

to purchase a Diamond model on a treed lot, the executive also noted that two Diamond models on treed lots had sold in the previous three months. One of these Diamond models had

sold for the incentive price of $469,000, but the other had sold for a lower price of $440,000.

Hoping to pay the lower price for his family’s new home, the executive offered $440,000 for

the Diamond model on the treed lot. Initially, the home builder turned down this offer, but two

days later the builder called back and accepted the offer. The executive had used data to buy

the new home for $54,000 less than the list price and $29,000 less than the incentive price!

Quantitative and qualitative variables

For any variable describing an element in a data set, we carry out a measurement to assign

a value of the variable to the element. For example, in the real estate example, real estate

sales records gave the actual selling price of each home to the nearest dollar. As another

example, a credit card company might measure the time it takes for a cardholder’s bill to be

paid to the nearest day. Or, as a third example, an automaker might measure the gasoline

mileage obtained by a car in city driving to the nearest one-tenth of a mile per gallon by

conducting a mileage test on a driving course prescribed by the Environmental Protection

Agency (EPA). If the possible values of a variable are numbers that represent quantities (that

is, “how much” or “how many”), then the variable is said to be quantitative. For example,

(1) the actual selling price of a home, (2) the payment time of a bill, (3) the gasoline mileage of a car, and (4) the 2014 payroll of a Major League Baseball team are all quantitative

variables. Considering the last example, Table 1.2 in the page margin gives the 2014 payroll

(in millions of dollars) for each of the 30 Major League Baseball (MLB) teams. Moreover,

Figure 1.1 portrays the team payrolls as a dot plot. In this plot, each team payroll is shown

A Dot Plot of 2014 MLB Payrolls (Payroll Is a Quantitative Variable)

60

80

100

120

140

160

180

200

220

240

2014 Payroll (in millions of dollars)

bow49461_ch01_002-053.indd 4

23/11/15 4:54 pm

## Business vocabulary in practice pptx

## Giáo trình Advanced Certificate in Information Technology-PassEd part 1 pptx

## Giáo trình Advanced Certificate in Information Technology-PassEd part 3 potx

## Giáo trình Advanced Certificate in Information Technology-PassEd part 4 docx

## Giáo trình Advanced Certificate in Information Technology-PassEd part 5 ppsx

## Giáo trình Advanced Certificate in Information Technology-PassEd part 6 pps

## Giáo trình Advanced Certificate in Information Technology-PassEd part 7 doc

## Giáo trình Advanced Certificate in Information Technology-PassEd part 8 docx

## Giáo trình Advanced Certificate in Information Technology-PassEd part 9 ppsx

## Giáo trình Advanced Certificate in Information Technology-PassEd part 10 pdf

Tài liệu liên quan