Analysis

of Messy Data

VOLUME III:

ANALYSIS OF COVARIANCE

George A. Milliken

Dallas E. Johnson

CHAPMAN & HALL/CRC

A CRC Pr ess Compan y

Boca Raton London Ne w York Washington, D.C.

C0317fm frame Page 4 Monday, July 16, 2001 7:52 AM

Library of Congress Cataloging-in-Publication Data

Milliken, George A., 1943–

Analysis of messy data / George A. Milliken, Dallas E. Johnson.

2 v. : ill. ; 24 cm.

Includes bibliographies and indexes.

Contents: v. 1. Designed experiments -- v. 2. Nonreplicated

experiments.

Vol. 2 has imprint: New York : Van Nostrand Reinhold.

ISBN 0-534-02713-X (v. 1) : $44.00 -- ISBN 0-442-24408-8 (v. 2)

1. Analysis of variance. 2. Experimental design. 3. Sampling

(Statistics) I. Johnson, Dallas E., 1938– . II. Title.

QA279 .M48 1984

519.5′352--dc19

84-000839

This book contains information obtained from authentic and highly regarded sources. Reprinted material

is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable

efforts have been made to publish reliable data and information, but the author and the publisher cannot

assume responsibility for the validity of all materials or for the consequences of their use.

Apart from any fair dealing for the purpose of research or private study, or criticism or review, as permitted

under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored

or transmitted, in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without the prior permission

in writing of the publishers, or in the case of reprographic reproduction only in accordance with the

terms of the licenses issued by the Copyright Licensing Agency in the UK, or in accordance with the

terms of the license issued by the appropriate Reproduction Rights Organization outside the UK.

The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for

creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC

for such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are

used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 2002 by Chapman & Hall/CRC

No claim to original U.S. Government works

International Standard Book Number 1-584-88083-X

Library of Congress Card Number 84-000839

Printed in the United States of America 1 2 3 4 5 6 7 8 9 0

Printed on acid-free paper

C0317fm frame Page 5 Monday, June 25, 2001 1:04 PM

Table of Contents

Chapter 1

Introduction to the Analysis of Covariance

1.1

Introduction

1.2

The Covariate Adjustment Process

1.3

A General AOC Model and the Basic Philosophy

References

Chapter 2

One-Way Analysis of Covariance — One Covariate in a

Completely Randomized Design Structure

2.1

2.2

2.3

2.4

The Model

Estimation

Strategy for Determining the Form of the Model

Comparing the Treatments or Regression Lines

2.4.1 Equal Slopes Model

2.4.2 Unequal Slopes Model-Covariate by Treatment Interaction

2.5

Confidence Bands about the Difference of Two Treatments

2.6

Summary of Strategies

2.7

Analysis of Covariance Computations via the SAS® System

2.7.1 Using PROC GLM and PROC MIXED

2.7.2 Using JMP®

2.8

Conclusions

References

Exercise

Chapter 3

3.1

3.2

3.3

3.4

3.5

3.6

Examples: One-Way Analysis of Covariance — One Covariate

in a Completely Randomized Design Structure

Introduction

Chocolate Candy — Equal Slopes

3.2.1 Analysis Using PROC GLM

3.2.2 Analysis Using PROC MIXED

3.2.3 Analysis Using JMP®

Exercise Programs and Initial Resting Heart Rate — Unequal Slopes

Effect of Diet on Cholesterol Level: An Exception to the Basic

Analysis of Covariance Strategy

Change from Base Line Analysis Using Effect of Diet on Cholesterol

Level Data

Shoe Tread Design Data for Exception to the Basic Strategy

© 2002 by CRC Press LLC

C0317fm frame Page 6 Monday, June 25, 2001 1:04 PM

3.7

Equal Slopes within Groups of Treatments and Unequal Slopes

between Groups

3.8

Unequal Slopes and Equal Intercepts — Part 1

3.9

Unequal Slopes and Equal Intercepts — Part 2

References

Exercises

Chapter 4

Multiple Covariates in a One-Way Treatment Structure in a

Completely Randomized Design Structure

4.1

4.2

4.3

4.4

4.5

Introduction

The Model

Estimation

Example: Driving A Golf Ball with Different Shafts

Example: Effect of Herbicides on the Yield of Soybeans — Three

Covariates

4.6 Example: Models That Are Quadratic Functions of the Covariate

4.7 Example: Comparing Response Surface Models

Reference

Exercises

Chapter 5

Two-Way Treatment Structure and Analysis of Covariance in

a Completely Randomized Design Structure

5.1

5.2

5.3

Introduction

The Model

Using the SAS® System

5.3.1 Using PROC GLM and PROC MIXED

5.3.2 Using JMP®

5.4

Example: Average Daily Gains and Birth Weight — Common Slope

5.5

Example: Energy from Wood of Different Types of Trees — Some

Unequal Slopes

5.6

Missing Treatment Combinations

5.7

Example: Two-Way Treatment Structure with Missing Cells

5.8

Extensions

Reference

Exercises

Chapter 6

6.1

6.2

6.3

6.4

6.5

6.6

Beta-Hat Models

Introduction

The Beta-Hat Model and Analysis

Testing Equality of Parameters

Complex Treatment Structures

Example: One-Way Treatment Structure

Example: Two-Way Treatment Structure

© 2002 by CRC Press LLC

C0317fm frame Page 7 Monday, June 25, 2001 1:04 PM

6.7

Summary

Exercises

Chapter 7

Variable Selection in the Analysis of Covariance Model

7.1

Introduction

7.2

Procedure for Equal Slopes

7.3

Example: One-Way Treatment Structure with Equal Slopes Model

7.4

Some Theory

7.5

When Slopes are Possibly Unequal

References

Exercises

Chapter 8

Comparing Models for Several Treatments

8.1

Introduction

8.2

Testing Equality of Models for a One-Way Treatment Structure

8.3

Comparing Models for a Two-Way Treatment Structure

8.4

Example: One-Way Treatment Structure with One Covariate

8.5

Example: One-Way Treatment Structure with Three Covariates

8.6

Example: Two-Way Treatment Structure with One Covariate

8.7

Discussion

References

Exercises

Chapter 9

Two Treatments in a Randomized Complete Block Design

Structure

9.1

9.2

9.3

9.4

9.5

9.6

9.7

9.8

Introduction

Complete Block Designs

Within Block Analysis

Between Block Analysis

Combining Within Block and Between Block Information

Determining the Form of the Model

Common Slope Model

Comparing the Treatments

9.8.1 Equal Slopes Models

9.8.2 Unequal Slopes Model

9.9

Confidence Intervals about Differences of Two Regression Lines

9.9.1 Within Block Analysis

9.9.2 Combined Within Block and Between Block Analysis

9.10 Computations for Model 9.1 Using the SAS® System

9.11 Example: Effect of Drugs on Heart Rate

9.12 Summary

References

Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 8 Monday, June 25, 2001 1:04 PM

Chapter 10 More Than Two Treatments in a Blocked Design Structure

10.1

10.2

10.3

Introduction

RCB Design Structure — Within and Between Block Information

Incomplete Block Design Structure — Within and Between Block

Information

10.4 Combining Between Block and Within Block Information

10.5 Example: Five Treatments in RCB Design Structure

10.6 Example: Balanced Incomplete Block Design Structure with Four

Treatments

10.7 Example: Balanced Incomplete Block Design Structure with Four

Treatments Using JMP®

10.8 Summary

References

Exercises

Chapter 11 Covariate Measured on the Block in RCB and Incomplete

Block Design Structures

11.1

11.2

11.3

11.4

11.5

11.6

Introduction

The Within Block Model

The Between Block Model

Combining Within Block and Between Block Information

Common Slope Model

Adjusted Means and Comparing Treatments

11.6.1 Common Slope Model

11.6.2 Non-Parallel Lines Model

11.7 Example: Two Treatments

11.8 Example: Four Treatments in RCB

11.9 Example: Four Treatments in BIB

11.10 Summary

References

Exercises

Chapter 12 Random Effects Models with Covariates

12.1

12.2

12.3

12.4

Introduction

The Model

Estimation of the Variance Components

Changing Location of the Covariate Changes the Estimates of the

Variance Components

12.5 Example: Balanced One-Way Treatment Structure

12.6 Example: Unbalanced One-Way Treatment Structure

12.7 Example: Two-Way Treatment Structure

12.8 Summary

References

Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 9 Monday, June 25, 2001 1:04 PM

Chapter 13 Mixed Models

13.1

13.2

13.3

13.4

Introduction

The Matrix Form of the Mixed Model

Fixed Effects Treatment Structure

Estimation of Fixed Effects and Some Small Sample Size

Approximations

13.5 Fixed Treatments and Locations Random

13.6 Example: Two-Way Mixed Effects Treatment Structure in a CRD

13.7 Example: Treatments are Fixed and Locations are Random with a

RCB at Each Location

References

Exercises

Chapter 14 Analysis of Covariance Models with Heterogeneous Errors

14.1

14.2

14.3

Introduction

The Unequal Variance Model

Tests for Homogeneity of Variances

14.3.1 Levene’s Test for Equal Variances

14.3.2 Hartley’s F-Max Test for Equal Variances

14.3.3 Bartlett’s Test for Equal Variances

14.3.4 Likelihood Ratio Test for Equal Variances

14.4 Estimating the Parameters of the Regression Model

14.4.1 Least Squares Estimation

14.4.2 Maximum Likelihood Methods

14.5 Determining the Form of the Model

14.6 Comparing the Models

14.6.1 Comparing the Nonparallel Lines Models

14.6.2 Comparing the Parallel Lines Models

14.7 Computational Issues

14.8 Example: One-Way Treatment Structure with Unequal Variances

14.9 Example: Two-Way Treatment Structure with Unequal Variances

14.10 Example: Treatments in Multi-location Trial

14.11 Summary

References

Exercises

Chapter 15 Analysis of Covariance for Split-Plot and Strip-Plot Design

Structures

15.1

15.2

15.3

15.4

Introduction

Some Concepts

Covariate Measured on the Whole Plot or Large Size of Experimental

Unit

Covariate is Measured on the Small Size of Experimental Unit

© 2002 by CRC Press LLC

C0317fm frame Page 10 Monday, June 25, 2001 1:04 PM

15.5

Covariate is Measured on the Large Size of Experimental Unit and a

Covariate is Measured on the Small Size of Experimental Unit

15.6 General Representation of the Covariate Part of the Model

15.6.1 Covariate Measured on Large Size of Experimental Unit

15.6.2 Covariate Measured on the Small Size of Experimental Units

15.6.3 Summary of General Representation

15.7 Example: Flour Milling Experiment — Covariate Measured on the

Whole Plot

15.8 Example: Cookie Baking

15.9 Example: Teaching Methods with One Covariate Measured on the

Large Size Experimental Unit and One Covariate Measured on the

Small Size Experimental Unit

15.10 Example: Comfort Study in a Strip-Plot Design with Three Sizes of

Experimental Units and Three Covariates

15.11 Conclusions

References

Exercises

Chapter 16 Analysis of Covariance for Repeated Measures Designs

16.1

16.2

16.3

16.4

Introduction

The Covariance Part of the Model — Selecting R

Covariance Structure of the Data

Specifying the Random and Repeated Statements for PROC MIXED

of the SAS® System

16.5 Selecting an Adequate Covariance Structure

16.6 Example: Systolic Blood Pressure Study with Covariate Measured

on the Large Size Experimental Unit

16.7 Example: Oxide Layer Development Experiment with Three Sizes

of Experimental Units Where the Repeated Measure is at the Middle

Size of Experimental Unit and the Covariate is Measured on the

Small Size Experimental Unit

16.8 Conclusions

References

Exercises

Chapter 17 Analysis of Covariance for Nonreplicated Experiments

17.1

17.2

17.3.

17.4

17.5

17.6

17.7

17.8

Introduction

Experiments with A Single Covariate

Experiments with Multiple Covariates

Selecting Non-null and Null Partitions

Estimating the Parameters

Example: Milling Flour Using Three Factors Each at Two Levels

Example: Baking Bread Using Four Factors Each at Two Levels

Example: Hamburger Patties with Four Factors Each at Two Levels

© 2002 by CRC Press LLC

C0317fm frame Page 11 Monday, June 25, 2001 1:04 PM

17.9

Example: Strength of Composite Material Coupons with Two

Covariates

17.10 Example: Effectiveness of Paint on Bricks with Unequal Slopes

17.11 Summary

References

Exercises

Chapter 18 Special Applications of Analysis of Covariance

18.1

18.2

18.3

18.4

Introduction

Blocking and Analysis of Covariance

Treatments Have Different Ranges of the Covariate

Nonparametric Analysis of Covariance

18.4.1 Heart Rate Data from Exercise Programs

18.4.2 Average Daily Gain Data from a Two-Way Treatment

Structure

18.5 Crossover Design with Covariates

18.6 Nonlinear Analysis of Covariance

18.7 Effect of Outliers

References

Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 13 Monday, June 25, 2001 1:04 PM

Preface

Analysis of covariance is a statistical procedure that enables one to incorporate

information about concomitant variables into the analysis of a response variable.

Sometimes this is done in an attempt to reduce experimental error. Other times it is

done to better understand the phenomenon being studied. The approach used in this

book is that the analysis of covariance model is described as a method of comparing

a series of regression models — one for each of the levels of a factor or combinations

of levels of factors being studied. Since covariance models are regression models,

analysts can use all of the methods of regression analysis to deal with problems

such as lack of fit, outliers, etc. The strategies described in this book will enable the

reader to appropriately formulate and analyze various kinds of covariance models.

When covariates are measured and incorporated into the analysis of a response

variable, the main objective of analysis of covariance is to compare treatments or

treatment combinations at common values of the covariates. This is particularly true

when the experimental units assigned to each of the treatment combinations may

have differing values of the covariates. Comparing treatments is dependent on the

form of the covariance model and thus care must be taken so that mistakes are not

made when drawing conclusions.

The goal of this book is to present the structure and philosophy for using the

analysis of covariance by including descriptions of methodologies, illustrating the

methodologies by analyzing numerous data sets, and occasionally furnishing some

theory when required. Our aim is to provide data analysts with tools for analyzing

data with covariates and to enable them to appropriately interpret the results.

Some of the methods and techniques described in this book are not available in

other books, but two issues of Biometrics (1957, Volume 13, Number 3, and 1982,

Volume 38, Number 3) were dedicated to the topic of analysis of covariance. The

topics presented are among those that we, as consulting statisticians, have found to

be most helpful in analyzing data when covariates are available for possible inclusion

in the analysis.

Readers of this book will learn how to:

• Formulate appropriate analysis of covariance models

• Simplify analysis of covariance models

• Compare levels of a factor or of levels of combinations of factors when

the model involves covariates

• Construct and analyze a model with two or more factors in the treatment

structure

• Analyze two-way treatment structures with missing cells

• Compare models using the beta-hat model

• Perform variable selection within the analysis of covariance model

© 2002 by CRC Press LLC

C0317fm frame Page 14 Monday, June 25, 2001 1:04 PM

• Analyze models with blocking in the design structure and use combined

intra-block and inter-block information about the slopes of the regression

models

• Use random statements in PROC MIXED to specify random coefficient

regression models

• Carry out the analysis of covariance in a mixed model framework

• Incorporate unequal treatment variances into the analysis

• Specify the analysis of covariance models for split-plot, strip-plot and

repeated measures designs both in terms of the regression models and the

covariance structures of the repeated measures

• Incorporate covariates into the analysis of nonreplicated experiments, thus

extending some of the results in Analysis of Messy Data, Volume II

The last chapter consists of a collection of examples that deal with (1) using the

covariate to form blocks, (2) crossover designs, (3) nonparametric analysis of covariance, (4) using a nonlinear model for the covariate model, and (5) the process of

examining mixed analysis of covariance models for possible outliers.

The approach used in this book is similar to that used in the first two volumes.

Each topic is covered from a practical viewpoint, emphasizing the implementation

of the methods much more than the theory behind the methods. Some theory has

been presented for some of the newer methodologies. The book utilized the procedures of the SAS® system and JMP® software packages to carry out the computations

and few computing formulae are presented. Either SAS® system code or JMP® menus

are presented for the analysis of the data sets in the examples. The data in the

examples (except for those using chocolate chips) were generated to simulate real

world applications that we have encountered in our consulting experiences.

This book is intended for everyone who analyzes data. The reader should have

a knowledge of analysis of variance and regression analysis as well as basic statistical

ideas including randomization, confidence intervals, and hypothesis testing. The first

four chapters contain the information needed to form a basic philosophy for using

the analysis of covariance with a one-way treatment structure and should be read

by everyone. As one progresses through the book, the topics become more complex

by going from designs with blocking to split-plot and repeated measures designs.

Before reading about a particular topic in the later chapters, read the first four

chapters. Knowledge of Chapters 13 and 14 from Analysis of Messy Data, Volume I:

Designed Experiments would be useful for understanding the part of Chapter 5

involving missing cells. The information in Chapters 4 through 9 of Analysis of

Messy Data, Volume II: Nonreplicated Experiments is useful for comprehending the

topics discussed in Chapter 17.

This book is the culmination of more than 25 years of writing. The earlier

editions of this manuscript were slanted toward providing an appropriate analysis

of split-plot type designs by using fixed effects software such as PROC GLM of the

SAS® system. With the development of mixed models software, such as PROC

MIXED of the SAS® system and JMP®, the complications of the analysis of splitplot type designs disappeared and thus enabled the manuscript to be completed

without including the difficult computations that are required when using fixed

© 2002 by CRC Press LLC

C0317fm frame Page 15 Monday, June 25, 2001 1:04 PM

effects software. Over the years, several colleagues made important contributions.

Discussions with Shie-Shien Yang were invaluable for the development of the variable selection process described in Chapter 7. Vicki Landcaster and Marie Loughin

read some of the earlier versions and provided important feedback. Discussions with

James Schwenke, Kate Ash, Brian Fergen, Kevin Chartier, Veronica Taylor, and

Mike Butine were important for improving the chapters involving combining intraand inter-block information and the strategy for the analysis of repeated measures

designs. Finally, we cannot express enough our thanks to Jane Cox who typed many

of the initial versions of the chapters. If it were not for Jane’s skills with the word

processor, the task of finishing this book would have been much more difficult.

We dedicate this volume to all who have made important contributions to our

personal and professional lives. This includes our wives, Janet and Erma Jean, our

children, Scott and April and Kelly and Mark, and our parents and parents in-law

who made it possible for us to pursue our careers as statisticians. We were both

fortunate to study with Franklin Graybill and we thank him for making sure that we

were headed in the right direction when our careers began.

© 2002 by CRC Press LLC

C0317c01 frame Page 1 Sunday, June 24, 2001 1:46 PM

1

Introduction to the

Analysis of Covariance

1.1 INTRODUCTION

The statistical procedure termed analysis of covariance has been used in several

contexts. The most common description of analysis of covariance is to adjust the

analysis for variables that could not be controlled by the experimenter. For example,

if a researcher wishes to compare the effect that ten different chemical weed control

treatments have on yield of a specific wheat variety, the researcher may wish to

control for the differential effects of a fertility trend occurring in the field and for

the number of wheat plants per plot that happen to emerge after planting. The

differential effects of a fertility trend can possibly be removed by using a randomized

complete block design structure, but it may not be possible to control the number

of wheat plants per plot (unless the seeds are sewn thickly and then the emerging

plants are thinned to a given number of plants per plot). The researcher wishes to

compare the treatments as if each treatment were grown on plots with the same

average fertility level and as if every plot had the same number of wheat plants. The

use of a randomized complete block design structure in which the blocks are constructed such that the fertility levels of plots within a block are very similar will

enable the treatments to be compared by averaging over the fertility levels, but the

analysis of covariance is a procedure which can compare treatment means after first

adjusting for the differential number of wheat plants per plot. The adjustment

procedure involves constructing a model that describes the relationship between

yield and the number of wheat plants per plot for each treatment, which is in the

form of a regression model. The regression models, one for each level of the

treatment, are then compared at a predetermined common number of wheat plants

per plot.

1.2 THE COVARIATE ADJUSTMENT PROCESS

To demonstrate the type of adjustment process that is being carried out when the

analysis of covariance methodology is applied, the set of data in Table 1.1 is used

in which there are two treatments and five plots per treatment in a completely

randomized design structure. Treatment 1 is a chemical application to control the

growth of weeds and Treatment 2 is a control without any chemicals to control the

weeds. The data in Table 1.1 consist of the yield of wheat plants of a specific variety

from plots of identical size along with the number of wheat plants that emerged

© 2002 by CRC Press LLC

C0317c01 frame Page 2 Sunday, June 24, 2001 1:46 PM

2

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.1

Yield and Plants per Plot Data for the Example

in Section 1.2

Treatment 1

Yield per plot

951

957

776

1033

840

Treatment 2

Plants per plot

126

128

107

142

120

Yield per plot

930

790

764

989

740

Plants per plot

135

119

110

140

102

Yield per plot

1100

1000

900

X

Means

X

800

700

1

2

Treatment Number

FIGURE 1.1 Plot of the data for the two treatments, with the “X” denoting the respective

means.

after planting per plot. The researcher wants to compare the yields of the two

treatments for the condition when there are 125 plants per plot.

Figure 1.1 is a graphical display of the plot yields for each of the treatments

where the circles represent the data points for Treatment 1 and the boxes represent

the data points for Treatment 2. An “X” is used to mark the means of each of the

treatments.

If the researcher uses the two-sample t-test or one-way analysis of variance to

compare the two treatments without taking information into account about the

number of plants per plot, a t statistic of 1.02 or a F statistic of 1.05 is obtained,

indicating the two treatment means are not significantly different ( p = 0.3361). The

results of the analysis are in Table 1.2 in which the estimated standard error of the

difference of the two treatment means is 67.23.

© 2002 by CRC Press LLC

C0317c01 frame Page 3 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

3

TABLE 1.2

Analysis of Variance Table and Means for Comparing

the Yields of the Two Treatments Where No Information

about the Number of Plants per Plot is Used

Source

Model

Error

Corrected total

df

1

8

9

SS

11833.60

90408.40

102242.00

MS

11833.60

11301.05

FValue

1.05

ProbF

0.3361

Source

TRT

df

1

SS (type III)

11833.60

MS

11833.60

FValue

1.05

ProbF

0.3361

Parameter

Trt 1 – Trt 2

Estimate

68.8

StdErr

67.23

t Value

1.02

Probt

0.3361

TRT

1

2

LSMean

911.40

842.60

ProbtDiff

0.3361

1100

el

od

m

t1

en

atm

Tre

Yield per plot

1000

l

de

2

nt

mo

e

atm

Tre

900

800

Treatment 1 data

Treatment 2 data

700

100

110

120

130

140

150

Number of plants per plot

FIGURE 1.2 Plot of the data and the estimated regression models for the two treatments.

The next step is to investigate the relationship between the yield per plot and

the number of plants per plot. Figure 1.2 is a display of the data where the number

of plants is on the horizontal axis and the yield is on the vertical axis. The circles

denote the data for Treatment 1 and the boxes denote the data for Treatment 2. The

two lines on the graph, denoted by Treatment 1 model and Treatment 2 model, were

computed from the data by fitting the model yij = αi + βxij + εij, i = 1, 2 and j = 1,

© 2002 by CRC Press LLC

C0317c01 frame Page 4 Sunday, June 24, 2001 1:46 PM

4

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.3

Analysis of Covariance to Provide the Estimates of the Slope

and Intercepts to be Used in Adjusting the Data

Source

Model

Error

Uncorr Total

df

3

7

10

SS

7787794.74

5737.26

7793532.00

MS

2595931.58

819.61

FValue

3167.28

ProbF

0.0000

Source

TRT

Plants

df

2

1

SS(Type III)

4964.18

84671.14

MS

2482.09

84671.14

FValue

3.03

103.31

ProbF

0.1128

0.0000

Parameter

Trt 1 – Trt 2

Estimate

44.73

StdErr

18.26

tValue

2.45

Probt

0.0441

Parameter

TRT 1

TRT 2

Plants

Estimate

29.453

–15.281

7.078

StdErr

87.711

85.369

0.696

tValue

0.34

–0.18

10.16

Probt

0.7469

0.8630

0.0000

2, …, 5, a model with different intercepts and common or equal slopes. The results

are included in Table 1.3.

Now analysis of covariance is used to compare the two treatments when there

are 125 plants per plot. The process of the analysis of covariance is to slide or move

the observations from a given treatment along the estimated regression model (parallel to the model) to intersect the vertical line at 125 plants per plot. This sliding

is demonstrated in Figure 1.3 where the solid circles represent the adjusted data for

Treatment 1 and the solid boxes represent the adjusted data for Treatment 2.

The lines join the open circles to the solid circles and join the open boxes to

the solid boxes. The lines indicate that the respective data points slid to the vertical

line at which there are 125 plants per plot.

The adjusted data are computed by

(

) (

)

(

yAij = yij − αˆ i + βˆ xij + αˆ i + βˆ125 = yij + βˆ 125 − xij

)

ˆ i + βˆ xij) i = 1,2 and j = 1,2,…,5 are the residuals or deviations of

The terms yij – (α

the observations from the estimated regression models. The preliminary computations of the adjusted yields are in Table 1.4. These adjusted yields are the predicted

yields of the plots as if each plot had 125 plants.

The next step is to compare the two treatments through the adjusted yield values

by computing a two-sample t statistic or the F statistic from a one-way analysis of

variance. The results of these analyses are in Table 1.5.

A problem with this analysis is that it assumes the adjusted data are not adjusted

data and so there is no reduction in the degrees of freedom for error due to estimating

the slope of the regression lines. Hence the final step is to recalculate the statistics

© 2002 by CRC Press LLC

C0317c01 frame Page 5 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

5

125 plants per plot

1100

el

od

m

t1

en

Yield per plot

tm

a

Tre

1000

el

od

m

t2

Slide observations

parallel to regression line

to meet the line of 125

plants per plot

en

tm

a

Tre

900

Adjusted data symbols

800

Treatment 1 data

Treatment 2 data

700

100

110

120

130

140

150

Number of plants per plot

FIGURE 1.3 Plot of the data and estimated regression models showing how to compute

adjusted yield values at 125 plants per plot.

TABLE 1.4

Preliminary Computations Used in Computing Adjusted Data

for Each Treatment as If All Plots Had 125 Plants per Plot

Treatment

1

1

1

1

1

Yield Per Plot

951

957

776

1033

840

Plants Per Plot

126

128

107

142

120

Residual

29.6905

21.534

–10.8232

–1.5611

–38.8402

Adjusted Yield

943.922

935.765

903.408

912.67

875.391

2

2

2

2

2

930

790

764

989

740

135

119

110

140

102

–10.2795

–37.0279

0.6761

13.3294

33.3019

859.218

832.469

870.173

882.827

902.799

by changing the degrees of freedom for error in Table 1.5 from 8 to 7 (the cost of

estimating the slope). The sum of squares error is identical for both Tables 1.3 and

1.5, but the error sum of squares from Table 1.5 is based on 8 degrees of freedom

instead of 7. To account for this change in degrees of freedom in Table 1.5, the

estimated standard error for comparing the two treatments needs to be multiplied

by 8 ⁄ 7 , the t statistic needs to be multiplied by 7 ⁄ 8 , and the F statistic needs to

be multiplied by 7/8. The recalculated statistics are presented in Table 1.6. Here the

estimated standard error of the difference between the two means is 18.11, a 3.7-fold

reduction over the analysis that ignores the information from the covariate. Thus,

© 2002 by CRC Press LLC

C0317c01 frame Page 6 Sunday, June 24, 2001 1:46 PM

6

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.5

Analysis of the Adjusted Yields (Too Many Degrees

of Freedom for Error)

Source

Model

Error

Corrected Total

df

1

8

9

SS

5002.83

5737.26

10740.09

MS

5002.83

717.16

FValue

6.98

ProbF

0.0297

Source

TRT

df

1

SS (Type III)

5002.83

MS

5002.83

FValue

6.98

ProbF

0.0297

Parameter

Trt 1 – Trt 2

Estimate

44.734

StdErr

16.937

t Value

2.641197

Probt

0.0297

TRT

1

2

LSMean

914.231

869.497

ProbtDiff

0.0297

TABLE 1.6

Recalculated Statistics to Reflect the Loss of

Error Degrees of Freedom Due to Estimating

the Slope before Computing the Adjusted Yields

Recalculated

Recalculated

Recalculated

Recalculated

estimated standard error

t-statistic

F-statistic

significance level

18.11

2.47

6.10

0.0428

by taking into account the linear relationship between the yield of the plot and the

number of plants in that plot, there is a tremendous reduction in the variability of

the data. In fact, the analysis of the adjusted data shows there is a significant

difference between the yields of the two treatments when adjusting for the unequal

number of plants per plot (p = 0.0428), when the analysis of variance in Table 1.2

did not indicate there is a significant difference between the treatments ( p = 0.3361).

The final issue is that since this analysis of the adjusted data overlooks the fact the

slope has been estimated, the estimated standard error of the difference of two means

is a little small as compared to the estimated standard error one gets from the analysis

of covariance. The estimated standard error of the difference of the two means as

computed from the analysis of covariance in Table 1.3 is 18.26 as compared to 18.11

for the analysis of the adjusted data. Thus the two analyses are not quite identical.

This example shows the power of being able to use information about covariates

or independent variables to make decisions about the treatments being included in

© 2002 by CRC Press LLC

C0317c01 frame Page 7 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

7

the study. The analysis of covariance uses a model to adjust the data as if all the

observations are from experimental units with identical values of the covariates.

A typical discussion of analysis of covariance indicates that the analyst should

include the number of plants as a term in the model so that term accounts for

variability in the observed yields, i.e., the variance of the model is reduced. If

including the number of plants in the model reduces the variability enough, then it

is used to adjust the data before the variety means are compared. It is important to

remember that there is a model being assumed when the covariate or covariates are

included in a model.

1.3 A GENERAL AOC MODEL AND

THE BASIC PHILOSOPHY

In this text, the analysis of covariance is described in more generality than that of

adjusting for variation due to uncontrollable variables. The analysis of covariance

is defined as a method for comparing several regression surfaces or lines, one for

each treatment or treatment combination, where a different regression surface is

possibly used to describe the data for each treatment or treatment combination.

A one-way treatment structure with t treatments in a completely randomized

design structure (Milliken and Johnson, 1992) is used as a basis for setting up the

definitions for the analysis of covariance model. The experimental situation involves

selecting N experimental units from a population of experimental units and measuring k characteristics x1ij, x2ij, …, xkij on each experimental unit. The variables x1ij,

x2ij, …, xkij are called covariates or independent variables or concomitant variables.

It is important to measure the values of the covariates before the treatments are

applied to the experimental units so that the levels of the treatments do not effect

the values of the covariates. At a minimum, the values of the covariate should not

be effected by the applied levels of the treatments. In the chemical weed treatment

experiment, the number of plants per plot occur after applying a particular treatment

on a plot, so the value of the covariate (number of plants per plot) could not be

determined before the treatments were applied to the plots. If the germination rate is

affected by the applied treatments, then the number of plants per plot cannot be used

as a covariate in the conventional manner (see Chapter 2 for further discussion). After

the set of experimental units is selected and the values of the covariates are determined

(whent possible), then randomly assign ni experimental units to treatment i, where

N = Σ ni. One generally assigns equal numbers of experimental units to the levels

i1

of the treatment, but equal numbers of experimental units per level of the treatment

are not necessary. After an experimental unit is subjected its specific level of the

treatment, then measure the response or dependent variable which is denoted by yij.

Thus the variables used in the discussions are summarized as:

yij

x1ij

x2ij

xkij

is

is

is

is

the

the

the

the

dependent measure

first independent variable or covariate

second independent variable or covariate

kth independent variable or covariate

© 2002 by CRC Press LLC

C0317c01 frame Page 8 Sunday, June 24, 2001 1:46 PM

8

Analysis of Messy Data, Volume III: Analysis of Covariance

At this point, the experimental design is a one-way treatment structure with

t treatments in a completely randomized design structure with k covariates. If there

is a linear relationship between the mean of y for the ith treatment and the k covariates

or independent variables, an analysis of covariance model can be expressed as:

y ij = βoi + βli x lij + β2 i x 2 ij + … + β ki x kij + ε ij

(1.1)

for i = 1, 2, …, t, and j = 1, 2, …, ni, and the εij ~ iid N(0, σ2), i.e., the εij are

independently identically distributed normal random variables with mean 0 and

variance σ2. The important thing to note about this model is that the mean of the

y values from a given treatment depends on the values of the x’s as well as on the

treatment applied to the experimental units.

The analysis of covariance is a strategy for making decisions about the form of

the covariance model through testing a series of hypotheses and then making treatment comparisons by comparing the estimated responses from the final regression

models. Two important hypotheses that help simplify the regression models are

H01: βh1 = βh2 = … = βht = 0 vs. Ha1: (not H01:), that is, all the treatments’

slopes for the hth covariate are zero, h = 1, 2, …, k, or

H02: βh1 = βh2 = … = βht vs. Ha2: (not Ho2:), that is, the slopes for the hth

covariate are equal across the treatments, meaning the surfaces are parallel

in the direction of the hth covariate, h = 1, 2, …, k.

The analysis of covariance model in Equation 1.1 is a combination of an analysis

of variance model and a regression model. The analysis of covariance model is part

of an analysis of variance model since the intercepts and slopes are functions of the

levels of the treatments. The analysis of covariance model is also part of a regression

model since the model for each treatment is a regression model.

An experiment is designed to purchase a certain number of degrees of freedom

for error (generally without the covariates) and the experimenter is willing to sell

some of those degrees of freedom for good or effective covariates which will help

reduce the magnitude of the error variance. The philosophy in this book is to select

the simplest possible expression for the covariate part of the model before making

treatment comparisons.

This process of model building to determine the simplest adequate form of the

regression models follows the principle of parsimony and helps guard against foolishly selling degrees of freedom for error to retain unnecessary covariate terms in

the model. Thus the strategy for analysis of covariance begins with testing hypotheses

such as H01 and H02 to make decisions about the form of the covariate or regression

part of the model. Once the form of the covariate part of the model is finalized, the

treatments are compared by comparing the regression surfaces at predetermined

values of the covariates.

The structure of the following chapters leads one through the forest of analysis

of covariance by starting with the simple model with one covariate and building

through the complex process involving analysis of covariance in split-plot and

© 2002 by CRC Press LLC

C0317c01 frame Page 9 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

9

repeated measures designs. Other topics discussed are multiple covariates, experiments involving blocks, and graphical methods for comparing the models for the

various treatments.

Chapter 2 discusses the simple analysis of covariance model involving a oneway treatment structure in a completely randomized design structure with one

covariate and Chapter 3 contains several examples demonstrating the strategies for

situations involving one covariate. Chapter 4 presents a discussion of the analysis

of covariance models involving more than one covariate which includes polynomial

regression models. Models involving two-way treatment structures, both balanced

and unbalanced, are discussed in Chapter 5. A method of comparing parameters via

beta-hat models is described in Chapter 6. Chapter 7 describes a method for variable

selection in the analysis of covariance where many possible covariates were measured. Chapter 8 discusses methods for testing the equality of several regression

models.

The next set of chapters (9 through 11) discuss analysis of covariance in the

randomized complete block and incomplete block design structures. The analysis

of data where the values of a characteristic are used to construct blocks is described,

i.e., where the value of the covariate is the same for all experimental units in a block.

In the analysis of covariance context, inter- or between block information about the

intercepts and slopes is required to extract all available information about the regression lines or surfaces from the data. Usual analysis methods extract only the intrablock information from the data. A mixed models analysis involving methods of

moments and maximum likelihood estimation of the variance components provides

combined estimates of the parameters and should be used for blocked experiments.

Chapter 12 describes models where the levels of the treatments are random effects

(Littell et al., 1996). The models in Chapter 12 include random coefficient models.

Chapter 13 provides a discussion of mixed models with covariates and Chapter 14

presents a discussion of unequal variance models.

Chapters 15 and 16 discuss problems with applying the analysis of covariance

to experiments involving repeated measures and split-plot design structures. One

has to consider the size of experimental unit on which the covariate is measured.

Cases are discussed where the covariate is measured on the large size of an experimental unit and when the covariate is measured on the small size of an experimental

unit. Several examples of split-plot and repeated measures designs are presented. A

process of selecting the simplest covariance structure for the repeated measures part

of the model and the simplest covariate (regression model) part of the model is

described. The analysis of covariance in the nonreplicated experiment is discussed

in Chapter 17. The half-normal plot methodology (Milliken and Johnson, 1989) is

used to determine the form of the covariate part of the model and to determine which

effects are to be included in the intercept part of the model.

Finally, several special applications of analysis of covariance are presented in

Chapter 18, including using the covariate to construct blocks, crossover designs,

nonlinear models, nonparameteric analysis of covariance, and a process for examining mixed models for possible outliers in the data set.

The procedures of the SAS® system (1989, 1996, and 1997) and JMP® (2000)

are used to demonstrate how to use software to carry out the analysis of covariance

© 2002 by CRC Press LLC

C0317c01 frame Page 10 Sunday, June 24, 2001 1:46 PM

10

Analysis of Messy Data, Volume III: Analysis of Covariance

computations. The topic of analysis of covariance has been the topic of two volumes

of Biometrics, Volume 13, Number 3, in 1957 and Volume 38, Number 3, in 1982.

The collection of papers in these two volumes present discussions of widely diverse

applications of analysis of covariance.

REFERENCES

Littell, R. C., Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. (1996) SAS® System for

Mixed Models, SAS Institute Inc., Cary, NC.

Milliken, G. A. and Johnson, D. E. (1989) Analysis of Messy Data, Volume II: Nonreplicated

Experiments, Chapman & Hall, London.

Milliken, G. A. and Johnson, D. E. (1992) Analysis of Messy Data, Volume I: Design

Experiments, Chapman & Hall, London.

SAS Institute Inc. (1989) SAS/STAT® User’s Guide, Version 6, Fourth Edition, Volume 2,

Cary, NC.

SAS Institute Inc. (1996) SAS/STAT® Software: Changes and Enhancements Through Release

6.11, Cary, NC.

SAS Institute Inc. (1997) SAS/STAT® Software: Changes and Enhancements Through Release

6.12, Cary, NC.

SAS Institute Inc. (2000) JMP® Statistics and Graphics Guide, Version 4, Cary, NC.

© 2002 by CRC Press LLC

C0317c02 frame Page 11 Monday, June 25, 2001 9:13 PM

2

One-Way Analysis

of Covariance —

One Covariate in a

Completely Randomized

Design Structure

2.1 THE MODEL

Suppose you have N homogeneous experimental

units and you randomly divide

t

them into t groups of ni units each where Σ ni = N. Each of the t treatments of a

i =1

one-way treatment structure is randomly assigned to one group of experimental

units, providing a one-way treatment structure in a completely randomized design

structure. It is assumed that the experimental units are subjected to their assigned

treatments independently of each other. Let yij (dependent variable) denote the jth

observation from the ith treatment and xij denote the covariate (independent variable)

corresponding to the (i,j)th experimental unit. As in Chapter 1, the values of the

covariate are not to be influenced by the levels of the treatment. The best case is

where the values of the covariate are determined before the treatments are assigned.

In any case, it is a good strategy to use the analysis of variance to check to see if

there are differences among the treatment covariate means (see Chapter 18).

Assume that the mean of yij can be expressed as a linear function of the covariate,

xij, with possibly a different linear function being required for each treatment. It is

important to note that the mean of an observation from the ith treatment group depends

on the value of the covariate as well as the treatment. In analysis of variance, the

mean of an observation from the ith treatment group depends only on the treatment.

The analysis of covariance model for a one-way treatment structure with one

covariate in a completely randomized design structure is

y ij = α i + βi X ij + ε ij ,

i = 1, 2, …, t. j = 1, 2, …, n i

(2.1)

where the mean of yi for a given value of X is µYi ΈX = αi + βiX. For making inferences,

it is assumed that εij ~ iid N(0, σ2). Model 2.1 has t intercepts (α1, …, αt ), t slopes

(β1, …, βt ), and one variance σ2, i.e., the model represents a collection of simple

linear regression models with a different model for each level of the treatment.

© 2002 by CRC Press LLC

C0317c02 frame Page 12 Monday, June 25, 2001 9:13 PM

12

Analysis of Messy Data, Volume III: Analysis of Covariance

Before analyzing this model, make sure that the data from each treatment can

in fact be described by a simple linear regression model. Various regression diagnostics should be run on the data before continuing. The equal variance assumption

should also be checked (see Chapter 14). If the simple linear regression model is

not adequate to describe the data for each treatment, then another model must be

selected before continuing with the analysis of covariance.

The analysis of covariance is a process of comparing the regression models and

then making decisions about the various parameters of the models. The process

involves comparing the t slopes, comparing the distances between the regression lines

(surfaces) at preselected values of X, and possibly comparing the t intercepts. The

analysis of covariance computations are typically presented in summation notation

with little emphasis on interpretations. In this and the following chapters, the various

covariance models are expressed in terms of matrices (see Chapter 6 of Milliken and

Johnson, 1992) and their interpretations are discussed. Software is used as the mode

of doing the analysis of covariance computations. The matrix form of Model 2.1 is

y11 1

M M

y1n1 1

y21 0

M M

=

y2 n2 0

M M

ytl 0

M M

ytnt 0

x11

M

x1n1

0

M

0

M

0

M

0

0

M

0

1

M

1

M

0

M

0

0

M

0

x21

M

x2 n2

M

0

M

0

L

L

L

L

L

L

0

M

0

0

M

0

M

1

M

1

0

M

0

0

M

0

M

xtl

M

xtnt

α1

β1

α 2

+ ε.

β2

M

αt

βt

(2.2)

which is expressed in the form of a linear model as y = Xβ + ε. The vector y denotes

the observations ordered by observation within each treatment, the 2t × 1 vector β

denotes the collection of slopes and intercepts, the matrix X is the design matrix,

and the vector ε represents the random errors.

2.2 ESTIMATION

The least squares estimator of the parameter vector β is βˆ = (X′X)–1X′y, but the least

squares estimator of β can also be obtained by fitting the simple linear regression

model to the data from each treatment and computing the least squares estimator of

each pair of parameters (αi, βi). For data from the ith treatment, fit the model

y i1 1

M = M

y 1

in i

© 2002 by CRC Press LLC

x i1

M

x in i

α i

+ εi ,

βi

(2.3)

C0317c02 frame Page 13 Monday, June 25, 2001 9:13 PM

One-Way Analysis of Covariance

13

which is expressed as yi = Xiβi +εi. The least squares estimator of βi is βˆ i =

(X′i Xi)–1X′i y, the same as the estimator obtained for a simple linear regression model.

The estimates of βi and αi in summation notation are

ni

∑x y

ij ij

βˆ i =

− n i xi. yi.

j =1

ni

∑x

2

ij

− n i xi2.

j =1

and

αˆ i = yi. − βˆ i xi. .

The residual sum of squares for the ith model is

ni

SS Re si =

∑ (y

ij

)

2

− αˆ i − βˆ i x ij .

j =1

There are ni – 2 degrees of freedom associated with SSResi since the ith model

involves two parameters. After testing the equality of the treatment variances (see

Chapter 14) and deciding there is not enough evidence to conclude the variances

are unequal, the residual sum of squares for Model 2.1 can be obtained by pooling

residual sums of the squares for each of the t models, i.e., sum the SSResi together

to obtain

t

SS Re s =

∑ SS Re s .

(2.4)

i

i =1

The pooled residual sum of squares, SSRes, is based on the pooled degrees of

freedom, computed and denoted by

d.f .SS Re s =

t

t

i =1

i =1

∑ ( n i − 2 ) = ∑ n i − 2 t = N − 2 t.

The best estimate of the variance of the experimental units is σˆ 2 = SSRes/(N – 2t).

The sampling distribution of (N – 2t) σˆ 2/σ2 is central chi-square with (N – 2t) degrees

ˆ = (α

ˆ 1, βˆ 1,

of freedom. The sampling distribution of the least squares estimator, β′

ˆ

ˆ t, βt ) is normal with mean β′ = (α1, β1, …, αt , βt ) and variance-covariance

…, α

matrix σ2 (X′ X)–1, which can be written as

(X′X )−1

1 1

−1

σ 2 ( X ′X ) = σ 2

M

0

© 2002 by CRC Press LLC

O

L

0

M

−1

(X′t Xt )

(2.5)

of Messy Data

VOLUME III:

ANALYSIS OF COVARIANCE

George A. Milliken

Dallas E. Johnson

CHAPMAN & HALL/CRC

A CRC Pr ess Compan y

Boca Raton London Ne w York Washington, D.C.

C0317fm frame Page 4 Monday, July 16, 2001 7:52 AM

Library of Congress Cataloging-in-Publication Data

Milliken, George A., 1943–

Analysis of messy data / George A. Milliken, Dallas E. Johnson.

2 v. : ill. ; 24 cm.

Includes bibliographies and indexes.

Contents: v. 1. Designed experiments -- v. 2. Nonreplicated

experiments.

Vol. 2 has imprint: New York : Van Nostrand Reinhold.

ISBN 0-534-02713-X (v. 1) : $44.00 -- ISBN 0-442-24408-8 (v. 2)

1. Analysis of variance. 2. Experimental design. 3. Sampling

(Statistics) I. Johnson, Dallas E., 1938– . II. Title.

QA279 .M48 1984

519.5′352--dc19

84-000839

This book contains information obtained from authentic and highly regarded sources. Reprinted material

is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable

efforts have been made to publish reliable data and information, but the author and the publisher cannot

assume responsibility for the validity of all materials or for the consequences of their use.

Apart from any fair dealing for the purpose of research or private study, or criticism or review, as permitted

under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored

or transmitted, in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without the prior permission

in writing of the publishers, or in the case of reprographic reproduction only in accordance with the

terms of the licenses issued by the Copyright Licensing Agency in the UK, or in accordance with the

terms of the license issued by the appropriate Reproduction Rights Organization outside the UK.

The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for

creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC

for such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are

used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 2002 by Chapman & Hall/CRC

No claim to original U.S. Government works

International Standard Book Number 1-584-88083-X

Library of Congress Card Number 84-000839

Printed in the United States of America 1 2 3 4 5 6 7 8 9 0

Printed on acid-free paper

C0317fm frame Page 5 Monday, June 25, 2001 1:04 PM

Table of Contents

Chapter 1

Introduction to the Analysis of Covariance

1.1

Introduction

1.2

The Covariate Adjustment Process

1.3

A General AOC Model and the Basic Philosophy

References

Chapter 2

One-Way Analysis of Covariance — One Covariate in a

Completely Randomized Design Structure

2.1

2.2

2.3

2.4

The Model

Estimation

Strategy for Determining the Form of the Model

Comparing the Treatments or Regression Lines

2.4.1 Equal Slopes Model

2.4.2 Unequal Slopes Model-Covariate by Treatment Interaction

2.5

Confidence Bands about the Difference of Two Treatments

2.6

Summary of Strategies

2.7

Analysis of Covariance Computations via the SAS® System

2.7.1 Using PROC GLM and PROC MIXED

2.7.2 Using JMP®

2.8

Conclusions

References

Exercise

Chapter 3

3.1

3.2

3.3

3.4

3.5

3.6

Examples: One-Way Analysis of Covariance — One Covariate

in a Completely Randomized Design Structure

Introduction

Chocolate Candy — Equal Slopes

3.2.1 Analysis Using PROC GLM

3.2.2 Analysis Using PROC MIXED

3.2.3 Analysis Using JMP®

Exercise Programs and Initial Resting Heart Rate — Unequal Slopes

Effect of Diet on Cholesterol Level: An Exception to the Basic

Analysis of Covariance Strategy

Change from Base Line Analysis Using Effect of Diet on Cholesterol

Level Data

Shoe Tread Design Data for Exception to the Basic Strategy

© 2002 by CRC Press LLC

C0317fm frame Page 6 Monday, June 25, 2001 1:04 PM

3.7

Equal Slopes within Groups of Treatments and Unequal Slopes

between Groups

3.8

Unequal Slopes and Equal Intercepts — Part 1

3.9

Unequal Slopes and Equal Intercepts — Part 2

References

Exercises

Chapter 4

Multiple Covariates in a One-Way Treatment Structure in a

Completely Randomized Design Structure

4.1

4.2

4.3

4.4

4.5

Introduction

The Model

Estimation

Example: Driving A Golf Ball with Different Shafts

Example: Effect of Herbicides on the Yield of Soybeans — Three

Covariates

4.6 Example: Models That Are Quadratic Functions of the Covariate

4.7 Example: Comparing Response Surface Models

Reference

Exercises

Chapter 5

Two-Way Treatment Structure and Analysis of Covariance in

a Completely Randomized Design Structure

5.1

5.2

5.3

Introduction

The Model

Using the SAS® System

5.3.1 Using PROC GLM and PROC MIXED

5.3.2 Using JMP®

5.4

Example: Average Daily Gains and Birth Weight — Common Slope

5.5

Example: Energy from Wood of Different Types of Trees — Some

Unequal Slopes

5.6

Missing Treatment Combinations

5.7

Example: Two-Way Treatment Structure with Missing Cells

5.8

Extensions

Reference

Exercises

Chapter 6

6.1

6.2

6.3

6.4

6.5

6.6

Beta-Hat Models

Introduction

The Beta-Hat Model and Analysis

Testing Equality of Parameters

Complex Treatment Structures

Example: One-Way Treatment Structure

Example: Two-Way Treatment Structure

© 2002 by CRC Press LLC

C0317fm frame Page 7 Monday, June 25, 2001 1:04 PM

6.7

Summary

Exercises

Chapter 7

Variable Selection in the Analysis of Covariance Model

7.1

Introduction

7.2

Procedure for Equal Slopes

7.3

Example: One-Way Treatment Structure with Equal Slopes Model

7.4

Some Theory

7.5

When Slopes are Possibly Unequal

References

Exercises

Chapter 8

Comparing Models for Several Treatments

8.1

Introduction

8.2

Testing Equality of Models for a One-Way Treatment Structure

8.3

Comparing Models for a Two-Way Treatment Structure

8.4

Example: One-Way Treatment Structure with One Covariate

8.5

Example: One-Way Treatment Structure with Three Covariates

8.6

Example: Two-Way Treatment Structure with One Covariate

8.7

Discussion

References

Exercises

Chapter 9

Two Treatments in a Randomized Complete Block Design

Structure

9.1

9.2

9.3

9.4

9.5

9.6

9.7

9.8

Introduction

Complete Block Designs

Within Block Analysis

Between Block Analysis

Combining Within Block and Between Block Information

Determining the Form of the Model

Common Slope Model

Comparing the Treatments

9.8.1 Equal Slopes Models

9.8.2 Unequal Slopes Model

9.9

Confidence Intervals about Differences of Two Regression Lines

9.9.1 Within Block Analysis

9.9.2 Combined Within Block and Between Block Analysis

9.10 Computations for Model 9.1 Using the SAS® System

9.11 Example: Effect of Drugs on Heart Rate

9.12 Summary

References

Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 8 Monday, June 25, 2001 1:04 PM

Chapter 10 More Than Two Treatments in a Blocked Design Structure

10.1

10.2

10.3

Introduction

RCB Design Structure — Within and Between Block Information

Incomplete Block Design Structure — Within and Between Block

Information

10.4 Combining Between Block and Within Block Information

10.5 Example: Five Treatments in RCB Design Structure

10.6 Example: Balanced Incomplete Block Design Structure with Four

Treatments

10.7 Example: Balanced Incomplete Block Design Structure with Four

Treatments Using JMP®

10.8 Summary

References

Exercises

Chapter 11 Covariate Measured on the Block in RCB and Incomplete

Block Design Structures

11.1

11.2

11.3

11.4

11.5

11.6

Introduction

The Within Block Model

The Between Block Model

Combining Within Block and Between Block Information

Common Slope Model

Adjusted Means and Comparing Treatments

11.6.1 Common Slope Model

11.6.2 Non-Parallel Lines Model

11.7 Example: Two Treatments

11.8 Example: Four Treatments in RCB

11.9 Example: Four Treatments in BIB

11.10 Summary

References

Exercises

Chapter 12 Random Effects Models with Covariates

12.1

12.2

12.3

12.4

Introduction

The Model

Estimation of the Variance Components

Changing Location of the Covariate Changes the Estimates of the

Variance Components

12.5 Example: Balanced One-Way Treatment Structure

12.6 Example: Unbalanced One-Way Treatment Structure

12.7 Example: Two-Way Treatment Structure

12.8 Summary

References

Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 9 Monday, June 25, 2001 1:04 PM

Chapter 13 Mixed Models

13.1

13.2

13.3

13.4

Introduction

The Matrix Form of the Mixed Model

Fixed Effects Treatment Structure

Estimation of Fixed Effects and Some Small Sample Size

Approximations

13.5 Fixed Treatments and Locations Random

13.6 Example: Two-Way Mixed Effects Treatment Structure in a CRD

13.7 Example: Treatments are Fixed and Locations are Random with a

RCB at Each Location

References

Exercises

Chapter 14 Analysis of Covariance Models with Heterogeneous Errors

14.1

14.2

14.3

Introduction

The Unequal Variance Model

Tests for Homogeneity of Variances

14.3.1 Levene’s Test for Equal Variances

14.3.2 Hartley’s F-Max Test for Equal Variances

14.3.3 Bartlett’s Test for Equal Variances

14.3.4 Likelihood Ratio Test for Equal Variances

14.4 Estimating the Parameters of the Regression Model

14.4.1 Least Squares Estimation

14.4.2 Maximum Likelihood Methods

14.5 Determining the Form of the Model

14.6 Comparing the Models

14.6.1 Comparing the Nonparallel Lines Models

14.6.2 Comparing the Parallel Lines Models

14.7 Computational Issues

14.8 Example: One-Way Treatment Structure with Unequal Variances

14.9 Example: Two-Way Treatment Structure with Unequal Variances

14.10 Example: Treatments in Multi-location Trial

14.11 Summary

References

Exercises

Chapter 15 Analysis of Covariance for Split-Plot and Strip-Plot Design

Structures

15.1

15.2

15.3

15.4

Introduction

Some Concepts

Covariate Measured on the Whole Plot or Large Size of Experimental

Unit

Covariate is Measured on the Small Size of Experimental Unit

© 2002 by CRC Press LLC

C0317fm frame Page 10 Monday, June 25, 2001 1:04 PM

15.5

Covariate is Measured on the Large Size of Experimental Unit and a

Covariate is Measured on the Small Size of Experimental Unit

15.6 General Representation of the Covariate Part of the Model

15.6.1 Covariate Measured on Large Size of Experimental Unit

15.6.2 Covariate Measured on the Small Size of Experimental Units

15.6.3 Summary of General Representation

15.7 Example: Flour Milling Experiment — Covariate Measured on the

Whole Plot

15.8 Example: Cookie Baking

15.9 Example: Teaching Methods with One Covariate Measured on the

Large Size Experimental Unit and One Covariate Measured on the

Small Size Experimental Unit

15.10 Example: Comfort Study in a Strip-Plot Design with Three Sizes of

Experimental Units and Three Covariates

15.11 Conclusions

References

Exercises

Chapter 16 Analysis of Covariance for Repeated Measures Designs

16.1

16.2

16.3

16.4

Introduction

The Covariance Part of the Model — Selecting R

Covariance Structure of the Data

Specifying the Random and Repeated Statements for PROC MIXED

of the SAS® System

16.5 Selecting an Adequate Covariance Structure

16.6 Example: Systolic Blood Pressure Study with Covariate Measured

on the Large Size Experimental Unit

16.7 Example: Oxide Layer Development Experiment with Three Sizes

of Experimental Units Where the Repeated Measure is at the Middle

Size of Experimental Unit and the Covariate is Measured on the

Small Size Experimental Unit

16.8 Conclusions

References

Exercises

Chapter 17 Analysis of Covariance for Nonreplicated Experiments

17.1

17.2

17.3.

17.4

17.5

17.6

17.7

17.8

Introduction

Experiments with A Single Covariate

Experiments with Multiple Covariates

Selecting Non-null and Null Partitions

Estimating the Parameters

Example: Milling Flour Using Three Factors Each at Two Levels

Example: Baking Bread Using Four Factors Each at Two Levels

Example: Hamburger Patties with Four Factors Each at Two Levels

© 2002 by CRC Press LLC

C0317fm frame Page 11 Monday, June 25, 2001 1:04 PM

17.9

Example: Strength of Composite Material Coupons with Two

Covariates

17.10 Example: Effectiveness of Paint on Bricks with Unequal Slopes

17.11 Summary

References

Exercises

Chapter 18 Special Applications of Analysis of Covariance

18.1

18.2

18.3

18.4

Introduction

Blocking and Analysis of Covariance

Treatments Have Different Ranges of the Covariate

Nonparametric Analysis of Covariance

18.4.1 Heart Rate Data from Exercise Programs

18.4.2 Average Daily Gain Data from a Two-Way Treatment

Structure

18.5 Crossover Design with Covariates

18.6 Nonlinear Analysis of Covariance

18.7 Effect of Outliers

References

Exercises

© 2002 by CRC Press LLC

C0317fm frame Page 13 Monday, June 25, 2001 1:04 PM

Preface

Analysis of covariance is a statistical procedure that enables one to incorporate

information about concomitant variables into the analysis of a response variable.

Sometimes this is done in an attempt to reduce experimental error. Other times it is

done to better understand the phenomenon being studied. The approach used in this

book is that the analysis of covariance model is described as a method of comparing

a series of regression models — one for each of the levels of a factor or combinations

of levels of factors being studied. Since covariance models are regression models,

analysts can use all of the methods of regression analysis to deal with problems

such as lack of fit, outliers, etc. The strategies described in this book will enable the

reader to appropriately formulate and analyze various kinds of covariance models.

When covariates are measured and incorporated into the analysis of a response

variable, the main objective of analysis of covariance is to compare treatments or

treatment combinations at common values of the covariates. This is particularly true

when the experimental units assigned to each of the treatment combinations may

have differing values of the covariates. Comparing treatments is dependent on the

form of the covariance model and thus care must be taken so that mistakes are not

made when drawing conclusions.

The goal of this book is to present the structure and philosophy for using the

analysis of covariance by including descriptions of methodologies, illustrating the

methodologies by analyzing numerous data sets, and occasionally furnishing some

theory when required. Our aim is to provide data analysts with tools for analyzing

data with covariates and to enable them to appropriately interpret the results.

Some of the methods and techniques described in this book are not available in

other books, but two issues of Biometrics (1957, Volume 13, Number 3, and 1982,

Volume 38, Number 3) were dedicated to the topic of analysis of covariance. The

topics presented are among those that we, as consulting statisticians, have found to

be most helpful in analyzing data when covariates are available for possible inclusion

in the analysis.

Readers of this book will learn how to:

• Formulate appropriate analysis of covariance models

• Simplify analysis of covariance models

• Compare levels of a factor or of levels of combinations of factors when

the model involves covariates

• Construct and analyze a model with two or more factors in the treatment

structure

• Analyze two-way treatment structures with missing cells

• Compare models using the beta-hat model

• Perform variable selection within the analysis of covariance model

© 2002 by CRC Press LLC

C0317fm frame Page 14 Monday, June 25, 2001 1:04 PM

• Analyze models with blocking in the design structure and use combined

intra-block and inter-block information about the slopes of the regression

models

• Use random statements in PROC MIXED to specify random coefficient

regression models

• Carry out the analysis of covariance in a mixed model framework

• Incorporate unequal treatment variances into the analysis

• Specify the analysis of covariance models for split-plot, strip-plot and

repeated measures designs both in terms of the regression models and the

covariance structures of the repeated measures

• Incorporate covariates into the analysis of nonreplicated experiments, thus

extending some of the results in Analysis of Messy Data, Volume II

The last chapter consists of a collection of examples that deal with (1) using the

covariate to form blocks, (2) crossover designs, (3) nonparametric analysis of covariance, (4) using a nonlinear model for the covariate model, and (5) the process of

examining mixed analysis of covariance models for possible outliers.

The approach used in this book is similar to that used in the first two volumes.

Each topic is covered from a practical viewpoint, emphasizing the implementation

of the methods much more than the theory behind the methods. Some theory has

been presented for some of the newer methodologies. The book utilized the procedures of the SAS® system and JMP® software packages to carry out the computations

and few computing formulae are presented. Either SAS® system code or JMP® menus

are presented for the analysis of the data sets in the examples. The data in the

examples (except for those using chocolate chips) were generated to simulate real

world applications that we have encountered in our consulting experiences.

This book is intended for everyone who analyzes data. The reader should have

a knowledge of analysis of variance and regression analysis as well as basic statistical

ideas including randomization, confidence intervals, and hypothesis testing. The first

four chapters contain the information needed to form a basic philosophy for using

the analysis of covariance with a one-way treatment structure and should be read

by everyone. As one progresses through the book, the topics become more complex

by going from designs with blocking to split-plot and repeated measures designs.

Before reading about a particular topic in the later chapters, read the first four

chapters. Knowledge of Chapters 13 and 14 from Analysis of Messy Data, Volume I:

Designed Experiments would be useful for understanding the part of Chapter 5

involving missing cells. The information in Chapters 4 through 9 of Analysis of

Messy Data, Volume II: Nonreplicated Experiments is useful for comprehending the

topics discussed in Chapter 17.

This book is the culmination of more than 25 years of writing. The earlier

editions of this manuscript were slanted toward providing an appropriate analysis

of split-plot type designs by using fixed effects software such as PROC GLM of the

SAS® system. With the development of mixed models software, such as PROC

MIXED of the SAS® system and JMP®, the complications of the analysis of splitplot type designs disappeared and thus enabled the manuscript to be completed

without including the difficult computations that are required when using fixed

© 2002 by CRC Press LLC

C0317fm frame Page 15 Monday, June 25, 2001 1:04 PM

effects software. Over the years, several colleagues made important contributions.

Discussions with Shie-Shien Yang were invaluable for the development of the variable selection process described in Chapter 7. Vicki Landcaster and Marie Loughin

read some of the earlier versions and provided important feedback. Discussions with

James Schwenke, Kate Ash, Brian Fergen, Kevin Chartier, Veronica Taylor, and

Mike Butine were important for improving the chapters involving combining intraand inter-block information and the strategy for the analysis of repeated measures

designs. Finally, we cannot express enough our thanks to Jane Cox who typed many

of the initial versions of the chapters. If it were not for Jane’s skills with the word

processor, the task of finishing this book would have been much more difficult.

We dedicate this volume to all who have made important contributions to our

personal and professional lives. This includes our wives, Janet and Erma Jean, our

children, Scott and April and Kelly and Mark, and our parents and parents in-law

who made it possible for us to pursue our careers as statisticians. We were both

fortunate to study with Franklin Graybill and we thank him for making sure that we

were headed in the right direction when our careers began.

© 2002 by CRC Press LLC

C0317c01 frame Page 1 Sunday, June 24, 2001 1:46 PM

1

Introduction to the

Analysis of Covariance

1.1 INTRODUCTION

The statistical procedure termed analysis of covariance has been used in several

contexts. The most common description of analysis of covariance is to adjust the

analysis for variables that could not be controlled by the experimenter. For example,

if a researcher wishes to compare the effect that ten different chemical weed control

treatments have on yield of a specific wheat variety, the researcher may wish to

control for the differential effects of a fertility trend occurring in the field and for

the number of wheat plants per plot that happen to emerge after planting. The

differential effects of a fertility trend can possibly be removed by using a randomized

complete block design structure, but it may not be possible to control the number

of wheat plants per plot (unless the seeds are sewn thickly and then the emerging

plants are thinned to a given number of plants per plot). The researcher wishes to

compare the treatments as if each treatment were grown on plots with the same

average fertility level and as if every plot had the same number of wheat plants. The

use of a randomized complete block design structure in which the blocks are constructed such that the fertility levels of plots within a block are very similar will

enable the treatments to be compared by averaging over the fertility levels, but the

analysis of covariance is a procedure which can compare treatment means after first

adjusting for the differential number of wheat plants per plot. The adjustment

procedure involves constructing a model that describes the relationship between

yield and the number of wheat plants per plot for each treatment, which is in the

form of a regression model. The regression models, one for each level of the

treatment, are then compared at a predetermined common number of wheat plants

per plot.

1.2 THE COVARIATE ADJUSTMENT PROCESS

To demonstrate the type of adjustment process that is being carried out when the

analysis of covariance methodology is applied, the set of data in Table 1.1 is used

in which there are two treatments and five plots per treatment in a completely

randomized design structure. Treatment 1 is a chemical application to control the

growth of weeds and Treatment 2 is a control without any chemicals to control the

weeds. The data in Table 1.1 consist of the yield of wheat plants of a specific variety

from plots of identical size along with the number of wheat plants that emerged

© 2002 by CRC Press LLC

C0317c01 frame Page 2 Sunday, June 24, 2001 1:46 PM

2

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.1

Yield and Plants per Plot Data for the Example

in Section 1.2

Treatment 1

Yield per plot

951

957

776

1033

840

Treatment 2

Plants per plot

126

128

107

142

120

Yield per plot

930

790

764

989

740

Plants per plot

135

119

110

140

102

Yield per plot

1100

1000

900

X

Means

X

800

700

1

2

Treatment Number

FIGURE 1.1 Plot of the data for the two treatments, with the “X” denoting the respective

means.

after planting per plot. The researcher wants to compare the yields of the two

treatments for the condition when there are 125 plants per plot.

Figure 1.1 is a graphical display of the plot yields for each of the treatments

where the circles represent the data points for Treatment 1 and the boxes represent

the data points for Treatment 2. An “X” is used to mark the means of each of the

treatments.

If the researcher uses the two-sample t-test or one-way analysis of variance to

compare the two treatments without taking information into account about the

number of plants per plot, a t statistic of 1.02 or a F statistic of 1.05 is obtained,

indicating the two treatment means are not significantly different ( p = 0.3361). The

results of the analysis are in Table 1.2 in which the estimated standard error of the

difference of the two treatment means is 67.23.

© 2002 by CRC Press LLC

C0317c01 frame Page 3 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

3

TABLE 1.2

Analysis of Variance Table and Means for Comparing

the Yields of the Two Treatments Where No Information

about the Number of Plants per Plot is Used

Source

Model

Error

Corrected total

df

1

8

9

SS

11833.60

90408.40

102242.00

MS

11833.60

11301.05

FValue

1.05

ProbF

0.3361

Source

TRT

df

1

SS (type III)

11833.60

MS

11833.60

FValue

1.05

ProbF

0.3361

Parameter

Trt 1 – Trt 2

Estimate

68.8

StdErr

67.23

t Value

1.02

Probt

0.3361

TRT

1

2

LSMean

911.40

842.60

ProbtDiff

0.3361

1100

el

od

m

t1

en

atm

Tre

Yield per plot

1000

l

de

2

nt

mo

e

atm

Tre

900

800

Treatment 1 data

Treatment 2 data

700

100

110

120

130

140

150

Number of plants per plot

FIGURE 1.2 Plot of the data and the estimated regression models for the two treatments.

The next step is to investigate the relationship between the yield per plot and

the number of plants per plot. Figure 1.2 is a display of the data where the number

of plants is on the horizontal axis and the yield is on the vertical axis. The circles

denote the data for Treatment 1 and the boxes denote the data for Treatment 2. The

two lines on the graph, denoted by Treatment 1 model and Treatment 2 model, were

computed from the data by fitting the model yij = αi + βxij + εij, i = 1, 2 and j = 1,

© 2002 by CRC Press LLC

C0317c01 frame Page 4 Sunday, June 24, 2001 1:46 PM

4

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.3

Analysis of Covariance to Provide the Estimates of the Slope

and Intercepts to be Used in Adjusting the Data

Source

Model

Error

Uncorr Total

df

3

7

10

SS

7787794.74

5737.26

7793532.00

MS

2595931.58

819.61

FValue

3167.28

ProbF

0.0000

Source

TRT

Plants

df

2

1

SS(Type III)

4964.18

84671.14

MS

2482.09

84671.14

FValue

3.03

103.31

ProbF

0.1128

0.0000

Parameter

Trt 1 – Trt 2

Estimate

44.73

StdErr

18.26

tValue

2.45

Probt

0.0441

Parameter

TRT 1

TRT 2

Plants

Estimate

29.453

–15.281

7.078

StdErr

87.711

85.369

0.696

tValue

0.34

–0.18

10.16

Probt

0.7469

0.8630

0.0000

2, …, 5, a model with different intercepts and common or equal slopes. The results

are included in Table 1.3.

Now analysis of covariance is used to compare the two treatments when there

are 125 plants per plot. The process of the analysis of covariance is to slide or move

the observations from a given treatment along the estimated regression model (parallel to the model) to intersect the vertical line at 125 plants per plot. This sliding

is demonstrated in Figure 1.3 where the solid circles represent the adjusted data for

Treatment 1 and the solid boxes represent the adjusted data for Treatment 2.

The lines join the open circles to the solid circles and join the open boxes to

the solid boxes. The lines indicate that the respective data points slid to the vertical

line at which there are 125 plants per plot.

The adjusted data are computed by

(

) (

)

(

yAij = yij − αˆ i + βˆ xij + αˆ i + βˆ125 = yij + βˆ 125 − xij

)

ˆ i + βˆ xij) i = 1,2 and j = 1,2,…,5 are the residuals or deviations of

The terms yij – (α

the observations from the estimated regression models. The preliminary computations of the adjusted yields are in Table 1.4. These adjusted yields are the predicted

yields of the plots as if each plot had 125 plants.

The next step is to compare the two treatments through the adjusted yield values

by computing a two-sample t statistic or the F statistic from a one-way analysis of

variance. The results of these analyses are in Table 1.5.

A problem with this analysis is that it assumes the adjusted data are not adjusted

data and so there is no reduction in the degrees of freedom for error due to estimating

the slope of the regression lines. Hence the final step is to recalculate the statistics

© 2002 by CRC Press LLC

C0317c01 frame Page 5 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

5

125 plants per plot

1100

el

od

m

t1

en

Yield per plot

tm

a

Tre

1000

el

od

m

t2

Slide observations

parallel to regression line

to meet the line of 125

plants per plot

en

tm

a

Tre

900

Adjusted data symbols

800

Treatment 1 data

Treatment 2 data

700

100

110

120

130

140

150

Number of plants per plot

FIGURE 1.3 Plot of the data and estimated regression models showing how to compute

adjusted yield values at 125 plants per plot.

TABLE 1.4

Preliminary Computations Used in Computing Adjusted Data

for Each Treatment as If All Plots Had 125 Plants per Plot

Treatment

1

1

1

1

1

Yield Per Plot

951

957

776

1033

840

Plants Per Plot

126

128

107

142

120

Residual

29.6905

21.534

–10.8232

–1.5611

–38.8402

Adjusted Yield

943.922

935.765

903.408

912.67

875.391

2

2

2

2

2

930

790

764

989

740

135

119

110

140

102

–10.2795

–37.0279

0.6761

13.3294

33.3019

859.218

832.469

870.173

882.827

902.799

by changing the degrees of freedom for error in Table 1.5 from 8 to 7 (the cost of

estimating the slope). The sum of squares error is identical for both Tables 1.3 and

1.5, but the error sum of squares from Table 1.5 is based on 8 degrees of freedom

instead of 7. To account for this change in degrees of freedom in Table 1.5, the

estimated standard error for comparing the two treatments needs to be multiplied

by 8 ⁄ 7 , the t statistic needs to be multiplied by 7 ⁄ 8 , and the F statistic needs to

be multiplied by 7/8. The recalculated statistics are presented in Table 1.6. Here the

estimated standard error of the difference between the two means is 18.11, a 3.7-fold

reduction over the analysis that ignores the information from the covariate. Thus,

© 2002 by CRC Press LLC

C0317c01 frame Page 6 Sunday, June 24, 2001 1:46 PM

6

Analysis of Messy Data, Volume III: Analysis of Covariance

TABLE 1.5

Analysis of the Adjusted Yields (Too Many Degrees

of Freedom for Error)

Source

Model

Error

Corrected Total

df

1

8

9

SS

5002.83

5737.26

10740.09

MS

5002.83

717.16

FValue

6.98

ProbF

0.0297

Source

TRT

df

1

SS (Type III)

5002.83

MS

5002.83

FValue

6.98

ProbF

0.0297

Parameter

Trt 1 – Trt 2

Estimate

44.734

StdErr

16.937

t Value

2.641197

Probt

0.0297

TRT

1

2

LSMean

914.231

869.497

ProbtDiff

0.0297

TABLE 1.6

Recalculated Statistics to Reflect the Loss of

Error Degrees of Freedom Due to Estimating

the Slope before Computing the Adjusted Yields

Recalculated

Recalculated

Recalculated

Recalculated

estimated standard error

t-statistic

F-statistic

significance level

18.11

2.47

6.10

0.0428

by taking into account the linear relationship between the yield of the plot and the

number of plants in that plot, there is a tremendous reduction in the variability of

the data. In fact, the analysis of the adjusted data shows there is a significant

difference between the yields of the two treatments when adjusting for the unequal

number of plants per plot (p = 0.0428), when the analysis of variance in Table 1.2

did not indicate there is a significant difference between the treatments ( p = 0.3361).

The final issue is that since this analysis of the adjusted data overlooks the fact the

slope has been estimated, the estimated standard error of the difference of two means

is a little small as compared to the estimated standard error one gets from the analysis

of covariance. The estimated standard error of the difference of the two means as

computed from the analysis of covariance in Table 1.3 is 18.26 as compared to 18.11

for the analysis of the adjusted data. Thus the two analyses are not quite identical.

This example shows the power of being able to use information about covariates

or independent variables to make decisions about the treatments being included in

© 2002 by CRC Press LLC

C0317c01 frame Page 7 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

7

the study. The analysis of covariance uses a model to adjust the data as if all the

observations are from experimental units with identical values of the covariates.

A typical discussion of analysis of covariance indicates that the analyst should

include the number of plants as a term in the model so that term accounts for

variability in the observed yields, i.e., the variance of the model is reduced. If

including the number of plants in the model reduces the variability enough, then it

is used to adjust the data before the variety means are compared. It is important to

remember that there is a model being assumed when the covariate or covariates are

included in a model.

1.3 A GENERAL AOC MODEL AND

THE BASIC PHILOSOPHY

In this text, the analysis of covariance is described in more generality than that of

adjusting for variation due to uncontrollable variables. The analysis of covariance

is defined as a method for comparing several regression surfaces or lines, one for

each treatment or treatment combination, where a different regression surface is

possibly used to describe the data for each treatment or treatment combination.

A one-way treatment structure with t treatments in a completely randomized

design structure (Milliken and Johnson, 1992) is used as a basis for setting up the

definitions for the analysis of covariance model. The experimental situation involves

selecting N experimental units from a population of experimental units and measuring k characteristics x1ij, x2ij, …, xkij on each experimental unit. The variables x1ij,

x2ij, …, xkij are called covariates or independent variables or concomitant variables.

It is important to measure the values of the covariates before the treatments are

applied to the experimental units so that the levels of the treatments do not effect

the values of the covariates. At a minimum, the values of the covariate should not

be effected by the applied levels of the treatments. In the chemical weed treatment

experiment, the number of plants per plot occur after applying a particular treatment

on a plot, so the value of the covariate (number of plants per plot) could not be

determined before the treatments were applied to the plots. If the germination rate is

affected by the applied treatments, then the number of plants per plot cannot be used

as a covariate in the conventional manner (see Chapter 2 for further discussion). After

the set of experimental units is selected and the values of the covariates are determined

(whent possible), then randomly assign ni experimental units to treatment i, where

N = Σ ni. One generally assigns equal numbers of experimental units to the levels

i1

of the treatment, but equal numbers of experimental units per level of the treatment

are not necessary. After an experimental unit is subjected its specific level of the

treatment, then measure the response or dependent variable which is denoted by yij.

Thus the variables used in the discussions are summarized as:

yij

x1ij

x2ij

xkij

is

is

is

is

the

the

the

the

dependent measure

first independent variable or covariate

second independent variable or covariate

kth independent variable or covariate

© 2002 by CRC Press LLC

C0317c01 frame Page 8 Sunday, June 24, 2001 1:46 PM

8

Analysis of Messy Data, Volume III: Analysis of Covariance

At this point, the experimental design is a one-way treatment structure with

t treatments in a completely randomized design structure with k covariates. If there

is a linear relationship between the mean of y for the ith treatment and the k covariates

or independent variables, an analysis of covariance model can be expressed as:

y ij = βoi + βli x lij + β2 i x 2 ij + … + β ki x kij + ε ij

(1.1)

for i = 1, 2, …, t, and j = 1, 2, …, ni, and the εij ~ iid N(0, σ2), i.e., the εij are

independently identically distributed normal random variables with mean 0 and

variance σ2. The important thing to note about this model is that the mean of the

y values from a given treatment depends on the values of the x’s as well as on the

treatment applied to the experimental units.

The analysis of covariance is a strategy for making decisions about the form of

the covariance model through testing a series of hypotheses and then making treatment comparisons by comparing the estimated responses from the final regression

models. Two important hypotheses that help simplify the regression models are

H01: βh1 = βh2 = … = βht = 0 vs. Ha1: (not H01:), that is, all the treatments’

slopes for the hth covariate are zero, h = 1, 2, …, k, or

H02: βh1 = βh2 = … = βht vs. Ha2: (not Ho2:), that is, the slopes for the hth

covariate are equal across the treatments, meaning the surfaces are parallel

in the direction of the hth covariate, h = 1, 2, …, k.

The analysis of covariance model in Equation 1.1 is a combination of an analysis

of variance model and a regression model. The analysis of covariance model is part

of an analysis of variance model since the intercepts and slopes are functions of the

levels of the treatments. The analysis of covariance model is also part of a regression

model since the model for each treatment is a regression model.

An experiment is designed to purchase a certain number of degrees of freedom

for error (generally without the covariates) and the experimenter is willing to sell

some of those degrees of freedom for good or effective covariates which will help

reduce the magnitude of the error variance. The philosophy in this book is to select

the simplest possible expression for the covariate part of the model before making

treatment comparisons.

This process of model building to determine the simplest adequate form of the

regression models follows the principle of parsimony and helps guard against foolishly selling degrees of freedom for error to retain unnecessary covariate terms in

the model. Thus the strategy for analysis of covariance begins with testing hypotheses

such as H01 and H02 to make decisions about the form of the covariate or regression

part of the model. Once the form of the covariate part of the model is finalized, the

treatments are compared by comparing the regression surfaces at predetermined

values of the covariates.

The structure of the following chapters leads one through the forest of analysis

of covariance by starting with the simple model with one covariate and building

through the complex process involving analysis of covariance in split-plot and

© 2002 by CRC Press LLC

C0317c01 frame Page 9 Sunday, June 24, 2001 1:46 PM

Introduction to the Analysis of Covariance

9

repeated measures designs. Other topics discussed are multiple covariates, experiments involving blocks, and graphical methods for comparing the models for the

various treatments.

Chapter 2 discusses the simple analysis of covariance model involving a oneway treatment structure in a completely randomized design structure with one

covariate and Chapter 3 contains several examples demonstrating the strategies for

situations involving one covariate. Chapter 4 presents a discussion of the analysis

of covariance models involving more than one covariate which includes polynomial

regression models. Models involving two-way treatment structures, both balanced

and unbalanced, are discussed in Chapter 5. A method of comparing parameters via

beta-hat models is described in Chapter 6. Chapter 7 describes a method for variable

selection in the analysis of covariance where many possible covariates were measured. Chapter 8 discusses methods for testing the equality of several regression

models.

The next set of chapters (9 through 11) discuss analysis of covariance in the

randomized complete block and incomplete block design structures. The analysis

of data where the values of a characteristic are used to construct blocks is described,

i.e., where the value of the covariate is the same for all experimental units in a block.

In the analysis of covariance context, inter- or between block information about the

intercepts and slopes is required to extract all available information about the regression lines or surfaces from the data. Usual analysis methods extract only the intrablock information from the data. A mixed models analysis involving methods of

moments and maximum likelihood estimation of the variance components provides

combined estimates of the parameters and should be used for blocked experiments.

Chapter 12 describes models where the levels of the treatments are random effects

(Littell et al., 1996). The models in Chapter 12 include random coefficient models.

Chapter 13 provides a discussion of mixed models with covariates and Chapter 14

presents a discussion of unequal variance models.

Chapters 15 and 16 discuss problems with applying the analysis of covariance

to experiments involving repeated measures and split-plot design structures. One

has to consider the size of experimental unit on which the covariate is measured.

Cases are discussed where the covariate is measured on the large size of an experimental unit and when the covariate is measured on the small size of an experimental

unit. Several examples of split-plot and repeated measures designs are presented. A

process of selecting the simplest covariance structure for the repeated measures part

of the model and the simplest covariate (regression model) part of the model is

described. The analysis of covariance in the nonreplicated experiment is discussed

in Chapter 17. The half-normal plot methodology (Milliken and Johnson, 1989) is

used to determine the form of the covariate part of the model and to determine which

effects are to be included in the intercept part of the model.

Finally, several special applications of analysis of covariance are presented in

Chapter 18, including using the covariate to construct blocks, crossover designs,

nonlinear models, nonparameteric analysis of covariance, and a process for examining mixed models for possible outliers in the data set.

The procedures of the SAS® system (1989, 1996, and 1997) and JMP® (2000)

are used to demonstrate how to use software to carry out the analysis of covariance

© 2002 by CRC Press LLC

C0317c01 frame Page 10 Sunday, June 24, 2001 1:46 PM

10

Analysis of Messy Data, Volume III: Analysis of Covariance

computations. The topic of analysis of covariance has been the topic of two volumes

of Biometrics, Volume 13, Number 3, in 1957 and Volume 38, Number 3, in 1982.

The collection of papers in these two volumes present discussions of widely diverse

applications of analysis of covariance.

REFERENCES

Littell, R. C., Milliken, G. A., Stroup, W. W., and Wolfinger, R. D. (1996) SAS® System for

Mixed Models, SAS Institute Inc., Cary, NC.

Milliken, G. A. and Johnson, D. E. (1989) Analysis of Messy Data, Volume II: Nonreplicated

Experiments, Chapman & Hall, London.

Milliken, G. A. and Johnson, D. E. (1992) Analysis of Messy Data, Volume I: Design

Experiments, Chapman & Hall, London.

SAS Institute Inc. (1989) SAS/STAT® User’s Guide, Version 6, Fourth Edition, Volume 2,

Cary, NC.

SAS Institute Inc. (1996) SAS/STAT® Software: Changes and Enhancements Through Release

6.11, Cary, NC.

SAS Institute Inc. (1997) SAS/STAT® Software: Changes and Enhancements Through Release

6.12, Cary, NC.

SAS Institute Inc. (2000) JMP® Statistics and Graphics Guide, Version 4, Cary, NC.

© 2002 by CRC Press LLC

C0317c02 frame Page 11 Monday, June 25, 2001 9:13 PM

2

One-Way Analysis

of Covariance —

One Covariate in a

Completely Randomized

Design Structure

2.1 THE MODEL

Suppose you have N homogeneous experimental

units and you randomly divide

t

them into t groups of ni units each where Σ ni = N. Each of the t treatments of a

i =1

one-way treatment structure is randomly assigned to one group of experimental

units, providing a one-way treatment structure in a completely randomized design

structure. It is assumed that the experimental units are subjected to their assigned

treatments independently of each other. Let yij (dependent variable) denote the jth

observation from the ith treatment and xij denote the covariate (independent variable)

corresponding to the (i,j)th experimental unit. As in Chapter 1, the values of the

covariate are not to be influenced by the levels of the treatment. The best case is

where the values of the covariate are determined before the treatments are assigned.

In any case, it is a good strategy to use the analysis of variance to check to see if

there are differences among the treatment covariate means (see Chapter 18).

Assume that the mean of yij can be expressed as a linear function of the covariate,

xij, with possibly a different linear function being required for each treatment. It is

important to note that the mean of an observation from the ith treatment group depends

on the value of the covariate as well as the treatment. In analysis of variance, the

mean of an observation from the ith treatment group depends only on the treatment.

The analysis of covariance model for a one-way treatment structure with one

covariate in a completely randomized design structure is

y ij = α i + βi X ij + ε ij ,

i = 1, 2, …, t. j = 1, 2, …, n i

(2.1)

where the mean of yi for a given value of X is µYi ΈX = αi + βiX. For making inferences,

it is assumed that εij ~ iid N(0, σ2). Model 2.1 has t intercepts (α1, …, αt ), t slopes

(β1, …, βt ), and one variance σ2, i.e., the model represents a collection of simple

linear regression models with a different model for each level of the treatment.

© 2002 by CRC Press LLC

C0317c02 frame Page 12 Monday, June 25, 2001 9:13 PM

12

Analysis of Messy Data, Volume III: Analysis of Covariance

Before analyzing this model, make sure that the data from each treatment can

in fact be described by a simple linear regression model. Various regression diagnostics should be run on the data before continuing. The equal variance assumption

should also be checked (see Chapter 14). If the simple linear regression model is

not adequate to describe the data for each treatment, then another model must be

selected before continuing with the analysis of covariance.

The analysis of covariance is a process of comparing the regression models and

then making decisions about the various parameters of the models. The process

involves comparing the t slopes, comparing the distances between the regression lines

(surfaces) at preselected values of X, and possibly comparing the t intercepts. The

analysis of covariance computations are typically presented in summation notation

with little emphasis on interpretations. In this and the following chapters, the various

covariance models are expressed in terms of matrices (see Chapter 6 of Milliken and

Johnson, 1992) and their interpretations are discussed. Software is used as the mode

of doing the analysis of covariance computations. The matrix form of Model 2.1 is

y11 1

M M

y1n1 1

y21 0

M M

=

y2 n2 0

M M

ytl 0

M M

ytnt 0

x11

M

x1n1

0

M

0

M

0

M

0

0

M

0

1

M

1

M

0

M

0

0

M

0

x21

M

x2 n2

M

0

M

0

L

L

L

L

L

L

0

M

0

0

M

0

M

1

M

1

0

M

0

0

M

0

M

xtl

M

xtnt

α1

β1

α 2

+ ε.

β2

M

αt

βt

(2.2)

which is expressed in the form of a linear model as y = Xβ + ε. The vector y denotes

the observations ordered by observation within each treatment, the 2t × 1 vector β

denotes the collection of slopes and intercepts, the matrix X is the design matrix,

and the vector ε represents the random errors.

2.2 ESTIMATION

The least squares estimator of the parameter vector β is βˆ = (X′X)–1X′y, but the least

squares estimator of β can also be obtained by fitting the simple linear regression

model to the data from each treatment and computing the least squares estimator of

each pair of parameters (αi, βi). For data from the ith treatment, fit the model

y i1 1

M = M

y 1

in i

© 2002 by CRC Press LLC

x i1

M

x in i

α i

+ εi ,

βi

(2.3)

C0317c02 frame Page 13 Monday, June 25, 2001 9:13 PM

One-Way Analysis of Covariance

13

which is expressed as yi = Xiβi +εi. The least squares estimator of βi is βˆ i =

(X′i Xi)–1X′i y, the same as the estimator obtained for a simple linear regression model.

The estimates of βi and αi in summation notation are

ni

∑x y

ij ij

βˆ i =

− n i xi. yi.

j =1

ni

∑x

2

ij

− n i xi2.

j =1

and

αˆ i = yi. − βˆ i xi. .

The residual sum of squares for the ith model is

ni

SS Re si =

∑ (y

ij

)

2

− αˆ i − βˆ i x ij .

j =1

There are ni – 2 degrees of freedom associated with SSResi since the ith model

involves two parameters. After testing the equality of the treatment variances (see

Chapter 14) and deciding there is not enough evidence to conclude the variances

are unequal, the residual sum of squares for Model 2.1 can be obtained by pooling

residual sums of the squares for each of the t models, i.e., sum the SSResi together

to obtain

t

SS Re s =

∑ SS Re s .

(2.4)

i

i =1

The pooled residual sum of squares, SSRes, is based on the pooled degrees of

freedom, computed and denoted by

d.f .SS Re s =

t

t

i =1

i =1

∑ ( n i − 2 ) = ∑ n i − 2 t = N − 2 t.

The best estimate of the variance of the experimental units is σˆ 2 = SSRes/(N – 2t).

The sampling distribution of (N – 2t) σˆ 2/σ2 is central chi-square with (N – 2t) degrees

ˆ = (α

ˆ 1, βˆ 1,

of freedom. The sampling distribution of the least squares estimator, β′

ˆ

ˆ t, βt ) is normal with mean β′ = (α1, β1, …, αt , βt ) and variance-covariance

…, α

matrix σ2 (X′ X)–1, which can be written as

(X′X )−1

1 1

−1

σ 2 ( X ′X ) = σ 2

M

0

© 2002 by CRC Press LLC

O

L

0

M

−1

(X′t Xt )

(2.5)

## Tài liệu LUYỆN ĐỌC TIẾNG ANH QUA TÁC PHẨM VĂN HỌC-Emma -Jane Austen Volume III Chapter VII doc

## Tài liệu LUYỆN ĐỌC TIẾNG ANH QUA TÁC PHẨM VĂN HỌC-Emma -Jane Austen Volume III Chapter XV docx

## Tài liệu LUYỆN ĐỌC TIẾNG ANH QUA TÁC PHẨM VĂN HỌC-Emma -Jane Austen Volume III Chapter XVI doc

## Tài liệu LUYỆN ĐỌC TIẾNG ANH QUA TÁC PHẨM VĂN HỌC-Emma -Jane Austen Volume III Chapter XVII pdf

## Tài liệu LUYỆN ĐỌC TIẾNG ANH QUA TÁC PHẨM VĂN HỌC-Emma -Jane Austen Volume III Chapter XVIII pptx

## Tài liệu LUYỆN ĐỌC TIẾNG ANH QUA TÁC PHẨM VĂN HỌC-Emma- Jane Austen Volume III Chapter XIX docx

## Tài liệu LUYỆN ĐỌC TIẾNG ANH QUA TÁC PHẨM VĂN HỌC-Emma -Jane Austen Volume III Chapter I pptx

## Handbook of Teichmüller Theory Volume III ppt

## Handbook of Teichmüller Theory Volume III pdf

## Historical Tales - The Romance of Reality - Volume III doc

Tài liệu liên quan