An Introduction to

Multivariate Statistical Analysis

Third Edition

T. W. ANDERSON

Stanford University

Department of Statl'ltLc",

Stanford, CA

~WlLEY

~INTERSCIENCE

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright © 200J by John Wiley & Sons, Inc. All rights reserved.

Published by John Wih:y & Sons, lnc. Hohuken, Nl:W Jersey

PlIhlislu:d sinll.II;lI1collsly in Canada.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any

form or hy any me'IllS, electronic, mechanical, photocopying, recording, scanning 0,' otherwise,

except as pClmit(ed under Section 107 or lOS or the 1Y7c> Uni!l:d States Copyright Act, without

either the prior writ'en permission of the Publisher, or al thorization through payment of the

appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive,

Danvers, MA (lIn], 'J7H-750-H400, fax 97R-750-4470, or on the weh 'It www.copyright,com,

Requests tf) the ,>ublisher for permission should be addressed to the Permissions Department,

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (20n 748-6011, fax (20n

748-6008, e-mdil: permreq@lwiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best

efforts in preparing this book, they make no representations or warranties with resped to

the accuracy or completeness of the contents of this book and specifically disclaim any

implied warranties of merchantability or fitness for a particular purpose. No warranty may be

created or extended by sales representatives or written sales materials. The advice and

strategies contained herein may not be suitable for your situation. You should consult with

a professional where appropriate. Neither the publisher nor au'hor shall be liable for any

loss of profit or any other commercial damages, including but not limited to special,

incidental, consequential, or other damages.

For gl:nl:ral information on our othl:r products and sl:rvices pll:asl: contad our Customl:r

Care Department within the U.S. at 877-762-2974, outside the U.S, at 317-572-3993 or

fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears

in print, however, may not he availllhie in electronic format.

Library of Congress Cataloging-in-Publication Data

Anderson, 1'. W. (Theodore Wilbur), IYI1:!An introduction to multivariate statistical analysis / Theodore W. Anderson.-- 3rd ed.

p. cm.-- (Wiley series in probability and mathematical statistics)

Includes hihliographical rekrcncc~ and indcx.

ISBN 0-471-36091-0 (cloth: acid-free paper)

1. Multivariate analysis. 1. Title. II. Series.

QA278.A516 2003

519.5'35--dc21

Printed in the United States of America

lOYH7654321

2002034317

To

DOROTHY

Contents

Preface to the Third Edition

xv

Preface to the Second Edition

xvii

Preface to the First Edition

xix

1 Introduction

1

1.1. Multivariate Statistical Analysis, 1

1.2. The Multivariate Normal Distribution, 3

2 The Multivariate Normal Distribution

6

2.1.

2.2.

2.3.

2.4.

Introduction, 6

Notions of Multivariate Distributions, 7

The Multivariate Normal Distribution, 13

The Distribution of Linear Combinations of Normally

Distributed Variates; Independence of Variates;

Marginal Distributions, 23

2.5. Conditional Distributions and Multiple Correlation

Coefficient, 33

2.6. The Characteristic Function; Moments, 41

2.7. Elliptically Contoured Distributions, 47

Problems, 56

3

Estimation of the Mean Vector and the Covariance Matrix

66

3.1. Introduction, 66

vii

CONTENTS

Vlli

3.2.

Tile Maximum LikelihoOll Estimators uf the Mean Vet:lor

and the Covariance Matrix, 67

3.3. The Distribution of the Sample Mean Vector; Inference

Concerning the Mean When the Covariance Matrix Is

Known, 74

3.4. Theoretical Properties of Estimators of the Mean

Vector, 83

3.5. Improved Estimation of the Mean, 91

3.6. Elliptically Contoured Distributions, 101

Problems, 108

4

The Distributions and Uses of Sample Correlation Coefficients

4.1.

4.2.

4.3.

115

rntroduction,

115

Currelation CoclTiciellt or a 13ivariate Sample, 116

Partial Correlation CoetTicients; Conditional

Di!'trihutions, 136

4.4. The MUltiple Correlation Codficient, 144

4.5. Elliptically Contoured Distributions, ] 58

Problems, I ()3

5

The Generalized T 2-Statistic

5. I.

5.2.

170

rntrod uction,

170

Derivation of the Generalized T 2-Statistic and Its

Distribution, 171

5.3. Uses of the T"-Statistic, 177

5.4. The Distribution of T2 under Alternative Hypotheses;

The Power Function, 185

5.5. The Two-Sample Problem with Unequal Covariance

Matrices, 187

5.6. Some Optimal Properties or the T 1-Test, 190

5.7. Elliptically Contoured Distributions, 199

Problems, 20 I

6

Classification of Observations

6.1.

6.2.

(>.3.

The Problem of Classification, 207

Standards of Good Classification, 208

Pro(;eOureJ.; or C1assiricatiun into One or Two Populations

with Known Probability Distributions, 2]]

207

CONTENTS

IX

6.4.

Classification into One of Two Known Multivariate Normal

Populations, 215

6.5. Classification into One of Two Multivariate Normal

Populations When the Parameters Are Estimated, 219

6.6. Probabilities of Misc1assification, 227

6.7. Classification into One of Several Populations, 233

6.8. Classification into One of Several Multivariate Normal

Populations, 237

6.9. An Example of Classification into One of Several

Multivariate Normal Populations, 240

6.10. Classification into One of Two Known Multivariate Normal

Populations with Unequal Covariance Matrices, 242

Problems, 248

7 The Distribution of the Sample Covarirnce Matrix and the

Sample Generalized Variance

251

7.1.

7.2.

7.3.

7.4.

7.5.

7.6.

Introduction, 251

The Wishart Distribution, 252

Some Properties of the Wishart Distribution, 258

Cochran's Theorem, 262

The Generalized Variance, 264

Distribution of the Set of Correlation Coefficients When

the Population Covariance Matrix Is Diagonal, 270

7.7. The Inverted Wishart Distribution and Bayes Estimation of

the Covariance Matrix, 272

7.8. Improved Estimation of the Covariance Matrix, 276

7.9. Elliptically Contoured Distributions, 282

PrOblems, 285

8 Testing the General Linear Hypothesis; Multivariate Analysis

of Variance

8.1.

8.2.

Introduction, 291

Estimators of Parameters in Multivariate Linear

Regl'ession, 292

8.3. Likelihood Ratio Criteria for Testing Linear Hypotheses

about Regression Coefficients, 298

g.4. The Distribution of the Likelihood Ratio Criterion When

the Hypothesis Is True, 304

291

x

CONTENTS

~.5.

An Asymptotic Expansion of the Distribution of the

Likelihood Ratio Criterion, 316

8.6. Other Criteria for Testing the Linear Hypothesis, 326

8.7. Tests of Hypotheses about Matrices of Regression

Coefficients and Confidence Regions, 337

8.8. Testing Equality of Means of Several Normal Distributions

with Common Covariance Matrix, 342

8.9. Multivariate Analysis of Variance, 346

8.10. Some Optimal Properties of Tests, 353

8.11. Elliptically Contoured Distributions, 370

Problems, 3""4

9 Testing Independence of Sets of Variates

381

9.1.

9.2.

I ntroductiom, 381

The Likelihood Ratio Criterion for Testing Independence

<,.)f Sets of Variates, 381

9.3. The Distribution of the Likelihood Ratio Criterion When

the Null Hypothesis Is True, 386

9.·t An Asymptotic Expansion of the Distribution of the

Likelihood Ratio Criterion, 390

9.5. Other Criteria, 391

9.6. Step-Down Procedures, 393

9.7. An Example, 396

9.S. The Case of Two Sets of Variates, 397

9.9. Admi~sibility of the Likelihood Ratio Test, 401

9.10. Monotonicity of Power Functions of Tests of

Independence of Set~, 402

9.11. Elliptically Contoured Distributions, 404

Problems, 408

10 Testing Hypotheses of Equality of Covariance Matrices and

Equality of Mean Vectors and Covariance Matrices

10.1.

10.2.

10.3.

10.4.

Introduction, 411

Criteria for Testing Equality of Several Covariance

Matrices, 412

Criteria for Testing That Several Normal Distributions

Are Identical, 415

Distributions of the Criteria, 417

411

CONTENTS

xi

10.5. Asymptotic EXpansions of the Distributions of the

Criteria, 424

10.6. The Case of Two Populations, 427

10.7. Testing the Hypothesis That a Covariance Matrix

Is Proportional to a Given Matrrix; The Sphericity

Test, 431

10.8. . Testing the Hypothesis That a Covariance Matrix Is

Equal to a Given Matrix, 438

10.9. Testing the Hypothesis That a Mean Vector and a

Covariance Matrix Are Equal to a Given Vector ann

Matrix, 444

10.10. Admissibility of Tests, 446

10.11. Elliptically Contoured Distributions, 449

Problems, 454

11 Principal Components

459

11.1. Introduction, 459

11.2. Definition of Principal Components in the

Populat10n, 460

11.3. Maximum Likelihood Estimators of the Principal

Components and Their Variances, 467

11.4. Computation of the Maximum Likelihood Estimates of

the Principal Components, 469

11.5. An Example, 471

11.6. Statistical Inference, 473

11.7. Testing Hypotheses about the Characteristic Roots of a

Covariance Matrix, 478

11.8. Elliptically Contoured Distributions, 482

Problems, 483

12 Canonical Correlations and Canonical Variables

12.1. Introduction, 487

12.2. Canonical Correlations and Variates in the

Population, 488

12.3. Estimation of Canonical Correlations and Variates, 498

12.4. Statistical Inference, 503

12.5. An Example, 505

12.6. Linearly Related Expected Values, 508

487

Xli

CONTENTS

12.7. Reduced Rank Regression, 514

12.8. Simultaneous Equations Models, 515

Problems, 526

13 The Distributions of Characteristic Roots and Vectors

528

13.1.

13.2.

13.3.

13.4.

13.5.

Introduction, 528

The Case of Two Wishart Matrices, 529

The Case of One Nonsingular Wishart Matrix, 538

Canonical Correlations, 543

Asymptotic Distributions in the Case of One Wishart

Matrix, 545

13.6. Asymptotic Distributions in the Case of Two Wishart

Matrices, 549

13.7. Asymptotic Distribution in a Regression Model, 555

13.S. Elliptically Contoured Distributions, 563

Problems, 567

14 Factor Analysis

569

14.1. Introduction, 569

14.2. The Model, 570

14.3. Maximum Likelihood Estimators for Random

Oithogonal Factors, 576

14.4. Estimation for Fixed Factors, 586

14.5. Factor Interpretation and Transformation, 587

14.6. Estimation for Identification by Specified Zeros, 590

14.7. Estimation of Factor Scores, 591

Problems, 593

15 Patterns of Dependence; Graphical Models

595

15.1. Introduction, 595

15.2. Undirected Graphs, 596

15.3. Directed Graphs, 604

15.4. Chain Graphs, 610

15.5. Statistical Inference, 613

Appendix A Matrix Theory

A.I.

A.2.

Definition of a Matrix and Operations on Matrices, 624

Characteristic Roots and Vectors, 631

624

Xiii

CONTENTS

A.3.

A.4.

A.5.

Partitioned Vectors and Matrices, 635

Some Miscellaneous Results, 639

Gram-Schmidt Orthogonalization and the Soll1tion of

Linear Equations, 647

Appendix B Tables

B.1.

B.2.

B.3.

B.4.

B.5.

B.6.

B.7.

651

Wilks' Likelihood Criterion: Factors C(p, m, M) to

Adjust to X;.m' where M = n - p + 1, 651

Tables of Significance Points for the Lawley-Hotelling

Trace Test, 657

Tables of Significance Points for the

.Bartlett-Nanda-Pillai Trace Test, 673

Tables of Significance Points for the Roy Maximum Root

Test, 677

Significance Points for the Modified Likelihood Ratio

Test of Equality of Covariance Matrices Based on Equal

Sample Sizes, 681

Correction Factors for Significance Points for the

Sphericity Test, 683

Significance Points for the Modified Likelihood Ratio

Test "I = "I o, 685

References

687

Index

713

Preface to the Third Edition

For some forty years the first and second editions of this book have been

used by students to acquire a basic knowledge of the theory and methods of

multivariate statistical analysis. The book has also served a wider community

of stati~ticians in furthering their understanding and proficiency in this field.

Since the second edition was published, multivariate analysis has been

developed and extended in many directions. Rather than attempting to cover,

or even survey, the enlarged scope, I have elected to elucidate several aspects

that are particularly interesting and useful for methodology and comprehension.

Earlier editions included some methods that could be carried out on an

adding machine! In the twenty-first century, however, computational techniques have become so highly developed and improvements come so rapidly

that it is impossible to include all of the relevant methods in a volume on the

general mathematical theory. Some aspects of statistics exploit computational

power such as the resampling technologies; these are not covered here.

The definition of multivariate statistics implies the treatment of variables

that are interrelated. Several chapters are devoted to measures of correlation

and tests of independence. A new chapter, "Patterns of Dependence; Graphical Models" has been added. A so-called graphical model is a set of vertices

Or nodes identifying observed variables together with a new set of edges

suggesting dependences between variables. The algebra of such graphs is an

outgrowth and development of path analysis and the study of causal chains.

A graph may represent a sequence in time or logic and may suggest causation

of one set of variables by another set.

Another new topic systematically presented in the third edition is that of

elliptically contoured distributions. The multivariate normal distribution,

which is characterized by the mean vector and covariance matrix, has a

limitation that the fourth-order moments of the variables are determined by

the first- and second-order moments. The class .of elliptically contoured

xv

xvi

PREFACE TO THE THIRD EDITION

distribution relaxes this restriction. A density in this class has contours of

equal density which are ellipsoids as does a normal density, but the set of

fourth-order moments has one further degree of freedom. This topic is

expounded by the addition of sections to appropriate chapters.

Reduced rank regression developed in Chapters 12 and 13 provides a

method of reducing the number of regression coefficients to be estimated in

the regression of one set of variables to another. This approach includes the

limited-information maximum-likelihood estimator of an equation in a simultaneous equations model.

The preparation of the third edition has been benefited by advice and

comments of readers of the first and second editions as well as by reviewers

of the current revision. In addition to readers of the earlier editions listed in

those prefaces I want to thank Michael Perlman and Kathy Richards for their

assistance in getting this manuscript ready.

T. W.

Stanford, California

February 2003

ANDERSON

Preface to the Second Edition

Twenty-six years have plssed since the first edition of this book was published. During that tim~ great advances have been made in multivariate

statistical analysis-particularly in the areas treated in that volume. This new

edition purports to bring the original edition up to date by substantial

revision, rewriting, and additions. The basic approach has been maintained,

llamely, a mathematically rigorous development of statistical methods for

observations consisting of several measurements or characteristics of each

sUbject and a study of their properties. The general outline of topics has been

retained.

The method of maximum likelihood has been augmented by other considerations. In point estimation of the mf"an vectOr and covariance matrix

alternatives to the maximum likelihood estimators that are better with

respect to certain loss functions, such as Stein and Bayes estimators, have

been introduced. In testing hypotheses likelihood ratio tests have been

supplemented by other invariant procedures. New results on distributions

and asymptotic distributions are given; some significant points are tabulated.

Properties of these procedures, such as power functions, admissibility, unbiasedness, and monotonicity of power functions, are studied. Simultaneous

confidence intervals for means and covariances are developed. A chapter on

factor analysis replaces the chapter sketching miscellaneous results in the

first edition. Some new topics, including simultaneous equations models and

linear functional relationships, are introduced. Additional problems present

further results.

It is impossible to cover all relevant material in this book~ what seems

most important has been included. FOr a comprehensive listing of papers

until 1966 and books until 1970 the reader is referred to A Bibliography of

Multivariate Statistical Analysis by Anderson, Das Gupta, and Styan (1972).

Further references can be found in Multivariate Analysis: A Selected and

xvii

xvIH

PREFACE TO THE SECOND EDITION

Abstracted Bibliography, 1957-1972 by Subrahmaniam and Subrahmaniam

(973).

I am in debt to many students, colleagues, and friends for their suggestions

and assistance; they include Yasuo Amemiya, James Berger, Byoung-Seon

Choi. Arthur Cohen, Margery Cruise, Somesh Das Gupta, Kai-Tai Fang,

Gene Golub. Aaron Han, Takeshi Hayakawa, Jogi Henna, Huang Hsu, Fred

Huffer, Mituaki Huzii, Jack Kiefer, Mark Knowles, Sue Leurgans, Alex

McMillan, Masashi No, Ingram Olkin, Kartik Patel, Michael Perlman, Allen

Sampson. Ashis Sen Gupta, Andrew Siegel, Charles Stein, Patrick Strout,

Akimichi Takemura, Joe Verducci, MarIos Viana, and Y. Yajima. I was

helped in preparing the manuscript by Dorothy Anderson, Alice Lundin,

Amy Schwartz, and Pat Struse. Special thanks go to Johanne Thiffault and

George P. H, Styan for their precise attention. Support was contributed by

the Army Research Office, the National Science Foundation, the Office of

Naval Research, and IBM Systems Research Institute.

Seven tables of significance points are given in Appendix B to facilitate

carrying out test procedures. Tables 1, 5, and 7 are Tables 47, 50, and 53,

respectively, of Biometrika Tables for Statisticians, Vol. 2, by E. S. Pearson

and H. 0, Hartley; permission of the Biometrika Trustees is hereby acknowledged. Table 2 is made up from three tables prepared by A. W. Davis and

published in Biometrika (1970a), Annals of the Institute of Statistical Mathematics (1970b) and Communications in Statistics, B. Simulation and Computation (1980). Tables 3 and 4 are Tables 6.3 and 6.4, respectively, of Concise

Stalistical Tables, edited by Ziro Yamauti (1977) and published by the

Japanese Stamlards Alisociation; this book is a concise version of Statistical

Tables and Formulas with Computer Applications, JSA-1972. Table 6 is Table 3

of The Distribution of the Sphericity Test Criterion, ARL 72-0154, by B. N.

Nagarscnkcr and K. C. S. Pillai, Aerospacc Research Laboratorics (1972).

The author is indebted to the authors and publishers listed above for

permission to reproduce these tables.

T. W.

SIanford. California

June 1984

ANDERSON

Preface to the First Edition

This book has been designed primarily as a text for a two-semester course in

multivariate statistics. It is hoped that the book will also serve as an

introduction to many topics in this area to statisticians who are not students

and will be used as a reference by other statisticians.

For several years the book in the form of dittoed notes has been used in a

two-semester sequence of graduate courses at Columbia University; the first

six chapters constituted the text for the first semester, emphasizing correlation theory. It is assumed that the reader is familiar with the usual theory of

univariate statistics, particularly methods based on the univariate normal

distribution. A knowledge of matrix algebra is also a prerequisite; however,

an appendix on this topic has been included.

It is hoped that the more basic and important topics are treated here,

though to some extent the coverage is a matter of taste. Some 0f the mOre

recent and advanced developments are only briefly touched on in the late

chapter.

The method of maximum likelihood is used to a large extent. This leads to

reasonable procedures; in some cases it can be proved that they are optimal.

In many situations, however, the theory of desirable or optimum procedures

is lacking.

Over the years this manuscript has been developed, a number of students

and colleagues have been of considerable assistance. Allan Birnbaum, Harold

Hotelling, Jacob Horowitz, Howard Levene, Ingram OIkin, Gobind Seth,

Charles Stein, and Henry Teicher are to be mentioned particularly. Acknowledgements are also due to other members of the Graduate Mathematical

xix

xx

PREFACE TO THE FIRST EDITION

Statistics Society at Columbia University for aid in the preparation of the

manuscript in dittoed form. The preparation of this manuscript was supported in part by the Office of Naval Research.

T. W.

Center for Advanced Study

in the Behavioral Sciences

Stanford, California

December 1957

ANDERSON

CHAPTER 1

Introduction

1.1. MULTIVARIATE STATISTICAL ANALYSIS

Multivariate statistical analysis is concerned with data that consist of sets of

measurements on a number of individuals or objects. The sample data may

be heights an~ weights of some individuals drawn randomly from a population of school children in a given city, or the statistical treatment may be

made on a collection of measurements, such as lengths and widths of petals

and lengths and widths of sepals of iris plants taken from two species, or one

may study the scores on batteries of mental tests administered to a number of

students.

The measurements made on a single individual can be assembled into a

column vector. We think of the entire vector as an observation from a

multivariate population or distribution. When the individual is drawn randomly, we consider the vector as a random vector with a distribution or

probability law describing that population. The set of observations on all

individuals in a sample constitutes a sample of vectors, and the vectors set

side by side make up the matrix of observations. t The data to be analyzed

then are thought of as displayed in a matrix or in several matrices.

We shall see that it is helpful in visualizing the data and understanding the

methods to think of each observation vector as constituting a point in a

Euclidean space, each coordinate corresponding to a measurement or variable. Indeed, an early step in the statistical analysis is plotting the data; since

tWhen data are listed on paper by individual, it is natural to print the measurements on one

individual as a row of the table; then one individual corresponds to a row vector. Since we prefer

to operate algebraically with column vectors, we have chosen to treat observations in terms of

column vectors. (In practice, the basic data set may weD be on cards, tapes, or di.sks.)

An Introductihn to MuItiuanate Statistical Analysis, Third Edmon. By T. W. Anderson

ISBN 0-471-36091-0 Copyright © 2003 John Wiley & Sons. Inc.

1

INTRODUCTION

most statisticians are limited to two-dimensional plots, two coordinates of the

observation are plotted in turn.

Characteristics of a univariate distribution of essential interest are the

mean as a measure of location and the standard deviation as a measure of

variability; similarly the mean and standard deviation of a univariate sample

are important summary measures. In multivariate analysis, the means and

variances of the separate measurements-for distributions and for samples

-have corresponding relevance. An essential aspect, however, of multivariate analysis is the dependence between the different variables. The dependence between two variables may involve the covariance between them, that

is, the average products of their deviations from their respective means. The

covariance standardized by the corresponding standard deviations is the

correlation coefficient; it serves as a measure of degree of depend~nce. A set

of summary statistics is the mean vector (consisting of the univariate means)

and the covariance matrix (consisting of the univariate variances and bivariate covariances). An alternative set of summary statistics with the same

information is the mean vector, the set of' standard deviations, and the

correlation matrix. Similar parameter quantities describe location, variability,

and dependence in the population or for a probability distribution. The

multivariate nonnal distribution is completely determined by its mean vector

and covariance matrix~ and the sample mean vector and covariance matrix

constitute a sufficient set of statistics.

The measurement and analysis of dependence between variables~ between

sets of variables, and between variables and sets of variables are fundamental

to multivariate analysis. The multiple correlation coefficient is an extension

of the notion of correlation to the relationship of one variable to a set of

variables. The partial correlation coefficient is a measure of dependence

between two variables when the effects of other correlated variables have

been removed. The various correlation coefficients computed from samples

are used to estimate corresponding correlation coefficientS of distributions.

In this hook tests or hypothe~es or independence are developed. The properties of the estimators and test proredures are studied for sampling from the

multivariate normal distribution.

A number of statistical problems arising in multivariate populations are

straightforward analogs of problems arising in univariate populations; the

suitable methods for handling these problems are similarly related. For

example, ill the univariate case we may wish to test the hypothesis that the

mean of a variable is zero; in the multivariate case we may wish to test the

hypothesis that the vector of the means of several variables is the zero vector.

The analog of the Student t-test for the first hypOthesis is the generalized

T 2-test. The analysis of variance of a single variable is adapted to vector

1.2 THE ML"LTIVARIATE NORMAL DISTRIBUTION

3

observations; in regression analysis, the dependent quantity may be a vector

variable. A comparison of variances is generalized into a comparison of

covariance matrices.

The test procedures of univariate statistics are generalized to the multivariate case in such ways that the dependence between variables is taken into

account. These methods may not depend on the coordinate system; that is,

the procedures may be invariant with respect to linear transformations that

leave the nUll. hypothesis invariant. In some problems there may be families

of tests that are invariant; then choices must be made. Optimal properties of

the tests are considered.

For some other purposes, however, it may be important to select a

coordinate system so that the variates have desired statistical properties. One

might say that they involve characterizations of inherent properties of normal

distributions and of samples. These are closely related to the algebraic

problems of canonical forms of matrices. An example is finding the normalized linear combination of variables with maximum or minimum variance

(finding principal components); this amounts to finding a rotation of axes

that carries the covariance matrix to diagonal form. Another example is

characterizing the dependence between two sets of variates (finding canonical correlations). These problems involve the characteristic roots and vectors

of various matrices. The statistical properties of the corresponding sample

quantities are treated.

Some statistical problems arise in models in which means and covariances

are restricted. Factor analysis may be based on a model with a (population)

covariance matrix that is the sum of a positive definite diagonal matrix and a

positive semidefinite matrix of low rank; linear str Jctural relationships may

have a Similar formulation. The simultaneous equations system of econometrics is another example of a special model.

1.2. mE MULTIVARlATE NORMAL DISTRIBUTION

The statistical methods treated in this book can be developed and evaluated

in the context of the multivariate normal distribution, though many of the

procedures are useful and effective when the distribution sampled is not

normal. A major reason for basing statistical analysis on the normal distribution is that this probabilistic model approximates well the distribution of

continuous measurements in many sampled popUlations. In fact, most of the

methods and theory have been developed to serve statistical analysis of data.

Mathematicians such as Adrian (1808), Laplace (1811), Plana (1813), Gauss

4

INTRODUCTION

(1823), and Bravais (1846) l:tudicd the bivariate normal density. Francis

Galton, th.! 3eneticist, introduced the ideas of correlation, regression, and

homoscedasticity in the study ·of pairs of measurements, one made on a

parent and OTJ~ in an offspring. [See, e.g., Galton (1889).] He enunciated the

theory of the multivariate normal distribution as a generalization of obsetved

properties of s2mples.

Karl Pearson and others carried on the development of the theory and use

of differe'lt kinds of correlation coefficients t for studying proble.ns in genetics, biology, and other fields. R. A. Fisher further developed methods for

agriculture, botany, and anthropology, including the discriminant function for

classification problems. In another direction, analysis of scores 01 mental

tests led to a theory, including factor analysis, the sampling theory of which is

based on the normal distribution. In these cases, as well as in agricultural

experiments, in engineering problems, in certain economic problems, and in

other fields, the multivariate normal distributions have been found to be

sufficiently close approximations to the populations so that statistical analyses based on these models are justified.

The univariate normal distribution arises frequently because the effect

studied is the sum of many independent random effects. Similarly, the

multivariate normal distribution often occurs because the multiple meaSUrements are sums of small independent effects. Just as the central limit

theorem leads to the univariate normal distrL>ution for single variables, so

does the general central limit theorem for several variables lead to the

multivariate normal distribution.

Statistical theory based on the normal distribution has the advantage that

the multivariate methods based on it are extensively developed and can be

studied in an organized and systematic way. This is due not only to the need

for such methods because they are of practical US,!, but also to the fact that

normal theory is amenable to exact mathematical treatment. The 'suitable

methods of analysis are mainly based on standard operations of matrix.

algebra; the distributions of many statistics involved can be obtained exactly

or at least characterized; and in many cases optimum properties of procedures can be deduced.

The point of view in this book is to state problems of inference in terms of

the multivariate normal distributions, develop efficient and often optimum

methods in this context, and evaluate significance and confidence levels in

these terms. This approach gives coherence and rigor to the exposition, but,

by its very nature, cannot exhaust consideration of multivariate &tUistical

analysis. The procedures are appropriate to many nonnormal distributions,

f For

a detailed study of the development of the ideas of correlation, see Walker (1931),

1.2 THE MULTIVARIATE NORMAL DISTRIBUTION

s

but their adequacy may be open to question. Roughly speaking, inferences

about means are robust because of the operation of the central limit

theorem~ but inferences about covariances are sensitive to normality, the

variability of sample covariances depending on fourth-order moments.

This inflexibility of normal methods with respect to moments of order

greater than two can be reduced by including a larger class of elliptically

contoured distributions. In the univariate case the normal distribution is

determined by the mean and variance; higher-order moments and properties

such as peakedness and long tails are functions of the mean and variance.

Similarly, in the multivariate case the means and covariances or the means,

variances, and correlations determine all of the properties of the distribution.

That limitation is alleviated in one respect by consideration of a broad class

of elliptically contoured distributions. That class maintains the dependence

structure, but permits more general peakedness and long tails. This study

leads to more robust methods.

The development of computer technology has revolutionized multivariate

statistics in several respects. As in univariate statistics, modern computers

permit the evaluation of obsetved variability and significance of results by

resampling methods, such as the bootstrap and cross-validation. Such

methodology reduces the reliance on tables of significance points as well as

eliminates some restrictions of the normal distribution.

Nonparametric techniques are available when nothing is known about the

underlying distributions. Space does not permit inclusion of these topics as

well as o,\her considerations of data analysis, such as treatment of outliers

a.n?Jransformations of variables to approximate normality and homoscedastIClty.

The availability of modem computer facilities makes possible the analysis

of large data sets and that ability permits the application of multivariate

methods to new areas, such as image analysis, and more effective a.nalysis of

data, such as meteorological. Moreover, new problems of statistical analysis

arise, such as sparseness of parameter Or data matrices. Because hardware

and software development is so explosive and programs require specialized

knowledge, we are content to make a few remarks here and there about

computation. Packages of statistical programs are available for most of the

methods.

CHAPTER 2

The Multivariate

Normal Distribution

2.1. INTRODUCTION

In this chapter we discuss the multivariate normal distribution and some of

its properties. In Section 2.2 are considered the fundamental notions of

multivariate distributions: the definition by means of multivariate density

functions, marginal distributions, conditional distributions, expected values,

and moments. In Section 2.3 tht multivariate normal distribution is defined;

the parameters are shown to be the means, variances, and covariances or the

means, variances, and correlations of the components of the random vector.

In Section 2.4 it is shown that linear combinations of normal variables are

normally distributed and hence that marginal distributions are normal. In

Section 2.5 we see that conditional distributions are also normal with means

that are linear functions of the conditioning variables; the coefficients are

regression coefficients. The variances, covariances, and correlations-called

partial correlations-are constants. The multiple correlation coefficient is

the maximum correlation between a scalar random variable and linear

combination of other random variables; it is a measure of association between one variable and a set of others. The fact that marginal and conditional distributions of normal distributions are normal makes the treatment

of this family of di~tribution~ coherent. In Section 2.6 the characteristic

function, moments, and cumulants are discussed. In Section 2.7 elliptically

contoured distributions are defined; the properties of the normal distribution

arc extended to this Iarger cla~s of distributions .

. 41/ Illlrodl/(lIlll/ 10 Mull/l!analc Siulisl/cal Al/lIIYM~. Hllrd c;dillOll.

ISBN 0-471-36091-0 Copyright © 2003 John Wiley & Sons, Inc.

6

By T. W. Anderson

2.2

NOTIONS OF MULTIVARIATE DISTRIBUTIONS

7

2.2. NOTIONS OF MULTIVARIATE DISTRIBUTIONS

2.2.1. Joint Distributions

In this section we shall consider the notions of joint distributions of several

derived marginal distributions of subsets of variables, and derived

conditional distributions. First consider the case of two (real) random

variables t X and Y. Probabilities of events defined in terms of these variables

can be obtained by operations involving the cumulative distribution function

(abbrevialed as cdf),

variable~,

F(x,y) =Pr{X~x, Y~y},

(1)

defined for every pair of real numbers (x, y). We are interested in cases

where F(x, y) is absolutely continuous;· this means that the following partial

derivative exists almost everywhere:

a 2 F(x, y) _

a-~ay

-f(x,y),

(2)

and

(3)

F(x,y) =

f

y

x

f

-00

f(u,v) dudv.

-00

The nonnegative function f(x, y) is called the density of X and Y. The pair

of random variables ex, Y) defines a random point in a plane. The probability that (X, Y) falls in a rectangle is

(4)

Pr{x~X~x+6.x,y~Y~y+6.y}

""F(x+6.x,y+6.y) -F(x+6.x,y) -F(x,y+6.y) +F(x,y)

=

f(u,v) dudv

jYY+6 Y j.l+6X

x

(6.x> 0, 6.y> 0). The probability of the random point (X, Y) falling in any

set E for which the following int.!gral is defined (that is, any measurable set

E) is

(5)

Pr{(X,Y)EE}

=

f f/(x,y)dxdy .

tIn Chapter 2 we shall distinguish between random variables and running variables by use of

capital and lowercase letters, respectively. In later chapters we may be unable to hold to this

convention because of other complications of notation.

Multivariate Statistical Analysis

Third Edition

T. W. ANDERSON

Stanford University

Department of Statl'ltLc",

Stanford, CA

~WlLEY

~INTERSCIENCE

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright © 200J by John Wiley & Sons, Inc. All rights reserved.

Published by John Wih:y & Sons, lnc. Hohuken, Nl:W Jersey

PlIhlislu:d sinll.II;lI1collsly in Canada.

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any

form or hy any me'IllS, electronic, mechanical, photocopying, recording, scanning 0,' otherwise,

except as pClmit(ed under Section 107 or lOS or the 1Y7c> Uni!l:d States Copyright Act, without

either the prior writ'en permission of the Publisher, or al thorization through payment of the

appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive,

Danvers, MA (lIn], 'J7H-750-H400, fax 97R-750-4470, or on the weh 'It www.copyright,com,

Requests tf) the ,>ublisher for permission should be addressed to the Permissions Department,

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (20n 748-6011, fax (20n

748-6008, e-mdil: permreq@lwiley.com.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best

efforts in preparing this book, they make no representations or warranties with resped to

the accuracy or completeness of the contents of this book and specifically disclaim any

implied warranties of merchantability or fitness for a particular purpose. No warranty may be

created or extended by sales representatives or written sales materials. The advice and

strategies contained herein may not be suitable for your situation. You should consult with

a professional where appropriate. Neither the publisher nor au'hor shall be liable for any

loss of profit or any other commercial damages, including but not limited to special,

incidental, consequential, or other damages.

For gl:nl:ral information on our othl:r products and sl:rvices pll:asl: contad our Customl:r

Care Department within the U.S. at 877-762-2974, outside the U.S, at 317-572-3993 or

fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears

in print, however, may not he availllhie in electronic format.

Library of Congress Cataloging-in-Publication Data

Anderson, 1'. W. (Theodore Wilbur), IYI1:!An introduction to multivariate statistical analysis / Theodore W. Anderson.-- 3rd ed.

p. cm.-- (Wiley series in probability and mathematical statistics)

Includes hihliographical rekrcncc~ and indcx.

ISBN 0-471-36091-0 (cloth: acid-free paper)

1. Multivariate analysis. 1. Title. II. Series.

QA278.A516 2003

519.5'35--dc21

Printed in the United States of America

lOYH7654321

2002034317

To

DOROTHY

Contents

Preface to the Third Edition

xv

Preface to the Second Edition

xvii

Preface to the First Edition

xix

1 Introduction

1

1.1. Multivariate Statistical Analysis, 1

1.2. The Multivariate Normal Distribution, 3

2 The Multivariate Normal Distribution

6

2.1.

2.2.

2.3.

2.4.

Introduction, 6

Notions of Multivariate Distributions, 7

The Multivariate Normal Distribution, 13

The Distribution of Linear Combinations of Normally

Distributed Variates; Independence of Variates;

Marginal Distributions, 23

2.5. Conditional Distributions and Multiple Correlation

Coefficient, 33

2.6. The Characteristic Function; Moments, 41

2.7. Elliptically Contoured Distributions, 47

Problems, 56

3

Estimation of the Mean Vector and the Covariance Matrix

66

3.1. Introduction, 66

vii

CONTENTS

Vlli

3.2.

Tile Maximum LikelihoOll Estimators uf the Mean Vet:lor

and the Covariance Matrix, 67

3.3. The Distribution of the Sample Mean Vector; Inference

Concerning the Mean When the Covariance Matrix Is

Known, 74

3.4. Theoretical Properties of Estimators of the Mean

Vector, 83

3.5. Improved Estimation of the Mean, 91

3.6. Elliptically Contoured Distributions, 101

Problems, 108

4

The Distributions and Uses of Sample Correlation Coefficients

4.1.

4.2.

4.3.

115

rntroduction,

115

Currelation CoclTiciellt or a 13ivariate Sample, 116

Partial Correlation CoetTicients; Conditional

Di!'trihutions, 136

4.4. The MUltiple Correlation Codficient, 144

4.5. Elliptically Contoured Distributions, ] 58

Problems, I ()3

5

The Generalized T 2-Statistic

5. I.

5.2.

170

rntrod uction,

170

Derivation of the Generalized T 2-Statistic and Its

Distribution, 171

5.3. Uses of the T"-Statistic, 177

5.4. The Distribution of T2 under Alternative Hypotheses;

The Power Function, 185

5.5. The Two-Sample Problem with Unequal Covariance

Matrices, 187

5.6. Some Optimal Properties or the T 1-Test, 190

5.7. Elliptically Contoured Distributions, 199

Problems, 20 I

6

Classification of Observations

6.1.

6.2.

(>.3.

The Problem of Classification, 207

Standards of Good Classification, 208

Pro(;eOureJ.; or C1assiricatiun into One or Two Populations

with Known Probability Distributions, 2]]

207

CONTENTS

IX

6.4.

Classification into One of Two Known Multivariate Normal

Populations, 215

6.5. Classification into One of Two Multivariate Normal

Populations When the Parameters Are Estimated, 219

6.6. Probabilities of Misc1assification, 227

6.7. Classification into One of Several Populations, 233

6.8. Classification into One of Several Multivariate Normal

Populations, 237

6.9. An Example of Classification into One of Several

Multivariate Normal Populations, 240

6.10. Classification into One of Two Known Multivariate Normal

Populations with Unequal Covariance Matrices, 242

Problems, 248

7 The Distribution of the Sample Covarirnce Matrix and the

Sample Generalized Variance

251

7.1.

7.2.

7.3.

7.4.

7.5.

7.6.

Introduction, 251

The Wishart Distribution, 252

Some Properties of the Wishart Distribution, 258

Cochran's Theorem, 262

The Generalized Variance, 264

Distribution of the Set of Correlation Coefficients When

the Population Covariance Matrix Is Diagonal, 270

7.7. The Inverted Wishart Distribution and Bayes Estimation of

the Covariance Matrix, 272

7.8. Improved Estimation of the Covariance Matrix, 276

7.9. Elliptically Contoured Distributions, 282

PrOblems, 285

8 Testing the General Linear Hypothesis; Multivariate Analysis

of Variance

8.1.

8.2.

Introduction, 291

Estimators of Parameters in Multivariate Linear

Regl'ession, 292

8.3. Likelihood Ratio Criteria for Testing Linear Hypotheses

about Regression Coefficients, 298

g.4. The Distribution of the Likelihood Ratio Criterion When

the Hypothesis Is True, 304

291

x

CONTENTS

~.5.

An Asymptotic Expansion of the Distribution of the

Likelihood Ratio Criterion, 316

8.6. Other Criteria for Testing the Linear Hypothesis, 326

8.7. Tests of Hypotheses about Matrices of Regression

Coefficients and Confidence Regions, 337

8.8. Testing Equality of Means of Several Normal Distributions

with Common Covariance Matrix, 342

8.9. Multivariate Analysis of Variance, 346

8.10. Some Optimal Properties of Tests, 353

8.11. Elliptically Contoured Distributions, 370

Problems, 3""4

9 Testing Independence of Sets of Variates

381

9.1.

9.2.

I ntroductiom, 381

The Likelihood Ratio Criterion for Testing Independence

<,.)f Sets of Variates, 381

9.3. The Distribution of the Likelihood Ratio Criterion When

the Null Hypothesis Is True, 386

9.·t An Asymptotic Expansion of the Distribution of the

Likelihood Ratio Criterion, 390

9.5. Other Criteria, 391

9.6. Step-Down Procedures, 393

9.7. An Example, 396

9.S. The Case of Two Sets of Variates, 397

9.9. Admi~sibility of the Likelihood Ratio Test, 401

9.10. Monotonicity of Power Functions of Tests of

Independence of Set~, 402

9.11. Elliptically Contoured Distributions, 404

Problems, 408

10 Testing Hypotheses of Equality of Covariance Matrices and

Equality of Mean Vectors and Covariance Matrices

10.1.

10.2.

10.3.

10.4.

Introduction, 411

Criteria for Testing Equality of Several Covariance

Matrices, 412

Criteria for Testing That Several Normal Distributions

Are Identical, 415

Distributions of the Criteria, 417

411

CONTENTS

xi

10.5. Asymptotic EXpansions of the Distributions of the

Criteria, 424

10.6. The Case of Two Populations, 427

10.7. Testing the Hypothesis That a Covariance Matrix

Is Proportional to a Given Matrrix; The Sphericity

Test, 431

10.8. . Testing the Hypothesis That a Covariance Matrix Is

Equal to a Given Matrix, 438

10.9. Testing the Hypothesis That a Mean Vector and a

Covariance Matrix Are Equal to a Given Vector ann

Matrix, 444

10.10. Admissibility of Tests, 446

10.11. Elliptically Contoured Distributions, 449

Problems, 454

11 Principal Components

459

11.1. Introduction, 459

11.2. Definition of Principal Components in the

Populat10n, 460

11.3. Maximum Likelihood Estimators of the Principal

Components and Their Variances, 467

11.4. Computation of the Maximum Likelihood Estimates of

the Principal Components, 469

11.5. An Example, 471

11.6. Statistical Inference, 473

11.7. Testing Hypotheses about the Characteristic Roots of a

Covariance Matrix, 478

11.8. Elliptically Contoured Distributions, 482

Problems, 483

12 Canonical Correlations and Canonical Variables

12.1. Introduction, 487

12.2. Canonical Correlations and Variates in the

Population, 488

12.3. Estimation of Canonical Correlations and Variates, 498

12.4. Statistical Inference, 503

12.5. An Example, 505

12.6. Linearly Related Expected Values, 508

487

Xli

CONTENTS

12.7. Reduced Rank Regression, 514

12.8. Simultaneous Equations Models, 515

Problems, 526

13 The Distributions of Characteristic Roots and Vectors

528

13.1.

13.2.

13.3.

13.4.

13.5.

Introduction, 528

The Case of Two Wishart Matrices, 529

The Case of One Nonsingular Wishart Matrix, 538

Canonical Correlations, 543

Asymptotic Distributions in the Case of One Wishart

Matrix, 545

13.6. Asymptotic Distributions in the Case of Two Wishart

Matrices, 549

13.7. Asymptotic Distribution in a Regression Model, 555

13.S. Elliptically Contoured Distributions, 563

Problems, 567

14 Factor Analysis

569

14.1. Introduction, 569

14.2. The Model, 570

14.3. Maximum Likelihood Estimators for Random

Oithogonal Factors, 576

14.4. Estimation for Fixed Factors, 586

14.5. Factor Interpretation and Transformation, 587

14.6. Estimation for Identification by Specified Zeros, 590

14.7. Estimation of Factor Scores, 591

Problems, 593

15 Patterns of Dependence; Graphical Models

595

15.1. Introduction, 595

15.2. Undirected Graphs, 596

15.3. Directed Graphs, 604

15.4. Chain Graphs, 610

15.5. Statistical Inference, 613

Appendix A Matrix Theory

A.I.

A.2.

Definition of a Matrix and Operations on Matrices, 624

Characteristic Roots and Vectors, 631

624

Xiii

CONTENTS

A.3.

A.4.

A.5.

Partitioned Vectors and Matrices, 635

Some Miscellaneous Results, 639

Gram-Schmidt Orthogonalization and the Soll1tion of

Linear Equations, 647

Appendix B Tables

B.1.

B.2.

B.3.

B.4.

B.5.

B.6.

B.7.

651

Wilks' Likelihood Criterion: Factors C(p, m, M) to

Adjust to X;.m' where M = n - p + 1, 651

Tables of Significance Points for the Lawley-Hotelling

Trace Test, 657

Tables of Significance Points for the

.Bartlett-Nanda-Pillai Trace Test, 673

Tables of Significance Points for the Roy Maximum Root

Test, 677

Significance Points for the Modified Likelihood Ratio

Test of Equality of Covariance Matrices Based on Equal

Sample Sizes, 681

Correction Factors for Significance Points for the

Sphericity Test, 683

Significance Points for the Modified Likelihood Ratio

Test "I = "I o, 685

References

687

Index

713

Preface to the Third Edition

For some forty years the first and second editions of this book have been

used by students to acquire a basic knowledge of the theory and methods of

multivariate statistical analysis. The book has also served a wider community

of stati~ticians in furthering their understanding and proficiency in this field.

Since the second edition was published, multivariate analysis has been

developed and extended in many directions. Rather than attempting to cover,

or even survey, the enlarged scope, I have elected to elucidate several aspects

that are particularly interesting and useful for methodology and comprehension.

Earlier editions included some methods that could be carried out on an

adding machine! In the twenty-first century, however, computational techniques have become so highly developed and improvements come so rapidly

that it is impossible to include all of the relevant methods in a volume on the

general mathematical theory. Some aspects of statistics exploit computational

power such as the resampling technologies; these are not covered here.

The definition of multivariate statistics implies the treatment of variables

that are interrelated. Several chapters are devoted to measures of correlation

and tests of independence. A new chapter, "Patterns of Dependence; Graphical Models" has been added. A so-called graphical model is a set of vertices

Or nodes identifying observed variables together with a new set of edges

suggesting dependences between variables. The algebra of such graphs is an

outgrowth and development of path analysis and the study of causal chains.

A graph may represent a sequence in time or logic and may suggest causation

of one set of variables by another set.

Another new topic systematically presented in the third edition is that of

elliptically contoured distributions. The multivariate normal distribution,

which is characterized by the mean vector and covariance matrix, has a

limitation that the fourth-order moments of the variables are determined by

the first- and second-order moments. The class .of elliptically contoured

xv

xvi

PREFACE TO THE THIRD EDITION

distribution relaxes this restriction. A density in this class has contours of

equal density which are ellipsoids as does a normal density, but the set of

fourth-order moments has one further degree of freedom. This topic is

expounded by the addition of sections to appropriate chapters.

Reduced rank regression developed in Chapters 12 and 13 provides a

method of reducing the number of regression coefficients to be estimated in

the regression of one set of variables to another. This approach includes the

limited-information maximum-likelihood estimator of an equation in a simultaneous equations model.

The preparation of the third edition has been benefited by advice and

comments of readers of the first and second editions as well as by reviewers

of the current revision. In addition to readers of the earlier editions listed in

those prefaces I want to thank Michael Perlman and Kathy Richards for their

assistance in getting this manuscript ready.

T. W.

Stanford, California

February 2003

ANDERSON

Preface to the Second Edition

Twenty-six years have plssed since the first edition of this book was published. During that tim~ great advances have been made in multivariate

statistical analysis-particularly in the areas treated in that volume. This new

edition purports to bring the original edition up to date by substantial

revision, rewriting, and additions. The basic approach has been maintained,

llamely, a mathematically rigorous development of statistical methods for

observations consisting of several measurements or characteristics of each

sUbject and a study of their properties. The general outline of topics has been

retained.

The method of maximum likelihood has been augmented by other considerations. In point estimation of the mf"an vectOr and covariance matrix

alternatives to the maximum likelihood estimators that are better with

respect to certain loss functions, such as Stein and Bayes estimators, have

been introduced. In testing hypotheses likelihood ratio tests have been

supplemented by other invariant procedures. New results on distributions

and asymptotic distributions are given; some significant points are tabulated.

Properties of these procedures, such as power functions, admissibility, unbiasedness, and monotonicity of power functions, are studied. Simultaneous

confidence intervals for means and covariances are developed. A chapter on

factor analysis replaces the chapter sketching miscellaneous results in the

first edition. Some new topics, including simultaneous equations models and

linear functional relationships, are introduced. Additional problems present

further results.

It is impossible to cover all relevant material in this book~ what seems

most important has been included. FOr a comprehensive listing of papers

until 1966 and books until 1970 the reader is referred to A Bibliography of

Multivariate Statistical Analysis by Anderson, Das Gupta, and Styan (1972).

Further references can be found in Multivariate Analysis: A Selected and

xvii

xvIH

PREFACE TO THE SECOND EDITION

Abstracted Bibliography, 1957-1972 by Subrahmaniam and Subrahmaniam

(973).

I am in debt to many students, colleagues, and friends for their suggestions

and assistance; they include Yasuo Amemiya, James Berger, Byoung-Seon

Choi. Arthur Cohen, Margery Cruise, Somesh Das Gupta, Kai-Tai Fang,

Gene Golub. Aaron Han, Takeshi Hayakawa, Jogi Henna, Huang Hsu, Fred

Huffer, Mituaki Huzii, Jack Kiefer, Mark Knowles, Sue Leurgans, Alex

McMillan, Masashi No, Ingram Olkin, Kartik Patel, Michael Perlman, Allen

Sampson. Ashis Sen Gupta, Andrew Siegel, Charles Stein, Patrick Strout,

Akimichi Takemura, Joe Verducci, MarIos Viana, and Y. Yajima. I was

helped in preparing the manuscript by Dorothy Anderson, Alice Lundin,

Amy Schwartz, and Pat Struse. Special thanks go to Johanne Thiffault and

George P. H, Styan for their precise attention. Support was contributed by

the Army Research Office, the National Science Foundation, the Office of

Naval Research, and IBM Systems Research Institute.

Seven tables of significance points are given in Appendix B to facilitate

carrying out test procedures. Tables 1, 5, and 7 are Tables 47, 50, and 53,

respectively, of Biometrika Tables for Statisticians, Vol. 2, by E. S. Pearson

and H. 0, Hartley; permission of the Biometrika Trustees is hereby acknowledged. Table 2 is made up from three tables prepared by A. W. Davis and

published in Biometrika (1970a), Annals of the Institute of Statistical Mathematics (1970b) and Communications in Statistics, B. Simulation and Computation (1980). Tables 3 and 4 are Tables 6.3 and 6.4, respectively, of Concise

Stalistical Tables, edited by Ziro Yamauti (1977) and published by the

Japanese Stamlards Alisociation; this book is a concise version of Statistical

Tables and Formulas with Computer Applications, JSA-1972. Table 6 is Table 3

of The Distribution of the Sphericity Test Criterion, ARL 72-0154, by B. N.

Nagarscnkcr and K. C. S. Pillai, Aerospacc Research Laboratorics (1972).

The author is indebted to the authors and publishers listed above for

permission to reproduce these tables.

T. W.

SIanford. California

June 1984

ANDERSON

Preface to the First Edition

This book has been designed primarily as a text for a two-semester course in

multivariate statistics. It is hoped that the book will also serve as an

introduction to many topics in this area to statisticians who are not students

and will be used as a reference by other statisticians.

For several years the book in the form of dittoed notes has been used in a

two-semester sequence of graduate courses at Columbia University; the first

six chapters constituted the text for the first semester, emphasizing correlation theory. It is assumed that the reader is familiar with the usual theory of

univariate statistics, particularly methods based on the univariate normal

distribution. A knowledge of matrix algebra is also a prerequisite; however,

an appendix on this topic has been included.

It is hoped that the more basic and important topics are treated here,

though to some extent the coverage is a matter of taste. Some 0f the mOre

recent and advanced developments are only briefly touched on in the late

chapter.

The method of maximum likelihood is used to a large extent. This leads to

reasonable procedures; in some cases it can be proved that they are optimal.

In many situations, however, the theory of desirable or optimum procedures

is lacking.

Over the years this manuscript has been developed, a number of students

and colleagues have been of considerable assistance. Allan Birnbaum, Harold

Hotelling, Jacob Horowitz, Howard Levene, Ingram OIkin, Gobind Seth,

Charles Stein, and Henry Teicher are to be mentioned particularly. Acknowledgements are also due to other members of the Graduate Mathematical

xix

xx

PREFACE TO THE FIRST EDITION

Statistics Society at Columbia University for aid in the preparation of the

manuscript in dittoed form. The preparation of this manuscript was supported in part by the Office of Naval Research.

T. W.

Center for Advanced Study

in the Behavioral Sciences

Stanford, California

December 1957

ANDERSON

CHAPTER 1

Introduction

1.1. MULTIVARIATE STATISTICAL ANALYSIS

Multivariate statistical analysis is concerned with data that consist of sets of

measurements on a number of individuals or objects. The sample data may

be heights an~ weights of some individuals drawn randomly from a population of school children in a given city, or the statistical treatment may be

made on a collection of measurements, such as lengths and widths of petals

and lengths and widths of sepals of iris plants taken from two species, or one

may study the scores on batteries of mental tests administered to a number of

students.

The measurements made on a single individual can be assembled into a

column vector. We think of the entire vector as an observation from a

multivariate population or distribution. When the individual is drawn randomly, we consider the vector as a random vector with a distribution or

probability law describing that population. The set of observations on all

individuals in a sample constitutes a sample of vectors, and the vectors set

side by side make up the matrix of observations. t The data to be analyzed

then are thought of as displayed in a matrix or in several matrices.

We shall see that it is helpful in visualizing the data and understanding the

methods to think of each observation vector as constituting a point in a

Euclidean space, each coordinate corresponding to a measurement or variable. Indeed, an early step in the statistical analysis is plotting the data; since

tWhen data are listed on paper by individual, it is natural to print the measurements on one

individual as a row of the table; then one individual corresponds to a row vector. Since we prefer

to operate algebraically with column vectors, we have chosen to treat observations in terms of

column vectors. (In practice, the basic data set may weD be on cards, tapes, or di.sks.)

An Introductihn to MuItiuanate Statistical Analysis, Third Edmon. By T. W. Anderson

ISBN 0-471-36091-0 Copyright © 2003 John Wiley & Sons. Inc.

1

INTRODUCTION

most statisticians are limited to two-dimensional plots, two coordinates of the

observation are plotted in turn.

Characteristics of a univariate distribution of essential interest are the

mean as a measure of location and the standard deviation as a measure of

variability; similarly the mean and standard deviation of a univariate sample

are important summary measures. In multivariate analysis, the means and

variances of the separate measurements-for distributions and for samples

-have corresponding relevance. An essential aspect, however, of multivariate analysis is the dependence between the different variables. The dependence between two variables may involve the covariance between them, that

is, the average products of their deviations from their respective means. The

covariance standardized by the corresponding standard deviations is the

correlation coefficient; it serves as a measure of degree of depend~nce. A set

of summary statistics is the mean vector (consisting of the univariate means)

and the covariance matrix (consisting of the univariate variances and bivariate covariances). An alternative set of summary statistics with the same

information is the mean vector, the set of' standard deviations, and the

correlation matrix. Similar parameter quantities describe location, variability,

and dependence in the population or for a probability distribution. The

multivariate nonnal distribution is completely determined by its mean vector

and covariance matrix~ and the sample mean vector and covariance matrix

constitute a sufficient set of statistics.

The measurement and analysis of dependence between variables~ between

sets of variables, and between variables and sets of variables are fundamental

to multivariate analysis. The multiple correlation coefficient is an extension

of the notion of correlation to the relationship of one variable to a set of

variables. The partial correlation coefficient is a measure of dependence

between two variables when the effects of other correlated variables have

been removed. The various correlation coefficients computed from samples

are used to estimate corresponding correlation coefficientS of distributions.

In this hook tests or hypothe~es or independence are developed. The properties of the estimators and test proredures are studied for sampling from the

multivariate normal distribution.

A number of statistical problems arising in multivariate populations are

straightforward analogs of problems arising in univariate populations; the

suitable methods for handling these problems are similarly related. For

example, ill the univariate case we may wish to test the hypothesis that the

mean of a variable is zero; in the multivariate case we may wish to test the

hypothesis that the vector of the means of several variables is the zero vector.

The analog of the Student t-test for the first hypOthesis is the generalized

T 2-test. The analysis of variance of a single variable is adapted to vector

1.2 THE ML"LTIVARIATE NORMAL DISTRIBUTION

3

observations; in regression analysis, the dependent quantity may be a vector

variable. A comparison of variances is generalized into a comparison of

covariance matrices.

The test procedures of univariate statistics are generalized to the multivariate case in such ways that the dependence between variables is taken into

account. These methods may not depend on the coordinate system; that is,

the procedures may be invariant with respect to linear transformations that

leave the nUll. hypothesis invariant. In some problems there may be families

of tests that are invariant; then choices must be made. Optimal properties of

the tests are considered.

For some other purposes, however, it may be important to select a

coordinate system so that the variates have desired statistical properties. One

might say that they involve characterizations of inherent properties of normal

distributions and of samples. These are closely related to the algebraic

problems of canonical forms of matrices. An example is finding the normalized linear combination of variables with maximum or minimum variance

(finding principal components); this amounts to finding a rotation of axes

that carries the covariance matrix to diagonal form. Another example is

characterizing the dependence between two sets of variates (finding canonical correlations). These problems involve the characteristic roots and vectors

of various matrices. The statistical properties of the corresponding sample

quantities are treated.

Some statistical problems arise in models in which means and covariances

are restricted. Factor analysis may be based on a model with a (population)

covariance matrix that is the sum of a positive definite diagonal matrix and a

positive semidefinite matrix of low rank; linear str Jctural relationships may

have a Similar formulation. The simultaneous equations system of econometrics is another example of a special model.

1.2. mE MULTIVARlATE NORMAL DISTRIBUTION

The statistical methods treated in this book can be developed and evaluated

in the context of the multivariate normal distribution, though many of the

procedures are useful and effective when the distribution sampled is not

normal. A major reason for basing statistical analysis on the normal distribution is that this probabilistic model approximates well the distribution of

continuous measurements in many sampled popUlations. In fact, most of the

methods and theory have been developed to serve statistical analysis of data.

Mathematicians such as Adrian (1808), Laplace (1811), Plana (1813), Gauss

4

INTRODUCTION

(1823), and Bravais (1846) l:tudicd the bivariate normal density. Francis

Galton, th.! 3eneticist, introduced the ideas of correlation, regression, and

homoscedasticity in the study ·of pairs of measurements, one made on a

parent and OTJ~ in an offspring. [See, e.g., Galton (1889).] He enunciated the

theory of the multivariate normal distribution as a generalization of obsetved

properties of s2mples.

Karl Pearson and others carried on the development of the theory and use

of differe'lt kinds of correlation coefficients t for studying proble.ns in genetics, biology, and other fields. R. A. Fisher further developed methods for

agriculture, botany, and anthropology, including the discriminant function for

classification problems. In another direction, analysis of scores 01 mental

tests led to a theory, including factor analysis, the sampling theory of which is

based on the normal distribution. In these cases, as well as in agricultural

experiments, in engineering problems, in certain economic problems, and in

other fields, the multivariate normal distributions have been found to be

sufficiently close approximations to the populations so that statistical analyses based on these models are justified.

The univariate normal distribution arises frequently because the effect

studied is the sum of many independent random effects. Similarly, the

multivariate normal distribution often occurs because the multiple meaSUrements are sums of small independent effects. Just as the central limit

theorem leads to the univariate normal distrL>ution for single variables, so

does the general central limit theorem for several variables lead to the

multivariate normal distribution.

Statistical theory based on the normal distribution has the advantage that

the multivariate methods based on it are extensively developed and can be

studied in an organized and systematic way. This is due not only to the need

for such methods because they are of practical US,!, but also to the fact that

normal theory is amenable to exact mathematical treatment. The 'suitable

methods of analysis are mainly based on standard operations of matrix.

algebra; the distributions of many statistics involved can be obtained exactly

or at least characterized; and in many cases optimum properties of procedures can be deduced.

The point of view in this book is to state problems of inference in terms of

the multivariate normal distributions, develop efficient and often optimum

methods in this context, and evaluate significance and confidence levels in

these terms. This approach gives coherence and rigor to the exposition, but,

by its very nature, cannot exhaust consideration of multivariate &tUistical

analysis. The procedures are appropriate to many nonnormal distributions,

f For

a detailed study of the development of the ideas of correlation, see Walker (1931),

1.2 THE MULTIVARIATE NORMAL DISTRIBUTION

s

but their adequacy may be open to question. Roughly speaking, inferences

about means are robust because of the operation of the central limit

theorem~ but inferences about covariances are sensitive to normality, the

variability of sample covariances depending on fourth-order moments.

This inflexibility of normal methods with respect to moments of order

greater than two can be reduced by including a larger class of elliptically

contoured distributions. In the univariate case the normal distribution is

determined by the mean and variance; higher-order moments and properties

such as peakedness and long tails are functions of the mean and variance.

Similarly, in the multivariate case the means and covariances or the means,

variances, and correlations determine all of the properties of the distribution.

That limitation is alleviated in one respect by consideration of a broad class

of elliptically contoured distributions. That class maintains the dependence

structure, but permits more general peakedness and long tails. This study

leads to more robust methods.

The development of computer technology has revolutionized multivariate

statistics in several respects. As in univariate statistics, modern computers

permit the evaluation of obsetved variability and significance of results by

resampling methods, such as the bootstrap and cross-validation. Such

methodology reduces the reliance on tables of significance points as well as

eliminates some restrictions of the normal distribution.

Nonparametric techniques are available when nothing is known about the

underlying distributions. Space does not permit inclusion of these topics as

well as o,\her considerations of data analysis, such as treatment of outliers

a.n?Jransformations of variables to approximate normality and homoscedastIClty.

The availability of modem computer facilities makes possible the analysis

of large data sets and that ability permits the application of multivariate

methods to new areas, such as image analysis, and more effective a.nalysis of

data, such as meteorological. Moreover, new problems of statistical analysis

arise, such as sparseness of parameter Or data matrices. Because hardware

and software development is so explosive and programs require specialized

knowledge, we are content to make a few remarks here and there about

computation. Packages of statistical programs are available for most of the

methods.

CHAPTER 2

The Multivariate

Normal Distribution

2.1. INTRODUCTION

In this chapter we discuss the multivariate normal distribution and some of

its properties. In Section 2.2 are considered the fundamental notions of

multivariate distributions: the definition by means of multivariate density

functions, marginal distributions, conditional distributions, expected values,

and moments. In Section 2.3 tht multivariate normal distribution is defined;

the parameters are shown to be the means, variances, and covariances or the

means, variances, and correlations of the components of the random vector.

In Section 2.4 it is shown that linear combinations of normal variables are

normally distributed and hence that marginal distributions are normal. In

Section 2.5 we see that conditional distributions are also normal with means

that are linear functions of the conditioning variables; the coefficients are

regression coefficients. The variances, covariances, and correlations-called

partial correlations-are constants. The multiple correlation coefficient is

the maximum correlation between a scalar random variable and linear

combination of other random variables; it is a measure of association between one variable and a set of others. The fact that marginal and conditional distributions of normal distributions are normal makes the treatment

of this family of di~tribution~ coherent. In Section 2.6 the characteristic

function, moments, and cumulants are discussed. In Section 2.7 elliptically

contoured distributions are defined; the properties of the normal distribution

arc extended to this Iarger cla~s of distributions .

. 41/ Illlrodl/(lIlll/ 10 Mull/l!analc Siulisl/cal Al/lIIYM~. Hllrd c;dillOll.

ISBN 0-471-36091-0 Copyright © 2003 John Wiley & Sons, Inc.

6

By T. W. Anderson

2.2

NOTIONS OF MULTIVARIATE DISTRIBUTIONS

7

2.2. NOTIONS OF MULTIVARIATE DISTRIBUTIONS

2.2.1. Joint Distributions

In this section we shall consider the notions of joint distributions of several

derived marginal distributions of subsets of variables, and derived

conditional distributions. First consider the case of two (real) random

variables t X and Y. Probabilities of events defined in terms of these variables

can be obtained by operations involving the cumulative distribution function

(abbrevialed as cdf),

variable~,

F(x,y) =Pr{X~x, Y~y},

(1)

defined for every pair of real numbers (x, y). We are interested in cases

where F(x, y) is absolutely continuous;· this means that the following partial

derivative exists almost everywhere:

a 2 F(x, y) _

a-~ay

-f(x,y),

(2)

and

(3)

F(x,y) =

f

y

x

f

-00

f(u,v) dudv.

-00

The nonnegative function f(x, y) is called the density of X and Y. The pair

of random variables ex, Y) defines a random point in a plane. The probability that (X, Y) falls in a rectangle is

(4)

Pr{x~X~x+6.x,y~Y~y+6.y}

""F(x+6.x,y+6.y) -F(x+6.x,y) -F(x,y+6.y) +F(x,y)

=

f(u,v) dudv

jYY+6 Y j.l+6X

x

(6.x> 0, 6.y> 0). The probability of the random point (X, Y) falling in any

set E for which the following int.!gral is defined (that is, any measurable set

E) is

(5)

Pr{(X,Y)EE}

=

f f/(x,y)dxdy .

tIn Chapter 2 we shall distinguish between random variables and running variables by use of

capital and lowercase letters, respectively. In later chapters we may be unable to hold to this

convention because of other complications of notation.

## An introduction to disk drive modeling

## C++ - I/O Streams as an Introduction to Objects and Classes

## An Introduction to Software Engineering

## Cambridge.University.Press.An.Introduction.to.Law.and.Regulation.Text.and.Materials.Apr.2007.pdf

## Cambridge.University.Press.An.Introduction.to.the.Philosophy.of.Mind.Jan.2000.pdf

## An introduction to franchising

## AN INTRODUCTION TO POWERPOINT

## An Introduction to Spring

## Embedded Systems Design—An Introduction to Processes, Tools, and Techniques

## An Introduction To Cryptography

Tài liệu liên quan