Running head: DATA ANALYSIS SUN COAST PROJECT 1

Data Analysis Sun Coast Project

Nguyen Tien Thanh

ID: 280113

Columbia Southern University

DATA ANALYSIS SUN COAST PROJECT

2

Data Analysis: Correlation, Regression, t Test, and ANOVA

The Sun Coast Remediation’s data are meet assumption and appropriate for parametric

statistical procedures. For further conclusion, in this assignment we will use analysis including:

correlation analysis, simple regression analysis, multiple regression analysis, independent sample

t test, paired sample t test and ANOVA. The results, conclusions from these analysis will support

us to make right decisions.

Correlation Analysis

The hypotheses:

H01:There is not a relationship between size of PM and numbers of employee’s sick days.

HA1:There is a relationship between size of PM and numbers of employee’s sick days.

Data output results from Excel Toolpak:

mean annual sick

days per employee

Microns

Microns

mean annual sick days per

employee

1

-0.715984185

1

Regression Statistics

Multiple R

0.715984185

R Square

0.512633354

Adjusted R Square

0.507807941

Standard Error

1.327783455

Observations

103

ANOVA

Df

Regression

1

Residual

101

Total

102

SS

187.295323

9

178.063899

4

365.359223

3

MS

187.2953239

1.763008905

F

106.236175

8

Significance

F

1.89059E-17

DATA ANALYSIS SUN COAST PROJECT

Coefficients

10.0814448

3

0.52237655

4

Intercept

Microns

3

Standard

Error

0.31515696

9

31.9886464

1.16929E-54

9.456258184

0.05068126

7

-10.30709347

1.89059E-17

-0.622914554

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

10.7066314

8

0.42183855

4

Upper 95.0%

9.456258184

10.70663148

0.622914554

-0.421838554

The value of Pearson correlation coefficientr = -0.715. It meansthat particulate matter

size, as measured in microns, is strongly and negatively correlated with mean annual sick days

per employee. The value of r2=0.51, it means that 51% of the variability in employee sick days is

explained by particular matter size.

The value of p is 1.89E-17 for microns,it is smaller than the value of alpha 0.05.When the

p value is smaller than the alpha, the null hypothesis isrejected and the alternative hypothesis is

accepted that there is statistically significant relationship between particular matter size and

employee sick days.

Simple Regression Analysis

Restate the hypotheses:

H02:There is not a relationship between the safety training expenditure and the lost time hours.

HA2:There is a relationship between the safety training expenditure and the lost time hours.

Data output results from Excel Toolpak:

Regression Statistics

Multiple R

0.939559324

R Square

0.882771723

Adjusted R Square

0.882241279

ANOVA

Standard Error

24.61328875

Df

SS

Observations

223

Residual

221

1008202.10

5

133884.890

3

Total

222

1142086.996

Regression

1

Coefficients

Intercept

273.449419

Standard

Error

2.665261963

MS

1008202.105

F

1664.21068

7

Significance

F

7.6586E-105

605.8139831

t Stat

102.5975768

P-value

2.1412E-188

Lower 95%

Upper 95%

268.1968373

278.7020007

Lower 95.0%

268.1968373

Upper 95.0%

278.7020007

DATA ANALYSIS SUN COAST PROJECT

safety training

expenditure

-0.143367741

0.00351436

8

-40.79473848

4

7.6586E-105

-0.150293705

-0.136441778

-0.150293705

The value of Multiple R is0.939, close to 1, it means that there is strong correlation between

the safety training expenditureand the lost time hours. The value ofR square (R2) is 0.88 indicates

that 88% of the variation in the lost time hours is explained by the regression model. This is a

high R2.

The p value is7.65E-105smaller than the alpha value 0.05.Sothe null hypothesis is

rejected and the alternative hypothesis is accepted. There is a relationship between the safety

training expenditure and the lost time hours.

The coefficient for safety training expenditure is -0.143 indicating a negative correlation

between lost time hours and the safety training expenditure.The model can be expressed as a

predictive equation:

Y = a + bX

Lost time hours = 273.44 + (-0.143)(safety training expenditure).

Multiple Regression Analysis

Restate the hypotheses:

H03:There is not a relationship between frequency, angle in degrees, chord length,

velocity,displacement and decibel level.

HA3:There is a relationship between frequency, angle in degrees, chord length, velocity,

displacement and decibel level.

Data output results from Excel Toolpak:

Regression Statistics

Multiple R

0.601841822

R Square

0.362213579

Adjusted R Square

0.360083364

Standard Error

5.51856585

-0.136441778

DATA ANALYSIS SUN COAST PROJECT

Observations

5

1503

ANOVA

Df

SS

Regression

MS

5

25891.88784

5178.377569

Residual

1497

45590.48986

30.45456904

Total

1502

71482.3777

Coefficients

Intercept

Standard

Error

t Stat

F

170.036146

7

P-value

Significance

F

2.1289E-143

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

126.8224555

0.623820253

203.2996763

0

125.5988009

128.0461101

125.5988009

128.0461101

-0.0011169

4.7551E-05

-23.48846042

-0.001210174

-0.001023627

-0.001210174

-0.001023627

Angle in Degrees

0.047342353

0.037308069

1.268957462

-0.025839288

0.120523993

-0.025839288

0.120523993

Chord Length

Velocity (Meters

per Second)

-5.495318335

2.927962181

-1.876840613

4.0652E-104

0.20465350

1

0.06073430

9

-11.23866234

0.248025671

-11.23866234

0.248025671

0.083239634

0.009300188

8.950317436

1.02398E-18

0.064996851

0.101482417

0.064996851

0.101482417

Displacement

-240.5059086

16.51902666

-14.55932686

5.20583E-45

-272.9088041

-208.103013

-272.9088041

-208.103013

Frequency (Hz)

The value ofMultiple R is0.6reveals, it means thatthe frequency, angle in degrees, chord

length, velocity, displacement aremoderately correlated with decibel level. R square (R2) is 0.36,

it means 36% of the variability in the decibel levelexplained by frequency, angle in degrees,

chord length, velocity, displacement. This is a weak R2.

Using an alpha of 0.05 to compare with thep value of each variable:

for Frequency (Hz), a p value of 4.06E-104< 0.05, therefore, there is statistical

significance between Frequency and decibel level.

for Angle in Degrees, a p value of 0.2> 0.05,therefore, there is no statistical significance

between Angle in Degrees and decibel level.

for Chord Length, a p value of 0.06> 0.05, therefore, there is no statistical significance

between Chord Length and decibel level.

for Velocity (meters per second),a p value of 1.02E-18 < 0.05, therefore, there is

statistical significance between Velocity and decibel level.

DATA ANALYSIS SUN COAST PROJECT

6

and for Displacement, a p value of 5.2E-45 < 0.05, therefore, there is statistical

significance between Displacement and decibel level.

Summary, there is a statistically significant relationship between frequency, velocity,

displacement and decibel level. The coefficient for frequency is -0.001 and displacement is

-240.5 indicating a negative correlation between frequency, displacement and the decibel level.

The coefficient for velocity is 0.083 indicating a positive correlation between velocity and the

decibel level.

The predictive equation is expressed as following:

Y = a + b1X1 + b2X2 +…+ bnXn

Decibel level= 126.8 + (-0.001)(Frequency (Hz)) + (-240)(Displacement) +

0.083(Velocity).

Independent Sample t Test

Restate the hypotheses:

H04:The revised new employee training is not more effective than the prior training.

HA4:The revised new employee training is more effective than the prior training.

Data output results from Excel Toolpak:

t-Test: Two-Sample Assuming Unequal Variances

Mean

Variance

Observations

Hypothesized Mean Difference

Df

t Stat

P(T<=t) one-tail

t Critical one-tail

P(T<=t) two-tail

Prior

Training

69.79032258

122.004495

62

0

87

-9.666557191

9.69914E-16

1.662557349

1.93983E-15

Revised

Training

84.77419355

26.96456901

62

DATA ANALYSIS SUN COAST PROJECT

t Critical two-tail

7

1.987608282

The mean value of the Prior Training group (69.8) is lower than mean value of Revised

Training group (84.8). Besides, a pvalue has been found at 1.94E-15 smaller than the alpha of

0.05. Thus, the null hypothesis is rejected and the alternative hypothesis is accepted. Therevised

new employee training is more effective than the prior training.

Dependent Sample t Test

Restate the hypotheses:

H05:There is not an increase in blood lead level from pre-exposure baseline measurements.

HA5:There is an increase in blood lead from pre-exposure baseline measurements.

Data output results from Excel Toolpak:

t-Test: Paired Two Sample for Means

Mean

Variance

Observations

Pearson Correlation

Hypothesized Mean Difference

Df

t Stat

P(T<=t) one-tail

t Critical one-tail

P(T<=t) two-tail

t Critical two-tail

Pre-Exposure

μg/dL

32.85714286

150.4583333

49

0.992236043

0

48

-1.929802563

0.029776357

1.677224196

0.059552714

2.010634758

Post-Exposure

μg/dL

33.28571429

155.5

49

There is a very slightly increase in the mean values between the two groups from 32.8

ofPre-Exposure to 33.3 of Post-Exposure.Furthermore, the p value of 0.059is greater than the

alpha of 0.05, the null hypothesis is accepted that there is no statistically significant difference in

DATA ANALYSIS SUN COAST PROJECT

8

blood lead levels between the Pre-Exposure groupand the Post-Exposure group, and the

alternative hypothesis is rejected.

ANOVA

Restate the hypotheses:

H06:There are not differences in return-on-investment between air monitoring, soil remediation,

water reclamation, and health and safety training.

HA6:There are differences in return-on-investment between air monitoring, soil remediation,

water reclamation, and health and safety training.

Data output results from Excel Toolpak:

Anova: Single Factor

SUMMARY

Groups

A = Air

B = Soil

C = Water

D = Training

Coun

t

20

20

20

20

Su

m

178

182

140

108

ANOVA

Source of Variation

Between Groups

Within Groups

SS

182.8

388.4

df

Total

571.2

Average

8.9

9.1

7

5.4

Variance

9.357894737

3.042105263

6.631578947

1.410526316

MS

F

P-value

F crit

3 60.93333333 11.92310333 1.75888E-06 2.72494392

76 5.110526316

79

There are obvious differences between average values of air monitoring (8.9), soil

remediation (9.1), water reclamation (7) and health and safety training (5.4). On the other hand,

the ANOVA p value of 1.75E-06 < 0.05 (alpha), therefore, the null hypothesis is rejected and the

alternative hypothesis is accepted that there are statistically significant differences in return-on-

DATA ANALYSIS SUN COAST PROJECT

9

investment between air monitoring, soil remediation, water reclamation and health and safety

training.

DATA ANALYSIS SUN COAST PROJECT

10

References

Creswell, J. W., & Creswell, J. D. (2018).Research design: Qualitative, quantitative, and mixed

method approaches (5th ed.). Los Angeles, CA: Sage.

Data Analysis Sun Coast Project

Nguyen Tien Thanh

ID: 280113

Columbia Southern University

DATA ANALYSIS SUN COAST PROJECT

2

Data Analysis: Correlation, Regression, t Test, and ANOVA

The Sun Coast Remediation’s data are meet assumption and appropriate for parametric

statistical procedures. For further conclusion, in this assignment we will use analysis including:

correlation analysis, simple regression analysis, multiple regression analysis, independent sample

t test, paired sample t test and ANOVA. The results, conclusions from these analysis will support

us to make right decisions.

Correlation Analysis

The hypotheses:

H01:There is not a relationship between size of PM and numbers of employee’s sick days.

HA1:There is a relationship between size of PM and numbers of employee’s sick days.

Data output results from Excel Toolpak:

mean annual sick

days per employee

Microns

Microns

mean annual sick days per

employee

1

-0.715984185

1

Regression Statistics

Multiple R

0.715984185

R Square

0.512633354

Adjusted R Square

0.507807941

Standard Error

1.327783455

Observations

103

ANOVA

Df

Regression

1

Residual

101

Total

102

SS

187.295323

9

178.063899

4

365.359223

3

MS

187.2953239

1.763008905

F

106.236175

8

Significance

F

1.89059E-17

DATA ANALYSIS SUN COAST PROJECT

Coefficients

10.0814448

3

0.52237655

4

Intercept

Microns

3

Standard

Error

0.31515696

9

31.9886464

1.16929E-54

9.456258184

0.05068126

7

-10.30709347

1.89059E-17

-0.622914554

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

10.7066314

8

0.42183855

4

Upper 95.0%

9.456258184

10.70663148

0.622914554

-0.421838554

The value of Pearson correlation coefficientr = -0.715. It meansthat particulate matter

size, as measured in microns, is strongly and negatively correlated with mean annual sick days

per employee. The value of r2=0.51, it means that 51% of the variability in employee sick days is

explained by particular matter size.

The value of p is 1.89E-17 for microns,it is smaller than the value of alpha 0.05.When the

p value is smaller than the alpha, the null hypothesis isrejected and the alternative hypothesis is

accepted that there is statistically significant relationship between particular matter size and

employee sick days.

Simple Regression Analysis

Restate the hypotheses:

H02:There is not a relationship between the safety training expenditure and the lost time hours.

HA2:There is a relationship between the safety training expenditure and the lost time hours.

Data output results from Excel Toolpak:

Regression Statistics

Multiple R

0.939559324

R Square

0.882771723

Adjusted R Square

0.882241279

ANOVA

Standard Error

24.61328875

Df

SS

Observations

223

Residual

221

1008202.10

5

133884.890

3

Total

222

1142086.996

Regression

1

Coefficients

Intercept

273.449419

Standard

Error

2.665261963

MS

1008202.105

F

1664.21068

7

Significance

F

7.6586E-105

605.8139831

t Stat

102.5975768

P-value

2.1412E-188

Lower 95%

Upper 95%

268.1968373

278.7020007

Lower 95.0%

268.1968373

Upper 95.0%

278.7020007

DATA ANALYSIS SUN COAST PROJECT

safety training

expenditure

-0.143367741

0.00351436

8

-40.79473848

4

7.6586E-105

-0.150293705

-0.136441778

-0.150293705

The value of Multiple R is0.939, close to 1, it means that there is strong correlation between

the safety training expenditureand the lost time hours. The value ofR square (R2) is 0.88 indicates

that 88% of the variation in the lost time hours is explained by the regression model. This is a

high R2.

The p value is7.65E-105smaller than the alpha value 0.05.Sothe null hypothesis is

rejected and the alternative hypothesis is accepted. There is a relationship between the safety

training expenditure and the lost time hours.

The coefficient for safety training expenditure is -0.143 indicating a negative correlation

between lost time hours and the safety training expenditure.The model can be expressed as a

predictive equation:

Y = a + bX

Lost time hours = 273.44 + (-0.143)(safety training expenditure).

Multiple Regression Analysis

Restate the hypotheses:

H03:There is not a relationship between frequency, angle in degrees, chord length,

velocity,displacement and decibel level.

HA3:There is a relationship between frequency, angle in degrees, chord length, velocity,

displacement and decibel level.

Data output results from Excel Toolpak:

Regression Statistics

Multiple R

0.601841822

R Square

0.362213579

Adjusted R Square

0.360083364

Standard Error

5.51856585

-0.136441778

DATA ANALYSIS SUN COAST PROJECT

Observations

5

1503

ANOVA

Df

SS

Regression

MS

5

25891.88784

5178.377569

Residual

1497

45590.48986

30.45456904

Total

1502

71482.3777

Coefficients

Intercept

Standard

Error

t Stat

F

170.036146

7

P-value

Significance

F

2.1289E-143

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

126.8224555

0.623820253

203.2996763

0

125.5988009

128.0461101

125.5988009

128.0461101

-0.0011169

4.7551E-05

-23.48846042

-0.001210174

-0.001023627

-0.001210174

-0.001023627

Angle in Degrees

0.047342353

0.037308069

1.268957462

-0.025839288

0.120523993

-0.025839288

0.120523993

Chord Length

Velocity (Meters

per Second)

-5.495318335

2.927962181

-1.876840613

4.0652E-104

0.20465350

1

0.06073430

9

-11.23866234

0.248025671

-11.23866234

0.248025671

0.083239634

0.009300188

8.950317436

1.02398E-18

0.064996851

0.101482417

0.064996851

0.101482417

Displacement

-240.5059086

16.51902666

-14.55932686

5.20583E-45

-272.9088041

-208.103013

-272.9088041

-208.103013

Frequency (Hz)

The value ofMultiple R is0.6reveals, it means thatthe frequency, angle in degrees, chord

length, velocity, displacement aremoderately correlated with decibel level. R square (R2) is 0.36,

it means 36% of the variability in the decibel levelexplained by frequency, angle in degrees,

chord length, velocity, displacement. This is a weak R2.

Using an alpha of 0.05 to compare with thep value of each variable:

for Frequency (Hz), a p value of 4.06E-104< 0.05, therefore, there is statistical

significance between Frequency and decibel level.

for Angle in Degrees, a p value of 0.2> 0.05,therefore, there is no statistical significance

between Angle in Degrees and decibel level.

for Chord Length, a p value of 0.06> 0.05, therefore, there is no statistical significance

between Chord Length and decibel level.

for Velocity (meters per second),a p value of 1.02E-18 < 0.05, therefore, there is

statistical significance between Velocity and decibel level.

DATA ANALYSIS SUN COAST PROJECT

6

and for Displacement, a p value of 5.2E-45 < 0.05, therefore, there is statistical

significance between Displacement and decibel level.

Summary, there is a statistically significant relationship between frequency, velocity,

displacement and decibel level. The coefficient for frequency is -0.001 and displacement is

-240.5 indicating a negative correlation between frequency, displacement and the decibel level.

The coefficient for velocity is 0.083 indicating a positive correlation between velocity and the

decibel level.

The predictive equation is expressed as following:

Y = a + b1X1 + b2X2 +…+ bnXn

Decibel level= 126.8 + (-0.001)(Frequency (Hz)) + (-240)(Displacement) +

0.083(Velocity).

Independent Sample t Test

Restate the hypotheses:

H04:The revised new employee training is not more effective than the prior training.

HA4:The revised new employee training is more effective than the prior training.

Data output results from Excel Toolpak:

t-Test: Two-Sample Assuming Unequal Variances

Mean

Variance

Observations

Hypothesized Mean Difference

Df

t Stat

P(T<=t) one-tail

t Critical one-tail

P(T<=t) two-tail

Prior

Training

69.79032258

122.004495

62

0

87

-9.666557191

9.69914E-16

1.662557349

1.93983E-15

Revised

Training

84.77419355

26.96456901

62

DATA ANALYSIS SUN COAST PROJECT

t Critical two-tail

7

1.987608282

The mean value of the Prior Training group (69.8) is lower than mean value of Revised

Training group (84.8). Besides, a pvalue has been found at 1.94E-15 smaller than the alpha of

0.05. Thus, the null hypothesis is rejected and the alternative hypothesis is accepted. Therevised

new employee training is more effective than the prior training.

Dependent Sample t Test

Restate the hypotheses:

H05:There is not an increase in blood lead level from pre-exposure baseline measurements.

HA5:There is an increase in blood lead from pre-exposure baseline measurements.

Data output results from Excel Toolpak:

t-Test: Paired Two Sample for Means

Mean

Variance

Observations

Pearson Correlation

Hypothesized Mean Difference

Df

t Stat

P(T<=t) one-tail

t Critical one-tail

P(T<=t) two-tail

t Critical two-tail

Pre-Exposure

μg/dL

32.85714286

150.4583333

49

0.992236043

0

48

-1.929802563

0.029776357

1.677224196

0.059552714

2.010634758

Post-Exposure

μg/dL

33.28571429

155.5

49

There is a very slightly increase in the mean values between the two groups from 32.8

ofPre-Exposure to 33.3 of Post-Exposure.Furthermore, the p value of 0.059is greater than the

alpha of 0.05, the null hypothesis is accepted that there is no statistically significant difference in

DATA ANALYSIS SUN COAST PROJECT

8

blood lead levels between the Pre-Exposure groupand the Post-Exposure group, and the

alternative hypothesis is rejected.

ANOVA

Restate the hypotheses:

H06:There are not differences in return-on-investment between air monitoring, soil remediation,

water reclamation, and health and safety training.

HA6:There are differences in return-on-investment between air monitoring, soil remediation,

water reclamation, and health and safety training.

Data output results from Excel Toolpak:

Anova: Single Factor

SUMMARY

Groups

A = Air

B = Soil

C = Water

D = Training

Coun

t

20

20

20

20

Su

m

178

182

140

108

ANOVA

Source of Variation

Between Groups

Within Groups

SS

182.8

388.4

df

Total

571.2

Average

8.9

9.1

7

5.4

Variance

9.357894737

3.042105263

6.631578947

1.410526316

MS

F

P-value

F crit

3 60.93333333 11.92310333 1.75888E-06 2.72494392

76 5.110526316

79

There are obvious differences between average values of air monitoring (8.9), soil

remediation (9.1), water reclamation (7) and health and safety training (5.4). On the other hand,

the ANOVA p value of 1.75E-06 < 0.05 (alpha), therefore, the null hypothesis is rejected and the

alternative hypothesis is accepted that there are statistically significant differences in return-on-

DATA ANALYSIS SUN COAST PROJECT

9

investment between air monitoring, soil remediation, water reclamation and health and safety

training.

DATA ANALYSIS SUN COAST PROJECT

10

References

Creswell, J. W., & Creswell, J. D. (2018).Research design: Qualitative, quantitative, and mixed

method approaches (5th ed.). Los Angeles, CA: Sage.

## Thế giới nghệ thuật trong tiểu thuyết của vi hồng

## Bước đầu tìm hiểu lời thoại trong văn xuôi Vi Hồng

## Bước đầu tìm hiểu lời thoại trong văn xuôi Vi Hồng.pdf

## Luận văn: THẾ GIỚI NGHỆ THUẬT TRONG TIỂU THUYẾT CỦA VI HỒNG pot

## Luận văn: BƯỚC ĐẦU TÌM HIỂU LỜI THOẠI TRONG VĂN XUÔI VI HỒNG pdf

## Đừng làm trẻ hư hỏng vì tiền pps

## ảnh hưởng của văn hóa dân gian trong một số tiểu thuyết của vi hồng

## lời văn nghệ thuật trong tiểu thuyết của vi hồng

## Đặc điểm ngôn ngữ văn xuôi Vi Hồng

## đặc điểm tiểu thuyết vi hồng

Tài liệu liên quan