Tải bản đầy đủ

MBA 5652 UNIT VI hong

Running head: DATA ANALYSIS SUN COAST PROJECT 1

Data Analysis Sun Coast Project
Nguyen Tien Thanh
ID: 280113
Columbia Southern University


DATA ANALYSIS SUN COAST PROJECT

2

Data Analysis: Correlation, Regression, t Test, and ANOVA
The Sun Coast Remediation’s data are meet assumption and appropriate for parametric
statistical procedures. For further conclusion, in this assignment we will use analysis including:
correlation analysis, simple regression analysis, multiple regression analysis, independent sample
t test, paired sample t test and ANOVA. The results, conclusions from these analysis will support
us to make right decisions.
Correlation Analysis
The hypotheses:
H01:There is not a relationship between size of PM and numbers of employee’s sick days.

HA1:There is a relationship between size of PM and numbers of employee’s sick days.
Data output results from Excel Toolpak:
mean annual sick
days per employee

Microns
Microns
mean annual sick days per
employee

1
-0.715984185

1

Regression Statistics
Multiple R
0.715984185
R Square
0.512633354
Adjusted R Square
0.507807941
Standard Error
1.327783455
Observations
103

ANOVA
Df
Regression

1

Residual

101

Total

102



SS
187.295323
9
178.063899
4
365.359223
3

MS
187.2953239
1.763008905

F
106.236175
8

Significance
F
1.89059E-17


DATA ANALYSIS SUN COAST PROJECT
Coefficients
10.0814448
3
0.52237655
4

Intercept
Microns

3

Standard
Error
0.31515696
9

31.9886464

1.16929E-54

9.456258184

0.05068126
7

-10.30709347

1.89059E-17

-0.622914554

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

10.7066314
8
0.42183855
4

Upper 95.0%

9.456258184

10.70663148

0.622914554

-0.421838554

The value of Pearson correlation coefficientr = -0.715. It meansthat particulate matter
size, as measured in microns, is strongly and negatively correlated with mean annual sick days
per employee. The value of r2=0.51, it means that 51% of the variability in employee sick days is
explained by particular matter size.
The value of p is 1.89E-17 for microns,it is smaller than the value of alpha 0.05.When the
p value is smaller than the alpha, the null hypothesis isrejected and the alternative hypothesis is
accepted that there is statistically significant relationship between particular matter size and
employee sick days.
Simple Regression Analysis
Restate the hypotheses:
H02:There is not a relationship between the safety training expenditure and the lost time hours.
HA2:There is a relationship between the safety training expenditure and the lost time hours.
Data output results from Excel Toolpak:
Regression Statistics
Multiple R
0.939559324
R Square
0.882771723
Adjusted R Square
0.882241279
ANOVA
Standard Error
24.61328875
Df
SS
Observations
223

Residual

221

1008202.10
5
133884.890
3

Total

222

1142086.996

Regression

1

Coefficients
Intercept

273.449419

Standard
Error
2.665261963

MS

1008202.105

F
1664.21068
7

Significance
F
7.6586E-105

605.8139831

t Stat
102.5975768

P-value
2.1412E-188

Lower 95%

Upper 95%

268.1968373

278.7020007

Lower 95.0%
268.1968373

Upper 95.0%
278.7020007


DATA ANALYSIS SUN COAST PROJECT
safety training
expenditure

-0.143367741

0.00351436
8

-40.79473848

4

7.6586E-105

-0.150293705

-0.136441778

-0.150293705

The value of Multiple R is0.939, close to 1, it means that there is strong correlation between
the safety training expenditureand the lost time hours. The value ofR square (R2) is 0.88 indicates
that 88% of the variation in the lost time hours is explained by the regression model. This is a
high R2.
The p value is7.65E-105smaller than the alpha value 0.05.Sothe null hypothesis is
rejected and the alternative hypothesis is accepted. There is a relationship between the safety
training expenditure and the lost time hours.
The coefficient for safety training expenditure is -0.143 indicating a negative correlation
between lost time hours and the safety training expenditure.The model can be expressed as a
predictive equation:
Y = a + bX
Lost time hours = 273.44 + (-0.143)(safety training expenditure).
Multiple Regression Analysis
Restate the hypotheses:
H03:There is not a relationship between frequency, angle in degrees, chord length,
velocity,displacement and decibel level.
HA3:There is a relationship between frequency, angle in degrees, chord length, velocity,
displacement and decibel level.
Data output results from Excel Toolpak:
Regression Statistics
Multiple R
0.601841822
R Square
0.362213579
Adjusted R Square
0.360083364
Standard Error
5.51856585

-0.136441778


DATA ANALYSIS SUN COAST PROJECT
Observations

5

1503

ANOVA
Df

SS

Regression

MS

5

25891.88784

5178.377569

Residual

1497

45590.48986

30.45456904

Total

1502

71482.3777

Coefficients
Intercept

Standard
Error

t Stat

F
170.036146
7

P-value

Significance
F
2.1289E-143

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

126.8224555

0.623820253

203.2996763

0

125.5988009

128.0461101

125.5988009

128.0461101

-0.0011169

4.7551E-05

-23.48846042

-0.001210174

-0.001023627

-0.001210174

-0.001023627

Angle in Degrees

0.047342353

0.037308069

1.268957462

-0.025839288

0.120523993

-0.025839288

0.120523993

Chord Length
Velocity (Meters
per Second)

-5.495318335

2.927962181

-1.876840613

4.0652E-104
0.20465350
1
0.06073430
9

-11.23866234

0.248025671

-11.23866234

0.248025671

0.083239634

0.009300188

8.950317436

1.02398E-18

0.064996851

0.101482417

0.064996851

0.101482417

Displacement

-240.5059086

16.51902666

-14.55932686

5.20583E-45

-272.9088041

-208.103013

-272.9088041

-208.103013

Frequency (Hz)

The value ofMultiple R is0.6reveals, it means thatthe frequency, angle in degrees, chord
length, velocity, displacement aremoderately correlated with decibel level. R square (R2) is 0.36,
it means 36% of the variability in the decibel levelexplained by frequency, angle in degrees,
chord length, velocity, displacement. This is a weak R2.
Using an alpha of 0.05 to compare with thep value of each variable:
for Frequency (Hz), a p value of 4.06E-104< 0.05, therefore, there is statistical
significance between Frequency and decibel level.
for Angle in Degrees, a p value of 0.2> 0.05,therefore, there is no statistical significance
between Angle in Degrees and decibel level.
for Chord Length, a p value of 0.06> 0.05, therefore, there is no statistical significance
between Chord Length and decibel level.
for Velocity (meters per second),a p value of 1.02E-18 < 0.05, therefore, there is
statistical significance between Velocity and decibel level.


DATA ANALYSIS SUN COAST PROJECT

6

and for Displacement, a p value of 5.2E-45 < 0.05, therefore, there is statistical
significance between Displacement and decibel level.
Summary, there is a statistically significant relationship between frequency, velocity,
displacement and decibel level. The coefficient for frequency is -0.001 and displacement is
-240.5 indicating a negative correlation between frequency, displacement and the decibel level.
The coefficient for velocity is 0.083 indicating a positive correlation between velocity and the
decibel level.
The predictive equation is expressed as following:
Y = a + b1X1 + b2X2 +…+ bnXn
Decibel level= 126.8 + (-0.001)(Frequency (Hz)) + (-240)(Displacement) +
0.083(Velocity).
Independent Sample t Test
Restate the hypotheses:
H04:The revised new employee training is not more effective than the prior training.
HA4:The revised new employee training is more effective than the prior training.
Data output results from Excel Toolpak:
t-Test: Two-Sample Assuming Unequal Variances

Mean
Variance
Observations
Hypothesized Mean Difference
Df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail

Prior
Training
69.79032258
122.004495
62
0
87
-9.666557191
9.69914E-16
1.662557349
1.93983E-15

Revised
Training
84.77419355
26.96456901
62


DATA ANALYSIS SUN COAST PROJECT
t Critical two-tail

7

1.987608282

The mean value of the Prior Training group (69.8) is lower than mean value of Revised
Training group (84.8). Besides, a pvalue has been found at 1.94E-15 smaller than the alpha of
0.05. Thus, the null hypothesis is rejected and the alternative hypothesis is accepted. Therevised
new employee training is more effective than the prior training.
Dependent Sample t Test
Restate the hypotheses:
H05:There is not an increase in blood lead level from pre-exposure baseline measurements.
HA5:There is an increase in blood lead from pre-exposure baseline measurements.
Data output results from Excel Toolpak:
t-Test: Paired Two Sample for Means

Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
Df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail

Pre-Exposure
μg/dL
32.85714286
150.4583333
49
0.992236043
0
48
-1.929802563
0.029776357
1.677224196
0.059552714
2.010634758

Post-Exposure
μg/dL
33.28571429
155.5
49

There is a very slightly increase in the mean values between the two groups from 32.8
ofPre-Exposure to 33.3 of Post-Exposure.Furthermore, the p value of 0.059is greater than the
alpha of 0.05, the null hypothesis is accepted that there is no statistically significant difference in


DATA ANALYSIS SUN COAST PROJECT

8

blood lead levels between the Pre-Exposure groupand the Post-Exposure group, and the
alternative hypothesis is rejected.
ANOVA
Restate the hypotheses:
H06:There are not differences in return-on-investment between air monitoring, soil remediation,
water reclamation, and health and safety training.
HA6:There are differences in return-on-investment between air monitoring, soil remediation,
water reclamation, and health and safety training.
Data output results from Excel Toolpak:
Anova: Single Factor
SUMMARY
Groups
A = Air
B = Soil
C = Water
D = Training

Coun
t
20
20
20
20

Su
m
178
182
140
108

ANOVA
Source of Variation
Between Groups
Within Groups

SS
182.8
388.4

df

Total

571.2

Average
8.9
9.1
7
5.4

Variance
9.357894737
3.042105263
6.631578947
1.410526316

MS
F
P-value
F crit
3 60.93333333 11.92310333 1.75888E-06 2.72494392
76 5.110526316
79

There are obvious differences between average values of air monitoring (8.9), soil
remediation (9.1), water reclamation (7) and health and safety training (5.4). On the other hand,
the ANOVA p value of 1.75E-06 < 0.05 (alpha), therefore, the null hypothesis is rejected and the
alternative hypothesis is accepted that there are statistically significant differences in return-on-


DATA ANALYSIS SUN COAST PROJECT

9

investment between air monitoring, soil remediation, water reclamation and health and safety
training.


DATA ANALYSIS SUN COAST PROJECT

10

References
Creswell, J. W., & Creswell, J. D. (2018).Research design: Qualitative, quantitative, and mixed
method approaches (5th ed.). Los Angeles, CA: Sage.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay

×