# Bài tập phân tích thống kê ra quyết định e

Bài tập phân tích thống kê ra quyết định

I) Topic:
1. ANALYSIS FOR DECISION MAKING
One of the investments of Philip Mahn will fall due soon, he is now considering how to invest
the return worth 30,000\$. He is considering two investment schemes: entrusting such amount
of money to a mutual fund that invests in securities (MF) or investing in a certificate of
deposit with the term of one-year (CD). The certificate of deposit ensures payment of interest
at 8%. The estimated return on an investment in a mutual fund is 16%, 9%, and -2%,
respectively depending on good, average or bad market conditions. The probability of the
market under good, average or bad market conditions is estimated at 0.1, 0.85, and 0.05,
respectively.
a. Please build a result-based matrix for this question
b. Which investment scheme will be chosen according to maximax criteria?
c. Which investment scheme will be chosen according to maximin criteria?
d. Which investment scheme will be chosen according to the criteria of lowest regret?
e. Which investment scheme will be chosen according to EMV criteria?
f. Which investment scheme will be chosen according to EOL criteria?
g. How much will Philip be prepared to pay so as to obtain totally accurate market
forecast?
2. REGRESSION

The personnel director of a small manufacture company has collected data on salary (Y)
earned by machinists working for the company together with information on average
performance (X1) over the three-year period, length of services (years - X2), and the number
of machines assigned (X3) (file dat9-19.xls).
He wishes to build a regression model to produce an estimate of average salary that
each employee may expect to receive according to the level of work completion, length of
service (years), and the number of machines assigned.

a. Please draw scatter diagrams showing the relationship between salary and
independent variables. Which type of relationship does each diagram suggest for you?
b. If the personnel director wishes to build a regression model using only one
independent variable to estimate salary, which independent variable should be used?
c. If the personnel director wishes to set up a regression model only using two
independent variables to estimate salary, which independent variables should be used?
d. Please compare Adjusted R2 indicator produced in question b and question c with
such indicator produced by the model with all of the three independent variables.
Which model will you recommend this director to use?
e. Suppose the director wishes to use a regression model with all of the three
independent variables, what is the regression equation?
1. ANALYSIS FOR DECISION MAKING:
a. Building a LP model:
With an amount of 30,000\$ invested in a mutual fund (MF) and certificate of deposit with the
term of one year (CD) that bears an annual interest of 8%, and return on investment in a MF
at 16%, 9% and -2% depending on good, average and bad conditions of the securities market,
with the respective probabilities of 0., 0.85 and 0.5.
We have returns on investments in a MF, CD and both of max MV & CD correspondent with
the market conditions as follows:
The return on investment of MF:
 Good = 30 000\$ x 16% = 4800 \$
 Average = = 30 000\$ x 9% = 2700 \$
 Bad = = 30 000\$ x (2)% = - 600\$
The return on investment of MF:
 Good = 30 000\$ x 8% = 2400
 Average = 30 000\$ x 8% = 2400
 Bad = 30 000\$ x 8% = 2400
2

The Max return on investment of both MF & CD:
 Max good(MF,MD) = Max(4800,2400) = 4800
 Max average(MF,MD) = Max(2700,2400) = 2700
 Max bad(MF,MD) = Max((600),2400) = 2400
We have investment pay off matrix as follows:

b. Which investment scheme will be chosen according to maximax criteria :

Status nature
Description

MF
CD
Max(State of Nature)
Investment
Probability
MF r(%)
CD r(%)

Good

Average

4,800
2,400
4,800
30,000
0.10
0.16
0.08

2,700
2,400
2,700

(600)
2,400
2,400

0.85
0.09
0.08

0.05
(0.02)
0.08

3

Decision
Analysis
following
principles of
Maximax
4,800
2,400
4,800

 MAX MF = MAX MF(Good, Average, Bad) = MAX MF(4800,2700,(600) = 4800
 MAX CD = MAX CD(Good, Average, Bad) = MAX CD(2400, 2400, 2400) = 2400
 MAX (MF,CD) = MAX(4800,2400) = 4800

According to this criteria, Philip Mahn will invest in a MF. This investment produces a high
gain if he makes an accurate estimate, conversely, if he makes a wrong estimate, he will make
substantial loss (under good market conditions, he will earn a profit of 4,800\$. Conversely,
under bad market conditions, this investment will incur a loss of 600\$).
Conclusion: MF Investment scheme will be chosen according to maximax criteria.
c. Which investment scheme will be chosen according to maximin criteria:

Status nature
Description

MF
CD
Max(State of Nature)
Investment
Probability
MF r(%)
CD r(%)

Good

Average

4,800
2,400
4,800
30,000
0.10
0.16
0.08

2,700
2,400
2,700

(600)
2,400
2,400

0.85
0.09
0.08

0.05
(0.02)
0.08

Decision
Analysis
following
principles of
Maximin
(600)
2,400
2,400

 MIN MF = MIN MF(Good, Average, Bad) = MIN MF(4800,2700,(600))
= (600)
 MIN CD = MIN CD(Good, Average, Bad) = MIN CD(2400, 2400, 2400) = 2400
 MAX (MF,CD) = MAX((600),2400) = 2400
According to this criteria, Philip Mahn will invest in CD, this investment is relatively safer
although it produces a lower profit as compared to maximax criteria (2,400<4,800).
Conclusion: CD Investment scheme will be chosen according to maximin criteria
d. Which investment scheme will be chosen according to minimax regret criteria :

4

D
escription
MF
CD
Max(State of
Nature)
Investment
Probablity
MF r(%)
CD r(%)

Decision Analysis following
principles of Maximax regret

Status nature
Good

Average

Good Average

4,800
2,400

2,700
2,400

(600)
2,400 2,400

4,800

2,700

2,400

30,000
0.10
0.16
0.08

0.85
0.09
0.08

0.05
(0.02)
0.08

300

Min

3,000
-

3,000
2,400
2,400

Regret level is: profit produced on investment in a certain area (MF or CD) under particular
market conditions (good, average and bad) is less than the highest profit produced under
respective market conditions.
We have a regret level when investing in a MF under good market conditions as follows:
 Max regret good MF = Max good(MF,CD) – Max good MF = 4800-4800 = 0
 Max regredgood MD = Max good(MF,CD) – Max good MD = 4800 - 2400 = 2400
Beside that, we have the regret when investment in a MF, CD under avarage market
conditions as follows:
 Max regert Average MF = Max Average(MF,CD) – Max Average MF = 2700-2700 = 0
 Max regretAverage MD = Max Average(MF,CD) – Max AverageMD = 2700 -2400 = 300
And, we have a regret when investment in a MF, CD under bad market conditions as follows:
We have a regret matrix as follows:
Minimax regret
Good

Average

Decision
Max

5

0

0

3,000

3,000

2,400

300

0

2,400

Minimax

2,400

According to this criteria, Philip Mahn will invest in CD if the wrong investment decision
results in a less profit than the highest profit produced by the two types of investments with
the highest value of 2,400\$.
Conclusion: CD Investment scheme will be chosen according to minimax regret criteria
e. Which investment scheme will be chosen according to EMV criteria:
As known that, EMV is the scheme with the highest expected monetary value, and it is mean
value that we earn. This is a risky scheme which is subject to the decision maker’s option.
The EMV of an investment is the mean value of profits gained under market conditions.
Status nature
Description

MF
CD
Max(State of Nature)
Investment
Probability
MF r(%)
MD r (%)

Good

Average

4,800
2,400
4,800
30,000
0.10
0.16
0.08

2,700
2,400
2,700

(600)
2,400
2,400

0.85
0.09
0.08

0.05
(0.02)
0.08

Decision
Analysis
following
criteria of EMV
2,745
2,400
2,745

EMV MF = ∑ Lợi nhuận MF (Good, Trung bình, xấu) x Xác suất MF(Good, Trung bình, xấu)
= (4800 x 0.1) + (2700 x 0.85) + ((600) x 0.05) = 2745
EMV CD = ∑ Lợi nhuận CD (Good, Trung bình, xấu) x Xác suất CD(Good, Trung bình, xấu)
= (2400 x 0.1) + (2400 x 0.85) + (2400 x 0.05) = 2400
MAX EMV (MF,CD) = Max (2745, 2400) = 2745
Conclusion: MF investment scheme will be chosen according to EMV criteria
f. Which investment scheme will be chosen according to EOL criteria:

6

Decision Analysis following
criteria of EOL
EOL
Minimax Regret

Status nature
Description
Good
MF
CD
Max(State of
Nature)
Investment
Probablity
MF r(%)
MD r(%)

Average

4,800
2,400

2,700
2,400

(600)
2,400

Good
2,400

4,800

2,700

2,400

-

30,000
0.10
0.16
0.08

0.85
0.09
0.08

0.05
(0.02)
0.08

Average
300

3,000
-

150
495
150

EOL is the criteria for selecting a scheme with the lowest regret or opportunity loss.
The value of opportunity loss is the mean of regret probability values occuring under each
market condition.
We have: EOL MF = ∑ Maximin regret MF (Good, avarage, bad) * probability MF(Good, avarage, bad)
= (0 x 0.1) + (0 x 0.85) + (3000 x 0.05) = 150 \$
EOL CD = ∑ Maximin regret CD (Good, avarage, bad) * probability CD(Good, avarage, bad)
= ∑(2400 x 0.1) + (300 x 0.85) + (0 x 0.05) = 495 \$
MIN EOL (MF,CD) = MIN (150, 495) = 150 \$
Conclusion: according to EOL criteria, the MF investment scheme will be chosen
because of having regret or opportunity loss lowest with 150\$ and having EMV
largest with 2745 \$.
g. How much will Philip be prepared to pay so as to obtain totally accurate market
forecast?

State of nature

Description

Good

Average

MF

4,800

2,700

-600

2,745

CD
Max (state of nature):

2,400

2,400

2,400
2,400

2,400
2,745

7

Decision
analysic
EMV

4,800

2,700

Investment:

30,000

Probability

0.1

0.85

0.05

0.16
0.08

0.09
0.08

-0.02
0.08

MFt (%)
CD r (%)

When no information is available EMV0 = 2,745
When totally accurate forecast is available,
= 4,800x0.1+2,700x0.85+ 2,400x0.05= 2,895
Expected value of perfect information: 2,895- 2,745= 150\$
Conclusion: Philip would be willing to pay \$ 150 to get market forecast information is
100% accurate
2. REGRESSION:
We call:
 Y is salary.
 X1 is average performance.
 X2 is years of working.
 X3

is

the

number

of

machines

(certifications).
We have the salary data as follow:
Obs
1
2
3
4
5
6

Y

X1

X2

X3

Salary

Avg Perf.

Years

Certifi.

3.50
5.30
5.10
5.80
4.20
6.00

9
20
18
33
31
13

6
6
7
7
8
6

48.20
55.30
53.70
61.80
56.40
52.50

8

assigned

7
54.00
6.80
25
6
8
55.70
5.50
30
4
9
45.10
3.10
5
6
10
67.90
7.20
47
8
11
53.20
4.50
25
5
12
46.80
4.90
11
6
13
58.30
8.00
23
8
14
59.10
6.50
35
7
15
57.80
6.60
39
5
16
48.60
3.70
21
4
17
49.20
6.20
7
6
18
63.00
7.00
40
7
19
53.00
4.00
35
6
20
50.90
4.50
23
4
21
55.40
5.90
33
5
22
51.80
5.60
27
4
23
60.20
4.80
34
8
24
50.10
3.90
15
5
a. Draw a scatter diagram. Which type of relationship does each diagram suggest?
-

A scatter diagram shows the correlation between salary and average performance:
We have a diagram as shown below:
CORRELATION BETWEEN SALARY AND PERFORMANCE

-

The scatter diagram represents the correlation between salary and length of service
(years of working):
We have a diagram as shown below:
9

CORRELATION BETWEEN SALARY AND YEARS OF
WORKING

-

The scatter diagram represents the correlation between salary and the number of
machines assigned (certifications):

We have a diagram as below:
CORRELATION BETWEEN SALARY AND
CERTIFICATION

In general, among the three estimated regression lines, the regression line representing the
relationship between salary and years of working has the highest precision.

10

b. If the personnel director wishes to set up a regression model only using one
independent variables to estimate salary, which independent variable should be
used?
Y = f(X1, X2, X3) +€
-

The regression model between wages and an independent variable Avg Salary and
perf (the working ranking)
Y = f(X1)
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.667096
R Square
0.445017
0.41979
Standard Error
4.169847
Observations
24
ANOVA
df
Regression
Residual
Total

Intercept
X Variable 1

-

1
22
23

SS
306.7323
382.5277
689.26

Coefficient
s
39.34766
2.827808

Standard
Error
3.706664
0.673271

MS
306.7323
17.38762

F
17.64084

Significanc
eF
0.00037

t Stat
10.61538
4.200101

P-value
4.03E-10
0.00037

Lower 95%
31.66051
1.431528

Upper 95%
47.03481
4.224088

Regression model between wages and an independent variable Salary and Years
(years of experience)
Y = f(X2)
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.858558
R Square
0.737122
0.725173
Standard Error
2.869837
Observations
24
ANOVA
df
Regression
Residual

1
22

SS
508.0688
181.1912
11

MS
508.0688
8.235962

F
61.68907

Significanc
eF
8E-08

Total

Intercept
X Variable 1

-

23

689.26

Coefficient
s
44.04785
0.418784

Standard Error
1.453995
0.053319

t Stat
30.29435
7.854239

P-value
1.97E-19
8E-08

Lower 95%
41.03245
0.308206

Upper 95%
47.06325
0.529362

Regression model between wages and an independent variable Salary and
Certification (Some machines charge)
Y = f(X3)

SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.558288
R Square
0.311685
0.280398
Standard Error
4.643802
Observations
24
ANOVA
df
Regression
Residual
Total

Intercept
X Variable 3

1
22
23

SS
214.8323
474.4278
689.26

Coefficient
s
40.595
2.3175

Standard
Error
4.506323
0.73425

MS
214.8323
21.5649

F
9.962127

Significanc
eF
0.00458

t Stat
9.008454
3.156284

P-value
7.79E-09
0.00458

Lower 95%
31.24946
0.79476

Upper
95%
49.94054
3.84024

With 03 wage regression model and ranked above work, years of experience, we have
undertaken several machines:
- YX1 có R2X1 = 0.445
- YX2 có R2X2 = 0.737
- YX3 có R2X3 = 0.311
In the case of HR managers want to build a regression model using only one
independent variable to predict future wages, independent variables should be used is
the number of years of experience X2
c. If the personnel director wishes to set up a regression model only using two
independent variables to estimate salary, which independent variables should be
used:
12

-Regression model using only two independent variables X1, X2 to predict wages
SUMMARY
OUTPUT
Regression Statistics
Multiple R

0.909801

R Square

0.827738

0.811332

Standard Error

2.377806

Observations

24

ANOVA

2
21
23

SS
570.5268
118.7332
689.26

MS
285.2634
5.653963

F
50.4537

Significanc
eF
9.55E-09

Coefficient
s
38.25083
1.443021
0.341248

Standard Error
2.119773
0.434165
0.049959

t Stat
18.04478
3.323666
6.83056

P-value
2.9E-14
0.003226
9.4E-07

Lower 95%
33.84252
0.540124
0.237353

df
Regression
Residual
Total

Intercept
X Variable 1
X Variable 2

Upper 95%
42.65914
2.345917
0.445144

-Regression model using only two independent variables X2, X3 to predict wages
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.931807362
R Square
0.868264959
0.855718765
Standard Error
2.079373694
Observations
24
ANOVA

2
21
23

SS
598.4603
90.79969
689.26

MS
299.2302
4.323795

F
69.20544

Significanc
eF
5.71E-10

Coefficients
35.84885269

Standard Error
2.079774

t Stat
17.2369

P-value
7.17E-14

Lower 95%
31.52373

df
Regression
Residual
Total

Intercept

13

Upper
95%
40.17398

X Variable 1
X Variable 2

0.374942513
1.548867849

0.039805
0.338753

9.419387
4.572263

5.46E-09
0.000165

0.292163
0.844392

0.457722
2.253343

-The regression model uses only two independent variables X1, X3 to predict wages
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.754612825
R Square
0.569440516
0.528434851
Standard Error
3.759226299
Observations
24
ANOVA

2
21
23

SS
392.4926
296.7674
689.26

MS
196.2463
14.13178

F
13.88687

Significance
F
0.000144

Coefficients
32.89954553
2.288043945
1.556725388

Standard Error
4.244763
0.645309
0.631928

t Stat
7.75062
3.545657
2.463454

P-value
1.36E-07
0.001914
0.022481

Lower 95%
24.07208
0.946051
0.24256

df
Regression
Residual
Total

Intercept
X Variable 1
X Variable 2

Upper 95%
41.72701
3.630037
2.870891

Table summarizes the R2 and adjusted R2 of the model on two variables:
Hệ số R2

Hệ số R2 điều chỉnh

1

Y = f(X1,X2)
b1: 1,44 (t Stat : 3.32)***
b2: 0,34 (t Stat : 6.83)***

0,827

0,811

2

Y = f(X2,X3)***
b1: 0,37 (t Stat : 0.44)***
b2: 1,54 (t Stat : 4.55)***

0,868

0,855

3

Y = f(X1,X3)
b1: 2,28 (t Stat : 3.54)***
b2: 1,55 (t Stat : 2.46)**

0,569

0,528

SốTT

Các mô hình hai biến

So that, In the case of HR managers want to build a regression model using only two
independent variables to predict wages, two independent variables should be used is (X2, X3)

14

d. Please compare Adjusted R2 indicator produced in question a and question b
with such indicator produced by the model with all of the three independent
variables. Which model will you recommend this director to use?
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.955797409
R Square
0.913548688
0.900580991
Standard Error
1.726085624
Observations
24
ANOVA
df
Regression
Residual
Total

Intercept
X Variable 1
X Variable 2
X Variable 3

3
20
23

SS
629.6725683
59.58743165
689.26

Coefficients
32.92115589
1.057787097
0.325173417
1.299180285

Standard
Error
1.949026211
0.326811986
0.036445032
0.291588118

MS
209.8909
2.979372

F
70.44803

Significance F
8.28E-11

t Stat
16.89108
3.236684
8.922297
4.455532

P-value
2.64E-13
0.004135
2.08E-08
0.000243

Lower 95%
28.85556
0.376069
0.24915
0.690938

Upper 95%
36.98675
1.739505
0.401196
1.907422

We have the following table:
SốTT

The regression model

Coefficient R2

coefficient R2

b

Y = f(X2,)***
b1: 0,37 (t Stat : 0.44)***

0,737

0,725

c

Y = f(X2,X3)***
b1: 0,37 (t Stat : 0.44)***
b2: 1,54 (t Stat : 4.55)***

0,868

0,855

a

Y = f(X1,X2 X3)
b1: 1,05 (t Stat : 3.23)***
b2: 0,32 (t Stat : 8.92)***
b3: 1,29 (t Stat : 4.45)***

0,913

0,900

15

Through compare the Adjusted R2 obtained in sentence b and c with the only question
which of both models have three independent variables. The HR director should be
proposed model uses three independent variables because coefficient R2 and adjusted
R2 coefficient is 0913 and 0900 have the highest reliability of the model in the table
above.
e. Suppose the director wishes to use a regression model with all of the three
independent variables, what is the regression equation?
Based on the results of running regression models, we have an equation as follows:
Y = bo + b1X1 + b2X2 + b3X3
Y = 32,92 + 1,05X1 + 0,32X2 + 1.29X3

-THE END-

16

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay

×