Bài tập phân tích thống kê ra quyết định
I) Topic:
1. ANALYSIS FOR DECISION MAKING
One of the investments of Philip Mahn will fall due soon, he is now considering how to invest
the return worth 30,000$. He is considering two investment schemes: entrusting such amount
of money to a mutual fund that invests in securities (MF) or investing in a certificate of
deposit with the term of oneyear (CD). The certificate of deposit ensures payment of interest
at 8%. The estimated return on an investment in a mutual fund is 16%, 9%, and 2%,
respectively depending on good, average or bad market conditions. The probability of the
market under good, average or bad market conditions is estimated at 0.1, 0.85, and 0.05,
respectively.
a. Please build a resultbased matrix for this question
b. Which investment scheme will be chosen according to maximax criteria?
c. Which investment scheme will be chosen according to maximin criteria?
d. Which investment scheme will be chosen according to the criteria of lowest regret?
e. Which investment scheme will be chosen according to EMV criteria?
f. Which investment scheme will be chosen according to EOL criteria?
g. How much will Philip be prepared to pay so as to obtain totally accurate market
forecast?
2. REGRESSION
The personnel director of a small manufacture company has collected data on salary (Y)
earned by machinists working for the company together with information on average
performance (X1) over the threeyear period, length of services (years  X2), and the number
of machines assigned (X3) (file dat919.xls).
He wishes to build a regression model to produce an estimate of average salary that
each employee may expect to receive according to the level of work completion, length of
service (years), and the number of machines assigned.
a. Please draw scatter diagrams showing the relationship between salary and
independent variables. Which type of relationship does each diagram suggest for you?
b. If the personnel director wishes to build a regression model using only one
independent variable to estimate salary, which independent variable should be used?
c. If the personnel director wishes to set up a regression model only using two
independent variables to estimate salary, which independent variables should be used?
d. Please compare Adjusted R2 indicator produced in question b and question c with
such indicator produced by the model with all of the three independent variables.
Which model will you recommend this director to use?
e. Suppose the director wishes to use a regression model with all of the three
independent variables, what is the regression equation?
II) Answers:
1. ANALYSIS FOR DECISION MAKING:
a. Building a LP model:
With an amount of 30,000$ invested in a mutual fund (MF) and certificate of deposit with the
term of one year (CD) that bears an annual interest of 8%, and return on investment in a MF
at 16%, 9% and 2% depending on good, average and bad conditions of the securities market,
with the respective probabilities of 0., 0.85 and 0.5.
We have returns on investments in a MF, CD and both of max MV & CD correspondent with
the market conditions as follows:
The return on investment of MF:
Good = 30 000$ x 16% = 4800 $
Average = = 30 000$ x 9% = 2700 $
Bad = = 30 000$ x (2)% =  600$
The return on investment of MF:
Good = 30 000$ x 8% = 2400
Average = 30 000$ x 8% = 2400
Bad = 30 000$ x 8% = 2400
2
The Max return on investment of both MF & CD:
Max good(MF,MD) = Max(4800,2400) = 4800
Max average(MF,MD) = Max(2700,2400) = 2700
Max bad(MF,MD) = Max((600),2400) = 2400
We have investment pay off matrix as follows:
b. Which investment scheme will be chosen according to maximax criteria :
Status nature
Description
MF
CD
Max(State of Nature)
Investment
Probability
MF r(%)
CD r(%)
Good
Average
Bad
4,800
2,400
4,800
30,000
0.10
0.16
0.08
2,700
2,400
2,700
(600)
2,400
2,400
0.85
0.09
0.08
0.05
(0.02)
0.08
3
Decision
Analysis
following
principles of
Maximax
4,800
2,400
4,800
MAX MF = MAX MF(Good, Average, Bad) = MAX MF(4800,2700,(600) = 4800
MAX CD = MAX CD(Good, Average, Bad) = MAX CD(2400, 2400, 2400) = 2400
MAX (MF,CD) = MAX(4800,2400) = 4800
According to this criteria, Philip Mahn will invest in a MF. This investment produces a high
gain if he makes an accurate estimate, conversely, if he makes a wrong estimate, he will make
substantial loss (under good market conditions, he will earn a profit of 4,800$. Conversely,
under bad market conditions, this investment will incur a loss of 600$).
Conclusion: MF Investment scheme will be chosen according to maximax criteria.
c. Which investment scheme will be chosen according to maximin criteria:
Status nature
Description
MF
CD
Max(State of Nature)
Investment
Probability
MF r(%)
CD r(%)
Good
Average
Bad
4,800
2,400
4,800
30,000
0.10
0.16
0.08
2,700
2,400
2,700
(600)
2,400
2,400
0.85
0.09
0.08
0.05
(0.02)
0.08
Decision
Analysis
following
principles of
Maximin
(600)
2,400
2,400
MIN MF = MIN MF(Good, Average, Bad) = MIN MF(4800,2700,(600))
= (600)
MIN CD = MIN CD(Good, Average, Bad) = MIN CD(2400, 2400, 2400) = 2400
MAX (MF,CD) = MAX((600),2400) = 2400
According to this criteria, Philip Mahn will invest in CD, this investment is relatively safer
although it produces a lower profit as compared to maximax criteria (2,400<4,800).
Conclusion: CD Investment scheme will be chosen according to maximin criteria
d. Which investment scheme will be chosen according to minimax regret criteria :
4
D
escription
MF
CD
Max(State of
Nature)
Investment
Probablity
MF r(%)
CD r(%)
Decision Analysis following
principles of Maximax regret
Status nature
Good
Average
Bad
Good Average
4,800
2,400
2,700
2,400
(600)
2,400 2,400
4,800
2,700
2,400
30,000
0.10
0.16
0.08
0.85
0.09
0.08
0.05
(0.02)
0.08
300
Bad
Min
3,000

3,000
2,400
2,400
Regret level is: profit produced on investment in a certain area (MF or CD) under particular
market conditions (good, average and bad) is less than the highest profit produced under
respective market conditions.
We have a regret level when investing in a MF under good market conditions as follows:
Max regret good MF = Max good(MF,CD) – Max good MF = 48004800 = 0
Max regredgood MD = Max good(MF,CD) – Max good MD = 4800  2400 = 2400
Beside that, we have the regret when investment in a MF, CD under avarage market
conditions as follows:
Max regert Average MF = Max Average(MF,CD) – Max Average MF = 27002700 = 0
Max regretAverage MD = Max Average(MF,CD) – Max AverageMD = 2700 2400 = 300
And, we have a regret when investment in a MF, CD under bad market conditions as follows:
Max regret bad MF = Max bad(MF,CD) – Max bad MF = 2400(600) = 3000
Max regret bad MD = Max bad(MF,CD) – Max bad MD = 24002400 = 0
We have a regret matrix as follows:
Minimax regret
Good
Average
Bad
Decision
Max
5
0
0
3,000
3,000
2,400
300
0
2,400
Minimax
2,400
According to this criteria, Philip Mahn will invest in CD if the wrong investment decision
results in a less profit than the highest profit produced by the two types of investments with
the highest value of 2,400$.
Conclusion: CD Investment scheme will be chosen according to minimax regret criteria
e. Which investment scheme will be chosen according to EMV criteria:
As known that, EMV is the scheme with the highest expected monetary value, and it is mean
value that we earn. This is a risky scheme which is subject to the decision maker’s option.
The EMV of an investment is the mean value of profits gained under market conditions.
Status nature
Description
MF
CD
Max(State of Nature)
Investment
Probability
MF r(%)
MD r (%)
Good
Average
Bad
4,800
2,400
4,800
30,000
0.10
0.16
0.08
2,700
2,400
2,700
(600)
2,400
2,400
0.85
0.09
0.08
0.05
(0.02)
0.08
Decision
Analysis
following
criteria of EMV
2,745
2,400
2,745
EMV MF = ∑ Lợi nhuận MF (Good, Trung bình, xấu) x Xác suất MF(Good, Trung bình, xấu)
= (4800 x 0.1) + (2700 x 0.85) + ((600) x 0.05) = 2745
EMV CD = ∑ Lợi nhuận CD (Good, Trung bình, xấu) x Xác suất CD(Good, Trung bình, xấu)
= (2400 x 0.1) + (2400 x 0.85) + (2400 x 0.05) = 2400
MAX EMV (MF,CD) = Max (2745, 2400) = 2745
Conclusion: MF investment scheme will be chosen according to EMV criteria
f. Which investment scheme will be chosen according to EOL criteria:
6
Decision Analysis following
criteria of EOL
EOL
Minimax Regret
Status nature
Description
Good
MF
CD
Max(State of
Nature)
Investment
Probablity
MF r(%)
MD r(%)
Average
Bad
4,800
2,400
2,700
2,400
(600)
2,400
Good
2,400
4,800
2,700
2,400

30,000
0.10
0.16
0.08
0.85
0.09
0.08
0.05
(0.02)
0.08
Average
300
Bad
3,000

150
495
150
EOL is the criteria for selecting a scheme with the lowest regret or opportunity loss.
The value of opportunity loss is the mean of regret probability values occuring under each
market condition.
We have: EOL MF = ∑ Maximin regret MF (Good, avarage, bad) * probability MF(Good, avarage, bad)
= (0 x 0.1) + (0 x 0.85) + (3000 x 0.05) = 150 $
EOL CD = ∑ Maximin regret CD (Good, avarage, bad) * probability CD(Good, avarage, bad)
= ∑(2400 x 0.1) + (300 x 0.85) + (0 x 0.05) = 495 $
MIN EOL (MF,CD) = MIN (150, 495) = 150 $
Conclusion: according to EOL criteria, the MF investment scheme will be chosen
because of having regret or opportunity loss lowest with 150$ and having EMV
largest with 2745 $.
g. How much will Philip be prepared to pay so as to obtain totally accurate market
forecast?
State of nature
Description
Good
Average
MF
4,800
2,700
600
2,745
CD
Max (state of nature):
2,400
2,400
2,400
2,400
2,400
2,745
7
Bad
Decision
analysic
EMV
4,800
2,700
Investment:
30,000
Probability
0.1
0.85
0.05
0.16
0.08
0.09
0.08
0.02
0.08
MFt (%)
CD r (%)
When no information is available EMV0 = 2,745
When totally accurate forecast is available,
EMV = Max(good)x0.1+Max(Aver.)x0.85+ Max(Bad)x0.05=
= 4,800x0.1+2,700x0.85+ 2,400x0.05= 2,895
Expected value of perfect information: 2,895 2,745= 150$
Conclusion: Philip would be willing to pay $ 150 to get market forecast information is
100% accurate
2. REGRESSION:
We call:
Y is salary.
X1 is average performance.
X2 is years of working.
X3
is
the
number
of
machines
(certifications).
We have the salary data as follow:
Obs
1
2
3
4
5
6
Y
X1
X2
X3
Salary
Avg Perf.
Years
Certifi.
3.50
5.30
5.10
5.80
4.20
6.00
9
20
18
33
31
13
6
6
7
7
8
6
48.20
55.30
53.70
61.80
56.40
52.50
8
assigned
7
54.00
6.80
25
6
8
55.70
5.50
30
4
9
45.10
3.10
5
6
10
67.90
7.20
47
8
11
53.20
4.50
25
5
12
46.80
4.90
11
6
13
58.30
8.00
23
8
14
59.10
6.50
35
7
15
57.80
6.60
39
5
16
48.60
3.70
21
4
17
49.20
6.20
7
6
18
63.00
7.00
40
7
19
53.00
4.00
35
6
20
50.90
4.50
23
4
21
55.40
5.90
33
5
22
51.80
5.60
27
4
23
60.20
4.80
34
8
24
50.10
3.90
15
5
a. Draw a scatter diagram. Which type of relationship does each diagram suggest?

A scatter diagram shows the correlation between salary and average performance:
We have a diagram as shown below:
CORRELATION BETWEEN SALARY AND PERFORMANCE

The scatter diagram represents the correlation between salary and length of service
(years of working):
We have a diagram as shown below:
9
CORRELATION BETWEEN SALARY AND YEARS OF
WORKING

The scatter diagram represents the correlation between salary and the number of
machines assigned (certifications):
We have a diagram as below:
CORRELATION BETWEEN SALARY AND
CERTIFICATION
In general, among the three estimated regression lines, the regression line representing the
relationship between salary and years of working has the highest precision.
10
b. If the personnel director wishes to set up a regression model only using one
independent variables to estimate salary, which independent variable should be
used?
Y = f(X1, X2, X3) +€

The regression model between wages and an independent variable Avg Salary and
perf (the working ranking)
Y = f(X1)
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.667096
R Square
0.445017
Adjusted R Square
0.41979
Standard Error
4.169847
Observations
24
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1

1
22
23
SS
306.7323
382.5277
689.26
Coefficient
s
39.34766
2.827808
Standard
Error
3.706664
0.673271
MS
306.7323
17.38762
F
17.64084
Significanc
eF
0.00037
t Stat
10.61538
4.200101
Pvalue
4.03E10
0.00037
Lower 95%
31.66051
1.431528
Upper 95%
47.03481
4.224088
Regression model between wages and an independent variable Salary and Years
(years of experience)
Y = f(X2)
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.858558
R Square
0.737122
Adjusted R Square
0.725173
Standard Error
2.869837
Observations
24
ANOVA
df
Regression
Residual
1
22
SS
508.0688
181.1912
11
MS
508.0688
8.235962
F
61.68907
Significanc
eF
8E08
Total
Intercept
X Variable 1

23
689.26
Coefficient
s
44.04785
0.418784
Standard Error
1.453995
0.053319
t Stat
30.29435
7.854239
Pvalue
1.97E19
8E08
Lower 95%
41.03245
0.308206
Upper 95%
47.06325
0.529362
Regression model between wages and an independent variable Salary and
Certification (Some machines charge)
Y = f(X3)
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.558288
R Square
0.311685
Adjusted R Square
0.280398
Standard Error
4.643802
Observations
24
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 3
1
22
23
SS
214.8323
474.4278
689.26
Coefficient
s
40.595
2.3175
Standard
Error
4.506323
0.73425
MS
214.8323
21.5649
F
9.962127
Significanc
eF
0.00458
t Stat
9.008454
3.156284
Pvalue
7.79E09
0.00458
Lower 95%
31.24946
0.79476
Upper
95%
49.94054
3.84024
With 03 wage regression model and ranked above work, years of experience, we have
undertaken several machines:
 YX1 có R2X1 = 0.445
 YX2 có R2X2 = 0.737
 YX3 có R2X3 = 0.311
In the case of HR managers want to build a regression model using only one
independent variable to predict future wages, independent variables should be used is
the number of years of experience X2
c. If the personnel director wishes to set up a regression model only using two
independent variables to estimate salary, which independent variables should be
used:
12
Regression model using only two independent variables X1, X2 to predict wages
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.909801
R Square
0.827738
Adjusted R Square
0.811332
Standard Error
2.377806
Observations
24
ANOVA
2
21
23
SS
570.5268
118.7332
689.26
MS
285.2634
5.653963
F
50.4537
Significanc
eF
9.55E09
Coefficient
s
38.25083
1.443021
0.341248
Standard Error
2.119773
0.434165
0.049959
t Stat
18.04478
3.323666
6.83056
Pvalue
2.9E14
0.003226
9.4E07
Lower 95%
33.84252
0.540124
0.237353
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
Upper 95%
42.65914
2.345917
0.445144
Regression model using only two independent variables X2, X3 to predict wages
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.931807362
R Square
0.868264959
Adjusted R Square
0.855718765
Standard Error
2.079373694
Observations
24
ANOVA
2
21
23
SS
598.4603
90.79969
689.26
MS
299.2302
4.323795
F
69.20544
Significanc
eF
5.71E10
Coefficients
35.84885269
Standard Error
2.079774
t Stat
17.2369
Pvalue
7.17E14
Lower 95%
31.52373
df
Regression
Residual
Total
Intercept
13
Upper
95%
40.17398
X Variable 1
X Variable 2
0.374942513
1.548867849
0.039805
0.338753
9.419387
4.572263
5.46E09
0.000165
0.292163
0.844392
0.457722
2.253343
The regression model uses only two independent variables X1, X3 to predict wages
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.754612825
R Square
0.569440516
Adjusted R Square
0.528434851
Standard Error
3.759226299
Observations
24
ANOVA
2
21
23
SS
392.4926
296.7674
689.26
MS
196.2463
14.13178
F
13.88687
Significance
F
0.000144
Coefficients
32.89954553
2.288043945
1.556725388
Standard Error
4.244763
0.645309
0.631928
t Stat
7.75062
3.545657
2.463454
Pvalue
1.36E07
0.001914
0.022481
Lower 95%
24.07208
0.946051
0.24256
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
Upper 95%
41.72701
3.630037
2.870891
Table summarizes the R2 and adjusted R2 of the model on two variables:
Hệ số R2
Hệ số R2 điều chỉnh
1
Y = f(X1,X2)
b1: 1,44 (t Stat : 3.32)***
b2: 0,34 (t Stat : 6.83)***
0,827
0,811
2
Y = f(X2,X3)***
b1: 0,37 (t Stat : 0.44)***
b2: 1,54 (t Stat : 4.55)***
0,868
0,855
3
Y = f(X1,X3)
b1: 2,28 (t Stat : 3.54)***
b2: 1,55 (t Stat : 2.46)**
0,569
0,528
SốTT
Các mô hình hai biến
So that, In the case of HR managers want to build a regression model using only two
independent variables to predict wages, two independent variables should be used is (X2, X3)
14
d. Please compare Adjusted R2 indicator produced in question a and question b
with such indicator produced by the model with all of the three independent
variables. Which model will you recommend this director to use?
SUMMARY
OUTPUT
Regression Statistics
Multiple R
0.955797409
R Square
0.913548688
Adjusted R Square
0.900580991
Standard Error
1.726085624
Observations
24
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
X Variable 3
3
20
23
SS
629.6725683
59.58743165
689.26
Coefficients
32.92115589
1.057787097
0.325173417
1.299180285
Standard
Error
1.949026211
0.326811986
0.036445032
0.291588118
MS
209.8909
2.979372
F
70.44803
Significance F
8.28E11
t Stat
16.89108
3.236684
8.922297
4.455532
Pvalue
2.64E13
0.004135
2.08E08
0.000243
Lower 95%
28.85556
0.376069
0.24915
0.690938
Upper 95%
36.98675
1.739505
0.401196
1.907422
We have the following table:
SốTT
The regression model
Coefficient R2
coefficient R2
adjusted
b
Y = f(X2,)***
b1: 0,37 (t Stat : 0.44)***
0,737
0,725
c
Y = f(X2,X3)***
b1: 0,37 (t Stat : 0.44)***
b2: 1,54 (t Stat : 4.55)***
0,868
0,855
a
Y = f(X1,X2 X3)
b1: 1,05 (t Stat : 3.23)***
b2: 0,32 (t Stat : 8.92)***
b3: 1,29 (t Stat : 4.45)***
0,913
0,900
15
Through compare the Adjusted R2 obtained in sentence b and c with the only question
which of both models have three independent variables. The HR director should be
proposed model uses three independent variables because coefficient R2 and adjusted
R2 coefficient is 0913 and 0900 have the highest reliability of the model in the table
above.
e. Suppose the director wishes to use a regression model with all of the three
independent variables, what is the regression equation?
Based on the results of running regression models, we have an equation as follows:
Y = bo + b1X1 + b2X2 + b3X3
Y = 32,92 + 1,05X1 + 0,32X2 + 1.29X3
THE END
16