1
MINISTRY OF EDUCATION AND TRAINING
MINISTRY OF NATIONAL DEFENCE
ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY
VU DINH MINH
RESEARCH ON DEVELOPMENT OF SEMISUPERVISED
CLUSTERING ALGORITHMS USING FUZZY MINMAX NEURAL
NETWORK AND THEIR APPLICATIONS
Specialization: Mathematical Foundation for Informatics
Code:
9 46 01 10
SUMMARY OF PhD THESIS IN MATHEMATICAL
Hanoi, 2019
This thesis has been completed at: 2
ACADEMY OF MILITARY SCIENCE AND TECHNOLOGY
Scientific supervisors:
1. Assoc. Prof. Dr Le Ba Dung
2. Dr Nguyen Doan Cuong
Reviewer 1: Assoc. Prof. Dr Bui Thu Lam
Military Technical Academy
Reviewer 2: Assoc. Prof. Phung Trung Nghia
Thai Nguyen University
Reviewer 3: Dr Nguyen Do Van
Academy of Military Science and Technology
The thesis was defended at the Doctoral Evaluating Council at
Academy level held at
Academy of Military Science and
Technology at ..... date ……., 2019
The thesis can be found at:
 The library of Academy of Military Science and Technology
 Vietnam National Library.
1
INTRODUCTION
1. The necessary of the thesis
Fuzzy semisupervised clustering is an extension of fuzzy clustering
using prior knowledge that increases quality of clusters. Preinformed
information, also known as additional information, is intended to guide,
monitor and control the clustering process.
Fuzzy minmax neural network (FMNN) model proposed by Patrick
K. Simpson is based on advantages of combining fuzzy logic, artificial
neural network, fuzzy minmax theory to solve classing and clustering
problem. FMNN is an incremental learning model based on fuzzy metafiles for ability to process large data sets.
Liver disease diagnosis based on data from liver enzyme test results
can be formulated as a pattern recognition problem. Use of FMNN is
considered an effective approach. One of the reasons that FMNN is used
in disease diagnostic support is the ability to generate if…then decision
rule that is very simple. Each FMNN's hyperbox transforms into a rule
described by quantifying min and max values of the data attributes.
However, the FMNN itself still has many shortcomings leading to
the difficulties and limited practical application. Main researches on
FMNN focus on major directions such as improving the network
structure, optimizing parameters, subscribing, reducing the number of
hyperbox in the network, improving the learning method or incorporating
other method to improve the quality.
Based on the research on FMNN's development process, to improve
the efficiency of FMNN, the thesis topic focuses on proposing and
improving methodology by semisupervised learning method. In the new
methods presented in the thesis, additional information is defined as the
label assigned to a piece of data to guide and monitor the clustering
process. This is a new approach that earlier methods have not mentioned.
2
2. Objectives of the research
1) Develop advanced fuzzy semisupervised clustering algorithm
based on label spreading. Additional information is a small percentage of
the samples labeled.
2) Propose a novel model of combined semisupervised clustering, this
model automatically defines additional information. In our research, a part of
sample of the fuzzy semisupervised clustering algorithm is labeled.
3) Develop a fuzzy clustering algorithm considering to the
distribution of data.
4) Apply fuzzy minmax neural network to the dump of fuzzy
if...then decision rule in design of the liver disease diagnostic support
system from data is data of the results of the liver enzyme test.
3. Object and scope of the research
The thesis focuses on the following issues:
 An overview of fuzzy minmax neural network and variations of
fuzzy minmax neural network.
 Analysis of limitations and solutions used by researchers to
overcome these limitations.
 Application of fuzzy minmax neural network with dump of fuzzy
if...then decision rule in disease diagnosis.
4. Research methods
The thesis uses theoretical research method, in particular, the thesis
has studied the FMNN model for classing and clustering data. Since then,
the thesis focuses on the proposed semi supervised clustering algorithm.
The thesis also uses simulated empirical method in combination with
analysis, statistics and evaluation of empirical data.
5. Contribution of the thesis
 Develop the advanced SSFMM algorithm for fuzzy semisupervised clustering based on label spreading progress.
3
 Propose a novel model of semisupervised clustering combined
with FMNN and SSFMM. This model automatically defined additional
information for semisupervised clustering algorithms.
 Develop a fuzzy clustering algorithm considering to the
distribution of data.
6. Structure of the thesis
Apart from the introduction and conclusion, the main contents of the
thesis consists of three chapters:
 Chapter 1 presents an overview of the thesis, including the basic
concepts of FMNN and FMNN extensions. From general characteristics
of extensions, limitations, it shall provide the direction of the next
research. Throughout this chapter, the thesis gives an overview of the
research problem, concepts and basic algorithms used in the research.
 Chapter 2 presents suggestions for improving learning method in
the FMNN using the semisupervised algorithm model for data
clustering. The additional information is labeled a part of the sample in
the training data set. Then labels from this part of data are spreading to
unlabeled data samples. Fuzzy semisupervised clustering combining
with FMNN model automatically defines additional information. This is
also used as the input of the fuzzy semisupervised algorithm. Data
clustering model in fuzzy minmax neural network takes into account
distribution of data as well.
 Chapter 3 presents the application of proposed model with the
generation of fuzzy decision rules formed if...then in support system of
liver disease diagnostic on a real dataset.
Chapter 1: Overview of fuzzy minmax neural network
1.1. Fundamental knowledge of fuzzy minmax neural network
* Hyperbox membership function
The degree determination of membership function bj(A,Bj) measures
the degree of belonging of sample A corresponding to hyperbox Bj. It is
defined by Eq. (1.2) or Ed. (1.3) below.
4
b j A, B j
1 n
max 0,1 max 0, min 1, ai w ji
2n i 1
+ max 0,1 max 0, min 1, v ji ai
b j A, B j
(1.2)
1 n
1 f ai w ji , f v ji ai ,
n i 1
(1.3)
* Fuzzy minmax neural network structure
FMNN uses a straightforward neural network structure, twolayer
structure (Fig. 1.4) with unsupervised learning and threelayer structure
(Fig. 1.5) with supervised learning.
Fig. 1.4 Twolayer neural network model
Fig. 1.5 Threelayer neural network model
* Overlapping between hyperboxes
The FMNN algorithm is aimed at creating and modifying hyperboxes
in ndimensional spaces. If the expansion creates overlap between the
hyperboxes, the contraction process is performed to eliminate overlap. The
overlap happens between Bj and Bk if one of the four following cases occurs:
 Case 1: max of Bj overlapped with min of Bk
 Case 2: min of Bj overlapped with max of Bk
 Case 3: Bk contained within Bj
 Case 4: Bj contained within Bk
If Bj and Bk are overlapped, the contraction process of hyperboxes is
performed in the corresponding direction to eliminate overlap:
 Case 1. If v ji vki w ji wki then:
new
vkinew vkiold wold
vkiold wold
ji / 2 wki
ji / 2
5
 Case 2. If vki v ji wki w ji then:
old
new
old
v new
vold
vold
ji
ji wki / 2 wki
ji wki / 2
 Case 3. If v ji vki wki w ji , considerring following cases:
+ If (wki v ji wji vki ) , then: vnew
wkiold
ji
+ If (wki v ji wji vki ) , then: wnew
vkiold
ji
 Case 4. If vki v ji w ji wki , considerring following cases:
+ If (wki v ji wji vki ) , then: wkinew vold
ji
new
old
+ If (wki v ji wji vki ) , then: vki w ji
* The learning algorithm in fuzzy minmax neural netwwork
Algorithm in fuzzy minmax neural network only include creation
and modification of hyperboxes in the sample space. The learning
algorithm in FMNN consists of 3 steps: creation and expansion of
hyperboxes, overlapping test, hyperbox contraction. Each step is repeated
for all samples in the dataset.
1.2. Some researches to improve quality of FMNN
* Adjust size limit of hyperbox
In order to overcome the phenomenon of exceeding size limit of
hyperbox for network training due to the averaging method, D. Ma
proposed an alternative solution of size limit function to be compared in
all dimensions calculated according to formula (1.24) using the formula
(1.29).
A, B
j
A , B
h
j
max w , a min v , a
1 n
max w ji , ai min v ji , ai ,
n i 1
i 1,...,n
ji
hi
ji
hi
(1.24)
(1.29)
* Modify FMNN structure to manage overlapping areas
The FMCN (Fuzzy Minmax neural network classifier with
Compensatory Neurons) and DCFMN (DataCoreBased Fuzzy Min–
Max Neural Network) models overcome the problems caused by
contraction of the hyperboxes that created the additional hyperboxes.
Rather than adjusting contraction of the hyperboxes, the FMCN and
6
DCFMN handle overlapping areas by using hyperboxes to manage
separate overlapping area.
* Improve learning method in FMNN
The semisupervised model of GFMM (General Fuzzy MinMax) and
RFMN (Reflex Fuzzy Minmax Neural network) uses additional information
as the labels accompanying with some input patterns. GFMM and RFMN
used prior knowledge to monitor and guide clustering.
1.5. Conclusion of Chapter 1
Chapter 1 presented the overview research on FMNN and
development trend of FMNN, synthesized and compared the case
researches on structural improvement of FMNN algorithm.
The following chapters will present proposals on some issues that
remain in development of FMNN and application of FMNN to support
disease diagnosis.
Chapter 2: The development of semisupervised clustering algorithm
using fuzzy minmax neural network
This chapter presents three algorithms to improve learning method
and the experimental results used to evaluate proposed algorithms. The
novel models include:
 An improvement of SSFMM semisupervised learning method,
results announced in [3].
 A novel model of semisupervised clustering combined with
FMNN and SSFMM, results announced in [5].
 A fuzzy clustering algorithm considering to the distribution of data.
In addition, the algorithm uses a set of additional rules in the training
process. Results announced in [2, 4].
2.1. SSFMM semisupervised fuzzy clustering algorithm
The GFMM model and the modified model (RFMN) have the
advantage of using more prior information to monitor the clustering
process, thereby improving the clustering quality. However, both GFMM
and RFMN are capable of producing hyperboxes with their own attributes
7
that are not labeled. Because when GFMM and RFMN create new
hyperboxes for the first sample with out label, the new hyperbox is not
labeled. This hyperbox will wait for labeled samples to edit the label of
the hyperbox by the label of the sample. However, there may still be
unlabeled hyperboxes that are not edited due to the absence of labeled
samples. Figure 2.1 is an illustrative example of the case of GFMM and
RFMN producing unlabeled hyperboxes.
Hyperbox
Siêu hộp U
Hyperbox
Siêu
hộp V V
Fig. 2.1 Failed hyperboxes of GFMM and RFMN
Where: V is a hyperbox created from labeled samples or be adjusted
label by labeled samples, U is a hyperbox created from unlabeled samples
or without label adjustment.
The SSFMM algorithm proposes the method to overcome this
disadvantage of GFMM and RFMN. SSFMM prevents the algorithm
from making unlabeled hyperboxes using the βlimit threshold. The
initial threshold is defined by user, but the algorithm has the ability to
manually redefine the threshold for fit during training process. The
framework diagram is described in Figure 2.2.
When creating a new hyperbox from the unlabeled pattern, SSFMM
only creates a new hyperbox if it satisfies β criteria defined in (2.2).
max E A , B : j 1,..., q ,
h
j
(2.2)
The SSFMM operates under the label spreading scheme to label
hyperboxes made by unlabeled samples. Algorithms generate hyperboxes
8
from labeled data samples and spread the labels from labeled hyperboxes
to the hyperboxes created by unlabeled samples. SSFMM incorporates
all the hyperboxes with the same label that form a full cluster.
Begin
Input: D, ,
Snew = D; Sold = 0;
m = D; h = 1
Input pattern {Ah ,dh}D
Does Ah
belong BjB?
y
y
n
Is BjB
that is able to conver
Ah?
y
Expand of Bj
n
n
dh = 0?
d h Blj
n
dh ≠ 0?
y
l
Create Hnew, H new
dh ,
B B Hnew
n
D D \ Ah
dh = 0 ?
y
d h Blj
n
max E A , B j 1,..., q
h
j
S
old
S
old
1
y
Is there
overlapping?
n
y
Hyperbox contraction
l
Create Hnew, H new
Blj
B B Hnew
D D \ Ah
k=k+1
y
n
h < m?
n
Snew = Sold ?
n
y
.
D ?
y
Calculate C according to (1.7)
Output: B, C
End
Fig. 2.2 General diagram of SSFMM algorithm.
* Complexity evaluation of the SSFMM algorithm
The SSFMM algorithm has time complexity that is
O(M(M(M1)/2+NK). Where M is the total number of samples in the
training data set, N is the number of attributes of the data sample, K is
the total number of hyperboxes generated in the SSFMM network.
9
2.2. Combined fuzzy semisupervised clustering algorithm (SCFMN)
The algorithm of SSFMM generated hyperboxes, with each
hyperbox as a cluster. SSFMM used many small hyperboxes to classify
samples on the boundary. However, when the value of parameter max
decreases, the number of hyperboxes in the network will increase and the
complexity of algorithm increases as well. SSFMM should have a
certain rate of labeled sample in the training set.
To against this limitation of SSFMM, SCFMN uses the max
parameter for different values in two stages to improve clustering results
1
2
with fewer hyperboxes. Value of max
and max
are the maximum size of
the large and small hyperboxes, respectively. In the first stage, SCFMN
generates hyperboxes and label for fully attached samples with
hyperboxes. In the second stage, SCFMN spreads label from hyperboxes
created in previous stage to hyperboxes created from unlabeled samples.
Large and small hyperboxes with the same label will form a full cluster.
Figure 2.3 shows the idea of using large hyperboxes at the center of
clusters in conjunction with the smaller hyperboxes in the boundary.
These hyperboxes are expressed in 2dimensional space and data sets
consists of two clusters. Denote B is a large hyperbox, G is a small
hyperbox (dashed line) obtained from labeled samples, R is a small
hyperboxes (dot cross line) obtained from unlabeled samples.
* * *
*
*
*
* * *
*
*
** *
*
*
*
*
* **
*
*
*
* * **
* **
* *
*
* * ** *
* *
*
* * * * ** ** **
*** *
** *
Hyperbox R
*
Hyperbox B
*
+ ++
++
+ + + ++ + + + +
+ + ++
+
*+
++
+
+
* ++
++ + + +
** ** + + +
+ +
+
*
+ +
*
+
++
++ + ++ +
* * **
+
+
+
+ +
*
* +
+ +
++
+ ++
+ ++
+
++ +++
*
*
Fig. 2.3 SCFMN uses the large and small hyperboxes.
Hyperbox G
10
2.2.2. Methodology of SCFMN algorithm
Figure 2.5 is general diagram of SCFMN algorithm.
* The complexity of the SCFMN algorithm
SCFMN has a time complexity of O(KN(M(K+1)+1)+M(M1)/2).
Where M is the total number of samples in the training data set, N is the
number of attributes of the data sample, K is the total number of
hyperboxes generated in the SCFMN network.
Begin
Input pattern {Ah ,dh}D
Input pattern AhD
y
Is BjB
that is able to conver
Ah?
n
Create new Bj, B j
l
j
Does Ah
belong BjB?
Mở rộng hyperbox
Is there
overlapping?
n
n
dh ≠ 0?
y
y
Create Hnew,
l
H new
dh
G = G{Hnew}
dh = 0 ?
dh Blj
max E A , H : s 1,..., q
y
Is there
overlapping?
n
dh 0
D2=D2{Ah,dh}
dh H sl
n
n
Dữ liệu vào AhD
Ah có
thuộc vào BjB ?
y
Is BjB
that is able to conver
Ah?
y
Expand of Bj
Hyperbox
contraction
Dữ liệu vào
đã hết?
y
n
dh = 0?
n
n
y
n
y
dh Blj
y
Hyperbox contraction
D1=D1{Ah,dh}
Are allthe data
has labeled?
y
D = D1D2
Phase 1: Additional information
n
n
h
s
y
n
l
Tạo Hnew, H new
dh
R = R{Hnew}
Are all the
data has
labeled?
y
End
Phase 2: Apply SSFMM for data clustering
Fig. 2.5 General diagram of SCFMN algorithm.
2.3. CFMNN fuzzy minmax clustering algorithm based on data
cluster center
The value of FMNN membership function does not decrease as the
samples are far away from the hyperbox. To overcome these
disadvantages, CFMNN relies on the distances between the samples and
11
centroids of the corresponding hyperboxes. Centroid value is calculated
until the sample is far away from the hyperbox and its membership is less
than 0.6, when the membership function value does not decrease. Apart
from the min and max points, each hyperbox has the center of the
hyperbox defined as in (2.8).
c ji v ji w ji / 2
(2.8)
The Euclidean distance between the input pattern Ah and the center
of hyperbox j, E A , B is given by (2.9):
h
j
1
E A , B 1
h
j
n
n
i 1
c ji ahi
2
(2.9)
For each sample Ah satisfies the size limit condition (1.24) where the
membership function value is bj < 0.6, its distance is calculated and
compared with others. Samples will belong to the closest hyperboxes.
* Complexity of the CFMNN algorithm
CFMNN algorithm has a time complexity of O(MKN). Where M is
the total number of samples in the training data set, N is the number of
attributes of the data sample, K is the total number of hyperboxes
generated in CFMNN.
2.4. Experiment and evaluation
* Experimental method
To evaluate the performance of these proposed algorithms, the
experiments were performed on the Benchmark data set.
The objective of experiment is to evaluate the ability to improve
performance, quantity, and distribution of the hyperboxes when changing
the value of parameter max in the SSFMM, CFMNN, SCFMN algorithms.
This also evaluates the mitigation capability of hyperboxes as well.
Accuracy and CCC (Cophenetic Correlation Coefficient)
measurements are used to evaluate the performance of algorithms and
compare them to other ones. Accuracy is calculated by (2.12), CCC is
calculated by (2.13).
12
Details of the experimental results are presented in Table 2.2 to
Table 2.14, from Figure 2.9 to Figure 2.20.
* Experimental results
(a). Spiral
(b). Aggregation
(c). Jain
(d). Flame
(e). Pathbased
(b) R15
Fig. 2.9 Graphical distribution of hyperboxes on data sets
13
(a)
(b)
(c)
(d)
Fig. 2.10 Accuracy obtained when changing the ratio of labeled sample
of SSFMM.
14
(a). Data set R15
(b). Jain data set
(c). Iris data set
(d). Flame data set
Fig. 2.11 Accuracy obtained when changing max of SSFMM and SCFMN
15
(a) Jain dataset
(b) Flame dataset
(c) Iris dataset
(d) R15 dataset
Hình 2.17. NoH obtained when changing max of SSFMM and SCFMN
The experimental results show that:
16
 Accuracy decreases when ratio of the labeled sample decreases but it
is not as much as the decreasing ratio of the labeled sample in training set.
 Accuracy decreases when the max size of max increases. When max
is too small, the Accuracy measurement decreases. max affects to the
performance of the algorithm.
 The total number of hyperboxes decreases when max increases.
* Comparisons of proposed algorithm results with some other
algorithms
Table 2.7 compares the GFMM, RFMN and SSFMM Accuracy
measurements on the Iris data set.
Table 2.7 Values of Accuracy with the changing of ratio of sample labeled
Accuracy (%)
Ratio of sample
labeled
GFMM
RFMN
SSFMM
2%
36
52
94
10%
49
83
96
50%
84
92
97
Table 2.8 compares the GFMM, RFMN and SSFMM Accuracy
measurements on a set of experimental data sets. Sample ratio in label in
training training is 10%.
Table 2.8 Values of Accuracy obtained by using SSFMM, GFMM and
RFMN on different data sets
Data set
Aggregation
Flame
Jain
Sprial
Pathbased
R15
Iris
ThyroidNew
Wine
Accuracy (%)
GFMM
RFMN
SSFMM
48.25
79.56
98.86
49.74
84.47
98.75
56.32
55.19
52.47
48.28
49.36
51.83
52.54
85.35
82.61
82.52
84.78
83.92
80.12
80.73
100
100
98.72
99.50
96.00
91.69
93.33
17
Table 2.9 Comparison of Accuracy obtained by using SCFMN, CFMNN,
FMNN and MFMM
Accuracy (%)
Data
set
FMNN
MFMM
CFMNN
SCFMN
Flame
85.13
91.78
91.25
99.17
Jain
86.07
91.18
91.20
100
R15
87.24
93.54
93.76
99.50
Iris
86.97
93.01
92.77
95.98
Wine
85.58
93.12
92.83
94.35
PID
68.35
70.08
70.49
74.58
Table 2.10 Compare of CCC obtained by using SCFMN, CFMNN, MFMN
and MFMM
CCC
Data set
Glass
Iris
Wine
MFMM
MFMN
CFMNN
SCFMN
0.94
0.94
0.93
0.94

0.97
0.97
0.98
0.83

0.84
0.89
Table 2.11 Compare Time obtained by using SCFMN, CFMNN, FMNN
and MFMM
Dataset
Time (s)
FMNN
MFMM
CFMNN
SCFMN
Flame
0.483
0.532
0.487
0.876
Jain
0.635
0.724
0.648
0.923
R15
0.701
0.798
0.712
0.967
Iris
0.215
0.231
0.221
0.623
Wine
0.274
0.283
0.276
0.692
525.132
732.945
543.675
913.657
PID
18
Hình 2.19. Values of Accuracy comparison chart of SCFMN, CFMNN with
FMNN, MFMM
Figure 2.20. NoH comparison chart of SCFMN with some other methods
2.5. Conclusion of Chapter 2
Chapter 2 presents the improvements of FMNN algorithm including:
 Propose improvements of semisupervised learning with labeled a
part of the data in training set and label spreading methods (SSFMM).
Learning algorithm in SSFMM uses the information contained in both of
labeled and unlabeled data for training. SSFMM performs well even
with low ratio of labeled samples. This proposal was published in [3].
 Propose a novel semisupervised clustering model combined
(SCFMN). SCFMN uses semisupervised learning method with additional
information defined automatically. SCFMN uses structure of hyperbox
with large size at the center of the cluster to minimize the number of
hyperboxes and small hyperboxes at the boundary among the clusters to
increase clustering performance. This proposal was published in [5].
19
 Propose an improved algorithm CFMNN considering to the
distribution of data. In the forecasting and adjusting stages, the hyperbox is
not completely dependent on its membership degree, especially when the
model is far away from the hyperbox. In addition, the CFMNN uses a new
set of 10 rules to adjust hyperboxes during training. This proposal has been
published in [2, 4].
Chapter 3: Application of Fuzzy minmax neural network in
supporting liver disease diagnosis
3.1. Liver disease diagnosis methods
* Diagnosed using APRI
APRI is calculated by the formula (3.1):
APRI =
AST / ULN
100
PLT
(3.1)
* Diagnosed using FIB4
FIB4 is calculated by the formula (3.2):
FIB4 =
Age AST
PLT ALT
(3.2)
3.2. Liver disease diagnosis support using fuzzy minmax neural network
* Problem modeling
CDS (Cirrhosis Diagnosis System) is a diagnostic model for liver
disease based on a combination of fuzzy minmax theory, artificial neural
networks and fuzzy inference method to build a decision support system
via data of liver enzyme test. The model of CDS in liver disease
diagnostic support system is shown in Figure 3.1.
* Model analysis
 CDS creates an combined approach between data clustering
algorithm and decisionmaking methods for the liver disease diagnosis.
 CDS offers a view to combine clustering algorithm using FMNN
with the decisionmaking system. This has great significance for liver
disease diagnosis problem in particular and the fields of Medical
Informatics in general.
20
Begin
Liver enzyme test
Extract and select
features
Expansion of
hyperbox
Data
Hyperbox
Overlap Test
Hyperbox
Contraction
Fuzzy minmax neural network training
Pruning Hyperboxes
Generating the Rules
Disease summary table from
the test results
Diagnostic
End
Fig. 3.1. Liver disease diagnostic support system by CDS
* Pruning hyperbox using the HCF index
Each hyperbox is associated with an HCF (Hyperbox Confidence
Factor) to measure usage level. Hyperboxes with a HCF index lower than
the threshold will be pruned.
* Decision rule extracting
Each hyperbox generates a fuzzy decision rule. The min and max
values are quantified as Q levels that equivalent to the number of fuzzy
partitions in the quantitative rule. Each input pattern is assigned to
quantum dots by using (3.8):
Aq (q 1) / (Q 1)
Fuzzy rules formed if…then are defined by (3.9):
Rule R j : If x p1 is Aq and x pn is Aq
Then x p is C j
(3.8)
(3.9)
21
3.3. Experiment and evaluation
* Experimental data sets
Information on liver disease data is shown in Table 3.3. This
information is extracted from the medical records related to the test
results and disease diagnosis from doctors.
* Objectives of experiments
 To evaluate the ability of improving the performance.
 To evaluate the number of hyperboxes before and after prunning
process.
 To evaluate the decision rules, computation time.
* Measurements and evaluation criteria
Measurements include Accuracy, AccSe, AccSp, NPV, PPV,
Jaccard, Rand, FM, NoH.
* Experimental results
Details of the experimental results are presented in Tables 3.4 to
Table 3.15, from Figure 3.2 to Figure 3.10.
(a) SSFMM
(b) SCFMN
Fig. 3.5 Accuracy of SCFMN, SSFMM when changing max on Liverdisease
dataset
22
Fig. 3.6 NoH of SCFMN and SSFMM when changing max on real dataset
Table 3.9. Fuzzy rules on Cirrhosis dataset generated by SCFMN
IF
Rule
Then
CF
A1
A2
A3
1
1
1
23
2
0.300
2
13
1
23
1
0.114
3
12
1
34
1
0.075
4
34
12
1
1
0.039
5
13
14
12
1
0.834
6
1
1
14
2
0.43
Table 3.13. An example of a diagnostic results using SCFMN on real dataset
If
Then
(C)
A1
81
A2
0
A3
A4
97.1 104.1
A5
A6
3.1 154.4
A7
36.7
A8
27.3
A9
10.1
A10
37
53
0
94.1 100.9
3.1 266.4
25.2
37.6
10.7
28
1
53
0
87.9
94.3
3.1 249.0
23.5
35.1
10.0
28
1
81
0
86.1
92.3
3.1 136.9
32.5
24.2
9.0
37
1
24
1
592.3 200.6
3.0 195.6
38.3 359.5 139.3
39
1
37
0
568.6 208.7
2.7
82.6
27.5
65.3
15.3
23
1
46
1
60.4
57.0
1.1
87.8
37.4
19.0
3.5
18
0
57
0
60.5
45.4
1.3 196.2
39.2
12.1
3.5
29
0
57
0
60.5
45.4
1.3 196.4
39.2
12.1
3.5
29
0
1
3.4. Conclusion of Chapter 3
In chapter 3, the applications of proposed models in the design of
support system for diagnosing liver disease from data which includes the
information of liver enzyme tests.
23
The implement of proposed models on the live disease data set.
Obtained results show that proposed models get better results comparing
with giving good results with predicted values. Especially the ability to
extract the fuzzy if...then decision rule with quantitative values are the
minmax points of the fuzzy hyperbox. The results were evaluated
through measurements, and at the same time, through these experimental
results, once again test the correctness of the propositions when
constructed using theoretical models.
CONCLUSION
From the research contents, the thesis has achieved the following
results:
* Main results:
 Propose algorithm improvements with semisupervised learning
using additional information is labeled with part of the data in the training
set and label spreading methods (SSFMM). It gradually forms and
corrects the hyperboxes (clusters) during training. Labeled samples are
prepopulated to form hyperboxes, and then spread the labels to
unlabeled samples to form hyperboxes from unlabeled training samples
Learning in SSFMM uses the information contained in the labeled data
and also unlabeled data for training. SSFMM performs well even with
low labeled sample rates. This proposal was published in [3].
 Propose fuzzy semisupervised clustering model combined
between SSFMM and FMNN. The proposed model uses semisupervised
learning method with additional information provided by automatically
defined algorithms. The algorithm uses structure of hyperbox with large
size at the center of the cluster to minimize the number of hyperboxes
and small hyperboxes at the boundary among the clusters to increase
clustering performance. This proposal was published in [5].
 Propose algorithm for improving CFMNN considering to the
distribution of data. During the forecasting and adjustment phase, the
hyperbox is not completely dependent on its dependency, especially
when the model is far away from the hyperbox. In addition, the CFMNN