A multi-tier knowledge discovery info-structure using ensemble techniques
Loading...
Date
2007
Authors
Sakthiaseelan, Karthigasoo
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Our terminal focus is to learn rules instances that have been discovered from
unannotated data and generate results with high accuracy. This is done via a hybridized
methodology which features both supervised and unsupervised techniques. Unannotated
data without prior classification information could now be useful as our research has
brought new insight to knowledge discovery and learning altogether.
Our Methodology for Knowledge Discovery and Learning (MKDL) consists of 6
important phases that used different algorithms to produce the outcome. The phases and
algorithms used are as follows: a) Data Preprocessing using Mean/Mode Fill and
Combinatorial Completion, b) Clustering Ensemble using Boosting technique within
Kohonen Self Organizing Map, c) Data Discretization using Boolean Reasoning and
Entropy/Minimum Description Length, d) Rule Generation using Genetic Algorithm,
Johnson Algorithm and Rough Sets Approximation, e) Rule Filtering using Michalski’s
formula and Torgo’s technique and f) Learning using the ensemble technique with
Bagging within Neural Networks.
An output from one phase will be an input to the next phase. All the 6 phases
combined with its functions and algorithm form an integration of different application. This
complete architecture forms the Multi-tier Knowledge Discovery, Amalgamation and
Learning Info-structure (MESTAC).
We performed comparison and analysis with 2 knowledge discovery frameworks
and different algorithms to come up with the best model (combination of algorithms) that
result in high accuracy in prediction. We introduced a boosting ensemble technique into
Kohonen Self Organizing Map to produce better clustering results. We also introduced bagging ensemble technique to a combination of neural network algorithm to produce
precision in prediction.
MESTAC may seem to be a complex combination of phases but there are 3
important advantages in terms of its overall methodology. MESTAC is simple, efficient and
generic. Simplicity here indicates that MESTAC is a highly modular info-structure, where
each phase is an independent functional-specific module. Efficiency here indicates that
the final outcome of the info-structure is more accurate. Genericity here indicates that the
info-structure can be used to discover knowledge for different types of data-sets such as
continuous, mixed and discrete data-sets.
MESTAC has demonstrated to be a feasible method using a well-known breast
cancer dataset. The positive results from the empirical study indicate that the methodology
is sound and is indeed applicable to be a new knowledge discovery and learning
methodology.
Description
Master
Keywords
Science Physic , Ensemble Techniques