Improving Classification Accuracy by using Feature Selection and Ensemble Model
Pushpalata Pujari1, Jyoti Bala Gupta2
1Pushpalata Pujari, Computer Science & Information Technology Department , Guru Ghasid Das Central University , Bilaspur, India.
2Jyoti Bala Gupta, Information Technology Department, C.V.Raman University, Bilaspur, India.
Manuscript received on April 11, 2012. | Revised Manuscript received on April 14, 2012. | Manuscript published on May 05, 2012. | PP: 380-386 | Volume-2 Issue-2, May 2012 . | Retrieval Number: B0647042212/2012©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Classification is an important technique of data mining. In this paper feature selection technique and an ensemble model is proposed to improve classification accuracy. Feature selection technique is used for selecting subset of relevant features from the data set to build robust learning models. Classification accuracy is improved by removing most irrelevant and redundant features from the dataset. Ensemble model is proposed for improving classification accuracy by combining the prediction of multiple classifiers. Three decision tree data mining classifiers CART, CHAID and QUEST are considered in this paper for classification. The ionosphere dataset investigated in this study is taken from UCI machine learning repository which is classified under two category “Bad” and “Good”. The proposed ensemble model combines the classifiers CART, CHAID and QUEST by using confidential-weighted voting scheme. A comparative study is carried on the performances of the classifiers before and after carrying out feature selection. The performance of each classifier and ensemble model is evaluated by using statistical measures like accuracy, specificity and sensitivity. Gain chart and R.O.C (Receiver operating characteristics chart) are also used for measuring performances. It is found that with feature selection the ensemble model provides a greater accuracy of 93.84% than any of the individual model. Experimental results show that the proposed ensemble model with feature selection is quite effective for the task of classification of ionosphere dataset.
Keywords: Classification, Ensemble Model, Ionosphere Dataset, Feature Selection.