Reducing the Size of Very Large Training Set for Support Vector Machine Classification
Mahmoudreza Ahmadi¹, Hamidreza Ghaffari²¹Mahmoudreza Ahmadi, MA. Student, Department of Computer Engineering, Islamic Azad University of Ferdows Ferdows, Khorasan, Iran.
²Dr. Hamidreza Ghaffari, Department of Computer Engineering, Islamic Azad University of Ferdows Ferdows, Khorasan, Iran.
Manuscript received on November 02, 2014. | Revised Manuscript received on November 04, 2014. | Manuscript published on November 05, 2014. | PP: 55-61 | Volume-4 Issue-5, November 2014. | Retrieval Number: D2377094414/2014©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Normal support vector machine (SVM) algorithms are not suitable for classification of large data sets because of high training complexity. In this paper, we introduce a method based on edge recognition technique to find low-value data, where to keep input data distribution, we use clustering algorithm like k-means to compute clusters centers. Data is selected through edge recognition algorithm and cluster centers, are used to build a training data set. Reconstructed data set with small size, increase the speed of training process procedure without decreasing classification precision. But, as we used k-means algorithm, it is required to initially specify the number of classes. We try to get a proper procedure by improving edge recognition algorithm to reduce data, also using hierarchical clustering algorithm and similarity percent to compute number of clusters instead of using k-means algorithm, and compare results of these two algorithms.
Keywords: Support vector machine, k-means, optimization, edge recognition, cluster, hierarchical, similarity percent.

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US