Reducing the Size of Very Large Training Set for Support Vector Machine Classification
Mahmoudreza Ahmadi1, Hamidreza Ghaffari2
1Mahmoudreza Ahmadi, MA. Student, Department of Computer Engineering, Islamic Azad University of Ferdows Ferdows, Khorasan, Iran.
2Dr. Hamidreza Ghaffari, Department of Computer Engineering, Islamic Azad University of Ferdows Ferdows, Khorasan, Iran.
Manuscript received on November 02, 2014. | Revised Manuscript received on November 04, 2014. | Manuscript published on November 05, 2014. | PP: 55-61 | Volume-4 Issue-5, November 2014. | Retrieval Number: D2377094414/2014©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Normal support vector machine (SVM) algorithms are not suitable for classification of large data sets because of high training complexity. In this paper, we introduce a method based on edge recognition technique to find low-value data, where to keep input data distribution, we use clustering algorithm like k-means to compute clusters centers. Data is selected through edge recognition algorithm and cluster centers, are used to build a training data set. Reconstructed data set with small size, increase the speed of training process procedure without decreasing classification precision. But, as we used k-means algorithm, it is required to initially specify the number of classes. We try to get a proper procedure by improving edge recognition algorithm to reduce data, also using hierarchical clustering algorithm and similarity percent to compute number of clusters instead of using k-means algorithm, and compare results of these two algorithms.
Keywords: Support vector machine, k-means, optimization, edge recognition, cluster, hierarchical, similarity percent.