Comparing and Selecting Appropriate Measuring Parameters for K-means Clustering Technique
Shreya Jain1, Samta Gajbhiye2
1Shreya Jain, M.E. Scholar, Computer Science & Engg Department, Shri Shankaracharya Technical Campus, Bhilai, India.
2Samta Gajbhiye, Sr. Associate Professor, Computer Science & Engg Department, Shri Shankaracharya Technical Campus, Bhilai, India.
Manuscript received on February 15, 2012. | Revised Manuscript received on February 20, 2012. | Manuscript published on March 05, 2012. | PP: 392-396 | Volume-2 Issue-1, March 2012. | Retrieval Number: A0459022112/2012©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Clustering is a powerful technique for large scale topic discovery from text. It involves two phases: first, feature extraction maps each document or record to a point in a high dimensional space, then clustering algorithms automatically group the points into a hierarchy of clusters. Hence to improve the efficiency & accuracy of mining task on high dimensional data the data must be pre-processed by an efficient dimensionality reduction method. Recently cluster analysis is popularly used data analysis method in number of areas. K-Means is one of the well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroids. In this paper, a certain k-means algorithm for clustering the data sets is used and the algorithm outputs k disjoint clusters each with a concept vector that is the centroid of the cluster normalized to have unit Euclidean norm. Also in this paper, we deal with the analysis of different sets of k-values for better performance of the k-means clustering algorithm.
Keywords: Data Mining, Text Mining, Clustering, K-Means Clustering, Silhouette plot .