Automatic Keyword Extraction From Any Text Document Using N-gram Rigid Collocation
Bidyut Das1, Subhajit Pal2, Suman Kr. Mondal3, Dipankar Dalui4, Saikat Kumar Shome5
1Bidyut Das, Dept. Of IT, Haldia Institute Of Technology, Haldia, India.
2Subhajit Pal, Students Of IT Dept., Haldia Institute Of Technology, Haldia, India.
3Suman Kr. Mondal, Students Of IT Dept., Haldia Institute Of Technology, Haldia, India.
4Dipankar Dalui, Students Of IT Dept., Haldia Institute Of Technology, Haldia, India.
5Saikat Kumar Shome, Scientist, CSIR-Central Mechanical Engineering Research Institute, Durgapur.
Manuscript received on April 04, 2013. | Revised Manuscript received on April 28, 2013. | Manuscript published on May 05, 2013. | PP: 238-242 | Volume-3, Issue-2, May 2013. | Retrieval Number: B1512053213/2013©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: An unsupervised method for extracting keywords from a single document is proposed in this paper. A fuzzy set theoretic approach, fuzzy n-gram indexing, is used to extract n-gram keywords. It is noticed that n-gram keyword renders a better result as compared to mono-gram keyword, but for some documents the most relevant keyword is mono-gram. This paper focuses on a keyword extraction approach which neither requires a dictionary or thesaurus nor does it depend on the size of text document. The algorithm is efficient enough to dynamically determine the mono-gram, bi-gram as well as n-grams keywords for different documents.
Keywords: Keyword extraction; n-gram collocation, fuzzy set; information retrieval, natural language processing.