Automatic Keyword Extraction From Any Text Document Using N-gram Rigid Collocation
Bidyut Das¹, Subhajit Pal², Suman Kr. Mondal³, Dipankar Dalui⁴, Saikat Kumar Shome⁵
¹Bidyut Das, Dept. Of IT, Haldia Institute Of Technology, Haldia, India.
²Subhajit Pal, Students Of IT Dept., Haldia Institute Of Technology, Haldia, India.
³Suman Kr. Mondal, Students Of IT Dept., Haldia Institute Of Technology, Haldia, India.
⁴Dipankar Dalui, Students Of IT Dept., Haldia Institute Of Technology, Haldia, India.
⁵Saikat Kumar Shome, Scientist, CSIR-Central Mechanical Engineering Research Institute, Durgapur.
Manuscript received on April 04, 2013. | Revised Manuscript received on April 28, 2013. | Manuscript published on May 05, 2013. | PP: 238-242 | Volume-3, Issue-2, May 2013. | Retrieval Number: B1512053213/2013©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: An unsupervised method for extracting keywords from a single document is proposed in this paper. A fuzzy set theoretic approach, fuzzy n-gram indexing, is used to extract n-gram keywords. It is noticed that n-gram keyword renders a better result as compared to mono-gram keyword, but for some documents the most relevant keyword is mono-gram. This paper focuses on a keyword extraction approach which neither requires a dictionary or thesaurus nor does it depend on the size of text document. The algorithm is efficient enough to dynamically determine the mono-gram, bi-gram as well as n-grams keywords for different documents.
Keywords: Keyword extraction; n-gram collocation, fuzzy set; information retrieval, natural language processing.

Download PDF

JOURNAL

REQUIREMENTS

PRODUCT

CONTACT US