Corpus based Automatic Text Summarization System with HMM Tagger
M.Suneetha1, S. Sameen Fatima2
1Manne Suneetha, Assistant Professor, Department of Information Technology, VR Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India.
2Dr. S. Sameen Fatima, Professor and HOD, Department of Computer Science engineering, Osmania University, Hyderabad, Andhra Pradesh, India.
Manuscript received on June 26, 2011. | Revised Manuscript received on July 02, 2011. | Manuscript published on July 05, 2011. | PP: 118-123 | Volume-1 Issue-3, July 2011. | Retrieval Number: C073071311
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The rapid growth of the data in the Internet has overloaded the user with enormous amounts of information which is more difficult to access huge volumes of documents. Automatic text summarization technique is an important activity in the analysis of high volume text documents. Text Summarization is condensing the source text into a shorter version preserving its information content and overall meaning. In this paper a frequent term based text summarization technique with HMM tagger is designed and implemented in java. The proposed system generates a summary for a given input document based on identification and extraction of important sentences in the document. The model consists of four stages. In first stage, the system decomposes the given text into its constituent sentences, assigning the POS (tag) for each word in the text and stores the result in a table. The second stage removes the stop words, stemming the text and applying lemmatization. Feature term identification is done in third stage. Finally each sentence is ranked depending on feature terms. This stage reduced the amount of the sentences in the summary in order to produce a qualitative summary.
Keywords: Text Summarization, HMM Tagger, Brown Corpus, POS tagging.