Temporal Condensation of Tamil News
Downloads
Since the dawn of the Internet, we have been inundated with an excess of information. The volume of information available on the Internet is expected to grow exponentially. This brings a need for summarization of information. Thus, making summarization one of the most sought-after topics in the domain of natural language processing. It is essential to be informed about the vital happenings, and newspapers have been serving this purpose for a very long time. Sadly, there is a perception among the general public that no news agency today can be unequivocally trusted, the credibility of news articles is uncertain. Therefore, one has to read news articles from various sources to get an unbiased view on topic. When a query related to an event is entered in SEs like google, the search renders an overwhelming number of responses, it is humanly impossible to read all of them. In an effort to address the aforementioned problems, a condensation of news articles covering the Tamilnadu Legislative Assembly election is performed. The news articles were collected from various news sources over a period of two months. The collected articles were translated from Tamil to English. These articles included news about various events, in order to segregate Tamilnadu related news from them k-means clustering was performed on the dataset. The relvant news articles acquired was pre-processed to remove ambiguity and mistakes from translation. These articles were summarized individually using a linear regression model that gave importance to features such as named entities, number of words that were similar to title etc. The acquired individual summaries were summarized using BERT extractive summarizer as it would reduce redundancy. When generated summary was compared with introduction and title of the article in the absence of an introduction a precision of 0.512, recall of 0.25 and f-measure of 0.31 were obtained.
Sharma, Parul, and Teng-Sheng Moh. "Prediction of Indian election using sentiment analysis on Hindi Twitter." In 2016 IEEE international conference on big data (big data), pp. 1966-1971. IEEE, 2016.
Liu, Mingrong, Yicen Liu, Liang Xiang, Xing Chen, and Qing Yang. "Extracting key entities and significant events from online daily news." In International Conference on Intelligent Data Engineering and Automated Learning, pp. 201-209. Springer, Berlin, Heidelberg, 2008.
Mirani, Tarun B., and Sreela Sasi. "Two-level text summarization from online news sources with sentiment analysis." In 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), pp. 19-24. IEEE, 2017
Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv: 1508.04025 (2015).
Sethi, Prakhar, Sameer Sonawane, Saumitra Khanwalker, and R. B. Keskar. "Automatic text summarization of news articles." In 2017 International Conference on Big Data, IoT and Data Science (BID), pp. 23-29. IEEE, 2017.
Nasukawa, Tetsuya, and Tohru Nagano. "Text analysis and knowledge mining system." IBM system journal 40, no. 4 (2001): 967-984.
Nayeem, Mir Tafseer, and Yllias Chali. "Extract with order for coherent multi-document summarization." arXiv preprint
arXiv: 1706.06542 (2017).
Feldman, Ronen, and Ido Dagan. "Knowledge Discovery in Textual Databases (KDT)." In KDD, vol. 95, pp. 112-117. 1995.
Konchady, Manu, and James Sanger. Text mining application programming. Vol. 1. Boston: Charles River Media, 2006.
"Speech and Language processing" by Dan Jurasky and James H. Martin [ third edition]