Improving Results of TF-IDF based Retrieval System using Co-reference Resolution and Pronoun Substitution
Downloads
Information Retrieval systems involve the process of retrieving relevant information based on user queries. TF-IDF is one of the most popular techniques of Information Retrieval. It is widely used and been successful in retrieving relevant information. But still it has some disadvantages. In this paper we propose a method to improve the performance of TF/IDF based systems using Co-reference Resolution and Pronoun Substitution. The system is found to be effective as there has been significant changes in the order of rankings of documents retrieved due to the relative increase in the amount
of content that have taken into consideration during the retrieval process. Graphical analysis of the observed improvement is given by visualizations of TF-IDF, Cosine Similarity and Effective improvement in rank for various documents before and after the change of algorithm.
Sanderson, M., & Croft, W. B. (2012). The history of information retrieval research. Proceedings of the IEEE, 100(Special Centennial Issue), 1444-1451.
Jin, Y., Lin, Z., & Lin, H. (2008, December). The research of search engine based on semantic web. In 2008 International Symposium on Intelligent Information Technology Application Workshops (pp. 360-363). IEEE
Guo, A., & Yang, T. (2016, May). Research and improvement of feature words weight based on TFIDF algorithm. In 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference (pp. 415-419). IEEE.
Xu, R. (2014, July). POS weighted TF-IDF algorithm and its application for an MOOC search engine. In 2014 International Conference on Audio, Language and Image Processing (pp. 868-873). IEEE.
Roul, R. K., Sahoo, J. K., & Arora, K. (2017, December). Modified TF-IDF term weighting strategies for text categorization. In 2017 14th IEEE India Council International Conference (INDICON) (pp. 1-6). IEEE.
Wang, N., Wang, P., & Zhang, B. (2010, June). An improved TF-IDF weights function based on information theory. In 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering (Vol. 3, pp. 439-441). IEEE.
Mishra, A., & Vishwakarma, S. (2015, December). Analysis of tf-idf model and its variant for document retrieval. In 2015 international conference on computational intelligence and communication networks (cicn) (pp. 772-776). IEEE.
Liu, Q., Wang, J., Zhang, D., Yang, Y., & Wang, N. (2018, December). Text features extraction based on TF-IDF associating semantic. In 2018 IEEE 4th International Conference on Computer and Communications (ICCC) (pp. 2338-2343). IEEE.
Sanderson, M., & Croft, W. B. (2012). The history of information retrieval research. Proceedings of the IEEE, 100(Special Centennial Issue), 1444-1451.
Jin, Y., Lin, Z., & Lin, H. (2008, December). The research of search engine based on semantic web. In 2008 International Symposium on Intelligent Information Technology Application Workshops (pp. 360-363). IEEE