State-Of-The-Art Named Entity Recognition and Related Extraction: A Review
Downloads
Named Entity Recognition (NER) has evolved significantly as a key component in the field of Natural Language Processing (NLP). This review paper encapsulates the progress and trends in NER by exploring state-of-the-art techniques and methodologies employed across various domains. It highlights the shift from traditional rule-based models to advanced machine learning approaches, including deep learning and transformers, which have markedly enhanced the performance of NER systems. Particular emphasis is given to the adaptation of NER for specific needs such as biomedical information extraction, cybersecurity, and multilingual entity recognition, reflecting the growing complexity and diversity of application fields. Recent advances demonstrate the integration of sophisticated technologies like graph attention networks and multimodal frameworks, which leverage both contextual and syntactic features to address the challenges of polysemy and entity disambiguation. The review also discusses the crucial role of domain-specific adaptations, the importance of large, annotated datasets, and ongoing efforts to mitigate limitations related to data scarcity in low-resource languages. This comprehensive overview not only sheds light on the technological advancements but also sets the stage for future explorations aimed at further refining the accuracy and applicability of NER systems across more diverse and challenging datasets.
S. Amin and G. ¨ Unter Neumann, “T2NER: Transformers based Transfer Learning Framework for Named Entity Recognition.” [Online]. Available: https://github.com/thuml/
A. D. P. Ariyanto, D. Purwitasari, and C. Fatichah, “A Systematic Review on Semantic Role Labeling for Information Extraction in Low-Resource Data,” IEEE Access, vol. 12, pp. 57917–57946, 2024, doi: 10.1109/ACCESS.2024.3392370.
H. Lughbi, M. Mars, and K. Almotairi, “CybAttT: A Dataset of Cyberattack News Tweets for Enhanced Threat Intelligence,” Data (Basel), vol. 9, no. 3, p. 39, Feb. 2024, doi: 10.3390/data9030039.
H. Zhou, Z. Liu, C. Lang, Y. Xu, Y. Lin, and J. Hou, “Improving the recall of biomedical named entity recognition with label re-correction and knowledge distillation,” BMC Bioinformatics, vol. 22, no. 1, Dec. 2021, doi: 10.1186/s12859-021-04200-w.
D. Vithanage, P. Yu, L. Wang, and C. Deng, “Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study,” J Healthc Inform Res, vol. 8, no. 1, pp. 158–179, Mar. 2024, doi: 10.1007/s41666-023-00157-y.
Amina Catherine Ijiga, Enoch Joseph Aboi, Idoko Peter Idoko, Lawrence Anebi Enyejo, and Micheal Olumubo Odeyemi, “Collaborative innovations in Artificial Intelligence (AI): Partnering with leading U.S. tech firms to combat human trafficking,” Global Journal of Engineering and Technology Advances, vol. 18, no. 3, pp. 106–123, Mar. 2024, doi: 10.30574/gjeta.2024.18.3.0046.
A. Rahali and M. A. Akhloufi, “End-to-End Transformer-Based Models in Textual-Based NLP,” AI (Switzerland), vol. 4, no. 1. Multidisciplinary Digital Publishing Institute (MDPI), pp. 54–110, Mar. 01, 2023. doi: 10.3390/ai4010004.
M. B. Shishehgarkhaneh, R. C. Moehler, Y. Fang, A. A. Hijazi, and H. Aboutorab, “Transformer-Based Named Entity Recognition in Construction Supply Chain Risk Management in Australia,” IEEE Access, vol. 12, pp. 41829–41851, 2024,
doi: 10.1109/ACCESS.2024.3377232.
S. Chen, Y. Pei, Z. Ke, and W. Silamu, “Low-resource named entity recognition via the pre-training model,” Symmetry (Basel), vol. 13, no. 5, May 2021, doi: 10.3390/sym13050786.
M. H. Syed and S. T. Chung, “Menuner: Domain‐adapted bert based ner approach for a domain with limited dataset and its application to food menu domain,” Applied Sciences (Switzerland), vol. 11, no. 13, Jul. 2021, doi: 10.3390/app11136007.
A. Agrawal, S. Tripathi, M. Vardhan, V. Sihag, G. Choudhary, and N. Dragoni, “BERT-Based Transfer-Learning Approach for Nested Named-Entity Recognition Using Joint Labeling,” Applied Sciences (Switzerland), vol. 12, no. 3, Feb. 2022, doi: 10.3390/app12030976.
R. Anam et al., “A deep learning approach for Named Entity Recognition in Urdu language,” PLoS One, vol. 19, no. 3 March, Mar. 2024,
doi: 10.1371/journal.pone.0300725.
Y. Tian, W. Shen, Y. Song, F. Xia, M. He, and K. Li, “Improving biomedical named entity recognition with syntactic information,” BMC Bioinformatics, vol. 21, no. 1, Dec. 2020, doi: 10.1186/s12859-020-03834-6.
X. Zheng, H. Du, X. Luo, F. Tong, W. Song, and D. Zhao, “BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework,” BMC Bioinformatics, vol. 23, no. 1, Dec. 2022, doi: 10.1186/s12859-022-05051-9.
Y. Zhong and S. D. Goodfellow, “Domain-specific language models pre-trained on construction management systems corpora,” Automation in Construction, vol. 160. Elsevier B.V., Apr. 01, 2024. doi: 10.1016/j.autcon.2024.105316.
Z. Nasar, S. W. Jaffry, and M. K. Malik, “Named Entity Recognition and Relation Extraction: State-of-The-Art,” ACM Comput Surv, vol. 54, no. 1, Apr. 2021, doi: 10.1145/3445965.
H. Alamro, T. Gojobori, M. Essack, and X. Gao, “BioBBC: a multi-feature model that enhances the detection of biomedical entities,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-58334-x.
W. Gao, Y. Li, X. Guan, S. Chen, and S. Zhao, “Research on Named Entity Recognition Based on Multi-Task Learning and Biaffine Mechanism,” Comput Intell Neurosci, vol. 2022, 2022, doi: 10.1155/2022/2687615.
Y. Jiang, F. Jin, M. Chen, G. Liu, and Y. Yuan, “Cross-domain NER in Data-poor Scenarios for Human Mobility Knowledge,” 2023, doi: 10.21203/rs.3.rs-3152699/v1.
W. Bouarroudj, Z. Boufaida, and L. Bellatreche, “Named entity disambiguation in short texts over knowledge graphs,” Knowl Inf Syst, vol. 64, no. 2, pp. 325–351, Feb. 2022, doi: 10.1007/s10115-021-01642-9.
A. Hur, N. Janjua, and M. Ahmed, “Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLP,” Expert Syst Appl, vol. 239, Apr. 2024, doi: 10.1016/j.eswa.2023.122269.
W. Li, J. Liu, Y. Gao, X. Zhang, and J. Gu, “Chinese Fine-Grained Named Entity Recognition Based on BILTAR and GlobalPointer Modules,” Applied Sciences, vol. 13, no. 23, p. 12845, Nov. 2023, doi: 10.3390/app132312845.
Y. Zhang and G. Xiao, “Named Entity Recognition Datasets: A Classification Framework,” International Journal of Computational Intelligence Systems, vol. 17, no. 1. Springer Science and Business Media B.V., Dec. 01, 2024. doi: 10.1007/s44196-024-00456-1.
J. Lee and J. A. Shin, “Decoding BERT’s Internal Processing of Garden-Path Structures through Attention Maps*,” Korean Journal of English Language and Linguistics, vol. 23, pp. 461–481, 2023, doi: 10.15738/kjell.23..202306.461.
C. Raffel et al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” 2020. [Online]. Available:
http://jmlr.org/papers/v21/20-074.html.
B. S. Al-Smadi, “DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning,” Comput Biol Med, vol. 170, Mar. 2024, doi: 10.1016/j.compbiomed.2024.107921.
E. Kotei and R. Thirunavukarasu, “A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning,” Information (Switzerland), vol. 14, no. 3. MDPI, Mar. 01, 2023. doi: 10.3390/info14030187.
M. S. I. Malik, A. Nazarova, M. M. Jamjoom, and D. I. Ignatov, “Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, Sep. 2023,
doi: 10.1016/j.jksuci.2023.101736.
A. H. Oliaee, S. Das, J. Liu, and M. A. Rahman, “Using Bidirectional Encoder Representations from Transformers (BERT) to classify traffic crash severity types,” Natural Language Processing Journal, vol. 3, p. 100007, Jun. 2023,
doi: 10.1016/j.nlp.2023.100007.
S. Harrer, “Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine,” eBioMedicine, vol. 90. Elsevier B.V., Apr. 01, 2023.
doi: 10.1016/j.ebiom.2023.104512.
B. Ji et al., “Research on Chinese medical named entity recognition based on collaborative cooperation of multiple neural network models,” J Biomed Inform, vol. 104, Apr. 2020,
doi: 10.1016/j.jbi.2020.103395.
K. Ahmed, S. K. Khurshid, and S. Hina, “CyberEntRel: Joint extraction of cyber entities and relations using deep learning,” Comput Secur, vol. 136, Jan. 2024, doi: 10.1016/j.cose.2023.103579.
X. Li, H. Zhang, and X. H. Zhou, “Chinese clinical named entity recognition with variant neural structures based on BERT methods,” J Biomed Inform, vol. 107, Jul. 2020,
doi: 10.1016/j.jbi.2020.103422.
C. Wang et al., “Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree,” J Biomed Inform, vol. 111, Nov. 2020,
doi: 10.1016/j.jbi.2020.103583.
C. Sun, Z. Yang, L. Wang, Y. Zhang, H. Lin, and J. Wang, “Biomedical named entity recognition using BERT in the machine reading comprehension framework,” J Biomed Inform, vol. 118, Jun. 2021, doi: 10.1016/j.jbi.2021.103799.
H. Fabregat, A. Duque, J. Martinez-Romo, and L. Araujo, “Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction,” J Biomed Inform, vol. 138, Feb. 2023, doi: 10.1016/j.jbi.2022.104279.
R. Juez-Hernandez, L. Quijano-Sánchez, F. Liberatore, and J. Gómez, “AGORA: An intelligent system for the anonymization, information extraction and automatic mapping of sensitive documents,” Appl Soft Comput, vol. 145, Sep. 2023, doi: 10.1016/j.asoc.2023.110540.
N. Jofche, K. Mishev, R. Stojanov, M. Jovanovik, E. Zdravevski, and D. Trajanov, “Named Entity Recognition and Knowledge Extraction from Pharmaceutical Texts using Transfer Learning,” in Procedia Computer Science, Elsevier B.V., 2022,
X. Jiang, Y. Cheng, S. Zhang, J. Wang, and B. Ma, “APIE: An information extraction module designed based on the pipeline method,” Array, vol. 21, Mar. 2024, doi: 10.1016/j.array.2023.100331.
H. Zhang, J. Guo, Y. Wang, Z. Zhang, and H. Zhao, “Judicial nested named entity recognition method with MRC framework,” International Journal of Cognitive Computing in Engineering, vol. 4, pp. 118–126, Jun. 2023,
doi: 10.1016/j.ijcce.2023.03.002.
A. Dash, S. Darshana, D. K. Yadav, and V. Gupta, “A clinical named entity recognition model using pretrained word embedding and deep neural networks,” Decision Analytics Journal, vol. 10, Mar. 2024, doi: 10.1016/j.dajour.2024.100426.
L. Ding, C. Ouyang, Y. Liu, Z. Tao, Y. Wan, and Z. Gao, “Few-shot Named Entity Recognition via encoder and class intervention,” AI Open, vol. 5, pp. 39–45, Jan. 2024,
doi: 10.1016/j.aiopen.2024.01.005.
S. Rizou et al., “Efficient intent classification and entity recognition for university administrative services employing deep learning models,” Intelligent Systems with Applications, vol. 19, Sep. 2023, doi: 10.1016/j.iswa.2023.200247.
B. Jehangir, S. Radhakrishnan, and R. Agarwal, “A survey on Named Entity Recognition — datasets, tools, and methodologies,” Natural Language Processing Journal, vol. 3, p. 100017, Jun. 2023, doi: 10.1016/j.nlp.2023.100017.
Y. Lou, T. Qian, F. Li, and D. Ji, “A Graph Attention Model for Dictionary-Guided Named Entity Recognition,” IEEE Access, vol. 8, pp. 71584–71592, 2020, doi: 10.1109/ACCESS.2020.2987399.
G. Yang and H. Xu, “A Residual BiLSTM Model for Named Entity Recognition,” IEEE Access, vol. 8, pp. 227710–227718, 2020,
doi: 10.1109/ACCESS.2020.3046253.
G. Popovski, B. K. Seljak, and T. Eftimov, “A Survey of Named-Entity Recognition Methods for Food Information Extraction,” IEEE Access, vol. 8. Institute of Electrical and Electronics Engineers Inc., pp. 31586–31594, 2020.
doi: 10.1109/ACCESS.2020.2973502.
T. Saout, F. Lardeux, and F. Saubion, “An Overview of Data Extraction from Invoices,” IEEE Access, vol. 12, pp. 19872–19886, 2024,
doi: 10.1109/ACCESS.2024.3360528.
M. Dias, J. Boné, J. C. Ferreira, R. Ribeiro, and R. Maia, “Named entity recognition for sensitive data discovery in portuguese,” Applied Sciences (Switzerland), vol. 10, no. 7, Apr. 2020,
doi: 10.3390/app10072303.
[50] Y. Wang, Y. Sun, Z. Ma, L. Gao, and Y. Xu, “An ERNIE-based joint model for chinese named entity recognition,” Applied Sciences (Switzerland), vol. 10, no. 16, Aug. 2020,
doi: 10.3390/app10165711.
M. A. Alonso, C. Gómez-Rodríguez, and J. Vilares, “On the use of parsing for named entity recognition,” Applied Sciences (Switzerland), vol. 11, no. 3. MDPI AG, pp. 1–24, Feb. 01, 2021.
doi: 10.3390/app11031090.
P. Bose, S. Srinivasan, W. C. Sleeman, J. Palta, R. Kapoor, and P. Ghosh, “A survey on recent named entity recognition and relationship extraction techniques on clinical texts,” Applied Sciences (Switzerland), vol. 11, no. 18. MDPI, Sep. 01, 2021. doi: 10.3390/app11188319.
L. Nemes and A. Kiss, “Information extraction and named entity recognition supported social media sentiment analysis during the COVID-19 pandemic,” Applied Sciences (Switzerland), vol. 11, no. 22, Nov. 2021, doi: 10.3390/app112211017.
Q. Fang, Y. Li, H. Feng, and Y. Ruan, “Chinese Named Entity Recognition Model Based on Multi-Task Learning,” Applied Sciences (Switzerland), vol. 13, no. 8, Apr. 2023,
doi: 10.3390/app13084770.
W. Liu and X. Cui, “Improving Named Entity Recognition for Social Media with Data Augmentation,” Applied Sciences (Switzerland), vol. 13, no. 9, May 2023,
doi: 10.3390/app13095360.
J. Sawicki, M. Ganzha, M. Paprzycki, and Y. Watanobe, “Applying Named Entity Recognition and Graph Networks to Extract Common Interests from Thematic Subfora on Reddit,” Applied Sciences, vol. 14, no. 5, p. 1696, Feb. 2024,
doi: 10.3390/app14051696.
L. He, Q. Wang, J. Liu, J. Duan, and H. Wang, “Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition,” Applied Sciences, vol. 14, no. 6, p. 2333, Mar. 2024, doi: 10.3390/app14062333.
F. Yi, B. Jiang, L. Wang, and J. Wu, “Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning,” IEEE Access, vol. 8, pp. 63214–63224, 2020,
doi: 10.1109/ACCESS.2020.2984582.
C. M. Tsai, “Stylometric Fake News Detection Based on Natural Language Processing Using Named Entity Recognition: In-Domain and Cross-Domain Analysis,” Electronics (Switzerland), vol. 12, no. 17, Sep. 2023,
doi: 10.3390/electronics12173676.
J. D’Souza, “Agriculture Named Entity Recognition—Towards FAIR, Reusable Scholarly Contributions in Agriculture,” Knowledge, vol. 4, no. 1, pp. 1–26, Jan. 2024,
doi: 10.3390/knowledge4010001.