Skip to main navigation Skip to search Skip to main content

Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition

  • Nadeesha Perera
  • , Thi Thuy Linh Nguyen
  • , Matthias Dehmer
  • , Frank Emmert-Streib*
  • *Corresponding author for this work
  • Tampere University
  • Swiss Distance University of Applied Sciences
  • Private University for Health Sciences, Medical Informatics and Technology
  • Nankai University

Research output: Contribution to journalArticlepeer-review

Abstract

Biomedical Named-Entity Recognition (BioNER) has become an essential part of text mining due to the continuously increasing digital archives of biological and medical articles. While there are many well-performing BioNER tools for entities such as genes, proteins, diseases or species, there is very little research into food and dietary constituent named-entity recognition. For this reason, in this paper, we study seven BioNER models for food and dietary constituents recognition. Specifically, we study a dictionary-based model, a conditional random fields (CRF) model and a new hybrid model, called FooDCoNER (Food and Dietary Constituents Named-Entity Recognition), which we introduce combining the former two models. In addition, we study deep language models including BERT, BioBERT, RoBERTa and ELECTRA. As a result, we find that FooDCoNER does not only lead to the overall best results, comparable with the deep language models, but FooDCoNER is also much more efficient with respect to run time and sample size requirements of the training data. The latter has been identified via the study of learning curves. Overall, our results not only provide a new tool for food and dietary constituent NER but also shed light on the difference between classical machine learning models and recent deep language models.

Original languageEnglish
Pages (from-to)254-275
Number of pages22
JournalMachine Learning and Knowledge Extraction
Volume4
Issue number1
DOIs
Publication statusPublished - Mar 2022
Externally publishedYes

Keywords

  • Biomedical named-entity recognition
  • Conditional random fields
  • Deep language models
  • Dictionary modeling
  • Food and dietary constituents extraction
  • Machine learning
  • Nutrition-entity extraction
  • Phytochemical extraction

Fingerprint

Dive into the research topics of 'Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition'. Together they form a unique fingerprint.

Cite this