Improving B-cell Linear Epitope Prediction via Multiple Feature Fusion and an Integrated Machine Learning Algorithm

  • Bing Rao
  • , Yuxuan Tang
  • , Jun Hu*
  • , Hanin Alahmadi
  • , Yaseer Alqahtani
  • , Muhammad Arif*
  • , Tanvir Alam*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Introduction: The identification of linear B-Cell epitopes (BCEs) is significantly important for the discovery of drugs, such as antibody production, peptide-based vaccines, and other therapeutics. Materials and Methods: Unlike traditional laboratory-based methods, computational techniques can save cost and time in predicting large-scale BCEs. For this purpose, numerous in-silico methods have been designed to enhance the overall efficacy of BCE prediction. However, research gaps exist for further improvement in the context of using novel feature representations and learning models for BCE prediction. Therefore, in the present study, we aimed to design a novel sequence-based predictor named CoBCEs for screening and discriminating accurate BCEs. The proposed CoBCEs model incorporates the notion of graph-based signature, texture-based, and protein language model (pLM)-based features to sufficiently explore the local and global evolutionary information from protein sequences alone. Then, we fed the fused features, i.e., ProtVec sequence embeddings, Distance-Enhanced Graph (DE-Graph), and term frequency-inverse document frequency (TF-IDF), to an ensemble machine learning classifier. Results: Experimental results of cross-validation and independent tests on several datasets demonstrate that CoBCEs attained superior performance in terms of accuracy, 77.3%, and Matthews correlation coefficient (MCC) of 61.8%, compared with other existing BCE predictors. Discussion: Detailed data analyses show that the major advantage of CoBCEs lies in the combined utilization of graph-based and pLM-based features, which extract more discriminative information from sequences. In the future, we aim to develop a publicly available web server using biological language models for large-scale BCE peptide prediction. Conclusion: We believe our proposed approach will offer valuable insights for drug discovery and disease treatment.

Original languageEnglish
JournalCurrent Drug Targets
Early online dateNov 2025
DOIs
Publication statusPublished - 11 Nov 2025

Keywords

  • B-cell epitopes
  • drug discovery
  • feature selection
  • machine learning
  • protein language model based features

Fingerprint

Dive into the research topics of 'Improving B-cell Linear Epitope Prediction via Multiple Feature Fusion and an Integrated Machine Learning Algorithm'. Together they form a unique fingerprint.

Cite this