TY - JOUR
T1 - Improving B-cell Linear Epitope Prediction via Multiple Feature Fusion and an Integrated Machine Learning Algorithm
AU - Rao, Bing
AU - Tang, Yuxuan
AU - Hu, Jun
AU - Alahmadi, Hanin
AU - Alqahtani, Yaseer
AU - Arif, Muhammad
AU - Alam, Tanvir
N1 - Publisher Copyright:
© 2025 Bentham Science Publishers
PY - 2025/11/11
Y1 - 2025/11/11
N2 - Introduction: The identification of linear B-Cell epitopes (BCEs) is significantly important for the discovery of drugs, such as antibody production, peptide-based vaccines, and other therapeutics. Materials and Methods: Unlike traditional laboratory-based methods, computational techniques can save cost and time in predicting large-scale BCEs. For this purpose, numerous in-silico methods have been designed to enhance the overall efficacy of BCE prediction. However, research gaps exist for further improvement in the context of using novel feature representations and learning models for BCE prediction. Therefore, in the present study, we aimed to design a novel sequence-based predictor named CoBCEs for screening and discriminating accurate BCEs. The proposed CoBCEs model incorporates the notion of graph-based signature, texture-based, and protein language model (pLM)-based features to sufficiently explore the local and global evolutionary information from protein sequences alone. Then, we fed the fused features, i.e., ProtVec sequence embeddings, Distance-Enhanced Graph (DE-Graph), and term frequency-inverse document frequency (TF-IDF), to an ensemble machine learning classifier. Results: Experimental results of cross-validation and independent tests on several datasets demonstrate that CoBCEs attained superior performance in terms of accuracy, 77.3%, and Matthews correlation coefficient (MCC) of 61.8%, compared with other existing BCE predictors. Discussion: Detailed data analyses show that the major advantage of CoBCEs lies in the combined utilization of graph-based and pLM-based features, which extract more discriminative information from sequences. In the future, we aim to develop a publicly available web server using biological language models for large-scale BCE peptide prediction. Conclusion: We believe our proposed approach will offer valuable insights for drug discovery and disease treatment.
AB - Introduction: The identification of linear B-Cell epitopes (BCEs) is significantly important for the discovery of drugs, such as antibody production, peptide-based vaccines, and other therapeutics. Materials and Methods: Unlike traditional laboratory-based methods, computational techniques can save cost and time in predicting large-scale BCEs. For this purpose, numerous in-silico methods have been designed to enhance the overall efficacy of BCE prediction. However, research gaps exist for further improvement in the context of using novel feature representations and learning models for BCE prediction. Therefore, in the present study, we aimed to design a novel sequence-based predictor named CoBCEs for screening and discriminating accurate BCEs. The proposed CoBCEs model incorporates the notion of graph-based signature, texture-based, and protein language model (pLM)-based features to sufficiently explore the local and global evolutionary information from protein sequences alone. Then, we fed the fused features, i.e., ProtVec sequence embeddings, Distance-Enhanced Graph (DE-Graph), and term frequency-inverse document frequency (TF-IDF), to an ensemble machine learning classifier. Results: Experimental results of cross-validation and independent tests on several datasets demonstrate that CoBCEs attained superior performance in terms of accuracy, 77.3%, and Matthews correlation coefficient (MCC) of 61.8%, compared with other existing BCE predictors. Discussion: Detailed data analyses show that the major advantage of CoBCEs lies in the combined utilization of graph-based and pLM-based features, which extract more discriminative information from sequences. In the future, we aim to develop a publicly available web server using biological language models for large-scale BCE peptide prediction. Conclusion: We believe our proposed approach will offer valuable insights for drug discovery and disease treatment.
KW - B-cell epitopes
KW - drug discovery
KW - feature selection
KW - machine learning
KW - protein language model based features
UR - https://www.scopus.com/pages/publications/105021441388
U2 - 10.2174/0113894501413502251009001940
DO - 10.2174/0113894501413502251009001940
M3 - Article
AN - SCOPUS:105021441388
SN - 1389-4501
JO - Current Drug Targets
JF - Current Drug Targets
ER -