Evaluation of machine learning versus Cox regression in identification of factors predicting recurrence following resection of non-small cell lung cancer

J. K. Blayney, C. M. Grills, P. V. Jithesh, I. I. Wistuba, M. Jacobson, K. J. O'Byrne, K. M. Kerr, G. Scagliotti, R. J. Holt, D. A. Fennell

Research output: Contribution to journalMeeting Abstractpeer-review

Abstract

Background: To capture the influence of multiple clinicopathological factors on NSCLC recurrence following surgical resection, machine learning methods (MLMs) can provide clear stratification between subgroups of patients. In this study a clinical signature associated with recurrence-free survival is considered, comparing the performance of MLMs against Cox regression analysis (CRA).

Methods: Recurrences < 3 or > 60 months from surgery were filtered, yielding 846 patients with full data comprising 14 clinicopathological factors. Neural Network classification was applied with correlation feature selection and discretisation, using the target values of recurrence and survival status. Ten-fold cross-validation was used, ranked by AUC values. Likewise, Decision Trees were applied, with Kaplan-Meier analysis (KMA) based on time to recurrence (TTR) or last follow-up. Initial results led to the removal of non-recurring patients with < 60 months follow-up time, leaving 477 patients.

Results: CRA identified pathological stage (χ2 = 17.01, p < 0.0001) as the best single predictive factor, then pT, nodal status and pN. Two superior multiple factor groups were identified: nodal status, pT and stage (χ2 = 22.81, p = 3.5E-06) and nodal status, pT and pN (χ2 = 22.62, p = 3.7E-06). Classifying for recurrence, feature selection identified independent factors of stage and neoadjuvant therapy. Predicting survival status, Neural Networks correctly classed 79.8% of patients (AUC 0.87). Decision Trees, with equal results, also found 14 subgroups based on combined ranges of pT, stage, age, recurrence site and histology. TTR KMA revealed 3 significant clusters of nodes (with discretization) and 4 clusters (without ) (see Table for subgroup examples).

Conclusions: Machine learning methods applied to the prediction of NSCLC recurrence postsurgery identify optimal combinations of clinicopathological variables, enhancing traditional CRA.
Original languageEnglish
JournalJournal of Clinical Oncology
Publication statusPublished - 20 May 2010

Fingerprint

Dive into the research topics of 'Evaluation of machine learning versus Cox regression in identification of factors predicting recurrence following resection of non-small cell lung cancer'. Together they form a unique fingerprint.

Cite this