RANDOM SUBSPACE LEARNING (RASSEL) WITH DATA DRIVEN WEIGHTING SCHEMES

Research output: Contribution to journalArticlepeer-review

Abstract

We present a novel adaptation of the random subspace learning approach to regression analysis and classification of high dimension low sample size data, in which the use of the individual strength of each explanatory variable is harnessed to achieve a consistent selection of a predictively optimal collection of base learners. In the context of random subspace learning, random forest (RF) occupies a prominent place as can be seen by the vast number of extensions of the random forest idea and the multiplicity of machine learning applications of random forest. The adaptation of random subspace learning presented in this paper differs from random forest in the following ways:(a) instead of using trees as RF does, we use multiple linear regression (MLR) as our regression base learner and the generalized linear model (GLM) as our classification base learner and (b) rather than selecting the subset of variables uniformly as RF does, we present the new concept of sampling variables based on a multinomial distribution with weights (success’ probabilities’) driven through p independent one-way analysis of variance (ANOVA) tests on the predictor variables. The proposed framework achieves two substantial benefits, namely,(1) the avoidance of the extra computational burden brought by the permutations needed by RF to de-correlate the predictor variables, and (2) the substantial reduction in the average test error gained with the base learners used.
Original languageEnglish
Pages (from-to)11-30
Number of pages20
JournalMathematics for Applications
Volume7
DOIs
Publication statusPublished - 1 Jan 2018

Fingerprint

Dive into the research topics of 'RANDOM SUBSPACE LEARNING (RASSEL) WITH DATA DRIVEN WEIGHTING SCHEMES'. Together they form a unique fingerprint.

Cite this