Skip to Content

Tealeaf

2020
Team Members:
  • Vivek Gopalakrishnan
  • Jennifer Heiko
  • Suyeon Ju
  • Morgan Sanchez
  • Celina Shih
Advisors:
  • Joshua Volegstein, PhD
  • Benjamin Pedigo
  • Jaewon Chung

Abstract:

Random Forest (RF) is an interpretable and robust machine learning algorithm for classification and regression. However, existing RF methods are not well-equipped for multivariate regression tasks (predicting multiple continuous outputs from multiple inputs) due to the inherent challenges of variance in high-dimensional space. To address this issue, we introduce two projection-based split criteria: axis projection and oblique projection. For axis projection, rather than computing mean squared error (MSE) over all predictors and all samples, at each split, MSE is computed on a predictor chosen at random. The oblique projection split criterion splits based on MSE of a linear combination of predictors using weights 1.0 and -1.0 and 0.0. These new split criteria outperform all existing split criteria in the Scikit-Learn implementation of Random Forest (MSE, mean absolute error (MAE), and Friedman MSE) in several nonlinear simulation settings.

Read the Johns Hopkins University privacy statement here.

Accept