From a course by Davy Paindaveine and Nathalie Vialaneix
Last updated on February 10, 2025
Using \(m=p\) would simply provide bagging-of-trees. Using \(m\) small is appropriate when there are many correlated predictors. Common practice:
In both cases, results are actually not very sensitive to \(m\).
Random forests
but they
Let us look at efficiency…
We repeated \(M=1000\) times the following experiment:
channing
data set into a training set (of size 300) and a test set (of size 162);randomForest
function in R
with default parameters (\(B=500\) trees, \(m\approx \sqrt p\)).This provided \(M=1000\) test errors for the direct (single-tree) approach, \(M=1000\) test errors for the bagging approach, and \(M=1000\) test errors for the random forest approach.
Because bagging-of-trees and random forests are poorly interpretable compared to classification trees, the following is useful.
The importance, \(v_j\) say, of the \(j\)th predictor is measured as follows.
For each tree (i.e., for any \(b=1,\ldots,B\)),
(A similar measure is used for regression, based on MSEs).