Estimating the prediction (lack of) accuracy
Several strategies to estimate prediction accuracy of a classifier:
(1) Compute a test error (as above): Partition the data set \(\mathcal{S}\) into a training set \(\mathcal{S}_\text{train}\) (to train the classifier) and a test set \(\mathcal{S}_\text{test}\) (on which to evaluate the misclassification rate \(e_\text{test}\)).
(2) Compute an \(L\)-fold cross-validation error:
Partition the data set \(\mathcal{S}\) into \(L\) folds \(\mathcal{S}_{\ell}\), \(\ell=1,\ldots,L\). For each \(\ell\), evaluate the test error \(e_{\text{test},\ell}\) associated with training set \(\mathcal{S}\setminus\mathcal{S}_{\ell}\) and test set \(\mathcal{S}_{\ell}\).
Then the (\(L\)-fold) ‘cross-validation error’ is: \[
e_\text{CV}=\frac{1}{L}\sum_{\ell=1}^L e_{\text{test},\ell}
\]
(3) Compute the Out-Of-Bag (OOB) error:
For each observation \(X_i\) from \(\mathcal{S}\), define the OOB prediction as \[
\phi_{\mathcal{S}}^\text{OOB}(X_i)
= \underset{k\in\{1,\ldots,K\}}{\operatorname{argmax}} \# \{ b:\phi_{\mathcal{S}^{*b}}(X_i)=k
\textrm{ and } (X_i,Y_i)\notin \mathcal{S}^{*b} \}
\] This is a majority voting discarding, quite naturally, bootstrap samples that use \((X_i,Y_i)\) to train the classification tree. The OOB error is then the corresponding misclassification rate \[
e_\text{OOB}=\frac{1}{n}\sum_{i=1}^n
\mathbb{1}[ \phi_{\mathcal{S}}^\text{OOB}(X_i) \neq Y_i]
\]