Worksheet
1 CART on Insurance Data
Exercise 1 The dataset contains 2220 observations of young drivers with car insurance contracts. The variables are TYPE, VALUE, SEX, AGEV, and AGEI. Below is a step-by-step solution in R.
Load the dataset and describe it.
Split the dataset into training and test samples (80%-20%).
Use the CART algorithm in order to explain the variable
TYPEby the variablesVALUE,SEX,AGEVandAGEI.
Remark
Remark 1. Ensure TYPE is a factor for classification trees.
Prune the tree using cross-validation (x-error) and the 1-SE rule.
Predict the
TYPEon test set and assess prediction quality.Compute and visualize ROC curves.
Interpret the results and compare them with Linear Discriminant Analysis.