[1] -0.08058779 0.28044078 1.19011050 -1.25212790
From a course by Davy Paindaveine and Nathalie Vialaneix
Last updated on February 10, 2025
Let \(X_1, \dots, X_n \sim P_\theta\) i.i.d. Let \(\hat\theta = \hat\theta (X_1, \dots, X_n)\) be an estimator for \(\theta\).
One often wants to evaluate the variance \(\operatorname{Var} [\hat\theta]\) to quantify the uncertainty of \(\hat\theta\).
The bootstrap is a powerful, broadly applicable method:
The method is nonparametric and can deal with small \(n\).
James et al. (2021)
Now, if historical data \(X_1=(Y_1,Z_1),\ldots,X_n=(Y_n,Z_n)\) are available, then we can estimate \(\lambda_\text{opt}\) by \[ \hat{\lambda}_\text{opt} = \frac{\widehat{\text{Var}[Y]}-\widehat{\text{Cov}[Y,Z]}}{\widehat{\text{Var}[Y]}+\widehat{\text{Var}[Z]}-2\widehat{\text{Cov}[Y,Z]}} \] where
We generated 1000 samples from the population. The first three are:
Here: \[ \widehat{\text{Std}[\hat{\lambda}_\text{opt}]} \approx 0.077, \qquad \bar{\lambda}_\text{opt} \approx 0.331 \ \ (\approx \lambda_\text{opt} = \frac13 = 0.333) \]
(This could also be used to estimate quantiles of \(\hat{\lambda}_\text{opt}\).)
This provides \[ \widehat{\text{Std}[\hat{\lambda}_\text{opt}]}^* \approx 0.079 \]
(This could again be used to estimate quantiles of \(\hat{\lambda}_\text{opt}\).)
Results are close: \(\widehat{\text{Std}[\hat{\lambda}_\text{opt}]}\approx 0.077\) and \(\widehat{\text{Std}[\hat{\lambda}_\text{opt}]}^*\approx 0.079\).
\[ \begin{array}{ccc} (X_1^{*1},\ldots,X_n^{*1}) & \leadsto & T^{*1}=T(X_1^{*1},\ldots,X_n^{*1}) \\[2mm] & \vdots & \\[2mm] (X_1^{*b},\ldots,X_n^{*b}) & \leadsto & T^{*b}=T(X_1^{*b},\ldots,X_n^{*b}) \\[2mm] & \vdots & \\[2mm] (X_1^{*B},\ldots,X_n^{*B}) & \leadsto & T^{*B}=T(X_1^{*B},\ldots,X_n^{*B}) \\[2mm] \end{array} \]
Each bootstrap sample \((X_1^{*b},\ldots,X_n^{*b})\) is obtained by sampling (uniformly) with replacement among the original sample \((X_1,\ldots,X_n)\).
Under mild conditions, the empirical distribution of \((T^{*1},\ldots,T^{*B})\) provides a good approximation of the sampling distribution of \(T\) under \(P_\theta\).
Possible uses:
Possible uses when \(T\) is an estimator of \(\theta\):
Bootstrap estimates of \(\mathbb E [\bar{X}]\) and \(\text{Var}[\bar{X}]\) are then given by
The practical sessions will explore how well such estimates behave.
boot
functionA better strategy is to use the boot
function from
The boot
function takes typically 3 arguments:
data
: the original samplestatistic
: a user-defined function with the statistic to bootstrap
R
: the number \(B\) of bootstrap samples to considerIf the statistic is the mean, then a suitable user-defined function is
The bootstrap estimate of \(\text{Var}[\bar{X}]\) is then