Let \(Y\) and \(Z\) be the values of two random assets and consider the portfolio: \[W_\lambda = \lambda Y + (1-\lambda) Z, \qquad \lambda \in [0,1] \] allocating a proportion \(\lambda\) of your wealth to \(Y\) and a proportion \(1-\lambda\) to \(Z\).
A common, risk-averse, strategy is to minimize the risk\(\operatorname{Var} [W_\lambda]\).
It can be shown that this risk is minimized at \[
\lambda_\text{opt} =
\frac{\operatorname{Var}[Z] - \operatorname{Cov}[Y,Z]}
{\operatorname{Var}[Y] + \operatorname{Var}[Z] - 2\operatorname{Cov}[Y,Z]}
\]
But in practice, \(\operatorname{Var}[Y]\), \(\operatorname{Var}[Z]\) and \(\operatorname{Cov}[Y,Z]\) are unknown.
2.2 Sample case
Now, if historical data\(X_1=(Y_1,Z_1),\ldots,X_n=(Y_n,Z_n)\) are available, then we can estimate \(\lambda_\text{opt}\) by \[
\hat{\lambda}_\text{opt} =
\frac{\widehat{\text{Var}[Y]}-\widehat{\text{Cov}[Y,Z]}}{\widehat{\text{Var}[Y]}+\widehat{\text{Var}[Z]}-2\widehat{\text{Cov}[Y,Z]}}
\] where
\(\widehat{\text{Var}[Y]}\) is the sample variance of the \(Y_i\)’s
\(\widehat{\text{Var}[Z]}\) is the sample variance of the \(Z_i\)’s
\(\widehat{\text{Cov}[Y,Z]}\) is the sample covariance of the \(Y_i\)’s and \(Z_i\)’s.
2.3 How to estimate the accuracy of \(\hat{\lambda}_\text{opt}\)?
\(\dots\) i.e., its standard deviation \(\text{Std}[\hat{\lambda}_\text{opt}]\)?
Using the available sample, we observe \(\hat{\lambda}_\text{opt}\)only once.
We need further samples leading to further observations of \(\hat{\lambda}_\text{opt}\).
Figure 1: Portfolio data. For this sample, \(\hat{\lambda}_\text{opt} = 0.283\)(James et al. 2021).
2.4 Sampling from the population: infeasible
We generated 1000 samples from the population. The first three are:
Figure 2: \(\color{red}{\hat{\lambda}^{(1)}_\text{opt}=0.283}\), \(\color{blue}{\hat{\lambda}^{(2)}_\text{opt}=0.357}\), \(\color{orange}{\hat{\lambda}^{(3)}_\text{opt}=0.299}\)(James et al. 2021).
This allows us to compute: \(\bar{\lambda}_\text{opt}
= \frac{1}{1000} \sum_{i=1}^{1000} \hat{\lambda}^{(i)}_\text{opt}\)
Figure 3: Histogram and boxplot of the empirical distribution of the \(\hat{\lambda}^{(i)}_\text{opt}\)(James et al. 2021).
(This could also be used to estimate quantiles of \(\hat{\lambda}_\text{opt}\).)
2.5 Sampling from the sample: the bootstrap
It is important to realize that this cannot be done in practice. One cannot sample from the population \(P_\theta\) since it is unknown.
However, one may sample instead from the empirical distribution \(P_n\) (i.e., the uniform distribution over \((X_1,\ldots,X_n)\)), that is close to \(P_\theta\) for large \(n\).
This means that we sample with replacement from \((X_1,\ldots,X_n)\), providing a first bootstrap sample\((X_1^{*1},\ldots,X_n^{*1})\) which allows us to evaluate \(\hat{\lambda}_\text{opt}^{*(1)}\).
Further generating bootstrap samples \((X_1^{*b},\ldots,X_n^{*b})\), \(b=2,\ldots,B=1000\), one can compute \[
\widehat{\text{Std}[\hat{\lambda}_\text{opt}]}^*
= \sqrt{ \frac{1}{B-1} \sum_{b=1}^B
( \hat{\lambda}^{*(b)}_\text{opt} - \bar{\lambda}_\text{opt}^* )^2 }
\] with \[
\bar{\lambda}_\text{opt}^*
= \frac{1}{1000} \sum_{b=1}^B \hat{\lambda}^{*(b)}_\text{opt}
\]
This provides \[
\widehat{\text{Std}[\hat{\lambda}_\text{opt}]}^*
\approx 0.079
\]
Figure 4: Histogram and boxplot of the bootstrap distribution of \(\hat\lambda_\text{opt}\)(James et al. 2021).
(This could again be used to estimate quantiles of \(\hat{\lambda}_\text{opt}\).)
2.6 A comparison between both samplings
Results are close: \(\widehat{\text{Std}[\hat{\lambda}_\text{opt}]}\approx 0.077\) and \(\widehat{\text{Std}[\hat{\lambda}_\text{opt}]}^*\approx 0.079\).
Figure 5: Bootstrap distributions from portfolio data (James et al. 2021).
Each bootstrap sample \((X_1^{*b},\ldots,X_n^{*b})\) is obtained by sampling (uniformly) with replacement among the original sample \((X_1,\ldots,X_n)\).
Under mild conditions, the empirical distribution of \((T^{*1},\ldots,T^{*B})\) provides a good approximation of the sampling distribution of \(T\) under \(P_\theta\).
Possible uses:
\(\frac{1}{B-1}\sum_{b=1}^B (T^{*b}-\bar{T}^*)^2\), with \(\bar{T}^*=\frac{1}{B}\sum_{b=1}^B T^{*b}\), estimates \(\text{Var[T]}\)
The sample \(\alpha\)-quantile \(q^*_{\alpha}\) of \((T^{*1},\ldots,T^{*B})\) estimates \(T\)’s \(\alpha\)-quantile
Possible uses when \(T\) is an estimator of \(\theta\):
\((\frac{1}{B}\sum_{b=1}^B T^{*b})-T\) estimates the bias \(\mathbb E [T] - \theta\) of \(T\)
\([q^*_{\alpha/2},q^*_{1-(\alpha/2)}]\) is an approximate \((1-\alpha)\)-confidence interval for \(\theta\).
\(\ldots\)
4 About the implementation in R
4.1 A toy illustration
Let \(X_1,\ldots,X_n\) (\(n=4\)) be i.i.d \(t\)-distributed with \(6\) degrees of freedom.
Let \(\bar{X}=\frac{1}{n} \sum_{i=1}^n X_i\) be the sample mean.
How to estimate the variance of \(\bar{X}\) through the bootstrap?
Code
n <-4(X <-rt(n,df=6))
[1] -0.08058779 0.28044078 1.19011050 -1.25212790
Code
Xbar <-mean(X)Xbar
[1] 0.0344589
4.2 Obtaining a bootstrap sample
Code
X
[1] -0.08058779 0.28044078 1.19011050 -1.25212790
Code
d <-sample(1:n,n,replace=TRUE)d
[1] 2 4 4 4
Code
Xstar <- X[d]Xstar
[1] 0.2804408 -1.2521279 -1.2521279 -1.2521279
4.3 Generating \(B=1000\) bootstrap means
Code
B <-1000Bootmeans <-vector(length = B)for (b in (1:B)) { d <-sample(1:n, n, replace =TRUE) Bootmeans[b] <-mean(X[d])}Bootmeans[1:4]
[1] 0.2370868 -0.3486833 0.3521335 0.2370868
4.4 Bootstrap estimates
Bootstrap estimates of \(\mathbb E [\bar{X}]\) and \(\text{Var}[\bar{X}]\) are then given by
Code
mean(Bootmeans)
[1] 0.03679914
Code
var(Bootmeans)
[1] 0.1789107
The practical sessions will explore how well such estimates behave.
4.5 The boot function
A better strategy is to use the boot function from
Code
library(boot)
The boot function takes typically 3 arguments:
data: the original sample
statistic: a user-defined function with the statistic to bootstrap
1st argument: a generic sample
2nd argument: a vector of indices pointing to a subsample on which the statistic is to be evaluated
R: the number \(B\) of bootstrap samples to consider
If the statistic is the mean, then a suitable user-defined function is
Code
boot.mean <-function(x,d) {mean(x[d])}
The bootstrap estimate of \(\text{Var}[\bar{X}]\) is then
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to StatisticalLearning: With Applications in R. Springer Texts in Statistics. New York, NY: Springer US. https://doi.org/10.1007/978-1-0716-1418-1.