This R session will illustrate the incorporation of prior information in the estimation of \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).

(Useful R links: Cookbook R, Quick-R, R documentation, CRAN, METACRAN.)

Shrinkage

Suppose we have an \(N\)-dimensional i.i.d. Gaussian time series \[ \mathbf{x}_t \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}). \] The sample estimators for the mean and covariance matrix are, respectively, the sample mean \[ \hat{\boldsymbol{\mu}} = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t \] and the sample covariance matrix \[ \hat{\boldsymbol{\Sigma}} = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T(\mathbf{x}_t - \hat{\boldsymbol{\mu}}). \] We will next explore the idea of shrinkage in order to improve those sample estimators when \(T\) is not large enough.

Shrinkage in \(\boldsymbol{\mu}\)

The James-Stein estimator is \[ \hat{\boldsymbol{\mu}}^\textsf{JS} = (1-\rho) \hat{\boldsymbol{\mu}} + \rho \mathbf{t} \] where \(\mathbf{t}\) is the shrinkage target and \(0\le\rho\le1\) is the amount of shrinkage. Common choices for the target \(\mathbf{t}\) are:

  • \(\mathbf{t} = \mathbf{0}\)
  • \(\mathbf{t} = 0.1 \times \mathbf{1}\)
  • \(\mathbf{t} = \frac{\mathbf{1}^T \hat{\boldsymbol{\mu}}}{N} \times \mathbf{1}\) (grand mean target)
  • \(\mathbf{t} = \frac{\mathbf{1}^T\hat{\boldsymbol{\Sigma}}^{-1}\hat{\boldsymbol{\mu}}}{\mathbf{1}^T\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}} \times \mathbf{1}\) (volatility-weighted mean)

A choice of \(\rho\) that improves the estimatior error is \[ \rho = \frac{1}{T}\frac{N\bar{\boldsymbol{\lambda}}-2\boldsymbol{\lambda}_\textsf{max}}{\|\hat{\boldsymbol{\mu}} - \mathbf{t}\|^2} \] where \(\bar{\boldsymbol{\lambda}}\) and \(\boldsymbol{\lambda}_\textsf{max}\) are the average and maximum values, respectively, of the eigenvalues of \(\boldsymbol{\Sigma}\) (or \(\hat{\boldsymbol{\Sigma}}\)).

We will first explore this estimator using synthetic data. Let’s start generating the data:

# generate Gaussian synthetic return data
library(mvtnorm)
set.seed(357)
N <- 100
T <- 120
mu <- runif(N)
U <- t(rmvnorm(n = round(0.7*N), sigma = 0.1*diag(N)))
Sigma <- U %*% t(U) + diag(N)
X <- rmvnorm(n = T, mean = mu, sigma = Sigma)
str(X)
#>  num [1:120, 1:100] 3.57 0.19 -0.807 5.042 0.281 ...

# sample estimates (sample mean and sample covariance matrix)
mu_sm <- colMeans(X)
Sigma_scm <- cov(X)

Now let’s try the James-Stein estimator with the four targets:

# define the four targets
t_1 <- rep(0, N)
t_2 <- rep(0.1, N)
t_3 <- rep(mean(mu_sm), N)
t_4 <- rep(sum(solve(Sigma_scm, mu_sm))/sum(solve(Sigma_scm, rep(1, N))), N)

# compute the corresponding four rho's
lambdas <- eigen(Sigma_scm)$values
lmd_mean <- mean(lambdas)
lmd_max <- max(lambdas)
rho_1 <- (1/T)*(N*lmd_mean - 2*lmd_max)/norm(mu_sm - t_1, "2")^2
rho_2 <- (1/T)*(N*lmd_mean - 2*lmd_max)/norm(mu_sm - t_2, "2")^2
rho_3 <- (1/T)*(N*lmd_mean - 2*lmd_max)/norm(mu_sm - t_3, "2")^2
rho_4 <- (1/T)*(N*lmd_mean - 2*lmd_max)/norm(mu_sm - t_4, "2")^2

# finally the James-Stein estimators
mu_JS_1 <- (1-rho_1)*mu_sm + rho_1*t_1
mu_JS_2 <- (1-rho_2)*mu_sm + rho_2*t_2
mu_JS_3 <- (1-rho_3)*mu_sm + rho_3*t_3
mu_JS_4 <- (1-rho_4)*mu_sm + rho_4*t_4

Let’s now compare the estimation error measured in terms of squared error and PRIAL (percentage relative improvement in average loss): \[ \begin{align} \textsf{SE} &= \|\hat{\boldsymbol{\mu}}^\textsf{JS} - \boldsymbol{\mu}\|^2\\ \textsf{PRIAL} &= \frac{\|\hat{\boldsymbol{\mu}} - \boldsymbol{\mu}\|^2 - \|\hat{\boldsymbol{\mu}}^\textsf{JS} - \boldsymbol{\mu}\|^2}{\|\hat{\boldsymbol{\mu}} - \boldsymbol{\mu}\|^2} \end{align} \]

# compute squared error and PRIAL
SE_mu <- c(norm(mu_sm - mu, "2&quo