This R session will illustrate the different models for multivariate financial time series, in particular, for the conditional mean and conditional covariance matrix or volatility.

(Useful R links: Cookbook R, Quick-R, R documentation, CRAN, METACRAN.)

R packages

The following R packages are widely used by the R community:

There are many other packages worth exploring such as stochvol (stochastic volatility is an alternative modeling to GARCH), prophet, fpp2 (includes Holt-Winter modeling and much more), tsDyn, vars, rucm, etc. For a full list of packages for time series modeling, check the CRAN Task View: Time Series Analysis.

The book Forecasting: principles and practice is available online and also comes with the R package fpp2.

Conditional Mean Models

i.i.d. model

We will start with the simple i.i.d. model as a warm-up. The i.i.d. model assumes an \(N\)-dimensional Gaussian time series for the log-returns \(\mathbf{x}_t\): \[ \mathbf{x}_t \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}). \] The sample estimators for the mean and covariance matrix are, respectively, the sample mean \[ \hat{\boldsymbol{\mu}} = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t \] and the sample covariance matrix \[ \hat{\boldsymbol{\Sigma}} = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T(\mathbf{x}_t - \hat{\boldsymbol{\mu}}). \]

Synthetic data

We will always start by generating synthetic data to become familiar with the procedure and to make sure the estimation process gives the correct result (i.e., sanity check). Only once everything is under control we will move to real market data and fit the different models.

Let’s generate synthetic i.i.d. data and estimate the mean and covariance matrix:

# generate Gaussian synthetic return data
library(mvtnorm)
set.seed(357)
N <- 100
T <- 120
mu <- runif(N)
U <- t(rmvnorm(n = round(0.7*N), sigma = 0.1*diag(N)))
Sigma <- U %*% t(U) + diag(N)
X <- rmvnorm(n = T, mean = mu, sigma = Sigma)

# sample estimates (sample mean and sample covariance matrix)
mu_sm <- colMeans(X)
Sigma_scm <- cov(X)

# error
norm(mu_sm - mu, "2")
#> [1] 2.443633
norm(Sigma_scm - Sigma, "F")
#> [1] 70.79889

Now, let’s do it again for different number of observations \(T\):

# first generate all the data
set.seed(357)
N <- 100
T_max <- 1000
mu <- runif(N)
U <- t(rmvnorm(n = round(0.7*N), sigma = 0.1*diag(N)))
Sigma <- U %*% t(U) + diag(N)
X <- rmvnorm(n = T_max, mean = mu, sigma = Sigma)

# now loop over subsets of the samples
error_mu_vs_T <- NULL
error_Sigma_vs_T <- NULL
T_sweep <- ceiling(seq(1.01*N, T_max, length.out = 20))
for (T in T_sweep) {
  X_ <- X[1:T, ]
  # sample estimates
  mu_sm <- colMeans(X_)
  Sigma_scm <- cov(X_)
  # compute errors
  error_mu_vs_T <- c(error_mu_vs_T, norm(mu_sm - mu, "2"))
  error_Sigma_vs_T <- c(error_Sigma_vs_T, norm(Sigma_scm -