This R session will introduce the basics on portfolio design and evaluation with R. (For the evaluation the package portfolioBacktest is recommended.)

(Useful R links: Cookbook R, Quick-R, R documentation, CRAN, METACRAN.)

Static portfolios

In this section, we will divide the stock market data into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation). A more sophisticated approach (considered in the next section) is based on a rolling window where the portfolio is updated with some frequency rather than kept fixed.

We start by loading some stock market data and dividing it into a training set and test set:

library(xts)
library(quantmod)
library(PerformanceAnalytics)

# set begin-end date and stock namelist
begin_date <- "2013-01-01"
end_date <- "2017-08-31"
stock_namelist <- c("AAPL", "AMD", "ADI",  "ABBV", "AET", "A",  "APD", "AA","CF")

# download data from YahooFinance
prices <- xts()
for (stock_index in 1:length(stock_namelist))
  prices <- cbind(prices, Ad(getSymbols(stock_namelist[stock_index], 
                                        from = begin_date, to = end_date, auto.assign = FALSE)))
colnames(prices) <- stock_namelist
indexClass(prices) <- "Date"
str(prices)
#> An 'xts' object on 2013-01-02/2017-08-30 containing:
#>   Data: num [1:1175, 1:9] 70.1 69.2 67.3 66.9 67.1 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ...
#>   Indexed by objects of class: [Date] TZ: UTC
#>   xts Attributes:  
#>  NULL
head(prices)
#>                AAPL  AMD      ADI     ABBV      AET        A      APD
#> 2013-01-02 70.12886 2.53 37.90372 28.62027 43.32114 28.16434 67.74826
#> 2013-01-03 69.24367 2.49 37.29209 28.38393 42.40319 28.26521 67.51154
#> 2013-01-04 67.31493 2.59 36.62878 28.02536 42.59989 28.82338 68.41895
#> 2013-01-07 66.91895 2.67 36.74078 28.08241 43.23684 28.61492 68.35583
#> 2013-01-08 67.09905 2.67 36.36173 27.47122 41.75045 28.38627 68.48207
#> 2013-01-09 66.05039 2.63 36.26697 27.62606 42.31490 29.15291 69.40527
#>                  AA       CF
#> 2013-01-02 20.62187 29.72212
#> 2013-01-03 20.80537 29.58158
#> 2013-01-04 21.24121 30.24417
#> 2013-01-07 20.87419 30.13087
#> 2013-01-08 20.87419 29.68914
#> 2013-01-09 20.82831 30.72749
tail(prices)
#>                AAPL   AMD      ADI     ABBV      AET        A      APD
#> 2017-08-23 157.5971 12.48 77.08375 69.09010 154.8036 62.21310 140.5500
#> 2017-08-24 156.8977 12.50 77.27874 69.66007 154.3686 62.12389 140.6178
#> 2017-08-25 157.4789 12.43 76.98628 70.01749 154.3192 62.34195 141.3730
#> 2017-08-28 159.0649 12.23 77.44447 70.82896 154.7443 62.89698 141.1504
#> 2017-08-29 160.4835 12.15 77.55172 71.37959 154.6356 62.92671 140.7534
#> 2017-08-30 160.9169 12.67 81.61696 71.40857 155.1101 63.33307 140.8889
#>               AA       CF
#> 2017-08-23 41.06 28.08939
#> 2017-08-24 41.34 28.18649
#> 2017-08-25 41.21 28.13794
#> 2017-08-28 42.17 28.18649
#> 2017-08-29 43.00 27.99230
#> 2017-08-30 43.09 28.09910

# compute log-returns and linear returns
X_log <- diff(log(prices))[-1]
X_lin <- (prices/lag(prices) - 1)[-1]

# or alternatively...
X_log <- CalculateReturns(prices, "log")[-1]
X_lin <- CalculateReturns(prices)[-1]

N <- ncol(X_log)  # number of stocks
T <- nrow(X_log)  # number of days

# split data into training and test data
T_trn <- round(0.7*T)  # 70% of data
X_log_trn <- X_log[1:T_trn, ]
X_log_tst <- X_log[(T_trn+1):T, ]
X_lin_trn <- X_lin[1:T_trn, ]
X_lin_tst <- X_lin[(T_trn+1):T, ]

Modeling linear vs log returns

Now we are ready to obtain the sample estimates from the returns \(\mathbf{x}_t\) (i.e., sample means and sample covariance matrix) as \[ \begin{align} \hat{\boldsymbol{\mu}} & = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t\\ \hat{\boldsymbol{\Sigma}} & = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})(\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T \end{align} \]

However, it is not totally clear whether we should use linear returns or log returns to estimate \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\). Clearly, for the portfolio design we need the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\) of the linear returns. There are three different philosophies in the estimation procedure that come to mind:

  1. estimate them directly from the linear returns (even though linear returns are not supposed to be easily modeled): \(\hat{\boldsymbol{\mu}} = \hat{\boldsymbol{\mu}}^\textsf{lin}\) and \(\hat{\boldsymbol{\Sigma}} = \hat{\boldsymbol{\Sigma}}^\textsf{lin}\)
  2. estimate them from the log returns (and ignore the approximation error): \(\hat{\boldsymbol{\mu}} = \hat{\boldsymbol{\mu}}^\textsf{log}\) and \(\hat{\boldsymbol{\Sigma}} = \hat{\boldsymbol{\Sigma}}^\textsf{log}\)
  3. estimate them from the log returns but properly transforming them to linear: \[ \begin{align} \hat{\boldsymbol{\mu}} & = \exp\left( \hat{\boldsymbo