Wednesday, August 8, 2012

Minimum Expected Shortfall Portfolio, Part 1

A few days ago, I wrote a piece on finding the minimum expected shortfall portfolio.  A few astute commenters quickly picked up where I was going with this -- using this as an alternative to low/minimum volatility portfolios.  What follows are the results of a small study I put together.  This post will deal with the portfolio selection and calculation of portfolio weights.  The next post will deal with computing portfolio statistics and comparing that portfolio to alternatives.

Note: In my last post, I mentioned the lack of speed in R during the NL optimization.  I was unable to rectify that situation.  I got tired of waiting on R while working on the code so I switched to SAS IML for the NL optimization.  I use the SAS provided methods to export the needed data into R objects.  I will use R and the handy PerformanceAnalytics package in the next post.  If anyone has ways to speed this up in R, please post it below.

Portfolio Selection
I wanted a set of stocks that were representative of the larger market, but not too many to bog down the analysis.  I choose the stocks of the Dow.

I don't have access to the historic components of the Dow, so I have to stick to the current composition.  I want a fairly long history, but I don't want to add too much survivors bias.  I.E.  If I go back to the late 80s, Cisco is basically a penny stock and will skew the results positive.  However, I wanted long enough to capture large market swings.  I decided to run the portfolio from 2000 through the end of 2011.  

One problem is that Google is part of the Dow and didn't exist before 2004.  So I dropped Google, leaving me with 29 stocks.

We will use weekly return data.  This smooths some of the noise from the series and hopefully some of the time dependence.  Using 2 years of weekly returns (approximately 104 data points) we subtract the mean.  From that fit a Student-T copula to the empirical distributions (previous example here).  Should the MLE fail to converge, fail over to a Gaussian copula.

We will rebalance the portfolio quarterly.  We assume there is no time dependence in the return series.  Using the fitted copula and empirical distributions, simulate 13 weeks ahead for 5000 iterations (65,000 total draws in the simulation).  The 13 weekly returns will be aggregated into 1 quarterly return.  From the 5000 simulated quarterly returns, we will run the minimum expected shortfall optimization.

Because the mean was taken out of the empirical distribution, our simulation assumes a 0 expected return over the period.

The following SAS code makes use of the macros in the copula example with a few modifications.  You can download the new version of the macros here.

The SAS code is can be downloaded here.  It is fairly well commented, so I will not now subject you to a walk through.  If there are questions, feel free to post them.

ods html close;

%let stocks =mmm aa axp t bac ba cat cvx csco ko dd xom ge hpq hd intc ibm jnj jpm mcd mrk msft pfe pg trv utx vz wmt dis;

libname dow30 "C:\temp\9.3\dow30";

proc datasets lib=work nolist kill;

/*Download stocks and prices*/

/*output prices and returns to permanent location*/
data dow30.returns;
set returns;

/*Put Week, Quarter, and Year on returns for aggregation*/
data returns;
format week qtr year 4.;
set returns;
week = week(date);
year = year(date);
qtr = qtr(date);

data dow30.prices;
format week qtr year 4.;
set prices;
week = week(date);
year = year(date);
qtr = qtr(date);

data dow30.prices_wk;
set dow30.prices;
by year week;
if last.week;

/*Sum returns over each week to create weekly returns*/
proc summary data=returns;
var &stocks;
by year week;
output out=weekly(drop=_type_ _freq_) sum=;

/*Copy weekly returns to perm library*/
proc datasets lib=work nolist;
copy out=dow30;
select weekly ;

/*Add a record index*/
data weekly(index=(indx));
set weekly;
indx = _N_;

/*Select the index values for each quarter end,
  starting with year end 1999, and ending with
  year end 2011 */
proc sql noprint;
select indx, year, week
into :starts separated by ' '
from weekly
where (2012 > year > 1999
            and week in (13,26,39,52)
  or (year = 1999 and week = 52)
order by year, week;

%put &starts;

/*Macro will loop over the starting index values
  starts = period starting index values
  obs = number of observations for copula fitting
  fwd = number of periods forward to simulate
  draws = number of draws for simulation
  max = number of times to loop, m<1 means loop over all
        values in starts */
%macro loop(starts,obs=100,fwd=1,draws=100,max=0);
/*Total draws needed in sumulation*/
%let nDraws = %eval(&fwd*&draws);
/*Number of items in starts*/
%let nSims = %sysfunc(countw(&starts));

/*update max with nSims if max < 1*/
%if %sysevalf(&max < 1) %then
      %let max=&nSims;

/*Delete optES if it exists*/
proc datasets lib=work nolist;
delete optES;

/*Delete cov if it exists*/
proc datasets lib=dow30 nolist;
delete cov;

/*Start main loop*/
%do i=1 %to &max;
      /*start= current starting index*/
      %let start = %scan(&starts,&i);
      /*e= ending index*/
      %let e = %eval(&start - &obs);

      /*create subset of weekly returns
        get the as-of week and year*/
      data test;
      set weekly(where=(&e <= indx <=&start)) end=last;
      if _n_ = 1 then do;
       call symput("foy",put(year,4.));
         call symput("fow",put(week,2.));
    if last then do;
         call symput("aoy", put(year,4.));
         call symput("aow", put(week,2.));

      /*de-mean the weekly returns over the subset*/
      proc standard mean=0 data=test out=test;
      var &stocks;

      /*Create the covariance matrix for the subset,
        store in perm location */
      proc corr data=test
            out=cov(where=(_type_="COV")) cov noprint;
      var &stocks;

      data cov;
      format year week 4.;
      set cov;

      proc append base=dow30.cov data=cov force;

      %put AS OF = &start;
      %put From &foy, Week &fow: To &aoy, Week &aow;

      ods select NONE ;

      /*Fit a T copula and simulate from it*/
      proc copula data=test;
      var &stocks;
      fit t /

      simulate /

      /*Check the sim data.  If no observations then the
        fitting failed.  Fail over to a Normal/Gaussian
        Copula */
      proc sql noprint;
      select count(*) format=5. into :nsim from sim;

      %if &nsim = 0 %then %do;
            %put NSIM=&nsim, switching to NORMAL Model;
            proc copula data=test ;
            var &stocks;
            fit normal /

            simulate /
      ods select default;

      /*Create an iteration number in the simulations*/
      data sim_&aoy._%left(&aow);
      format iter 8.;
      set sim;
      iter = mod(_N_,&draws);

      proc sort data=sim_&aoy._%left(&aow);
      by iter;

      /*Aggregate the iterations into a final number for
        the forward period*/
      proc summary data=sim_&aoy._%left(&aow);
      var &stocks;
      by iter;
      output out=sim_&aoy._%left(&aow)(drop=_type_ _freq_ iter) sum=;

      /*Use IML for the NL Optimization */
      proc iml noprint;

      /*Read the simulations into a matrix*/
      use sim_&aoy._%left(&aow);
      read all var _num_ into sim[colname=cnames];
      close sim_&aoy._%left(&aow);

      s = ncol(sim);
      w0 = repeat(1/s,s);

      /*Function calculating expected shortfall*/
      start F_ES(x) global(sim);
            v = sim*x`;
            call sort(v);
            cutoff = floor(nrow(sim)*.05);
            es = v[1:cutoff][:];
         return (-es);
      finish F_ES;

      /*Setup constraints*/
      lb = repeat(0,1,s);
      ub = repeat(.1,1,s);
      ones = repeat(1,1,s);
      addc = ones || {0 1};

      con = (lb || {. .}) //
            (ub || {. .}) //
      optn = {0 0};
      x = w0`;

      /*Call the nl quasi-Newton optimizer*/
      call nlpqn(rc,w,"F_ES",x,optn,con);

      /*Put the returns into a data set*/
      create results_&aoy._%left(&aow) from w[colname=cnames];
      append from w;
      close results_&aoy._%left(&aow);


      /*Add the week and year to the results*/
      data results_&aoy._%left(&aow);
      year = &aoy;
      week = &aow;
      set results_&aoy._%left(&aow);

      /*Append the results into the final results*/
      proc append data=results_&aoy._%left(&aow) base=optES force;


/*Delete the simulation and result files*/
proc datasets lib=work nolist;
delete sim: results:;


/*Call the simulations and optimizations
  Use 104 weeks (2 years) in each subset
  simulate 13 weeks (1 qtr) forward
  draw 5000 iterations   */
%let st = %sysfunc(datetime());
options nonotes nomprint;
options notes nomprint;
%let el = %sysevalf(%sysfunc(datetime()) - &st);
%put Took: &el;

/*Copy the final results and covariance matrices to
  output locationthe */
proc datasets lib=work nolist;
copy out=dow30;
select optES ;

/*Download DIA returns*/
%get_stocks(dia spy,03JAN2000,30JUL2012,keepPrice=0);

/*Finally, export the optimal portfolio weights
  into R and save the DataFrame*/
proc iml;
call ExportDataSetToR("dow30.optES", "optES" );
call ExportDataSetToR("dow30.cov","cov");
call ExportDataSetToR("dow30.prices","prices");
call ExportDataSetToR("dow30.prices_wk","prices_wk");
call ExportDataSetToR("returns","dia_spy");
submit / R;
      save(optES, file="C:\\temp\\dow30\\optES");
      save(cov, file="C:\\temp\\dow30\\cov");
      save(prices, file="C:\\temp\\dow30\\prices");
      save(prices_wk, file="C:\\temp\\dow30\\prices_wk");
      save(dia_spy, file="C:\\temp\\dow30\\dia");
The next post, I will post the R code to analyse the performance and compare it to a minimum variance portfolio, equal weight portfolio, cap weight portfolio, the Dow (DIA), and the S&P500 (SPY).


  1. Hi, Can you post a link to the R version of the code you tried to get to work? That might help finding tweaks to the code to improve the speed ...

  2. Hi Dom,
    Since you mentioned that you want this analysis to run quickly, you might want to check out a recent tip I posted on how to speed up simulations in SAS. It might prove useful in the future: or short URL

    1. Hi Rick. Because I am using rolling time windows, I would have to massively duplicate the data to create BY groups. PROC COPULA does have a BY statement. I could make a BY loop in IML for the nlpqn() calls.

      In the end time consideration was in the nlpqn call. R is about 10-15 times slower doing the optimization. That was the factor for using SAS.