A few days ago, I wrote a piece on finding the minimum expected shortfall portfolio. A few astute commenters quickly picked up where I was going with this -- using this as an alternative to low/minimum volatility portfolios. What follows are the results of a small study I put together. This post will deal with the portfolio selection and calculation of portfolio weights. The next post will deal with computing portfolio statistics and comparing that portfolio to alternatives.

Note: In my last post, I mentioned the lack of speed in R during the NL optimization. I was unable to rectify that situation. I got tired of waiting on R while working on the code so I switched to SAS IML for the NL optimization. I use the SAS provided methods to export the needed data into R objects. I will use R and the handy PerformanceAnalytics package in the next post. If anyone has ways to speed this up in R, please post it below.

**Portfolio Selection**

I wanted a set of stocks that were representative of the larger market, but not too many to bog down the analysis. I choose the stocks of the Dow.

I don't have access to the historic components of the Dow, so I have to stick to the current composition. I want a fairly long history, but I don't want to add too much survivors bias. I.E. If I go back to the late 80s, Cisco is basically a penny stock and will skew the results positive. However, I wanted long enough to capture large market swings. I decided to run the portfolio from 2000 through the end of 2011.

One problem is that Google is part of the Dow and didn't exist before 2004. So I dropped Google, leaving me with 29 stocks.

**Methodology**

We will use weekly return data. This smooths some of the noise from the series and hopefully some of the time dependence. Using 2 years of weekly returns (approximately 104 data points) we subtract the mean. From that fit a Student-T copula to the empirical distributions (previous example here). Should the MLE fail to converge, fail over to a Gaussian copula.

We will rebalance the portfolio quarterly. We assume there is no time dependence in the return series. Using the fitted copula and empirical distributions, simulate 13 weeks ahead for 5000 iterations (65,000 total draws in the simulation). The 13 weekly returns will be aggregated into 1 quarterly return. From the 5000 simulated quarterly returns, we will run the minimum expected shortfall optimization.

Because the mean was taken out of the empirical distribution, our simulation assumes a 0 expected return over the period.

**Code**

The following SAS code makes use of the macros in the copula example with a few modifications. You can download the new version of the macros here.

The SAS code is can be downloaded here. It is fairly well commented, so I will not now subject you to a walk through. If there are questions, feel free to post them.

The next post, I will post the R code to analyse the performance and compare it to a minimum variance portfolio, equal weight portfolio, cap weight portfolio, the Dow (DIA), and the S&P500 (SPY).

ods html close;%let stocks =mmm aa axp t bac ba cat cvx csco ko dd xom ge hpq hd intc ibm jnj jpm mcd mrk msft pfe pg trv utx vz wmt dis;libname dow30 "C:\temp\9.3\dow30";procdatasetslib=work nolist kill;quit;/*Download stocks and prices*/%(&stocks,get_stocks01JAN1998,30JUL2012,keepPrice=1);/*output prices and returns to permanent location*/datadow30.returns;set returns;run;/*Put Week, Quarter, and Year on returns for aggregation*/datareturns;format week qtr year4.;set returns;week = week(date);year = year(date);qtr = qtr(date);run;datadow30.prices;format week qtr year4.;set prices;week = week(date);year = year(date);qtr = qtr(date);run;datadow30.prices_wk;set dow30.prices;by year week;if last.week;run;/*Sum returns over each week to create weekly returns*/procsummarydata=returns;var &stocks;by year week;output out=weekly(drop=_type_ _freq_) sum=;run;/*Copy weekly returns to perm library*/procdatasetslib=work nolist;copy out=dow30;select weekly ;run;quit;/*Add a record index*/dataweekly(index=(indx));set weekly;indx = _N_;run;/*Select the index values for each quarter end,starting with year end 1999, and ending withyear end 2011 */procsqlnoprint;select indx, year, weekinto :starts separated by ' 'from weeklywhere (2012> year >1999and week in (13,26,39,52))or (year =1999and week =52)order by year, week;quit;%put &starts;/*Macro will loop over the starting index valuesstarts = period starting index valuesobs = number of observations for copula fittingfwd = number of periods forward to simulatedraws = number of draws for simulationmax = number of times to loop, m<1 means loop over allvalues in starts */%macroloop(starts,obs=100,fwd=1,draws=100,max=0);/*Total draws needed in sumulation*/%let nDraws = %eval(&fwd*&draws);/*Number of items in starts*/%let nSims = %sysfunc(countw(&starts));/*update max with nSims if max < 1*/%if %sysevalf(&max <1) %then%let max=&nSims;;/*Delete optES if it exists*/proc datasets lib=work nolist;delete optES;run;quit;/*Delete cov if it exists*/proc datasets lib=dow30 nolist;delete cov;run;quit;/*Start main loop*/%do i=1%to &max;/*start= current starting index*/%let start = %scan(&starts,&i);/*e= ending index*/%let e = %eval(&start - &obs);/*create subset of weekly returnsget the as-of week and year*/data test;set weekly(where=(&e <= indx <=&start)) end=last;if _n_ =1then do;call symput("foy",put(year,4.));call symput("fow",put(week,2.));end;if last then do;call symput("aoy", put(year,4.));call symput("aow", put(week,2.));end;run;/*de-mean the weekly returns over the subset*/proc standard mean=0data=test out=test;var &stocks;run;/*Create the covariance matrix for the subset,store in perm location */proc corr data=testout=cov(where=(_type_="COV")) cov noprint;var &stocks;run;data cov;format year week4.;set cov;year=&aoy;week=&aow;run;proc append base=dow30.cov data=cov force;run;%put AS OF = &start;%put From &foy, Week &fow: To &aoy, Week &aow;ods select NONE ;/*Fit a T copula and simulate from it*/proc copula data=test;var &stocks;fit t /marginals=empiricalmethod=MLE;simulate /ndraws=&nDrawsseed=54321out=sim;run;/*Check the sim data. If no observations then thefitting failed. Fail over to a Normal/GaussianCopula */proc sql noprint;select count(*) format=5.into :nsim from sim;quit;%if &nsim =0%then %do;%put NSIM=&nsim, switching to NORMAL Model;proc copula data=test ;var &stocks;fit normal /marginals=empirical;simulate /ndraws=&nDrawsseed=54321out=sim;run;%end;ods select default;/*Create an iteration number in the simulations*/data sim_&aoy._%(&aow);leftformat iter8.;set sim;iter = mod(_N_,&draws);run;proc sort data=sim_&aoy._%(&aow);leftby iter;run;/*Aggregate the iterations into a final number forthe forward period*/proc summary data=sim_&aoy._%(&aow);leftvar &stocks;by iter;output out=sim_&aoy._%(&aow)(drop=_type_ _freq_ iter) sum=;leftrun;/*Use IML for the NL Optimization */proc iml noprint;/*Read the simulations into a matrix*/use sim_&aoy._%(&aow);leftread all var _num_ into sim[colname=cnames];close sim_&aoy._%(&aow);lefts = ncol(sim);w0 = repeat(1/s,s);/*Function calculating expected shortfall*/start F_ES(x) global(sim);v = sim*x`;call sort(v);cutoff = floor(nrow(sim)*.05);es = v[1:cutoff][:];return (-es);finish F_ES;/*Setup constraints*/lb = repeat(0,1,s);ub = repeat(.1,1,s);ones = repeat(1,1,s);addc = ones || {01};con = (lb || {..}) //(ub || {..}) //addc;optn = {00};x = w0`;/*Call the nl quasi-Newton optimizer*/call nlpqn(rc,w,"F_ES",x,optn,con);/*Put the returns into a data set*/create results_&aoy._%(&aow) from w[colname=cnames];leftappend from w;close results_&aoy._%(&aow);leftquit;/*Add the week and year to the results*/data results_&aoy._%(&aow);leftyear = &aoy;week = &aow;set results_&aoy._%(&aow);leftrun;/*Append the results into the final results*/proc append data=results_&aoy._%(&aow) base=optES force;leftrun;%end;/*Delete the simulation and result files*/proc datasets lib=work nolist;delete sim: results:;run;quit;%mend;/*Call the simulations and optimizationsUse 104 weeks (2 years) in each subsetsimulate 13 weeks (1 qtr) forwarddraw 5000 iterations */%let st = %sysfunc(datetime());options nonotes nomprint;*loop(starts,obs=100,fwd=1,draws=100,max=0);%(&starts,obs=loop104,fwd=13,draws=5000,max=0);options notes nomprint;%let el = %sysevalf(%sysfunc(datetime()) - &st);%put Took: ⪙/*Copy the final results and covariance matrices tooutput locationthe */procdatasetslib=work nolist;copy out=dow30;select optES ;run;quit;/*Download DIA returns*/%(dia spy,get_stocks03JAN2000,30JUL2012,keepPrice=0);/*Finally, export the optimal portfolio weightsinto R and save the DataFrame*/prociml;call ExportDataSetToR("dow30.optES", "optES" );call ExportDataSetToR("dow30.cov","cov");call ExportDataSetToR("dow30.prices","prices");call ExportDataSetToR("dow30.prices_wk","prices_wk");call ExportDataSetToR("returns","dia_spy");submit / R;save(optES, file="C:\\temp\\dow30\\optES");save(cov, file="C:\\temp\\dow30\\cov");save(prices, file="C:\\temp\\dow30\\prices");save(prices_wk, file="C:\\temp\\dow30\\prices_wk");save(dia_spy, file="C:\\temp\\dow30\\dia");endsubmit;quit;

Hi, Can you post a link to the R version of the code you tried to get to work? That might help finding tweaks to the code to improve the speed ...

ReplyDeleteSee the previous post.

ReplyDeleteHi Dom,

ReplyDeleteSince you mentioned that you want this analysis to run quickly, you might want to check out a recent tip I posted on how to speed up simulations in SAS. It might prove useful in the future:

http://blogs.sas.com/content/iml/2012/07/18/simulation-in-sas-the-slow-way-or-the-by-way/ or short URL http://bit.ly/Nhn9tW

Hi Rick. Because I am using rolling time windows, I would have to massively duplicate the data to create BY groups. PROC COPULA does have a BY statement. I could make a BY loop in IML for the nlpqn() calls.

DeleteIn the end time consideration was in the nlpqn call. R is about 10-15 times slower doing the optimization. That was the factor for using SAS.