Today, I'm looking at one of the new procedures in SAS 9.3. PROC COPULA is new to the SAS/ETS package and is still "experimental." The SAS Documentation describes it best:
A copula is a function that combines marginal distributions of variables into a specific multivariate distribution. All of the one-dimensional marginals in the multivariate distribution are the cumulative distribution functions of the factors. Copulas help perform large-scale multivariate simulation from separate models, each of which can be fitted using different, even nonnormal, distributional specifications.Previously, simulation through copulas was only available in PROC MODEL (also in ETS), and Risk Dimensions (SAS' high end risk analysis framework). However, there was no way to fit non-normal copulas out of the box. The guys at SAS give it to us here.
Simulation using copulas has been pushed by the finance industry. So let's create a way to download a series of stock prices, fit a T copula to the log-returns, simulate from that, and look at the before and after distributional properties. Along the way, we'll discover how PROC COPULA operates.
First two macros for downloading stock prices from Yahoo! Finance.
Let's download the top ten holdings from the SPY (S&P500 tracking) ETF and convert them into returns:%macro download(symbol,from,to);/*Builde URL for CSV from Yahoo! Finance*/data _null_;format s $128.;if "&from" ^= "" thenfrom = "&from"d;elsefrom = intnx('year',today(),-1,'same');if "&to" ^= "" thento = "&to"d;elseto = today()-1;put FROM= date9. TO= date9.;to_d = day(to);to_m = month(to)-1;to_y = year(to);from_d = day(from);from_m = month(from)-1;from_y = year(from);s = catt("'http://ichart.finance.yahoo.com/table.csv?s=&symbol",'&d=',put(to_m,z2.),'&e=',to_d,'&f=',put(to_y,4.),'&g=d&a=',put(from_m,z2.),'&b=',from_d,'&c=',put(from_y,4.),'&ignore=.csv',"'");call symput("s",s);run;%put NOTE: &s;/*SAS Filename to point to the URL*/filename in url &s;/*Use PROC IMPORT to download and parse the CSV*/proc import file=in dbms=csv out=&symbol(rename=(adj_close=&symbol)) replace;run;/*Clear the filename to the url*/filename in clear;/*Ensure data are sorted*/proc sort data=&symbol(keep=date &symbol);by date;run;%mend;%macro get_stocks(stocks,from,to);%local i n;%let n= %sysfunc(countw(&stocks));options nosource nonotes nosource2;%do i=1 %to &n;%download(%scan(&stocks,&i),&from,&to);%end;options source notes source2;data stocks;merge &stocks;by date;run;proc datasets lib=work nolist;delete &stocks;quit;%mend;
ods html;Now, let's fit a T copula to the returns and simulate from it:
%let stocks= aapl xom ibm msft cvx ge t jnj pg wfc;%get_stocks(&stocks,25MAY2010,25MAY2011);
set stocks;array s[*] &stocks;do i=1 to dim(s);s[i] = log(s[i]/lag(s[i]));end;drop i;run;
First, we use the ods select statement to specify what output we want to see. There are additional outputs.ods select FitSummaryConvergenceStatusParameterEstimatesMatrixPlotUnifMatrixPlotOrig;title "T-Copula Fitting";proc copula data=returns;var &stocks;fit t /marginals=empiricalmethod = MLEplots = (data = both matrix);simulate /ndraws=10000seed=54321out=sim;run;ods select default;
Second, the var statement specifies which variables to include. Obviously, we want all our stock returns.
The fit statement (why can the ETS guys not get the IDE to recognize this and turn it blue? If STAT can do it on all their procedures, then ETS should be able to. Rant from a former product manager over) says we want to fit a T copula. The 'marginals=empirical' says to use the empirical distribution from the data to transform them to uniforms. The other option is to fit a distribution and transform the values into uniforms before passing to COPULA. We fit with a Maximum Likelihood Estimation and plot correlations.
The simulate statement simulates from the fitted copula. Given that we used the empirical values, we will get back values in the return space. This has a drawback as we will soon see. We are simulating 10,000 draws and putting the output into a data set named sim.
Output is available here. Notice the df parameter for the copula is ~8. Definitely not normal.
So now, let's look at the moments of the observed distribution as well as the simulated distribution:
Go back to the output and look at the results. The simulated distributions are nearly identical to the observed through the first 4 moments. Notice that only AT&T (T) is apporiximately normal (skew=0 and kurtosis=0)./*Original Distribution*/title "Original Distribution";proc means data=returns mean std skew kurt min max;var &stocks;run;/*Simulated Distribution*/title "Simulated Distribution";proc means data=sim mean std skew kurt min max;var &stocks;run;ods html close;
You will see that the max and min are identical between the observed and simulated data. This makes sense because we are using empirical distributions in the copula. There is no model for the tail of the distribution so there is no way for PROC COPULA to simulate beyond the largest and smallest observed values. So the distributions are truncated at the bounds.
I assume that fitting a copula using 'marginals=uniform' will allow simulation throughout the full range (0,1). I attempted to do this in a small test and got a weird error. There is nothing about the simulation specifics in the documentation.
Besides my inability to simulate with the uniforms, my open question is how are values from the empirical selected when the simulated uniform is between values. For example, the empirical CDF of -0.01 is 0.4 and the next largest value, -0.0095 is 0.41. My simulated value is 0.405. Does COPULA split the difference? Does it bucket into empirical values? I will be looking into the simulation specifics as I go forward. If you know, please let me know below!