Sunday, October 30, 2011

Learning R: Project 1, Part 2

So it's been a week since I started down this path.  I worked most of this out over last weekend, went to a conference, had hectic week at work, and then realized I lost my work.  Gah.

I'll be posting my general thoughts on R later.  Mostly it seems to be a neat language.  Lots of ways to do things. The ability to create output seems limited.  I played with a number of things trying to create rich HTML output like I did with SAS.  R2HTML might be what I need; I couldn't get it to work.

So here is what I have

require(fImport)
require(PerformanceAnalytics)

These two packages seem to do a lot of what I need. PerformanceAnalytics has a wealth of charting tools for financial data.

#Function to load stock data into a Time Series object
importSeries = function (symbol,from,to) {



#Read data from Yahoo! Finance
input = yahooSeries(symbol,from=from,to=to)

#Character Strings for Column Names
adjClose = paste(symbol,".Adj.Close",sep="")
inputReturn = paste(symbol,".Return",sep="")
CReturn = paste(symbol,".CReturn",sep="")

#Calculate the Returns and put it on the time series
input.Return = returns(input[,adjClose])
colnames(input.Return)[1] = inputReturn
input = merge(input,input.Return)

#Calculate the cumulative return and put it on the time series
input.first = input[,adjClose][1]
input.CReturn = fapply(input[,adjClose],FUN=function(x) log(x) - log(input.first))
colnames(input.CReturn)[1] = CReturn
input = merge(input,input.CReturn)

#Deleting things (not sure I need to do this, but I can't not delete things if
# given a way to...
rm(input.first,input.Return,input.CReturn,adjClose,inputReturn,CReturn)

#Return the timeseries
return(input)

}
I learned a lot about data handling in R putting this function together.

#Load SPY data
spy = importSeries("spy",from="2010-01-01",to="2011-10-22")
#Load Google data
goog = importSeries("goog",from="2010-01-01",to="2011-10-22")

#merge the time series
merged = merge(spy,goog)
Nothing fancy here.  The merge() function is nice, but I have no idea how to do anything but the "full" join that it defaults to.  If anyone knows of a good tutorial on doing more advanced SQL style joins, please let me know.

#Chart the Cumulative Returns
png("c:\\temp\\Returns_r.png")
chart.CumReturns(merged[,c("spy.Return","goog.Return"),drop=FALSE],
                            main="Total Returns SPY vs Google",
                            legend.loc="topleft")
dev.off()

#Create the Correlation plot
png("c:\\temp\\Corr.png")
chart.Correlation(merged[,c("spy.Return","goog.Return")],histogram=TRUE,pch="+")
dev.off()
First, the chart.CumReturns() produces a nice graph. Better than I was able to do with plot().

Second, the char.Correlation() also gives a neat output. I would really like to find a comparable method to produce the alpha ellipses that I did in SAS.

Third, I cannot find a good method that is comparable to PROC CORR. Can I get a good output with both correlation, covariance, mean, std, etc? Please, let me know.
#Regress Google on SPY
reg = lm(merged[,"goog.Return"]~merged[,"spy.Return"])

#Create the confidence interval
newx = merged[,"spy.Return"]
prd = predict(reg,newdata=newx,interval="confidence",level=.95, type="response")

#Print the Regression Summary
summary(reg)
Linear Regression seems pretty easy. It took me a while to decipher the R help to figure out the confidence interval stuff. Again, if there is a way to produce a rich set of output from a regression like SAS and PROC REG, please show me.

Here is the R output:
Call:
lm(formula = merged[, "goog.Return"] ~ merged[, "spy.Return"])


Residuals:
Min1QMedian3QMax
-0.089348-0.005702-0.0000830.0055130.116929


Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)-0.00038410.0006424-0.5980.55
merged[, "spy.Return"]0.96412180.050934618.929<2e-16 ***

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0137 on 453 degrees of freedom (1 observation deleted due to missingness)

Multiple R-squared: 0.4416, Adjusted R-squared: 0.4404

F-statistic: 358.3 on 1 and 453 DF, p-value: < 2.2e-16
Matches SAS. It's not exact, but very close.  That's good.
#Chart the regression
png("c:\\temp\\Regression.png")
chart.Regression(merged[,"goog.Return",drop=FALSE],
                          merged[,"spy.Return",drop=FALSE],
                          fit=c("linear"),
                          main="Google ~ SPY",
                          xlab="SPY Return",
                          ylab="Google Return")

#add the confidence interval
lines(newx$spy.Return,prd[,2],col="Red",lty=2)
lines(newx$spy.Return,prd[,3],col="Red",lty=2)
dev.off()
Using the chart.Regression() from PerformanceAnalytics. The fit interval looks suspect. Maybe I did something wrong.

Friday, October 21, 2011

Learning R: Project 1, Part 1

I've decided to take the leap and learn R.  This is going to be painful.

I decided that doing mini projects will be easier than reading a book cover to cover, forgetting most of it, and deciding to just watch Fringe.

So here we go:

Project 1:
Take data from Yahoo! Finance and calculate the Beta of a stock (let's choose Google since they provide the platform for this blog).

Because I know SAS, I'm going to give the SAS version of the project first.  Then I'm going to go off, tinker, and hopefully get something decent in R with matching results.

Currently I only have access to SAS 9.1.3.  I would love to show the new Graph Template Language stuff, but until I upgrade, we're all stuck with GPlot.

Tasks:

  1. Download data directly from Yahoo! Finance for Google and SPY (proxy for the S&P500).
  2.  Calculate the daily and cumulative log returns of both stocks.
  3. Plot the cumulative returns to get a feel of co-movement.
  4. Calculate the correlation between the daily returns.  Plot the daily returns and %-tile ellipses.
  5. Run a simple linear regression Google = A + B*SPY + e.  Plot summary statistics on things like residuals


SAS Code:

/*http://ichart.finance.yahoo.com/table.csv?s=GOOG&a=00&b=1&c=2010&d=09&e=22&f=2011&g=d&ignore=.csv;*/

%macro download(symbol,start,end);
/* Simple macro to read historic stock data from Yahoo! */

/*Data _null_ Step for string and date parsing.  Build the URL string*/
data _null_;
format url $200.;
url="http://ichart.finance.yahoo.com/table.csv?s=";

start = "&start"d;
end = "&end"d;

/*Starting date parameters*/
a = put(month(start)-1,z2.);
b = put(day(start),2.);
c = put(year(start),4.);

/*Ending date parameters*/
d = put(month(end)-1,z2.);
e = put(day(end),2.);
f = put(year(end),4.);

/*Build the url*/
url = catt(url, upcase("&symbol"),
           '&a=',a,
           '&b=',b,
           '&c=',c,
           '&d=',d,
           '&e=',e,
           '&f=',f,
           '&g=d&ignore=.csv');

/*stick the url into a macro variable*/
call symput('url',compress(url));
run;

/*Fileref to point to the URL*/
filename in url "&url";

/*Use Proc Import to read the CSV
  Could use a data step, but I'm lazy*/

proc import file=in out=&symbol dbms=csv replace;
run;

filename in clear;


/*Yahoo! gives us data in the opposite order we need*/
proc sort data=&symbol;
by date;
run;

/*Pesky holidays mess up our weekday time series
 -- This step is not really necessary for this example
    but it makes using the SAS/ETS procedures later easier
 -- Take the default spline interpolation  */
proc expand data=&symbol out=&symbol to=weekday;
id date;
run;

/*Calculate the log returns daily and cumulative*/
data &symbol(drop=first);
set &symbol;
retain first;

if _n_ = 1 then
    first = adj_close;

&symbol = log(adj_close) - log(lag(adj_close));
&symbol._c = log(adj_close) - log(first);
run;

%mend;

/*Download Google*/
%download(goog, 01JAN2010, 22OCT2011);

/*Download SPY */
%download(spy,  01JAN2010, 22OCT2011);

/*Merge the two data sets*/
proc sql noprint;
create table merged as
select a.date,
       a.spy_c label="Cumulative SPY Return",
       a.spy,
       b.goog_c label="Cumulative Google Return",
       b.goog
    from spy as a,
         goog as b
    where a.date = b.date;

quit;

/*Output to an HTML file for publishing*/
ods html body="proj1.html" 
         path="c:\temp"
         gpath="c:\temp\images" (url="images/")
         ;

/*Turn on output graphs for all our procedures*/
ods graphics on;

/*Plot the close of each stock
  -- Thanks Enterprise Guide, without you I would have to learn SAS/Graph*/

/* Set the SAS/Graph options */
SYMBOL1
    INTERPOL=JOIN
    HEIGHT=10pt
    VALUE=NONE
    LINE=1
    WIDTH=2
    CV = RED
;

SYMBOL2
    INTERPOL=JOIN
    HEIGHT=10pt
    VALUE=NONE
    LINE=1
    WIDTH=2
    CV = BLUE
;

Legend1
    FRAME
    ;

Axis1
    STYLE=1
    WIDTH=1
    MINOR=NONE
;

Axis2
    STYLE=1
    WIDTH=1
    MINOR=NONE
;

TITLE;
TITLE1 "Google and SPY Cumulative Returns";

PROC GPLOT DATA = merged;
PLOT spy_c * Date goog_c * Date  /
 OVERLAY
    VAXIS=AXIS1
    HAXIS=AXIS2
FRAME    LEGEND=LEGEND1
;

RUN; QUIT;

goptions reset=all;

TITLE1 "Google and SPY Return Correlation";

/*Let's get the correlation and a scatter plot of the returns*/
proc corr data=merged plots=scatter(alpha=.1 .05 .01);
var spy goog;
run;

TITLE1 "Google and SPY Return Regression";

/*Finally let's do the regression and get the Beta*/
proc reg data=merged;
modelgoog = spy;
run;
quit;

ods html close;

Output is available here.

Off I go to figure out how to do this in R.  Pointers, questions and comments are welcome.

Welcome

I'm creating this blog as a place to store my thoughts as I work through my adventures in programming statistical applications. For years I have worked with the SAS language and it pays the bills. However, I feel it's time to venture out -- learn something new like R, program something cool with CUDA, and get a blog. All the cool kids are doing it.
 
 
So here it goes. Hopefully what follows others will find useful. If not, well I'm sure there are other blogs out there for you.