Sunday, October 30, 2011

Learning R: Project 1, Part 2

So it's been a week since I started down this path.  I worked most of this out over last weekend, went to a conference, had hectic week at work, and then realized I lost my work.  Gah.

I'll be posting my general thoughts on R later.  Mostly it seems to be a neat language.  Lots of ways to do things. The ability to create output seems limited.  I played with a number of things trying to create rich HTML output like I did with SAS.  R2HTML might be what I need; I couldn't get it to work.

So here is what I have

require(fImport)
require(PerformanceAnalytics)

These two packages seem to do a lot of what I need. PerformanceAnalytics has a wealth of charting tools for financial data.

#Function to load stock data into a Time Series object
importSeries = function (symbol,from,to) {



#Read data from Yahoo! Finance
input = yahooSeries(symbol,from=from,to=to)

#Character Strings for Column Names
adjClose = paste(symbol,".Adj.Close",sep="")
inputReturn = paste(symbol,".Return",sep="")
CReturn = paste(symbol,".CReturn",sep="")

#Calculate the Returns and put it on the time series
input.Return = returns(input[,adjClose])
colnames(input.Return)[1] = inputReturn
input = merge(input,input.Return)

#Calculate the cumulative return and put it on the time series
input.first = input[,adjClose][1]
input.CReturn = fapply(input[,adjClose],FUN=function(x) log(x) - log(input.first))
colnames(input.CReturn)[1] = CReturn
input = merge(input,input.CReturn)

#Deleting things (not sure I need to do this, but I can't not delete things if
# given a way to...
rm(input.first,input.Return,input.CReturn,adjClose,inputReturn,CReturn)

#Return the timeseries
return(input)

}
I learned a lot about data handling in R putting this function together.

#Load SPY data
spy = importSeries("spy",from="2010-01-01",to="2011-10-22")
#Load Google data
goog = importSeries("goog",from="2010-01-01",to="2011-10-22")

#merge the time series
merged = merge(spy,goog)
Nothing fancy here.  The merge() function is nice, but I have no idea how to do anything but the "full" join that it defaults to.  If anyone knows of a good tutorial on doing more advanced SQL style joins, please let me know.

#Chart the Cumulative Returns
png("c:\\temp\\Returns_r.png")
chart.CumReturns(merged[,c("spy.Return","goog.Return"),drop=FALSE],
                            main="Total Returns SPY vs Google",
                            legend.loc="topleft")
dev.off()

#Create the Correlation plot
png("c:\\temp\\Corr.png")
chart.Correlation(merged[,c("spy.Return","goog.Return")],histogram=TRUE,pch="+")
dev.off()
First, the chart.CumReturns() produces a nice graph. Better than I was able to do with plot().

Second, the char.Correlation() also gives a neat output. I would really like to find a comparable method to produce the alpha ellipses that I did in SAS.

Third, I cannot find a good method that is comparable to PROC CORR. Can I get a good output with both correlation, covariance, mean, std, etc? Please, let me know.
#Regress Google on SPY
reg = lm(merged[,"goog.Return"]~merged[,"spy.Return"])

#Create the confidence interval
newx = merged[,"spy.Return"]
prd = predict(reg,newdata=newx,interval="confidence",level=.95, type="response")

#Print the Regression Summary
summary(reg)
Linear Regression seems pretty easy. It took me a while to decipher the R help to figure out the confidence interval stuff. Again, if there is a way to produce a rich set of output from a regression like SAS and PROC REG, please show me.

Here is the R output:
Call:
lm(formula = merged[, "goog.Return"] ~ merged[, "spy.Return"])


Residuals:
Min1QMedian3QMax
-0.089348-0.005702-0.0000830.0055130.116929


Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)-0.00038410.0006424-0.5980.55
merged[, "spy.Return"]0.96412180.050934618.929<2e-16 ***

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0137 on 453 degrees of freedom (1 observation deleted due to missingness)

Multiple R-squared: 0.4416, Adjusted R-squared: 0.4404

F-statistic: 358.3 on 1 and 453 DF, p-value: < 2.2e-16
Matches SAS. It's not exact, but very close.  That's good.
#Chart the regression
png("c:\\temp\\Regression.png")
chart.Regression(merged[,"goog.Return",drop=FALSE],
                          merged[,"spy.Return",drop=FALSE],
                          fit=c("linear"),
                          main="Google ~ SPY",
                          xlab="SPY Return",
                          ylab="Google Return")

#add the confidence interval
lines(newx$spy.Return,prd[,2],col="Red",lty=2)
lines(newx$spy.Return,prd[,3],col="Red",lty=2)
dev.off()
Using the chart.Regression() from PerformanceAnalytics. The fit interval looks suspect. Maybe I did something wrong.

No comments:

Post a Comment