I am downloading with the quantmod package the S&P 500 time series and the Sotheby's stock:
library(zoo)
library(tseries)
library(quantmod)
library(ggplot2)
env1 = new.env()
getSymbols("^GSPC", env = env1, src ="yahoo", from = as.Date("1988-06-01"),to = as.Date("2013-05-29"))
GSPC = env1$GSPC
gspc.df = data.frame(date=time(GSPC), coredata(GSPC))
env2 = new.env()
getSymbols("BID", env = env2, src ="yahoo", from = as.Date("1988-06-01"),to = as.Date("2013-05-29"))
BID = env2$BID
sothebys.df = data.frame(date=time(BID), coredata(BID))
My objective is to merge or melt the Adjusted Prices together and plot them with ggplot. However, I have problems with the df frame:
t = as.Date(0:9128, origin="1988-06-01")
y1 = gspc.df$GSPC.Adjusted
y2 = sothebys.df$BID.Adjusted
df = data.frame(t=t, values=c(y2,y1), type=rep(c("Bytes", "Changes"), each=9129))
g = ggplot(data=df, aes(x=t, y=values)) +
geom_line() +
facet_grid(type ~ ., scales="free") +
scale_y_continuous(trans="log10") +
ylab("Log values")
g
When I try to execute the df = data... line I get an error concerning the number of rows. How can I melt or merge the data, so that I can use them for the combined ggplot?
The graph works fine. In the last step I included the recession bars into the graph. The following code produces the recession bars including the normalized times seres:
recessions.df = read.table(textConnection(
"Peak, Trough
1857-06-01, 1858-12-01
1860-10-01, 1861-06-01
1865-04-01, 1867-12-01
1869-06-01, 1870-12-01
1873-10-01, 1879-03-01
1882-03-01, 1885-05-01
1887-03-01, 1888-04-01
1890-07-01, 1891-05-01
1893-01-01, 1894-06-01
1895-12-01, 1897-06-01
1899-06-01, 1900-12-01
1902-09-01, 1904-08-01
1907-05-01, 1908-06-01
1910-01-01, 1912-01-01
1913-01-01, 1914-12-01
1918-08-01, 1919-03-01
1920-01-01, 1921-07-01
1923-05-01, 1924-07-01
1926-10-01, 1927-11-01
1929-08-01, 1933-03-01
1937-05-01, 1938-06-01
1945-02-01, 1945-10-01
1948-11-01, 1949-10-01
1953-07-01, 1954-05-01
1957-08-01, 1958-04-01
1960-04-01, 1961-02-01
1969-12-01, 1970-11-01
1973-11-01, 1975-03-01
1980-01-01, 1980-07-01
1981-07-01, 1982-11-01
1990-07-01, 1991-03-01
2001-03-01, 2001-11-01
2007-12-01, 2009-06-01"), sep=',',
colClasses=c('Date', 'Date'), header=TRUE)
recessions.trim = subset(recessions.df, Peak >= min(gspc.df$date))
g.gspc = ggplot(data = df2) + geom_line(aes(x = Date, y = GSPC, colour = "blue")) + geom_line(aes(x = Date, y = Sothebys, colour = "red")) + theme_bw()
g.gspc = g.gspc + geom_rect(data=recessions.trim, aes(xmin=Peak, xmax=Trough, ymin=-Inf, ymax=+Inf), fill='pink', alpha=0.4)
plot(g.gspc)
Thank you very much for your assistance / teaching. I am quite new to programming and R, thanks for helping me to improve :)
By the way, if somebody has an idea to further improve this solution, please comment! Thx
The data from Sotheby's has slightly fewer observations than the data from the S&P. If you remove the dates from S&P that do not appear in Sotheby's, then it works fine. You were doing some weird things in defining your dataframe as well, so I fixed that.
library(zoo)
library(tseries)
library(quantmod)
library(ggplot2)
# import
env1 = new.env()
getSymbols("^GSPC", env = env1, src ="yahoo", from = as.Date("1988-06-01"),to = as.Date("2013-05-29"))
GSPC = env1$GSPC
gspc.df = data.frame(date=time(GSPC), coredata(GSPC))
env2 = new.env()
getSymbols("BID", env = env2, src ="yahoo", from = as.Date("1988-06-01"),to = as.Date("2013-05-29"))
BID = env2$BID
sothebys.df = data.frame(date=time(BID), coredata(BID))
# find which dates are in GSPC but not in Sotheby's
bad.dates <- sothebys.df$date[-which(gspc.df$date %in% sothebys.df$date)]
# remove the 'bad dates' from the dataframe so that both stocks have representative observations
# from each date
gspc.df <- gspc.df[-which(gspc.df$date %in% bad.dates),]
# verify the lengths
length(gspc.df) == length(sothebys.df)
# build the dataframe with dates and stock prices to be used in graphing
df = data.frame(Date = gspc.df$date,
GSPC = gspc.df$GSPC.Adjusted,
Sothebys = sothebys.df$BID.Adjusted)
# plot prices over time
ggplot(data = df, aes(x = Date)) + geom_line(aes(y = GSPC), colour = "blue") +
geom_line(aes(y = Sothebys), colour = "red")
You should definitely think about looking at just the day-to-day price changes, which is a common practice when comparing stocks. The difference in trading volume between an index and a particular security is so great that you can't learn much by looking at the chart you requested. I use the normalization function below occasionally. It isn't perfect for this situation (it scales everything to a range of 0 to 1), but I'll leave it up to you to standardize your data properly. In the mean time, the following code will give you a good idea of how they compare:
NormalizeVector <- function(x) {
NormCalc <- function(x) {
(x - min(x, na.rm=TRUE))/(max(x,na.rm=TRUE) - min(x, na.rm=TRUE))
}
if (class(x) %in% c("integer", "numeric")) {
norm.val <- NormCalc(x)
}
else norm.val <- x
return(norm.val)
}
df2 = as.data.frame(lapply(df, NormalizeVector))
# plot normalized prices over time
ggplot(data = df2, aes(x = Date)) + geom_line(aes(y = GSPC), colour = "blue") +
geom_line(aes(y = Sothebys), colour = "red")