Search code examples
rxtsdata-importwrds

R: import and converting CRSP data into a xts-object


I have downloaded historical US stock prices from the CRSP database via the WRDS website.

I can import the .csv file but my approach to properly fit it into a xts-object is at least unhandy. With longer time periods and more data, after splitting the original data frame according to each ID, the list of data frames get up to dozens of Gigabyte. Therefore, I am looking for a more efficient approach to covert the data frame, which consists of a simple list, to a ready to use xts-object.

Structure of data frame after import: (stocks are listed one below the other)

> head(dataf)
  ï..Names.Date PERMNO Price.adjusted
1    31/01/2014  10104          36.90
2    28/02/2014  10104          39.11
3    31/03/2014  10104          40.91

Desired format in the xts-object:

> dat[1:3,1:19]
         X10104 X10107 X11308 X11587 X11628 X11850 X12060 X12072 X12400 
Jan 2014  36.90  37.84  37.82 267.18  18.35  92.16  25.13  17.74  53.53  
Feb 2014  39.11  38.31  38.20 289.43  19.73  96.27  25.47  18.43  53.68  
Mar 2014  40.91  40.99  38.66 306.14  20.20  97.68  25.89  18.25  52.54  

My approach:

#read CSV into a data frame
dataf <- read.csv(file = "us-data14-16.csv", header = TRUE, sep = ";", fill = TRUE)

#data preprocessing, deletes objects with price = 0
dataf <- dataf[dataf[, 3] != 0, ]

#split list according to ticker in a list of data frames
dataf <- split(dataf, f= dataf[,2])

#get identifier
id <- names(dataf)

#convert data frames into xts objects
datax <- lapply(dataf, function(x) xts(x$Price.adjusted, as.yearmon(x[,1], "%d/%m/%Y")))

#set column name according to ticker (loop through every element in the list)
sapply(seq_along(datax), function(x) colnames(datax[[x]]) <<- id[x])

#merge list of xts objects in one xts object
dat <- do.call(merge, datax)

Solution

  • You need to use the reshape function with direction as wide.

    df = data.frame(Names.Date= rep(seq(1:10), 10), PERMNO=sort(rep(seq(from= 101, to=110), 10)), Price.adjusted=rnorm(100))
    reshape(df, idvar = "Names.Date", timevar = "PERMNO", direction = "wide")