Search code examples
rtime-seriesxtsquantmodperformanceanalytics

R: from a number of .csv to a single time series in xts


I have 100+ csv files in the current directory, all with the same characteristics. Some examples:

ABC.csv

,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.05,0.05,0.05,0.05,1405555200,100.0,5.0,2014-07-17 02:00:00
1,0.032,0.05,0.032,0.05,1405641600,500.0,16.0,2014-07-18 02:00:00
2,0.042,0.05,0.026,0.032,1405728000,12600.0,599.6,2014-07-19 02:00:00
...
1265,0.6334,0.6627,0.6054,0.6266,1514851200,6101389.25,3862059.89,2018-01-02 01:00:00

XYZ.csv

,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.0003616,0.0003616,0.0003616,0.0003616,1412640000,11.21,0.004054,2014-10-07 02:00:00
...
1183,0.0003614,0.0003614,0.0003614,0.0003614,1514851200,0.0,0.0,2018-01-02 01:00:00

The idea is to build in R a time series dataset in xts so that I could use the PerformanceAnalyticsand quantmod libraries. Something like that:

##                 ABC     XYZ     ...     ...     JKL
## 2006-01-03  NaN      20.94342
## 2006-01-04  NaN      21.04486
## 2006-01-05  9.728111 21.06047
## 2006-01-06  9.979226 20.99804
## 2006-01-09  9.946529 20.95903
## 2006-01-10 10.575626 21.06827
## ...

Any idea? I can provide my trials if required.


Solution

  • A solution using base R

    If you know that your files are formatted the same way then you can merge them. Below is what I would have done.

    Get a list a files (this assumes that all the .csv files are the one you actually need and they are placed in the working directory)

    vcfl <- list.files(pattern = "*.csv")
    

    lapply() to open all files and store them as.data.frame:

    lsdf <- lapply(lsfl, read.csv)
    

    Merge them. Here I used the column high but you can apply the same code on any variable (there likely is a solution without a loop)

    out_high <- lsdf[[1]][,c("timestamp", "high")]
    for (i in 2:length(vcfl)) {
      out_high <- merge(out_high, lsdf[[i]][,c("timestamp", "high")], by = "timestamp")
     }
    

    Rename the column using the vector of files' names:

    names(lsdf)[2:length(vcfl)] <- gsub(vcfl, pattern = ".csv", replacement = "") 
    

    You can now use as.xts() fron the xts package https://cran.r-project.org/web/packages/xts/xts.pdf

    I guess there is an alternative solution using tidyverse, somebody else?

    Hope this helps.