I would like to only use the rows in my csv
file that correspond to a specific date. I have seen many good ways to do this, however, they all require you to actually know that certain date and specify it in your code. Since I will be running this program frequently, what I am looking for is a fully automated process where I would not have to continue to change the specific date in my code. My data set looks something like this (Fortunately, I always want to read from the bottom, so I can use tail if need be):
Date Ticker
... ....
2015-12-31 TIF
2016-01-31 DD
2016-01-31 ADP
Essentially, I am asking if there is a way to say read.csv("df.csv", *only rows with same date as last row*)
.
I know that subsetting based on date is possible or there may be some way to do it like this:
x <-tail(df, *only rows with same date as last row*)
however, after some time, my data set will get quite large, and I don't think I would like to continue to read in such a large data set every time.
I'd put together a custom function that would read in data.frame from stated date.
ReadFrom <- function(filename, date){
sno<-grep(date, readLines(filename))[1]
dat <- read.table(filename, skip=sno-1, header=F, sep=",")
names(dat) <- unlist(read.table(filename, nrows=1, stringsAsFactors=F)) # insert header from row 1 of .csv file
return(dat)
}
ReadFrom("example.csv", "2016-01-31")
Date Ticker
1 2016-01-31 DD
2 2016-01-31 ADP
ReadFrom("example.csv", "2015-12-31")
Date Ticker
1 2015-12-31 TIF
2 2016-01-31 DD
3 2016-01-31 ADP
Data (written as "example.csv"):
structure(list(Date......Ticker = structure(c(1L, 3L, 2L), .Label = c("2015-12-31 TIF",
"2016-01-31 ADP", "2016-01-31 DD"), class = "factor")), .Names = "Date......Ticker", class = "data.frame", row.names = c(NA,
-3L))
Lots of assumption associated with this solution though:
(i) format in which date is written has to be known beforehand (i.e. YYYY-MM-DD)
(ii) date of csv has to be arranged in ascending order
(iii) not advisable to run this on very large csv files (readLines function can get extremely slow for very large files). Consider sql solutions in that case.