I am trying to read a csv into R. I would like to remove the lines before the row that starts with a "#" and also make that row my header. As the row that starts with the "#" is always changing I do not want to use skip =
.
Currently when I do read.csv("df.csv"):
abc x x.1 x.2
def
ghi
# vbn crt ykl
4 rte 77 drf
What I would like:
# vbn crt ykl
4 rte 77 drf
I have tried:
df <- df[min(grep("vbn",df$x)) :nrow(vnb),]
I know the second column's title will never change and will always be "vbn" I thought that would work. But this is my result.
abc x x.1 x.2
# vbn crt ykl
4 rte 77 drf
I also have tried:
library(data.table)
fread("df.csv")
But this did not work and yielded the same results as when I just did read.csv("df.csv")
Any help will be appreciated. Please let me know if I need to provide more information. Thank you.
Here is a two-step solution: we first read the file with readlines and find the row that starts with a #
, and then read it again with fread skipping all previous rows.
# x <- readLines("myfile.csv")
x <- readLines(textConnection(text))
h <- grep("^#",x) # find the header row
# df <- data.table::fread("myfile.csv",skip=h-1)
df <- data.table::fread(text,skip=h-1)
# or the base-R alternative (@A.Val's comment)
# read.table(text = text, skip = h-1, comment.char="", header = T)
# read.table("myfile.csv", skip = h-1, comment.char="", header = T)
df
# vbn crt ykl
1: 4 rte 77 drf
data
text <- "abc x x.1 x.2
def
ghi
# vbn crt ykl
4 rte 77 drf"