Search code examples
rcsvspecial-charactersread.csv

How to make a specific row that begins with a "#" the header in R


I am trying to read a csv into R. I would like to remove the lines before the row that starts with a "#" and also make that row my header. As the row that starts with the "#" is always changing I do not want to use skip =.

Currently when I do read.csv("df.csv"):

abc     x      x.1     x.2
def
ghi
#     vbn      crt     ykl
4     rte       77     drf

What I would like:

#     vbn      crt     ykl
4     rte       77     drf

I have tried:

df <- df[min(grep("vbn",df$x)) :nrow(vnb),]

I know the second column's title will never change and will always be "vbn" I thought that would work. But this is my result.

abc     x      x.1    x.2
#     vbn      crt    ykl
4     rte       77    drf

I also have tried:

library(data.table)
fread("df.csv")

But this did not work and yielded the same results as when I just did read.csv("df.csv")

Any help will be appreciated. Please let me know if I need to provide more information. Thank you.


Solution

  • Here is a two-step solution: we first read the file with readlines and find the row that starts with a #, and then read it again with fread skipping all previous rows.

    # x <- readLines("myfile.csv")
    x <- readLines(textConnection(text))
    
    h <- grep("^#",x) # find the header row
    
    # df <- data.table::fread("myfile.csv",skip=h-1)
    df <- data.table::fread(text,skip=h-1)
    
    # or the base-R alternative (@A.Val's comment)
    # read.table(text = text, skip = h-1, comment.char="", header = T)
    # read.table("myfile.csv", skip = h-1, comment.char="", header = T)
    df
    
    # vbn crt ykl
    1: 4 rte  77 drf
    

    data

    text <- "abc     x      x.1     x.2
    def
    ghi
    #     vbn      crt     ykl
    4     rte       77     drf"