Search code examples
rdata.tabledelimitercsv

Retrieve the column separator used by fread


fread from the data.table package can generally automatically determine the column separator (sep) when reading a file.

For example, here fread automatically detects | as column delimiter:

library(data.table)
fread(paste(c("A|1", "B|2", "C|3"), collapse = "\n"))
#    V1 V2
# 1:  A  1
# 2:  B  2
# 3:  C  3

But how can I retrieve the column separator which eventually was used by fread (here, the |)?


Solution

  • As Henrik mentions, this info is printed to the console if verbose = TRUE is chosen. You can capture the info printed about the separator with

    library(magrittr)
    example <- paste(c("A|1", "B|2", "C|3"), collapse = "\n")
    capture.output(fread(example, verbose = TRUE) %>% {NULL}) %>% 
        .[grepl('Detecting sep', .)]
    
    
    #[1] "Detecting sep ... '|'"
    

    You could also just implement your own delimiter finder based on the description of how fread finds the delimiter:

    Defaults to the first character in the set [,\t |;:] that exists on line autostart outside quoted ("") regions