Search code examples
rdataframeimportreadr

Import data in R from a text file containing multi header rows and whitespaces


I want to upload a data file that contains multiple header rows and whitespaces into R. I have copied the data here

# File name
#
#@   1  "Some text"                                                   "aa"
#@   2  "Some text"                                                   "bb"
#@   3  "Some text"                                                   "cc"
#@   4  "Some text"                                                   "dd"
#@   5  "Some text"                                                   "ee"
#@   6  "Some text"                                                   "ff"

#
#
#
#

 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  
 1.000000e-03  3.727051e-04  2.532203e-04  4.736003e-04  3.727051e-07  0.000000e+00   
 2.000000e-03  2.266785e-03  1.540081e-03  2.880429e-03  2.639490e-06  0.000000e+00  
 3.000000e-03  7.538553e-03  5.121786e-03  9.579321e-03  1.017804e-05  0.000000e+00   
 4.000000e-03  1.838835e-02  1.249329e-02  2.336627e-02  2.856639e-05  0.000000e+00  
 5.000000e-03  3.703296e-02  2.516073e-02  4.705817e-02  6.559935e-05  0.000000e+00 
 6.000000e-03  2.266785e-03  1.540081e-03  2.880429e-03  2.639490e-06  0.000000e+00  
 7.000000e-03  7.538553e-03  5.121786e-03  9.579321e-03  1.017804e-05  0.000000e+00   
 8.000000e-03  1.838835e-02  1.249329e-02  2.336627e-02  2.856639e-05  0.000000e+00  
 9.000000e-03  3.703296e-02  2.516073e-02  4.705817e-02  6.559935e-05  0.000000e+00

As you can see, it contains 9 rows and 6 columns. The issue is there is bunch of header data on top of the actual data set that contains the column names. I tried using the read_table command and whitespace delimiter, but the columns were limited to 3, instead of 6

dat <- read_table("example.txt", col_names = F)

I would like to import the data, convert to numeric and label each column with the header names listed in the file. So name of the first column would be "Some text aa", 2nd column would be "Some text bb" and so on. Is there a way to get this? I have 100 such files and would like to automate the process


Solution

  • If the file is quux.txt, then

    dat <- read.table("quux.txt", skip = 13, header = FALSE)
    dat
    #      V1           V2           V3           V4           V5 V6
    # 1 0.000 0.0000000000 0.0000000000 0.0000000000 0.000000e+00  0
    # 2 0.001 0.0003727051 0.0002532203 0.0004736003 3.727051e-07  0
    # 3 0.002 0.0022667850 0.0015400810 0.0028804290 2.639490e-06  0
    # 4 0.003 0.0075385530 0.0051217860 0.0095793210 1.017804e-05  0
    # 5 0.004 0.0183883500 0.0124932900 0.0233662700 2.856639e-05  0
    # 6 0.005 0.0370329600 0.0251607300 0.0470581700 6.559935e-05  0
    

    From here, you can extract the column names with

    nms <- readLines(pipe("grep '^#@' quux.txt")) |>
      sub(pattern = '.*"(\\S+)"\\s*$', replacement = '\\1')
    nms
    # [1] "aa" "bb" "cc" "dd" "ee" "ff"
    names(dat) <- nms
    dat
    #      aa           bb           cc           dd           ee ff
    # 1 0.000 0.0000000000 0.0000000000 0.0000000000 0.000000e+00  0
    # 2 0.001 0.0003727051 0.0002532203 0.0004736003 3.727051e-07  0
    # 3 0.002 0.0022667850 0.0015400810 0.0028804290 2.639490e-06  0
    # 4 0.003 0.0075385530 0.0051217860 0.0095793210 1.017804e-05  0
    # 5 0.004 0.0183883500 0.0124932900 0.0233662700 2.856639e-05  0
    # 6 0.005 0.0370329600 0.0251607300 0.0470581700 6.559935e-05  0