Search code examples
rimportnaraw-data

How to skip double space in a space delimited .txt in R?


The problem is that I have multiple raw data, in multiple .txt, with data like this in every file:

 -7.400513E-02  1.424561E-04 
 -.0592041  1.426086E-04 
 -4.440308E-02  1.436768E-04 
 -2.960205E-02  1.452942E-04 
 -1.480103E-02  1.473999E-04 
  0  1.499939E-04 
  1.480103E-02  1.531982E-04 
  2.960205E-02  1.567383E-04 
  4.440308E-02  1.603394E-04 
  .0592041  1.636658E-04 

I'm importing and mixing all this data, so I can work with it, with the next code:

Listoffiles <- list.files(pattern = "txt")
Listofdata <- map(Listadearchivos, ~vroom(.x, delim = " ",
                                          col_names = FALSE, 
                                          col_types = c(.default = "n")))
Data1 <- do.call(rbind.data.frame, Listofdata)

The problem is that I'm getting a mixed data.frame with NA, because in the raw data there are columns for the sign of the number, spaces for + and - for -, and many double spaces as delimiter, giving me a result like this:

    
 X1 X2 X3 X4 X5
1   NA  -0.08880615 NA  0.0001429749    NA
2   NA  -0.07400513 NA  0.0001424561    NA
3   NA  -0.05920410 NA  0.0001426086    NA
4   NA  -0.04440308 NA  0.0001436768    NA
5   NA  -0.02960205 NA  0.0001452942    NA
6   NA  -0.01480103 NA  0.0001473999    NA
7   NA  NA  0.00000000  NA  0.0001499939
8   NA  NA  0.01480103  NA  0.0001531982
9   NA  NA  0.02960205  NA  0.0001567383
10  NA  NA  0.04440308  NA  0.0001603394
11  NA  NA  0.05920410  NA  1.6366580000

What can I do in order to get only the two columns with the values?


Solution

  • data.table's fread parses your data correctly.

    library(data.table)
    Data1 <- fread("test-file.txt")
    Data1
    #              V1           V2
    #  1: -0.07400513 0.0001424561
    #  2: -0.05920410 0.0001426086
    #  3: -0.04440308 0.0001436768
    #  4: -0.02960205 0.0001452942
    #  5: -0.01480103 0.0001473999
    #  6:  0.00000000 0.0001499939
    #  7:  0.01480103 0.0001531982
    #  8:  0.02960205 0.0001567383
    #  9:  0.04440308 0.0001603394
    # 10:  0.05920410 0.0001636658
    

    You can then collapse the list of data.tables to a single table using data.table::rbindlist.

    Listadearchivos <- list.files(pattern = "txt")
    Listofdata <- map(Listadearchivos, ~fread(.x))
    Data1 <- as.data.frame(rbindlist(Listofdata))
    Data1
    #             V1           V2
    # 1  -0.07400513 0.0001424561
    # 2  -0.05920410 0.0001426086
    # 3  -0.04440308 0.0001436768
    # 4  -0.02960205 0.0001452942
    # 5  -0.01480103 0.0001473999
    # 6   0.00000000 0.0001499939
    # 7   0.01480103 0.0001531982
    # 8   0.02960205 0.0001567383
    # 9   0.04440308 0.0001603394
    # 10  0.05920410 0.0001636658