Search code examples
rdplyrreadr

Open a text file containing a table with no clear separator


I have to open a text file which contain data that I want to open as a tibble or dataframe. The data are stored in a tbale with white space as separator (I suppose). The probleme is that whitespace is also used in the column "PI". I have no idea how to open it without doing any complicated script to scan all of it and parse it with regex. Do you have any tips ? I think it was designed to have visualy aligned column when we open it as a text file.

here is a sample of the file.

  DAC                                                                PI    SPROF_DATE_UPDATE DATA_MODE      WMO CYCLE         PROFILE_DATE      LON     LAT        MPD_RAW      MPD_ADJ        Z_RAW       Z_ADJ
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2021-05-12 18:52:52         R  5903612   038  2012-07-04 01:15:50    8.651 -41.466     -89.313473          NaN     5.439052         NaN
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2020-12-03 02:03:33         R  5903717   138  2014-02-03 18:07:45 -164.533 -69.958     134.866767          NaN     7.884637         NaN
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2021-05-12 22:44:39         R  5903717   139  2014-02-10 20:05:49 -164.302 -70.061      99.848464          NaN     5.803396         NaN
 aoml  STEPHEN RISER , KENNETH JOHNSON                                   2021-05-12 22:44:53         R  5903717   140  2014-02-17 22:06:53 -164.146 -70.246     107.535752          NaN     6.260275         NaN

Solution

  • Fixing @danlooo
    First: header separator is single space.
    Second: sometimes before - there is just a single space

    data = readLines("testdata.txt") 
    data[1] = gsub(pattern ="\\s+" ,replacement = "\t" ,x = data[1])
    modified_data = gsub(pattern ="\\s-" ,replacement = "  -" ,x = data)
    modified_data = gsub(pattern ="\\s{2,}" ,replacement = "\t" ,x = modified_data)
    final_data = paste0(modified_data,collapse = "\n")
    write(final_data, "finaldata.txt")
    finaldatatable <- read_delim("finaldata.txt", 
                            delim = "\t", escape_double = FALSE, 
                            trim_ws = TRUE)