Search code examples
rdataframeimportheader

R: Reading tables from files and signing values from first row as names for each column for furhter inspection


Let's assume we have the following data set in a price.txt file:

   date   price 1   price 2   price 3   price 4
      1      34.5      40.3       200      15.2
      2        32        51     187.8       0.1
      .         .         .         .         .
      .         .         .         .         .

Now I would like to handle this table in R with the following comand and just return it, to check that the data is correct and that following calucaltions make sense once I formulate them. So the code to initialize a dataframe from a textfile is:

data <- read.table("price.txt"), header = TRUE)
data

Now this would just return me:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 9 elements)

And assign the headers in a not desired way. So the issue seems to be with the spaces inside each header for each column. I was able to manually delete the spaces between, but that just seems to take time and be error prone if there are tens of headers. Table after manual correction would look like this:

   date    price1    price2    price3    price4
      1      34.5      40.3       200      15.2
      2        32        51     187.8       0.1
      .         .         .         .         .
      .         .         .         .         .

If I now run the read.table command as previously, everything would work nicely. I know how to "clean" the headers from extra spaces with C or Python, but is there any quick fix for the described problem in R or in the read.table's syntax. Tried to look for one, but wasn't able to get any working since the data has also spaces between the column headers, not just inside them. Thank's for your advice.


Solution

  • Here is a solution using read_file and regex replacements:

    library(tidyverse)
    
    read_file("data.txt") %>%
      str_remove_all("(?<=[:alnum:]) (?=[:alnum:])") %>%
      read.table(text = ., header = TRUE)
    
    #   date price1 price2 price3 price4
    # 1    1   34.5   40.3    200   15.2
    # 2    2     32     51  187.8    0.1
    # 3    .      .      .      .      .
    # 4    .      .      .      .      .
    

    Or in one line without pipe:

    read.table(text = str_remove_all(read_file("/Users/davidcsuka/Desktop/data.txt"), "(?<=[:alnum:]) (?=[:alnum:])"), header = TRUE)