Let's assume we have the following data set in a price.txt
file:
date price 1 price 2 price 3 price 4
1 34.5 40.3 200 15.2
2 32 51 187.8 0.1
. . . . .
. . . . .
Now I would like to handle this table in R with the following comand and just return it, to check that the data is correct and that following calucaltions make sense once I formulate them. So the code to initialize a dataframe from a textfile is:
data <- read.table("price.txt"), header = TRUE)
data
Now this would just return me:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 9 elements)
And assign the headers in a not desired way. So the issue seems to be with the spaces inside each header for each column. I was able to manually delete the spaces between, but that just seems to take time and be error prone if there are tens of headers. Table after manual correction would look like this:
date price1 price2 price3 price4
1 34.5 40.3 200 15.2
2 32 51 187.8 0.1
. . . . .
. . . . .
If I now run the read.table
command as previously, everything would work nicely. I know how to "clean" the headers from extra spaces with C or Python, but is there any quick fix for the described problem in R or in the read.table's syntax. Tried to look for one, but wasn't able to get any working since the data has also spaces between the column headers, not just inside them. Thank's for your advice.
Here is a solution using read_file
and regex replacements:
library(tidyverse)
read_file("data.txt") %>%
str_remove_all("(?<=[:alnum:]) (?=[:alnum:])") %>%
read.table(text = ., header = TRUE)
# date price1 price2 price3 price4
# 1 1 34.5 40.3 200 15.2
# 2 2 32 51 187.8 0.1
# 3 . . . . .
# 4 . . . . .
Or in one line without pipe:
read.table(text = str_remove_all(read_file("/Users/davidcsuka/Desktop/data.txt"), "(?<=[:alnum:]) (?=[:alnum:])"), header = TRUE)