Search code examples
rimportdata-cleaning

Direct escape from "\t"[sic] seperator during import of table possible?


I recently received a .txt file in a very unusual format like this to process:

"Pony ID"/t"colour"/t"location"/t"age"
"Pony A"/t"white;brown;black"/t"stable1"/t12
"Pony B"/t"pink"/t"stable2"/t13
"Pony C"/t"white"/t"stable3"/t9

So if i try to import with the classic reading functions from utils or readr (e.g.read.tsv, read.delim), I end up with 1 column, probably since the sep="/t" input is not interpreted as a literal seperator. The following code resolves it:

library(tidyverse)

a<-read.delim("ponies.txt",sep="/", header = FALSE)
a<-data.frame(cbind(a[,1],sapply(a[,-1], function(x) str_sub(x,2))))
colnames(a)<-a[1,]
a<-a[-1,]

Pony ID            colour location age
2  Pony A white;brown;black  stable1  12
3  Pony B              pink  stable2  13
4  Pony C             white  stable3   9

I hope this questions is not too obscure, but I'm very curious: Does anyone know if there is a way to directly escape the literal "/t" delim during the import?


Solution

  • This could be made a bit more compact by reading with readLines, use gsub to change the delimiter, before reading with read.csv/read.table

    read.csv(text = gsub("/t", ",", gsub('"', '', readLines("ponies.txt"))), 
           check.names = FALSE)
    

    -output

      Pony ID            colour location age
    1  Pony A white;brown;black  stable1  12
    2  Pony B              pink  stable2  13
    3  Pony C             white  stable3   9