Search code examples
rtext-mining

How to turn txt file to nice dataframe


I have a txt file containing Track ID, Song ID, Artist Name and Song name. I'd like to convert it into a dataframe in R to do some analysis. What would be a good function to use to separate the data? Below is the top row of the dataset. Thanks!

TRMMMKD128F425225D<SEP>SOVFVAK12A8C1350D9<SEP>Karkkiautomaatti<SEP>Tanssi vaan

Solution

  • We can use read.table to directly read the file as a dataframe but separator (sep) between columns can be of only one character.

    So we can first use readLines to read the text file, replace '<SEP>' using gsub with a single character ('\t') and then use read.table specifying column names.

    data <- read.table(text = gsub('<SEP>', '\t', 
             readLines('filename.txt'), fixed = TRUE), 
             col.names = c('TrackID', 'SongID', 'ArtistName', 'SongName'),sep = "\t")
    data
    
    #             TrackID             SongID       ArtistName    SongName
    #1 TRMMMKD128F425225D SOVFVAK12A8C1350D9 Karkkiautomaatti Tanssi vaan