Search code examples
rcsvexport-to-csvcsvtotable

Converting text data to table/csv format


I need to convert text data (e.g., paragraph) into a dataframe (to save as a csv file) using R.

The following code converts the text into a table, but it puts words in each line in a single cell.

    merchant <- read.delim("merchant.txt")
    write.table(merchant,file="merchant.csv",sep=",",col.names=FALSE,row.names=FALSE)

How can I split the words in each paragraph (line), and thus create a single-column dataset with each word in a separate cell?


Solution

  • Here's my attempt based on tidyverse. Instead of reading in as a table, just read in as a string and then separate into a vector of individual words:

    library(tidyverse)
    
    ## Read in text file as string
    merchant <- read_file("merchant.txt") %>% 
    ## Remove all punctuation
    gsub('[[:punct:] ]+',' ',.) %>%
    ## Split individual words into list vector
    strsplit(" ")
    ## Set column equal to the vector of individual words
    para <- merchant[[1]]
    

    To convert this into a dataframe:

    para <- as.data.frame(para)