I need to convert text data (e.g., paragraph) into a dataframe (to save as a csv file) using R.
The following code converts the text into a table, but it puts words in each line in a single cell.
merchant <- read.delim("merchant.txt")
write.table(merchant,file="merchant.csv",sep=",",col.names=FALSE,row.names=FALSE)
How can I split the words in each paragraph (line), and thus create a single-column dataset with each word in a separate cell?
Here's my attempt based on tidyverse. Instead of reading in as a table, just read in as a string and then separate into a vector of individual words:
library(tidyverse)
## Read in text file as string
merchant <- read_file("merchant.txt") %>%
## Remove all punctuation
gsub('[[:punct:] ]+',' ',.) %>%
## Split individual words into list vector
strsplit(" ")
## Set column equal to the vector of individual words
para <- merchant[[1]]
To convert this into a dataframe:
para <- as.data.frame(para)