In r, I'm currently working with datasets of conversations. The data currently looks like the following:
Mike, "Hello how are you"
Sally, "Good you"
I plan to eventually create a word cloud of this data and would need it to look like this:
Mike, Hello
Mike, how
Mike, are
Mike, you
Sally, good
Sally, you
Perhaps something like this using reshape2::melt
?
# Sample data
df <- read.csv(text =
'Mike, "Hello how are you"
Sally, "Good you"', header = F)
# Split on words
lst <- strsplit(trimws(as.character(df[, 2])), "\\s");
names(lst) <- trimws(df[, 1]);
# Reshape into long dataframe
library(reshape2);
df.long <- (melt(lst))[2:1];
# L1 value
#1 Mike Hello
#2 Mike how
#3 Mike are
#4 Mike you
#5 Sally Good
#6 Sally you
Explanation: Split trailing/leading whitespace-trimmed (trimws
) entries in second column on whitespace \\s
and store in list
. Take list
entry names from first column, and reshape into a long data.frame
using reshape2::melt
.
I leave turning this into a comma-separated data.frame
up to you...