I have a data table containing 20000+ rows and one column. The string in each column has different number of words. I want to split the words and put each of them in a new column. I know how I can do it word by word:
Data [ , Word1 := as.character(lapply(strsplit(as.character(Data$complaint), split=" "), "[", 1))]
(Data
is my data table and complaint
is the name of the column)
Obviously, this is not efficient because each cell in each row has different number of words.
Could you please tell me about a more efficient way to do this?
Check out cSplit
from my "splitstackshape" package. It works on either data.frame
s or data.table
s (but always returns a data.table
).
Assuming KFB's sample data is at least slightly representative of your actual data, you can try:
library(splitstackshape)
cSplit(df, "x", " ")
# x_1 x_2 x_3 x_4
# 1: This is interesting NA
# 2: This actually is not
Another (blazing) option is to use stri_split_fixed
with simplify = TRUE
(from "stringi") (which is obviously deemed to enter the "splitstackshape" code soon):
library(stringi)
stri_split_fixed(df$x, " ", simplify = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "This" "is" "interesting" NA
# [2,] "This" "actually" "is" "not"