I've split a large dataframe by levels of a particular column into a list of dataframes using split() and am now attempting to assign each dataframe into it's own corpus object using Corpus() function but am unable to obtain the desired result.
I've tried creating a list of random norms of the same length as my list of dataframes, renaming each element in the list of norms, converting each element in my list of dataframes to a corpus object and assigning each one to the re-named variables in the list of norms.
df <- data.frame("A" = 10:12, "B" = c(1, 1, 2)) # create example df
split_df <- split(df, f = df$B, drop = T) # split df by B col
names(split_df) <- c("df1", "df2") # rename dfs
split_df
> split_df
$df1
A B
1 10 1
2 11 1
$df2
A B
3 12 2
y <- as.list(rnorm(length(split_df))) # create list of norms length of df list
names(y) <- paste("corpus", 1:length(y), sep="_") # rename elements of list
# iterate over list and assign same column of each df to individual corpus
for(i in 1:length(y)){
y[i] <- Corpus(VectorSource(split_df[[i]]$A))
}
list2env(y, envir = .GlobalEnv)
Basically, I am expecting to be able to create multiple corpus' objects (as many as dataframes within list of dataframes) with their own unique names without having to type out the variable name + Corpus() code manually for each dataframe within a list of 104 dataframes.
# actual result:
y[1]
> y[1]
$corpus_1
[1] "10" "11"
# expected result:
works_1 <- Corpus(VectorSource(split_df[[1]]$A))
works_1
> works_1
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 2
How can I re-produce the above expected result, for 104 separate dfs within a list, each with their own name? I.e. (corpus_1, corpus_2, ... , corpus_104)?
Many thanks.
lapply
is the way to go.
library(tm)
# create list of corpi
all_corps <- lapply(split_df, function(x) Corpus(VectorSource(x)))
summary(all_corps)
Length Class Mode
df1 2 SimpleCorpus list
df2 2 SimpleCorpus list