I am trying to do some basic text analysis. After installing the 'tidytext' package, I tried to unnest my data frame, but I keep getting an error. I assume there is some package I am missing, but I am not sure how to figure out which. Any suggestions appreciated.
library(dplyr)
library(tidytext)
#Import data
text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)
n= nrow(text)
text_df <- tibble(line = 1:n, text = text)
text_df %>%
unnest_tokens(word, text)
> Error in is_corpus_df(corpus) : ncol(corpus) >= 2 is not TRUE
dput:
structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
Your column text
is actually a dataframe within the dataframe text_df
, so you are trying to apply unnest_tokens()
to a dataframe, but it will only work if you apply it to an atomic vector (character, integer, double, logical, etc.).
To fix this, you can do:
library(dplyr)
library(tidytext)
text_df <- text_df %>%
mutate_all(as.character) %>%
unnest_tokens(word, text)
Edit:
dplyr
now has the across
function, so mutate_all
would be replaced with:
text_df <- text_df %>%
mutate(across(everything(), ~as.character(.))) %>%
unnest_tokens(word, text)
Which gives you:
# A tibble: 186 x 2
line word
<chr> <chr>
1 1 c
2 1 furloughs
3 1 students
4 1 do
5 1 not
6 1 have
7 1 their
8 1 books
9 1 or
10 1 needed
# ... with 176 more rows