Search code examples
rtidytext

tidytext error (Error in is_corpus_df(corpus) : ncol(corpus) >= 2 is not TRUE)


I am trying to do some basic text analysis. After installing the 'tidytext' package, I tried to unnest my data frame, but I keep getting an error. I assume there is some package I am missing, but I am not sure how to figure out which. Any suggestions appreciated.

#

library(dplyr)
library(tidytext)


#Import data  
  text <- read.csv("TextSample.csv", stringsAsFactors=FALSE)

  n= nrow(text)

  text_df <- tibble(line = 1:n, text = text)

   text_df %>%
    unnest_tokens(word, text)

> Error in is_corpus_df(corpus) : ncol(corpus) >= 2 is not TRUE

dput:

structure(list(line = 1:6, text = structure(list(text = c("furloughs", "Students do not have their books or needed materials ", "Working MORE for less pay", "None", "Caring for an immuno-compromised spouse", "being a mom, school teacher, researcher and professor" )), class = "data.frame", row.names = c(NA, -6L))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • Your column text is actually a dataframe within the dataframe text_df, so you are trying to apply unnest_tokens() to a dataframe, but it will only work if you apply it to an atomic vector (character, integer, double, logical, etc.).

    To fix this, you can do:

    library(dplyr)
    library(tidytext)
    
    text_df <- text_df %>% 
      mutate_all(as.character) %>% 
      unnest_tokens(word, text)
    

    Edit:

    dplyr now has the across function, so mutate_all would be replaced with:

    text_df <- text_df %>% 
      mutate(across(everything(), ~as.character(.))) %>% 
      unnest_tokens(word, text)
    

    Which gives you:

    # A tibble: 186 x 2
       line  word     
       <chr> <chr>    
     1 1     c        
     2 1     furloughs
     3 1     students 
     4 1     do       
     5 1     not      
     6 1     have     
     7 1     their    
     8 1     books    
     9 1     or       
    10 1     needed   
    # ... with 176 more rows