Search code examples
rtext-miningtidytext

R Tidytext unnest_tokens error when using a txt file as source


Very new to this topic. I am having trouble with the unnest_tokens function in the tidytext package. I have some texts stored in .txt format that I want to analyze.

An example would be putting the following sentences in a txt file then read it into R:

Emily Dickinson wrote some lovely text in her time.

text <- c("Because I could not stop for Death -",
          "He kindly stopped for me -",
          "The Carriage held but just Ourselves -",
          "and Immortality")

Below is my code:

library(dplyr)
library(tidytext)
library(readtext)
my_data <- read_file("exp.txt")
my_data_tibble <- tibble(text = my_data)
my_data_tibble %>% 
  unnest_tokens(word, my_data)

Then I would get the error message below:

Error in check_input(x) : 
  Input must be a character vector of any length or a list of character
  vectors, each of which has a length of 1.

Does anyone have a solution to my problem? Thank you in advance!


Solution

  • First input is the column name of output column that you want and second one is that of input.

    library(tidytext)
    
    my_data_tibble %>% unnest_tokens(word, text)
    
    # A tibble: 20 x 1
    #   word       
    #   <chr>      
    # 1 because    
    # 2 i          
    # 3 could      
    # 4 not        
    # 5 stop       
    # 6 for        
    # 7 death      
    # 8 he         
    #...
    #....