Search code examples
rnlptext-miningtidytext

r: unnest_tokens() not working with particular file


i am trying to run unnest_tokens() on the essay4 column of this dataset:

https://github.com/rudeboybert/JSE_OkCupid/blob/master/profiles.csv.zip

i have tried both unnest_tokens() and unnest_tokens_(), as well as running dput(as_tibble()) on profiles.csv to try to get the program working because of an answer i saw to a similar question that worked for somebody else, but i always get one of two errors.

when i run this:

tidy_essays <- dput_tbl_profiles %>%
   unnest_tokens(word, dput_tbl_profiles$essay4)

i get this error:

Error in check_input(x) : 
  Input must be a character vector of any length or a list of character
  vectors, each of which has a length of 1.

when i run this:

tidy_essays <- dput_tbl_profiles %>%
   unnest_tokens_(word, dput_tbl_profiles$essay4)

i get this error:

Error: Can't convert a closure to a quosure

i have also tried running the same operations on a version of profiles.csv which hasn't had dput(as_tibble()) run on it.

i can't figure out what to do here. it seems that other people have had trouble with this function because they aren't passing character vectors to it (like sending a list instead), or they forget to set stringsAsFactors = FALSE when reading in the data, which i've made sure to do.

any advice for how to proceed? i wish i could link the data directly instead of linking a zip file, but the file is 1/3 of the size when it's zipped. oh, and it's not my github account, so i don't get to decide how the data is stored.

anyway, thank you in advance for any insight.


Solution

  • We need to only specify the unquoted column name

    library(dplyr)
    library(tidytext)
    df1 <- read.csv("profiles.csv", stringsAsFactors = FALSE)
    df1 %>%
         unnest_tokens(word, essay4)
    # age      body_type              diet     drinks     drugs                         education
    #1       22 a little extra strictly anything   socially     never     working on college/university
    #1.1     22 a little extra strictly anything   socially     never     working on college/university
    #1.2     22 a little extra strictly anything   socially     never     working on college/university
    #1.3     22 a little extra strictly anything   socially     never     working on college/university
    #1.4     22 a little extra strictly anything   socially     never     working on college/university
    # ...