Search code examples
rfunctiontidyversetidytextunnest

Tidyverse unnest_tokens does not work inside function


I have a unnest_tokens function that works in the code, but once I put it into a function I cannot get it to work. I don't understand why this happens when I put it inside a function.

data:

id          words

1           why is this function not working
2           more text
3           help me
4           thank you
5           in advance
6           xx xx

The data is checked on stringsAsFactors == FALSE and if it's a Vector.

is.vector(data$words)
[1] TRUE
is.vector(data$id)
[1] TRUE
typeof(data$words)
[1] "character"

Here is the code outside of the function which gives the correct output:

df <- x %>% 
  unnest_tokens(word, words)%>%
  group_by(id)

1 why
1 is
1 this
1 function
1 not
1 working
2 more
2 text
3 help
3 me
4 thank
4 you
5 in
5 advance
6 xx
6 xx

Once I put the code in a function I get an error.

tidy_x <- unnestDF(data, "words", "id")

unnestDF <- function(df, col, groupbyCol) {
  x <- df %>%
    unnest_tokens(word, df[col])%>%
    group_by(df[groupbyCol])
  return(x)
}

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

Thank you in advance.


Solution

  • As we are using quoted arguments, one option would be to convert to symbol and then evaluate (!!) within unnest_tokens and instead of group_by use group_by_at which can take strings

    unnestDF <- function(df, col, groupbyCol) {
      df %>%
        unnest_tokens(word, !! rlang::sym(col))%>%
        group_by_at(groupbyCol)
    
       }
    
    
    unnestDF(data, "words", "id")
    # A tibble: 16 x 2
    # Groups:   id [6]
    #      id word    
    # * <int> <chr>   
    # 1     1 why     
    # 2     1 is      
    # 3     1 this    
    # 4     1 function
    # 5     1 not     
    # 6     1 working 
    # 7     2 more    
    # 8     2 text    
    # 9     3 help    
    #10     3 me      
    #11     4 thank   
    #12     4 you     
    #13     5 in      
    #14     5 advance 
    #15     6 xx      
    #16     6 xx