Search code examples
rtidytext

Error in row_number() after group_by() and unnest_tokens()


I am trying to mutate row numbers after tokenizing within a group_by block and get an error: Error: Can't recycle input of size 73422 to size 37055. Run rlang::last_error() to see where the error occurred.

library(tidyverse)
library(tidytext)
library(janeaustenr)

all_sentences <- austen_books() %>%
  group_by(book) %>%
  unnest_tokens(sentence, text, token = "sentences") %>%
  mutate(s_number = row_number()) %>%
  ungroup()

after ungrouping and regrouping its o.k.

all_sentences <- austen_books() %>%
  group_by(book) %>%
  unnest_tokens(sentence, text, token = "sentences") %>%
  ungroup() %>%
  group_by(book) %>%
  mutate(s_number = row_number()) %>%
  ungroup()

But it seems awkward please advise


Solution

  • Just move your group_by to after the unnest_tokens statement. Like this:

    all_sentences <- austen_books() %>%
      unnest_tokens(sentence, text, token = "sentences") %>%
      group_by(book) %>%
      mutate(s_number = row_number()) %>%
      ungroup()