R - delete length-one strings and stopwords (using tidytext) in character

If I have a df:

   Class sentence
1   Yes  there is p beaker on the table
2   Yes  they t the frown
3   Yes  so Z it was asleep

How do I remove the length-one strings within "sentence" column to remove things like "t" "p" and "Z", and then do a final clean using the stop_words list in tidytext to get the below?

   Class sentence
1   Yes  beaker table
2   Yes  frown
3   Yes  asleep

Solution

If we want to use tidytext, then create a sequence column (row_number()), then apply unnest_tokens on the sentence column, do an anti_join with the default data from get_stopwords(), filter out the words that have characters only 1, and then do a group by paste on the 'word' column to create the 'sentence'

library(dplyr)
library(tidytext)
library(stringr)
df %>% 
   mutate(rn = row_number()) %>%
   unnest_tokens(word, sentence) %>% 
   anti_join(get_stopwords()) %>% 
   filter(nchar(word) > 1) %>%
   group_by(rn, Class) %>%
   summarise(sentence = str_c(word, collapse = ' '), .groups = 'drop') %>% 
   select(-rn)

-Output

# A tibble: 3 x 2
  Class sentence    
  <chr> <chr>       
1 Yes   beaker table
2 Yes   frown       
3 Yes   asleep

Data

df <- structure(list(Class = c("Yes", "Yes", "Yes"), sentence = c("there is p beaker on the table", 
"they t the frown", "so Z it was asleep")), 
class = "data.frame", row.names = c("1", 
"2", "3"))