I have a data frame with some tweets and i want to extract the hashtags from the tweets using the unnest_tokens() function of tidytext package , creating a tokenized data frame with one row per hashtag.
My data only have 3 columns:
otros_numerales_numeral_petro <- Numeral_Petro_sin_emojis %>%
unnest_tokens(output = "hashtag", input = "Texto", token = "tweets") %>%
filter(str_starts(hashtag, "#"))
But, when i run the code i got this error:
Error: ! Support for
token = "tweets"
was deprecated in tidytext 0.4.0 and is now defunct.
Can someone help me to fix this, please.
Yep, the token = "tweets"
option was deprecated at the end of last year because of changes in upstream dependencies. It sounds you don't want to tokenize the text really, but rather extract all the hashtags. I would do this:
library(tidyverse)
library(rtweet)
bunny_tweets <-
search_tweets("#rabbits", n = 20, include_rts = FALSE) %>%
filter(!possibly_sensitive, lang == "en")
bunny_tweets %>%
mutate(hashtags = str_extract_all(full_text, "#\\S+")) %>%
unnest(hashtags) %>%
select(id, hashtags, full_text)
#> # A tibble: 142 × 3
#> id hashtags full_text
#> <dbl> <chr> <chr>
#> 1 1.64e18 #Animate "This awesome comic deserves more attention!\n \n#…
#> 2 1.64e18 #Doujinshi "This awesome comic deserves more attention!\n \n#…
#> 3 1.64e18 #rabbits "This awesome comic deserves more attention!\n \n#…
#> 4 1.64e18 #april "New baby bunny spotted! #april #rabbits\nBlack ba…
#> 5 1.64e18 #rabbits "New baby bunny spotted! #april #rabbits\nBlack ba…
#> 6 1.64e18 #LFDIE "Trust me! You'll get addicted to this story!\n \n…
#> 7 1.64e18 #rabbits "Trust me! You'll get addicted to this story!\n \n…
#> 8 1.64e18 #huacheng "Trust me! You'll get addicted to this story!\n \n…
#> 9 1.64e18 #digitalanimation "I've been completely addicted to ONEPIECE and Mar…
#> 10 1.64e18 #rabbits "I've been completely addicted to ONEPIECE and Mar…
#> # … with 132 more rows
Created on 2023-04-01 with reprex v2.0.2