I have a dataframe that looks like so:
ID | Tweet_ID | Tweet
1 12345 @sprintcare I did.
2 SPRINT @12345 Please send us a Private Message.
3 45678 @apple My information is incorrect.
4 APPLE @45678 What information is incorrect.
What I would like to do is some case_when statement to extract all the tweets that have the handle of the company name and ignore the numerical handles to create a new field.
Current code I'm playing around with but not succeeding with:
tweet_pattern <- " @[^0-9.-]\\w+"
Customer <- Customer %>%
Response_To_Comp = ifelse(str_detect(Tweet, tweet_pattern),
str_extract(Tweet, tweet_pattern),
NA_character_))
Desired output:
ID | Tweet_ID | Tweet | Response_To_Comp
1 12345 @sprintcare I did. sprintcare
2 SPRINT @12345 Please send us a Private Message. NA
3 45678 @apple My information is incorrect. apple
4 APPLE @45678 What information is incorrect. NA
You can use a lookbehind regex to extract the text which comes after '@'
and has one or more A-Za-z
characters in them.
library(dplyr)
library(stringr)
tweet_pattern <- "(?<=@)[A-Za-z]+"
df %>%mutate(Response_To_Comp = str_extract(Tweet, tweet_pattern))
# ID Tweet_ID Tweet Response_To_Comp
#1 1 12345 @sprintcare I did. sprintcare
#2 2 SPRINT @12345 Please send us a Private Message. <NA>
#3 3 45678 @apple My information is incorrect. apple
#4 4 APPLE @45678 What information is incorrect. <NA>