I pulled tweets from twitter using the academictwitter
package. I would now like to remove all retweets = tweets starting with "RT" in the first column "text" (e.g. third row). You can download a similar data frame from github including tweets from Trump: https://github.com/cbail/cbail.github.io/blob/master/Trump_Tweets.Rdata
Except my data frame has no column called "is_retweet", which makes it more difficult.
The output from my data frame looks like this (I have removed some redundant columns to make it clearer):
Thank you in advance for any suggestions
You can use regular expressions to figure out which rows start with 'RT'. If your data is in a data frame called tweets
, maybe something like this?
tweets[grepl("^(?!RT)", tweets$text, perl = TRUE),]
Or if you're using tidyverse
:
tweets %>%
filter(grepl("^(?!RT)", text, perl = TRUE))