I want to search a number of words from the df$text, if any of them or if these are present in the tweets I want to place the whole row in new dataframe. Actually the problem occurs I've search for keywords "pat", "ppp", "jui", "jip" but the dataset that i get contains the users' name having these keywords but not the tweets. I want to remove those tweets having not a keyword in them. The dataframe looks like:
screen_name | text
1| pat_bing | RT @timkaine: 22 school shootings in 2018. 3 in the last week. How many times must our hearts break hearing news like this - this time in…
2| artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
3| ppp_007 | RT @atDavidHoffman: Before today’s shooting in Santa Fe, Texas, no one was talking about the NRA & gun control anymore. Except the Parkland…
4| jip_1 | RT @TravisAllen02: What do Republicans care more about?
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
The desired df should be like:
screen_name | text
2| artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…
5| esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti
I'm done extracting tweets just want to clean up this mess. Help!
You can get this with grep
and a regular expression. Since you include row 2, I assume that you want to ignore case.
grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE)
[1] 2 5
dat[grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE), ]
screen_name
2 artguroo
5 esha_jip
text
2 RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir<U+0092>s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl<U+0085>
5