Search code examples
rdataframesearchtextwords

Searching of a number of words if found in dataframe$text , want to keep them


I want to search a number of words from the df$text, if any of them or if these are present in the tweets I want to place the whole row in new dataframe. Actually the problem occurs I've search for keywords "pat", "ppp", "jui", "jip" but the dataset that i get contains the users' name having these keywords but not the tweets. I want to remove those tweets having not a keyword in them. The dataframe looks like:

     screen_name  |   text
1|   pat_bing     | RT @timkaine: 22 school shootings in 2018. 3 in the last week. How many times must our hearts break hearing news like this - this time in…

2|   artguroo     | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…

3|   ppp_007      | RT @atDavidHoffman: Before today’s shooting in Santa Fe, Texas, no one was talking about the NRA & gun control anymore. Except the Parkland…

4|   jip_1        | RT @TravisAllen02: What do Republicans care more about?

5|   esha_jip     | I want jip to become the best party ever #jip #ppp #anp #pmln #pti

The desired df should be like:

  screen_name  |   text

2|   artguroo | RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir’s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl…

5|   esha_jip | I want jip to become the best party ever #jip #ppp #anp #pmln #pti

I'm done extracting tweets just want to clean up this mess. Help!


Solution

  • You can get this with grep and a regular expression. Since you include row 2, I assume that you want to ignore case.

    grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE)
    [1] 2 5
    dat[grep("pat|ppp|jui|jip", dat$text, ignore.case =TRUE), ]
      screen_name
    2    artguroo
    5    esha_jip
                                                                                                                                                            text
    2 RT @RabiaBaluch: Khurram Nawaz Gandapur (Dr Tahir Qadir<U+0092>s right-hand man and PAT/Minhaj-ul-Quran leader) abusing and threatening young girl<U+0085>
    5