Search code examples
rregexgreplremove-if

Remove specific rows in R


I have a dataframe where I would like to remove specific rows. I would like to remove the row where there is the "Référence" word and the 3 rows under the "référence" row. See my example here.

I think I have to use grepl function.

Thank you for your help.

Max.


Solution

  • You should use grep, not grepl. When you use grep you get the row indexes that match the pattern, while with grepl you get a boolean vector. You could do:

    rowIndexes = grep(x = df$col1, pattern = "refer")
    
    df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]
    

    Example:

    > df
              a   b  c   d  e
    1     00100  44  5  69 fr
    2     refer  34 35   7 df
    3  thisalso  46 15 167 as
    4   thistoo  46 15 167 as
    5     00100  11  5  67 uu
    6     00100 563 25  23 tt
    7     00100  44  5  69 fr
    8     refer  34 35   7 df
    9  thisalso  46 15 167 as
    10  thistoo  11  5  67 uu
    11    00100 563 25  23 tt
    12    00100  44  5  69 fr
    13    refer  34 35   7 df
    14 thisalso  46 15 167 as
    15  thistoo  11  5  67 uu
    16    00100 563 25  23 tt
    17    00100 563 25  23 tt
    18    00100 563 25  23 tt
    
    > rowIndexes = grep(x = df$col1, pattern = "refer")
    > df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]
    
    > df
    
           a   b  c  d  e
    1  00100  44  5 69 fr
    5  00100  11  5 67 uu
    6  00100 563 25 23 tt
    7  00100  44  5 69 fr
    11 00100 563 25 23 tt
    12 00100  44  5 69 fr
    16 00100 563 25 23 tt
    17 00100 563 25 23 tt
    18 00100 563 25 23 tt
    

    Generalization

    If you want to remove N lines after o before a set of specific lines, do:

    rowIndexes = grep(x = df$col1, pattern = "refer")
    N = 2
    indexesToRemove = sapply(rowIndexes, function(x){ x + (0:N) })
    df = df[-indexesToRemove, ]
    

    where N is an integer. If N is positive it will remove N rows after the lines with "refer". If N is negative, this will remove N previous rows.