Remove specific rows in R

I have a dataframe where I would like to remove specific rows. I would like to remove the row where there is the "Référence" word and the 3 rows under the "référence" row. See my example here.

I think I have to use grepl function.

Thank you for your help.

Max.

Solution

You should use grep, not grepl. When you use grep you get the row indexes that match the pattern, while with grepl you get a boolean vector. You could do:

rowIndexes = grep(x = df$col1, pattern = "refer")

df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]

Example:

> df
          a   b  c   d  e
1     00100  44  5  69 fr
2     refer  34 35   7 df
3  thisalso  46 15 167 as
4   thistoo  46 15 167 as
5     00100  11  5  67 uu
6     00100 563 25  23 tt
7     00100  44  5  69 fr
8     refer  34 35   7 df
9  thisalso  46 15 167 as
10  thistoo  11  5  67 uu
11    00100 563 25  23 tt
12    00100  44  5  69 fr
13    refer  34 35   7 df
14 thisalso  46 15 167 as
15  thistoo  11  5  67 uu
16    00100 563 25  23 tt
17    00100 563 25  23 tt
18    00100 563 25  23 tt

> rowIndexes = grep(x = df$col1, pattern = "refer")
> df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]

> df

       a   b  c  d  e
1  00100  44  5 69 fr
5  00100  11  5 67 uu
6  00100 563 25 23 tt
7  00100  44  5 69 fr
11 00100 563 25 23 tt
12 00100  44  5 69 fr
16 00100 563 25 23 tt
17 00100 563 25 23 tt
18 00100 563 25 23 tt

Generalization

If you want to remove N lines after o before a set of specific lines, do:

rowIndexes = grep(x = df$col1, pattern = "refer")
N = 2
indexesToRemove = sapply(rowIndexes, function(x){ x + (0:N) })
df = df[-indexesToRemove, ]

where N is an integer. If N is positive it will remove N rows after the lines with "refer". If N is negative, this will remove N previous rows.