I have a dataframe where I would like to remove specific rows. I would like to remove the row where there is the "Référence" word and the 3 rows under the "référence" row. See my example here.
I think I have to use grepl function.
Thank you for your help.
Max.
You should use grep
, not grepl
. When you use grep
you get the row indexes that match the pattern, while with grepl
you get a boolean vector. You could do:
rowIndexes = grep(x = df$col1, pattern = "refer")
df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]
Example:
> df
a b c d e
1 00100 44 5 69 fr
2 refer 34 35 7 df
3 thisalso 46 15 167 as
4 thistoo 46 15 167 as
5 00100 11 5 67 uu
6 00100 563 25 23 tt
7 00100 44 5 69 fr
8 refer 34 35 7 df
9 thisalso 46 15 167 as
10 thistoo 11 5 67 uu
11 00100 563 25 23 tt
12 00100 44 5 69 fr
13 refer 34 35 7 df
14 thisalso 46 15 167 as
15 thistoo 11 5 67 uu
16 00100 563 25 23 tt
17 00100 563 25 23 tt
18 00100 563 25 23 tt
> rowIndexes = grep(x = df$col1, pattern = "refer")
> df = df[-c(rowIndexes, rowIndexes+1, rowIndexes+2),]
> df
a b c d e
1 00100 44 5 69 fr
5 00100 11 5 67 uu
6 00100 563 25 23 tt
7 00100 44 5 69 fr
11 00100 563 25 23 tt
12 00100 44 5 69 fr
16 00100 563 25 23 tt
17 00100 563 25 23 tt
18 00100 563 25 23 tt
If you want to remove N
lines after o before a set of specific lines, do:
rowIndexes = grep(x = df$col1, pattern = "refer")
N = 2
indexesToRemove = sapply(rowIndexes, function(x){ x + (0:N) })
df = df[-indexesToRemove, ]
where N
is an integer. If N is positive it will remove N rows after the lines with "refer". If N is negative, this will remove N previous rows.