I have a dataset with a column of names. I would like to drop rows with the lesser "P" value, if there exists one with a higher value. For example, in the dataset below, I would like to drop the row ID's 3 and 5 since there exists a 'Texas P5' and a 'North Dakota P9.' What is the best way to do this? Thanks in advance!
ID | Name | Score |
---|---|---|
1 | Minnesota P2 | 342 |
2 | Vermont P7 | 342 |
3 | Texas P4 | 65 |
4 | New Mexico | 643 |
5 | North Dakota P8 | 78 |
6 | North Dakota P9 | 245 |
7 | Texas P5 | 856 |
8 | Minnesota LP | 342 |
Here is a base R way. Use ave
to split the data by Name
excluding the numbers, and check which group element is equal to its greatest element. ave
returns a vector of the same class as its input, in this case character. So coerce to logical and subset the original data frame.
x<-"
ID Name Score
1 'Minnesota P2' 342
2 'Vermont P7' 342
3 'Texas P4' 65
4 'New Mexico' 643
5 'North Dakota P8' 78
6 'North Dakota P9' 245
7 'Texas P5' 856
8 'Minnesota LP' 342"
df1 <- read.table(textConnection(x), header = TRUE)
i <- with(df1, ave(Name, sub("\\d+", "", Name), FUN = \(x){
x == tail(sort(x), 1)
}))
df1[as.logical(i),]
#> ID Name Score
#> 1 1 Minnesota P2 342
#> 2 2 Vermont P7 342
#> 4 4 New Mexico 643
#> 6 6 North Dakota P9 245
#> 7 7 Texas P5 856
#> 8 8 Minnesota LP 342
Created on 2022-07-06 by the reprex package (v2.0.1)