Select which rows to keep when using the distinct function in R

I have a dataframe that looks like this:

  company eh
1       A  1
2       A  3
3       B  2
4       C  2
5       C  1
6       D  3
7       E  1
8       F  3
9       F  1

As you can see, I have duplicate rows of company A, C and F. This is due to the fact that some companies can both take on the value 1, 2 and 3 in the 'eh' column. I want to end up with only one row per company so I run this code:

df <- distinct(df, company, .keep_all = TRUE)

Which results in:

  company eh
1       A  1
2       B  2
3       C  2
4       D  3
5       E  1
6       F  3

However, this removes random rows from the 'eh' column. But what I want with the 'eh' column is to keep the value 1 over 2 and 3. In other words, if a company's 'eh' value takes on both 1 and 3, I'd rather keep the row with value 1. So I want to end up with a result like this (removing row 2, 4 and 8):

  company eh
1       A  1
2       B  2
3       C  1
4       D  3
5       E  1
6       F  1

How can I do this?

Solution

You could arrange you data by company and eh first. distinct will keep the first row:

dat <- read.table(text = "company eh
1       A  1
2       A  3
3       B  2
4       C  2
5       C  1
6       D  3
7       E  1
8       F  3
9       F  1", header = TRUE)

library(dplyr)

dat %>% 
  arrange(company, eh) %>% 
  distinct(company, .keep_all = TRUE)
#>   company eh
#> 1       A  1
#> 3       B  2
#> 5       C  1
#> 6       D  3
#> 7       E  1
#> 9       F  1

^{Created on 2021-02-11 by the reprex package (v1.0.0)}