Search code examples
rstringdplyrgroup-by

Group by a column and selecting rows of another column with the maximum character in R (dplyr)


I have a dataset where I want to select the corresponding rows with the maximum character for each ID. This is a summary of the data

df <- data.frame(id=c(1,2,2,1,3),
                 text=c("new town","car is sold","address is changed","to be confirmed","call later")) 
df %>% glimpse()
Rows: 5
Columns: 2
$ id   <dbl> 1, 2, 2, 1, 3
$ text <chr> "new town", "car is sold", "address is changed",~

And I am trying this code to select those rows that had the max number of characters but it seems something is not working.

df %>% 
  group_by(id) %>% 
  filter(text==max(str_length(text)))
# A tibble: 0 x 2
# Groups:   id [0]
# ... with 2 variables: id <dbl>, text <chr>

I hope it is clear and someone can help.

Many thanks!


Solution

  • Try this (slightly adapted your code):

    library(dplyr) # >= 1.1.0
    
    df %>% 
      filter(str_length(text)==max(str_length(text)), .by = id)
    
     id               text
    1  2 address is changed
    2  1    to be confirmed
    3  3         call later
    

    As an alternative we could use which.max with nchar:

    library(dplyr) # >= 1.1.0
    
    df %>% 
      filter(row_number()==which.max(nchar(text)), .by = id)
    
      id               text
    1  2 address is changed
    2  1    to be confirmed
    3  3         call later