Search code examples
rdata.tablegreatest-n-per-group

Select a string with max length while using Group by in data table in r


My initial sample data was ambiguous so updating my data set

a <- data.table(name=c("?","","One","?","","Two"), value=c(1,3,2,6,5,2) , job=c(1,1,1,2,2,2) )

 name value job
1:    ?     1   1
2:          3   1
3:  One     2   1
4:    ?     6   2
5:          5   2
6:  Two     2   2

I want to group by the column "job" while finding the maximum in column "value" and selecting the "name" which has the maximum length.

My sample output would be

   name job value
1: One    1     3
2: Two    2     6

I think I want the equivalent of How do I select the longest 'string' from a table when grouping in R


Solution

  • We can group by 'job', get the index of the max number of characters (nchar) in 'name' and subset the dataset.

    a[, .SD[which.max(nchar(name)) ], by = job]
    #    name value job
    #1:  One     3   1
    #2:  Two     6   2
    

    Or get the row index (.I) from which.max, extract the column with the index ("V1") and subset the dataset.

    a[a[, .I[which.max(nchar(name))], by = job]$V1]
    

    Update

    Based on the new example, if the 'value' is not corresponding to the maximum number of character in 'name', we need to select it separately.

    a[, .(value= max(value), name = name[which.max(nchar(name))]),
                          by = job]
    #     job value name
    #1:   1     3  One
    #2:   2     6  Two