My initial sample data was ambiguous so updating my data set
a <- data.table(name=c("?","","One","?","","Two"), value=c(1,3,2,6,5,2) , job=c(1,1,1,2,2,2) )
name value job
1: ? 1 1
2: 3 1
3: One 2 1
4: ? 6 2
5: 5 2
6: Two 2 2
I want to group by the column "job" while finding the maximum in column "value" and selecting the "name" which has the maximum length.
My sample output would be
name job value
1: One 1 3
2: Two 2 6
I think I want the equivalent of How do I select the longest 'string' from a table when grouping in R
We can group by 'job', get the index of the max
number of characters (nchar
) in 'name' and subset the dataset.
a[, .SD[which.max(nchar(name)) ], by = job]
# name value job
#1: One 3 1
#2: Two 6 2
Or get the row index (.I
) from which.max
, extract the column with the index ("V1") and subset the dataset.
a[a[, .I[which.max(nchar(name))], by = job]$V1]
Based on the new example, if the 'value' is not corresponding to the maximum number of character in 'name', we need to select it separately.
a[, .(value= max(value), name = name[which.max(nchar(name))]),
by = job]
# job value name
#1: 1 3 One
#2: 2 6 Two