Search code examples
rdataframenestedpivotthreshold

Is there a way to organize a group of rows by size and then throw a flag if any of them are within a specific threshold of each other?


I have a dataset that looks like this when pivoted long

data<-data.frame(
  ID=c("type1","type1","type1","type1","type1","type1","type2","type2","type2","type2","type2","type2"),
  tag=c(1,2,3,4,5,6,1,2,3,4,5,6),
  value=c(0.2,0.7,0.45,0.4,1,0,0.056,0.12,0.3,0.67,0.09,0.004)
)

Is there a way after grouping by the id to organize the values by size taking the top 3 most and then throw a flag if specific tags are within a threshold of each other and saved.

For instance: if the threshold was set to 0.05 type1 would be [1,0.7,0.45,0.4,0.2,0] and then a flag would be raised that tags 3 and 4 were within the threshold boundary of each other.

The goal is to be able to add a code to each type made up of the top most tags with a note if there are any within the threshold.

So the type one label would be 523 (with a 4 within the threshold) and type2 would be 631 but no flag is thrown as there is not a value within 0.05 of the top 3 values

id label flag
type1 523 4 within threshold
type2 631 NA

I am currently able to order it but the internal threshold is what is giving me issues with how to approach the problem.


Solution

  • threshold = 0.05
    data |>
      arrange(ID, -value) |>
      summarize(label = paste0(tag[row_number() <= 3], collapse = ""),
                within_threshold = paste0(
                  tag[row_number() > 3 & 
                        value[row_number() == 3] - value <= threshold], 
                  collapse = ""),
                .by = ID)
    

    In type1, the #4 value 0.40 [label 4] is within 0.05 of 0.40, the #3 value. In type2, the #4 value 0.09 [label 5] is within 0.05 of 0.12, the #3 value.

         ID label within_threshold
    1 type1   523                4
    2 type2   432                5
    

    Or