Search code examples
rstringrangecomparisonmultiple-columns

Finding if the values in one column are within the range of several other columns


I am looking for an easy way to find whether the value in a column lies within the range of values in other columns.

My input looks like this:

ID  "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern"    "Q1 Comm - 04 Biography"    "Q1 Comm - Overall Plan"
10   NA                          NA                           4                              NA      4
31   2                           NA                           NA                             NA      2
225  0                           NA                           NA                             NA      1
243  NA                          2                            NA                             1       0
310  NA                          2                            NA                             1       NA

For each unique ID, I am interested in identifying when the column Q1 Comm - Overall Plan is:

1 - Below the min() of all the other columns, or

2 - Above the max() of all the other columns, or

3 - Within the range of all the other columns

The complete list of columns (along with the overall column) is below:

"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Plan"

My required output is something like this:

ID  "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern"    "Q1 Comm - 04 Biography"    "Q1 Comm - Overall Plan" "Q1_check"
10   NA                          NA                           4                              NA      4 "within"
31   2                           NA                           NA                             NA      2 "within"
225  0                           NA                           NA                             NA      1 "above"
243  NA                          2                            NA                             1       0 "below"
310  NA                          2                            NA                             1       NA NA

The dput() for my data frame df is below.

dput(df)

structure(list(ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA, 
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L), 
    Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA, 
    NA, NA, 1L), Q1.Comm...Overall.Plan = c(4L, 1L, 2L, 
    NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA, 
    NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))

Note:

I had asked this question here Finding if a value is within the range of other columns but the example was too simplified and none of the solutions worked for me.

The question was getting too lengthy, therefore, for the sake of clarity, I am posting this as a new question.

I thank you for your time and help on this post.


Solution

  • library(purrr)
    library(data.table)
    
    needed_cols <- setdiff(names(df), c("ID", "Q1.Comm...Overall.Plan"))
    
    setDT(df)[, c("min", "max") := transpose(pmap(.SD, range, na.rm = TRUE)), .SDcols = needed_cols]
    df[, Q1_check := fcase(
        is.na(`Q1.Comm...Overall.Plan`), NA_character_,
        `Q1.Comm...Overall.Plan` < min, "below",
        `Q1.Comm...Overall.Plan` > max, "above",
        default = "within"
      )
    ]
    df[, c("max", "min") := NULL]