Search code examples
rdplyrgroup-byrangerescale

R: Get values from separate dataframes based on the running values of a group_by function


I have the following dataframe.

Col1 = c("A1", "A1", "A2", "A2")
Col2 = c("B1", "B1", "B2", "B2")
Value = c(10, 20, 30, 40)
df = data.frame(Col1, Col2, Value)

This is a dataframe with various observations. Two factor columns and a value column. There can be multiple rows of the same group of observations with different values. There are multiple such dataframes with similar observations.

MinCol1 = c("A1", "A2")
MinCol2 = c("B1", "B2")
MinValue = c(1, 1)
mins = data.frame(MinCol1, MinCol2, MinValue)

MaxCol1 = c("A1", "A2")
MaxCol2 = c("B1", "B2")
MaxValue = c(100, 100)
maxes = data.frame(MaxCol1, MaxCol2, MaxValue)

The above two dataframes are the minimum and maximum values for all groups (Col1 and Col2) across all dataframes (like the 1st one, df).

I want to normalize the values of dataframes like the 1st one per group. I want the new values to be between 0 to 1 but I want the range to be normalized against to be taken from the mins and maxes dataframes.

normalizeDataForAllBenchmarks = function(df, mins, maxes) {
    
    ### Normalize metrics [0,1]
    df_normal = df %>%
      group_by(Process, Category, Metric) %>%
      mutate(Value = rescale(Value, to = c(0,1), from = range(...)))
    
    return(df_normal)
}

I have the above function bun I'm not sure what goes in the range function in order to do a per group lookup into the mins and maxes dataframes.


Solution

  • All you need to do is join the data by the ids and then calculate the norm:

    library(tidyverse)
    
    normalizeDataForAllBenchmarks = function(df, mins, maxes) {
        left_join(df, mins, by = c("Col1" = "MinCol1", "Col2" = "MinCol2"))|>
        left_join(maxes, by = c("Col1" = "MaxCol1", "Col2" = "MaxCol2")) |>
        mutate(across(Value:MaxValue, as.numeric),
               Value = (Value - MinValue)/(MaxValue-MinValue))|>
        select(-c(MinValue, MaxValue))
    }
    
    normalizeDataForAllBenchmarks(df, mins, maxes)
    #>   Col1 Col2      Value
    #> 1   A1   B1 0.09090909
    #> 2   A1   B1 0.19191919
    #> 3   A2   B2 0.29292929
    #> 4   A2   B2 0.39393939