Search code examples
rrow

Select rows in each group based on their values


I have a data frame that looks like this:

Site=c("Site1","Site1","Site1", "Site2","Site2","Site2")
Gender=c("Male","Male","Male","Female","Male","Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron")
Result=c(12,22,61,14,52,11)

df=data.frame(Site,Gender,Tissue,Element,Result)

> df

   Site Gender Tissue Element Result
1 Site1   Male Muscle    Iron     12
2 Site1   Male Muscle    Iron     22
3 Site1   Male Muscle   Humid     61
4 Site2 Female Muscle    Iron     14
5 Site2   Male     WB   Humid     52
6 Site2 Female Muscle    Iron     11

First Step

I want to group my data based on Site, Gender and Tissue. Then, within each group, I want to find those rows whose Elements of the same name have a higher Result.

For example,

Group1: Site1 Male Muscle

Group2: Site2 Female Muscle

Group3: Site2 Male WB

Now, in group 1:

Site1 Male Muscle Iron 12

Site1 Male Muscle Iron 22

Iron is the same Element in two rows. I want to choose the row whose Iron's Result is greater which is

Site1 Male Muscle Iron 22

Then I want to add another column to the data frame, let's say "Col6", and put the greatest Result amount there. So my data would look like this:

   Site Gender Tissue Element Result Col6
1 Site1   Male Muscle    Iron     12   NA 
2 Site1   Male Muscle    Iron     22   22
3 Site1   Male Muscle   Humid     61   NA
4 Site2 Female Muscle    Iron     14   14
5 Site2   Male     WB   Humid     52   NA
6 Site2 Female Muscle    Iron     11   NA

Second Step

After this, I want to multiply the row with the lowest Result for each Element by the Results of "Humid".

For example, in group 1:

Site1 Male Muscle Iron 12

Site1 Male Muscle Iron 22

Iron is the same Element in two rows with the first row having lower Result 12:

Site1 Male Muscle Iron 12

I want to multiply 12 by the Result of Humid in that group which is 61:

Site1 Male Muscle Humid 61.

And add this amount (12*61=732) to Col6 in front of Iron (not Humid), so that my final table looks like this:

   Site Gender Tissue Element Result Col6
1 Site1   Male Muscle    Iron     12  732
2 Site1   Male Muscle    Iron     22   22
3 Site1   Male Muscle   Humid     61   NA
4 Site2 Female Muscle    Iron     14   14
5 Site2   Male     WB   Humid     52   NA
6 Site2 Female Muscle    Iron     11   NA

Note: Please note that I have tens of Sites and Element types and each group always has two rows of the same Element to pick between lower and higher Result value.


Solution

  • Assuming there is maximum of one Humid row for each group (or else it will take max of that humid group), follow this code. A separate column dummy has been added for ease of explanation. Moreover I have one extra row (site2, Female, Muscle) for a better demonstration.

    #revised sample
    
    Site=c("Site1","Site1","Site1", "Site2","Site2","Site2", "Site2")
    Gender=c("Male","Male","Male","Female","Male","Female", "Female")
    Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle", "Muscle")
    Element=c("Iron","Iron","Humid","Iron","Humid","Iron", "Humid")
    Result=c(12,22,61,14,52,11, 50)
    
    df=data.frame(Site,Gender,Tissue,Element,Result)
    
    > df
       Site Gender Tissue Element Result
    1 Site1   Male Muscle    Iron     12
    2 Site1   Male Muscle    Iron     22
    3 Site1   Male Muscle   Humid     61
    4 Site2 Female Muscle    Iron     14
    5 Site2   Male     WB   Humid     52
    6 Site2 Female Muscle    Iron     11
    7 Site2 Female Muscle   Humid     50
    

    Code

    library(dplyr)
    
    df %>% mutate(rowid = row_number()) %>%
      group_by(Site, Gender, Tissue, Element) %>%
      mutate(dummy = case_when(Element != "Humid" & Result == max(Result) ~ "Max_E",
                               Element != "Humid" & Result != max(Result) ~ "Other_E",
                               Element == "Humid" & Result == max(Result) ~ "AA_Max_H",
                               TRUE ~ "Other_H")) %>%
      ungroup(Element) %>% arrange(Site, Gender, Tissue, dummy) %>%
      mutate(col6 = case_when(dummy == "Max_E" ~ Result,
                              dummy == "Other_E" ~ Result * first(Result[dummy == "AA_Max_H"]),
                              TRUE ~ NA_real_)) %>%
      ungroup() %>% arrange(rowid) %>%
      select(-rowid, -dummy)
    
    # A tibble: 7 x 6
      Site  Gender Tissue Element Result  col6
      <chr> <chr>  <chr>  <chr>    <dbl> <dbl>
    1 Site1 Male   Muscle Iron        12   732
    2 Site1 Male   Muscle Iron        22    22
    3 Site1 Male   Muscle Humid       61    NA
    4 Site2 Female Muscle Iron        14    14
    5 Site2 Male   WB     Humid       52    NA
    6 Site2 Female Muscle Iron        11   550
    7 Site2 Female Muscle Humid       50    NA
    

    For the sample data posted by OP, it gives the desired results exactly

    # A tibble: 6 x 6
      Site  Gender Tissue Element Result  col6
      <chr> <chr>  <chr>  <chr>    <dbl> <dbl>
    1 Site1 Male   Muscle Iron        12   732
    2 Site1 Male   Muscle Iron        22    22
    3 Site1 Male   Muscle Humid       61    NA
    4 Site2 Female Muscle Iron        14    14
    5 Site2 Male   WB     Humid       52    NA
    6 Site2 Female Muscle Iron        11    NA