I have a data frame that looks like this:
Site=c("Site1","Site1","Site1", "Site2","Site2","Site2")
Gender=c("Male","Male","Male","Female","Male","Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron")
Result=c(12,22,61,14,52,11)
df=data.frame(Site,Gender,Tissue,Element,Result)
> df
Site Gender Tissue Element Result
1 Site1 Male Muscle Iron 12
2 Site1 Male Muscle Iron 22
3 Site1 Male Muscle Humid 61
4 Site2 Female Muscle Iron 14
5 Site2 Male WB Humid 52
6 Site2 Female Muscle Iron 11
First Step
I want to group my data based on Site, Gender and Tissue. Then, within each group, I want to find those rows whose Elements of the same name have a higher Result.
For example,
Group1: Site1 Male Muscle
Group2: Site2 Female Muscle
Group3: Site2 Male WB
Now, in group 1:
Site1 Male Muscle Iron 12
Site1 Male Muscle Iron 22
Iron is the same Element in two rows. I want to choose the row whose Iron's Result is greater which is
Site1 Male Muscle Iron 22
Then I want to add another column to the data frame, let's say "Col6", and put the greatest Result amount there. So my data would look like this:
Site Gender Tissue Element Result Col6
1 Site1 Male Muscle Iron 12 NA
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA
Second Step
After this, I want to multiply the row with the lowest Result for each Element by the Results of "Humid".
For example, in group 1:
Site1 Male Muscle Iron 12
Site1 Male Muscle Iron 22
Iron is the same Element in two rows with the first row having lower Result 12:
Site1 Male Muscle Iron 12
I want to multiply 12 by the Result of Humid in that group which is 61:
Site1 Male Muscle Humid 61.
And add this amount (12*61=732) to Col6 in front of Iron (not Humid), so that my final table looks like this:
Site Gender Tissue Element Result Col6
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA
Note: Please note that I have tens of Sites and Element types and each group always has two rows of the same Element to pick between lower and higher Result value.
Assuming there is maximum of one Humid row for each group (or else it will take max of that humid group), follow this code. A separate column dummy
has been added for ease of explanation. Moreover I have one extra row (site2, Female, Muscle) for a better demonstration.
#revised sample
Site=c("Site1","Site1","Site1", "Site2","Site2","Site2", "Site2")
Gender=c("Male","Male","Male","Female","Male","Female", "Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle", "Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron", "Humid")
Result=c(12,22,61,14,52,11, 50)
df=data.frame(Site,Gender,Tissue,Element,Result)
> df
Site Gender Tissue Element Result
1 Site1 Male Muscle Iron 12
2 Site1 Male Muscle Iron 22
3 Site1 Male Muscle Humid 61
4 Site2 Female Muscle Iron 14
5 Site2 Male WB Humid 52
6 Site2 Female Muscle Iron 11
7 Site2 Female Muscle Humid 50
library(dplyr)
df %>% mutate(rowid = row_number()) %>%
group_by(Site, Gender, Tissue, Element) %>%
mutate(dummy = case_when(Element != "Humid" & Result == max(Result) ~ "Max_E",
Element != "Humid" & Result != max(Result) ~ "Other_E",
Element == "Humid" & Result == max(Result) ~ "AA_Max_H",
TRUE ~ "Other_H")) %>%
ungroup(Element) %>% arrange(Site, Gender, Tissue, dummy) %>%
mutate(col6 = case_when(dummy == "Max_E" ~ Result,
dummy == "Other_E" ~ Result * first(Result[dummy == "AA_Max_H"]),
TRUE ~ NA_real_)) %>%
ungroup() %>% arrange(rowid) %>%
select(-rowid, -dummy)
# A tibble: 7 x 6
Site Gender Tissue Element Result col6
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 550
7 Site2 Female Muscle Humid 50 NA
For the sample data posted by OP, it gives the desired results exactly
# A tibble: 6 x 6
Site Gender Tissue Element Result col6
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA