Search code examples
rdataframeif-statementdplyrconditional-statements

New column in dataframe with calculations from rows based on other column´s conditions in R


I am trying to get a new column ("Resilience" ) to my dataframe (df) with the values from calculating : (Diversity in Flood - Diversity in Control )/ Diversity in Control , for each different Plot.

This is my dataframe

df<-  data.frame(
  Plot = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
  Rain =c("Control", "Flood", "Dry","Control", "Flood", "Dry","Control", "Flood", "Dry"), 
  Diversity = sample (1:10, 9)
)

I have tried to use dplyr package , with arrange, group_by and mutate and the ifelse functions, but I am stuck. I don´t know how to include the condition to substract the Diversity of the control treatment to the diversity of the flood treatment and then divide all by the diversity of the control again.

df %>% 
  arrange(Plot, Rain)%>%
  group_by(Plot)%>%
  mutate(Resilience=NA,
         Resilience = ifelse(Rain =="Flood", Diversity???, NA))

Maybe dplyr is not the way to do it? Any help would be very appreciated


Solution

  • You may use match to get corresponding Diversity value for "Control" for each Plot.

    library(dplyr)
    
    df %>% 
      mutate(dc = Diversity[match("Control", Rain)],
             Resilience = ifelse(Rain =="Flood",(Diversity - dc)/dc, NA), .by= Plot)
    
    #  Plot    Rain Diversity dc Resilience
    #1    A Control         6  6         NA
    #2    A   Flood         4  6 -0.3333333
    #3    A     Dry         8  6         NA
    #4    B Control         7  7         NA
    #5    B   Flood         5  7 -0.2857143
    #6    B     Dry         1  7         NA
    #7    C Control         9  9         NA
    #8    C   Flood         3  9 -0.6666667
    #9    C     Dry         2  9         NA
    

    Note that the .by Syntax is available in dplyr 1.1.0 and above. Also I have created a temporary dc column so that we can reuse it twice without calculating it twice. You may remove the column if you don't need.