Search code examples
rdataframecolumnsorting

inserting a column of categories based on other columns in R


My dataframe currently looks like this:


Tree Cookie Age 
C1T1   A     10
C1T1   A     20
C1T1   A     30
C1T1   B     15
C1T1   B     20
C1T1   B     25
C1T2   A     12
C1T2   A     20
C1T2   B     5
C1T2   B     13

So for each "Tree" I have several "Cookies", and for each cookie I have different ages (basically representing different parts of the tree's life). I would like to add another column that bins each tree by its max age - the oldest age of the oldest cookie, in this case it would be the last age of cookie A in both trees (so will classify a tree as "young" if the max age is < 40, "mid-age" if max age is > 40 and < 120, and "old" if max age is > 120). Any advice on this is greatly appreciated!


Solution

  • Ok, here it goes: I used the dplyr library to do this, which gives me the %>% operator and the summarise() function. I also named your data frame trees. Then:

    library(dplyr)
    
    trees2 <- trees %>%
      group_by(Tree = Tree) %>%
      summarise(Age = max(Age))
    
    trees2$Cat <- ifelse(trees2$Age < 40, "young", ifelse(trees2$Age > 120, "old", "mid-age"))
    trees$Category = trees2$Cat[match(trees$Tree, trees2$Tree)]
    

    Before, trees2 would have bee this:

    > trees2
    # A tibble: 2 x 2
      Tree  Age  
      <chr> <chr>
    1 C1T1  30   
    2 C1T2  5  
      
    > trees2$Cat <- ifelse(trees2$Age < 40, "young", ifelse(trees2$Age > 120, "old", "mid-age"))
    
    > trees2
    # A tibble: 2 x 3
      Tree  Age   Cat  
      <chr> <chr> <chr>
    1 C1T1  30    young
    2 C1T2  5     old  
    

    After, using the recommendations in this post by cory, I finished by putting this tibble in the original table with this final line:

    trees$Category = trees2$Cat[match(trees$Tree, trees2$Tree)]
    

    And this gave me:

    > trees
       Tree Cookie Age Category
    1  C1T1      A  10    young
    2  C1T1      A  20    young
    3  C1T1      A  30    young
    4  C1T1      B  15    young
    5  C1T1      B  20    young
    6  C1T1      B  25    young
    7  C1T2      A  12      old
    8  C1T2      A  20      old
    9  C1T2      B   5      old
    10 C1T2      B  13      old