My dataframe currently looks like this:
Tree Cookie Age
C1T1 A 10
C1T1 A 20
C1T1 A 30
C1T1 B 15
C1T1 B 20
C1T1 B 25
C1T2 A 12
C1T2 A 20
C1T2 B 5
C1T2 B 13
So for each "Tree" I have several "Cookies", and for each cookie I have different ages (basically representing different parts of the tree's life). I would like to add another column that bins each tree by its max age - the oldest age of the oldest cookie, in this case it would be the last age of cookie A in both trees (so will classify a tree as "young" if the max age is < 40, "mid-age" if max age is > 40 and < 120, and "old" if max age is > 120). Any advice on this is greatly appreciated!
Ok, here it goes:
I used the dplyr
library to do this, which gives me the %>%
operator and the summarise()
function. I also named your data frame trees
. Then:
library(dplyr)
trees2 <- trees %>%
group_by(Tree = Tree) %>%
summarise(Age = max(Age))
trees2$Cat <- ifelse(trees2$Age < 40, "young", ifelse(trees2$Age > 120, "old", "mid-age"))
trees$Category = trees2$Cat[match(trees$Tree, trees2$Tree)]
Before, trees2
would have bee this:
> trees2
# A tibble: 2 x 2
Tree Age
<chr> <chr>
1 C1T1 30
2 C1T2 5
> trees2$Cat <- ifelse(trees2$Age < 40, "young", ifelse(trees2$Age > 120, "old", "mid-age"))
> trees2
# A tibble: 2 x 3
Tree Age Cat
<chr> <chr> <chr>
1 C1T1 30 young
2 C1T2 5 old
After, using the recommendations in this post by cory, I finished by putting this tibble in the original table with this final line:
trees$Category = trees2$Cat[match(trees$Tree, trees2$Tree)]
And this gave me:
> trees
Tree Cookie Age Category
1 C1T1 A 10 young
2 C1T1 A 20 young
3 C1T1 A 30 young
4 C1T1 B 15 young
5 C1T1 B 20 young
6 C1T1 B 25 young
7 C1T2 A 12 old
8 C1T2 A 20 old
9 C1T2 B 5 old
10 C1T2 B 13 old