I want to add a column to my dataframe based on information in two columns of the dataframe.
In my example dataframe, I have two lines of entries for one sample, which are rows 3 & 4. I want to code that writes a new column "main" and fills in "1" each row that has a unique tag number. For rows with duplicate tags numbers, I need the row with the highest weight to be "1" in main, and all other rows to be filled with "0".
df
sp weight tag
1 green 70 1
2 yellow 63 2
3 red 41 3
4 red 25 3
5 red 9 3
df with "main" column added
sp weight tag main
1 green 70 1 1
2 yellow 63 2 1
3 red 41 3 1
4 red 25 3 0
5 red 9 3 0
Here's what I have so far:
df$is.uniq <- duplicated(df$tag) | duplicated(df$tag), fromLast = TRUE)
df$main <- ifelse(is.uniq==TRUE, "1", ifelse(is.uniq==FALSE, "0", NA ))
I know i need to change the second ifelse statement to reference the weight column and fill 1 for the greatest weight and 0 for all else, but I haven't figured out how to do that yet.
We can create a group by operation and create the binary on a logical condition with the max
of 'weight'
library(dplyr)
df %>%
group_by(sp) %>%
mutate(main = +(weight == max(weight)))
-output
# A tibble: 5 x 4
# Groups: sp [3]
# sp weight tag main
# <chr> <int> <int> <int>
#1 green 70 1 1
#2 yellow 63 2 1
#3 red 41 3 1
#4 red 25 3 0
#5 red 9 3 0
Or in base R
an option is to first order
the data by 'weight' in descending order and then apply the duplicated
dfnew <- df[order(df$sp, -df$weight),]
dfnew$main <- +(!duplicated(dfnew$sp))
df <- structure(list(sp = c("green", "yellow", "red", "red", "red"),
weight = c(70L, 63L, 41L, 25L, 9L), tag = c(1L, 2L, 3L, 3L,
3L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5"))