Say that I have these data:
data <- tibble(person=c("Jack", "Jill", "Bill"), pet=c("dog", "cat", "zebra"), pet_cat=c(0,1,0), pet_dog=c(0,1,1), pet_llama=c(1,1,1))
person pet pet_cat pet_dog pet_llama
<chr> <chr> <dbl> <dbl> <dbl>
1 Jack dog 0 0 1
2 Jill cat 1 1 1
3 Bill zebra 0 1 1
What I want to do is, for each person, to first find out which pet they have (Jack has a dog) and then go to the column that includes this pet (for Jack, this is the pet_dog
column). And then make a new column, match
, that copies the value from the pet_
column (for Jack, this is 0
because pet_dog
's value for Jack
is 0). Not sure if this is necessary, but one way to think about this is that pet
refers to the person
's preference for a pet and the pet_
columns refer to the availability of the pet in the pet store, and match
tells if the person is able to buy the pet of their preference.
Furthermore, in some cases, there will not be a pet_
column matching the person
's pet
. In these cases, the match
should be 0
.
Note that for one person, say Jill
, the other people's values are totally irrelevant, as are the pet_
columns that do not correspond to Jill
's pet
value.
This is the desired result:
data_want <- tibble(person=c("Jack", "Jill", "Bill"), pet=c("dog", "cat", "zebra"), pet_cat=c(0,1,0), pet_dog=c(0,1,1), pet_llama=c(1,1,1), match=c(0, 1, 0))
person pet pet_cat pet_dog pet_llama match
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Jack dog 0 0 1 0
2 Jill cat 1 1 1 1
3 Bill zebra 0 1 1 0
How can I do this?
Note that I think it will involve putting the person's pet
value into a variable using something like get
or assign
combined with paste0
, then going to this column and pulling the relevant value and putting it in match
.
An option with tidyverse
pivot_longer
value
is 1str_detect
any
TRUE in 'match', coerce it to binary (+
)right_join
row_number()
) and order the rows (arrange
)library(dplyr)
library(tidyr)
library(stringr)
data %>%
pivot_longer(cols = contains('_')) %>%
filter(value == 1) %>%
mutate(match = str_detect(name, pet)) %>%
group_by(pet) %>%
summarise(match = +(any(match))) %>%
right_join(data %>%
mutate(rn = row_number())) %>%
arrange(rn) %>%
select(names(data), match)
-output
# A tibble: 3 x 6
person pet pet_cat pet_dog pet_llama match
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 Jack dog 0 0 1 0
2 Jill cat 1 1 1 1
3 Bill zebra 0 1 1 0
Or may use rowwise
rowwise
attribute - grouped by rowc_across
to create a logical vector i.e. where value is 1starts_with
'pet_'`,str_remove
str_c
ungroup
) and use str_detect
to detect if the 'pet' column values have a match with regex pattern createddata %>%
rowwise %>%
mutate(match = str_c(str_remove(names(select(cur_data(),
contains('_')))[c_across(contains("_")) == 1], ".*_"),
collapse="|")) %>%
ungroup %>%
mutate(match = +(str_detect(pet, match)))
# A tibble: 3 x 6
person pet pet_cat pet_dog pet_llama match
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 Jack dog 0 0 1 0
2 Jill cat 1 1 1 1
3 Bill zebra 0 1 1 0
Or using base R
match
ing the substring of column name with 'pet' columnmatrix
nm1 <- names(data)[startsWith(names(data), "pet_")]
data$match <- as.data.frame(data[nm1])[cbind(seq_len(nrow(data)),
match(data$pet, sub("pet_", "", nm1)))]
data$match[is.na(data$match)] <- 0
-output
data
# A tibble: 3 x 6
person pet pet_cat pet_dog pet_llama match
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Jack dog 0 0 1 0
2 Jill cat 1 1 1 1
3 Bill zebra 0 1 1 0