For demonstrative purposes, I use a tidytuesday
dataset called animal_outcomes
.
My problem: I have several numeric columns in a tibble
. I want to mutate
a new column that sums all the columns (except the last one) and if the sum is equal to the last one the new column is 1 else 0. I'll explain further:
# Adding the example dataset
data <- tidytuesdayR::tt_load(x = "2020-07-21")
data <- data$animal_outcomes
Now the data is like this:
> data$animal_outcomes
# A tibble: 664 × 12
year animal_type outcome ACT NSW NT QLD SA TAS VIC WA Total
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1999 Dogs Reclaimed 610 3140 205 1392 2329 516 7130 1 15323
2 1999 Dogs Rehomed 1245 7525 526 5489 1105 480 4908 137 21415
3 1999 Dogs Other 12 745 955 860 380 168 1001 6 4127
4 1999 Dogs Euthanized 360 9221 9 9214 1701 599 5217 18 26339
5 1999 Cats Reclaimed 111 201 22 206 157 31 884 0 1612
6 1999 Cats Rehomed 1442 3913 269 3901 1055 752 3768 62 15162
7 1999 Cats Other 0 447 0 386 46 124 1501 5 2509
8 1999 Cats Euthanized 1007 8205 847 10554 3415 1056 6113 5 31202
9 1999 Horses Reclaimed 0 0 1 0 2 1 87 0 91
10 1999 Horses Rehomed 1 12 3 3 10 0 19 0 48
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows
I want to add a column that checks if the Total
column is indeed a sum of all columns. Here is the outcome in my mind:
# A tibble: 664 × 13
year animal_type outcome ACT NSW NT QLD SA TAS VIC WA Total condition # notice this last column
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1999 Dogs Reclaimed 610 3140 205 1392 2329 516 7130 1 15323 1
2 1999 Dogs Rehomed 1245 7525 526 5489 1105 480 4908 137 21415 1
3 1999 Dogs Other 12 745 955 860 380 168 1001 6 4127 1
4 1999 Dogs Euthanized 360 9221 9 9214 1701 599 5217 18 26339 1
5 1999 Cats Reclaimed 111 201 22 206 157 31 884 0 1612 1
6 1999 Cats Rehomed 1442 3913 269 3901 1055 752 3768 62 15162 1
7 1999 Cats Other 0 447 0 386 46 124 1501 5 2509 1
8 1999 Cats Euthanized 1007 8205 847 10554 3415 1056 6113 5 31202 1
9 1999 Horses Reclaimed 0 0 1 0 2 1 87 0 91 1
10 1999 Horses Rehomed 1 12 3 3 10 0 19 0 48 1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows
I tried the following code. It works but requires a lot of keystrokes, therefore, it will not work great if you have many columns:
> data$animal_outcomes %>%
mutate(condition = if_else((ACT + NSW + NT + QLD + SA + TAS + VIC + WA) == Total, 1, 0))
# A tibble: 664 × 13
year animal_type outcome ACT NSW NT QLD SA TAS VIC WA Total condition
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1999 Dogs Reclaimed 610 3140 205 1392 2329 516 7130 1 15323 1
2 1999 Dogs Rehomed 1245 7525 526 5489 1105 480 4908 137 21415 1
3 1999 Dogs Other 12 745 955 860 380 168 1001 6 4127 1
4 1999 Dogs Euthanized 360 9221 9 9214 1701 599 5217 18 26339 1
5 1999 Cats Reclaimed 111 201 22 206 157 31 884 0 1612 1
6 1999 Cats Rehomed 1442 3913 269 3901 1055 752 3768 62 15162 1
7 1999 Cats Other 0 447 0 386 46 124 1501 5 2509 1
8 1999 Cats Euthanized 1007 8205 847 10554 3415 1056 6113 5 31202 1
9 1999 Horses Reclaimed 0 0 1 0 2 1 87 0 91 1
10 1999 Horses Rehomed 1 12 3 3 10 0 19 0 48 1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows
I also used the following but it returned an error:
data$animal_outcomes %>%
mutate(condition = if_else((ACT + NSW + NT + QLD + SA + TAS + VIC + WA) == Total, 1, 0))
Also, this one (which obviously is wrong because it sums the actual numbers 4:11
):
data$animal_outcomes %>%
mutate(condition = if_else(sum(4:11) == Total, 1,0))
AND THIS ONE:
I am not sure why sum(ACT:WA)
does not return an error! And if not returning an error, what it is actually summing!!
data$animal_outcomes %>%
mutate(condition = if_else(sum(ACT:WA) == Total, 1,0))
# A tibble: 664 × 13
year animal_type outcome ACT NSW NT QLD SA TAS VIC WA Total condition
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1999 Dogs Reclaimed 610 3140 205 1392 2329 516 7130 1 15323 0
2 1999 Dogs Rehomed 1245 7525 526 5489 1105 480 4908 137 21415 0
3 1999 Dogs Other 12 745 955 860 380 168 1001 6 4127 0
4 1999 Dogs Euthanized 360 9221 9 9214 1701 599 5217 18 26339 0
5 1999 Cats Reclaimed 111 201 22 206 157 31 884 0 1612 0
6 1999 Cats Rehomed 1442 3913 269 3901 1055 752 3768 62 15162 0
7 1999 Cats Other 0 447 0 386 46 124 1501 5 2509 0
8 1999 Cats Euthanized 1007 8205 847 10554 3415 1056 6113 5 31202 0
9 1999 Horses Reclaimed 0 0 1 0 2 1 87 0 91 0
10 1999 Horses Rehomed 1 12 3 3 10 0 19 0 48 0
You can try this:
data %>% rowwise() %>% mutate(check = 1*(Total == sum(c_across(ACT:WA))))
Output:
year animal_type outcome ACT NSW NT QLD SA TAS VIC WA Total check
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1999 Dogs Reclaimed 610 3140 205 1392 2329 516 7130 1 15323 1
2 1999 Dogs Rehomed 1245 7525 526 5489 1105 480 4908 137 21415 1
3 1999 Dogs Other 12 745 955 860 380 168 1001 6 4127 1
4 1999 Dogs Euthanized 360 9221 9 9214 1701 599 5217 18 26339 1
5 1999 Cats Reclaimed 111 201 22 206 157 31 884 0 1612 1
6 1999 Cats Rehomed 1442 3913 269 3901 1055 752 3768 62 15162 1
7 1999 Cats Other 0 447 0 386 46 124 1501 5 2509 1
8 1999 Cats Euthanized 1007 8205 847 10554 3415 1056 6113 5 31202 1
9 1999 Horses Reclaimed 0 0 1 0 2 1 87 0 91 1
10 1999 Horses Rehomed 1 12 3 3 10 0 19 0 48 1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows