My dataframe:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Observation Topic Gamma
1 Apple 1 0.1
2 Blueberry 2 0.1
3 Cirtus 3 0.2
4 Dates 4 0.2
5 Eggplant 5 0.1
How can I tell R to add the values of 1, 3, and 5, and 2 and 4, and then save it in a new column? For example:
Observation | Topic | Gamma | new variable |
---|---|---|---|
Apple | 1 | .10 | .40 |
Blueberry | 2 | .10 | .30 |
Cirtus | 3 | .20 | .40 |
Dates | 4 | .20 | .30 |
Eggplant | 5 | .10 | .40 |
Essentially, I'd like each observation to have a new value that sums up the gamma scores of topics 1, 3, and 5, as well as topics 2 and 4.
Update: Clarification: I am not trying to add even topic numbers or odd topic numbers. Sometimes it will be a mixture of both. See this new table as an example:
Observation | Topic | Gamma | new variable |
---|---|---|---|
Apple | 1 | .10 | .10 |
Blueberry | 2 | .10 | .70 |
Cirtus | 3 | .20 | .40 |
Dates | 4 | .20 | .40 |
Eggplant | 5 | .10 | .70 |
Fruits | 6 | .50 | .70 |
In this example, I left topic 1 alone, added topics 2, 5, and 6, and added topics 3 and 4.
Update: Clarification:
Observation | Topic | Gamma | new variable |
---|---|---|---|
Apple | 1 | .10 | .10 |
Apple | 2 | .10 | .70 |
Apple | 3 | .20 | .40 |
Apple | 4 | .20 | .40 |
Apple | 5 | .10 | .70 |
Apple | 6 | .50 | .70 |
Blueberry | 1 | .20 | .20 |
Blueberry | 2 | .10 | .60 |
Blueberry | 3 | .30 | .80 |
Blueberry | 4 | .50 | .80 |
Blueberry | 5 | .40 | .60 |
Blueberry | 6 | .10 | .60 |
In this example, Each fruit (observation) has their own set of values for each topic and I summed the same topics as listed above (2, 5, and 6, 3 and 4) per fruit.
Update II on new request:
library(dplyr)
df %>%
group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6
Update: on new request of OP. This solution is inspired fully by PaulS solution (credits to him):
library(dplyr)
df %>%
group_by(grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Blueberry 2 0.1 0.7
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.4
5 Eggplant 5 0.1 0.7
6 Fruits 6 0.5 0.7
First answer:
We could sum Gamma
after identifying odd and even rows in an ifelse statement:
In this case row_number
could be replaced by Topic
library(dplyr)
df %>%
mutate(new_variable = ifelse(row_number() %% 2 == 1,
sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
sum(Gamma[row_number() %% 2 == 0])) # even 2,4
)
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4
data:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Microbenchmark: AndrewGB's base R is fastest