I feel like my problem is an easy one to solve, however, I can't seem to figure it out.
I want to combine multiple rows that belong to the same group so that per group there is one row. This row has the sum over the rows for some variables and the mean for other variables. In the example, I only included variable treatment
of which I need the sum across the rows of each group episode
.
Df <- data.frame(country = c("A", "A", "A", "A", "A", "B","B", "B", "B"),
year = c("1950", "1951", "1952", "1953", "1954", "1950", "1951", "1952", "1953"),
time1 = c("1950", "1951", "1951", "1953", "1954", "1950", "1951", "1952", "1952"),
time2 = c("1951", "1953", "1953", "1954", "1955", "1951", "1952", "1954", "1954"),
episode = c("1", "2", "2", "3", "4", "1", "2", "3", "3"),
status = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
treatment = c(10, "NA", 20, 5, "NA", "NA", 30, 100, 10))
Df2 <- data.frame(country = c("A", "A", "A", "A", "B", "B", "B"),
time1 = c("1950", "1951", "1953", "1954", "1950", "1951", "1952"),
time2 = c("1951", "1953", "1954", "1955", "1951", "1952", "1954"),
episode = c("1", "2", "3", "4", "1", "2", "3"),
status = c(0, 1, 0, 1, 1, 0, 1),
treatment = c(10, 20, 5, 0, 0, 30, 110))
Any ideas on how to solve this?
Update: After clarification see comments:
treatment
is character
. With type.convert(as.is=TRUE)
we transform it to integer
.summarise
with sum
library(dplyr)
Df %>%
type.convert(as.is=TRUE) %>%
group_by(country, time1, time2, episode, status) %>%
summarise(treatment = sum(treatment, na.rm = TRUE))
country time1 time2 episode status treatment
<chr> <int> <int> <int> <int> <int>
1 A 1950 1951 1 0 10
2 A 1951 1953 2 1 20
3 A 1953 1954 3 0 5
4 A 1954 1955 4 1 0
5 B 1950 1951 1 1 0
6 B 1951 1952 2 0 30
7 B 1952 1954 3 1 110
First answer:
library(dplyr)
Df %>%
type.convert(as.is=TRUE) %>%
group_by(episode) %>%
summarise(sumTreatment=sum(treatment, na.rm = TRUE))
episode sumTreatment
<int> <int>
1 1 10
2 2 50
3 3 115
4 4 0