My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping.
Label <- c("203c","203c","204a","204a","204a","204a","204a","204a","204a","204a")
Type <- c("wholefish","flesh","flesh","fleshdelip","formula","formuladelip",
"formula","formuladelip","wholefish", "wholefishdelip")
Proportion <- c(1,1,0.67714,0.67714,0.32285,0.32285,0.32285,
0.32285, 0.67714,0.67714)
N <- (1:10)
C <- (1:10)
Code <- c("c","a","a","b","a","b","c","d","c","d")
df <- data.frame(Label,Type, Proportion, N, C, Code)
df
Label Type Proportion N C Code
1 203c wholefish 1.0000 1 1 c
2 203c flesh 1.0000 2 2 a
3 204a flesh 0.6771 3 3 a
4 204a fleshdelip 0.6771 4 4 b
5 204a formula 0.3228 5 5 a
6 204a formuladelip 0.3228 6 6 b
7 204a formula 0.3228 7 7 c
8 204a formuladelip 0.3228 8 8 d
9 204a wholefish 0.6771 9 9 c
10 204a wholefishdelip 0.6771 10 10 d
total <- df %>%
#where the Label and Code are the same the Proportion, N and C
#should be added together respectively
group_by(Label, Code) %>%
#total proportion should add up to 1
#my way of checking that the correct task has been completed
summarise_if(is.numeric, sum)
# A tibble: 6 x 5
# Groups: Label [?]
Label Code Proportion N C
<fctr> <fctr> <dbl> <int> <int>
1 203c a 1.00000 2 2
2 203c c 1.00000 1 1
3 204a a 0.99999 8 8
4 204a b 0.99999 10 10
5 204a c 0.99999 16 16
6 204a d 0.99999 18 18
Up until here I get what I want. Now I would like to include the column Type though it is excluded because values are conflicting. this is the result I would like to obtain
# A tibble: 6 x 5
# Groups: Label [?]
Label Code Proportion N C Type
<fctr> <fctr> <dbl> <int> <int> <fctr>
1 203c a 1.00000 2 2 wholefish
2 203c c 1.00000 1 1 flesh
3 204a a 0.99999 8 8 flesh_formula
4 204a b 0.99999 10 10 fleshdelip_formuladelip
5 204a c 0.99999 16 16 wholefish_formula
6 204a d 0.99999 18 18 wholefishdelip_formuladelip
I have tried ungroup()
and some variations of mutate
and unite
but to no avail, any suggestions would be greatly appreciated
Here's the data.table
solution, I'm assuming you want the mean()
of Proportion, since these grouped proportions are likely not additive.
setDT(df)
df[, .(Type =paste(Type,collapse="_"),
Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]
[order(Label)]
Label Code Type Proportion N C
1: 203c c wholefish 1.000000 1 1
2: 203c a flesh 1.000000 2 2
3: 204a a flesh_formula 0.499995 8 8
4: 204a b fleshdelip_formuladelip 0.499995 10 10
5: 204a c formula_wholefish 0.499995 16 16
6: 204a d formuladelip_wholefishdelip 0.499995 18 18
I'm not sure this is the cleanest dplyr
solution, but it works:
df %>% group_by(Label, Code) %>%
mutate(Type = paste(Type,collapse="_")) %>%
group_by(Label,Type,Code) %>%
summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))
Note the key here is to re-group once you create the combined Type
column.
Label Type Code N C Proportion
<fctr> <chr> <fctr> <int> <int> <dbl>
1 203c flesh a 2 2 1.000000
2 203c wholefish c 1 1 1.000000
3 204a flesh_formula a 8 8 0.499995
4 204a fleshdelip_formuladelip b 10 10 0.499995
5 204a formula_wholefish c 16 16 0.499995
6 204a formuladelip_wholefishdelip d 18 18 0.499995