Let's say I have this dataframe:
> df <- data.frame(a1=c(1, 2, 3, 4, 5), b1=c(5, 4, 3, 2, 1), b2=c(10, 11, 12, 13, 14), a2=c(5, 6, 7, 8, 9), c1=c(100, 200, 300, 400, 500))
> df
a1 b1 b2 a2 c1
1 1 5 10 5 100
2 2 4 11 6 200
3 3 3 12 7 300
4 4 2 13 8 400
5 5 1 14 9 500
>
As you can see, I have columns a1
a2
that both start with a
, I want to sum them row-wise and generate just one column of a
, same for column b
. Also for c
column, just the c
column should stay the same as it is since there are only one column that starts with c
.
Desired output:
a b c
1 6 15 100
2 8 15 200
3 10 15 300
4 12 15 400
5 14 15 500
Coming from a Python, in pandas (Python Library), I could just simply use:
df.groupby(df.columns.str[:1], axis=1).sum()
But I am not sure how to do this in R.
I tried dplyr
:
df %>% group_by(sub(".$", "", colnames(df))) %>%
mutate(across(colnames(.), sum))
But an error popped up. New to R.
as.data.frame(lapply(unique(sapply(names(df), function(x) grep(substr(x,1,1), names(df)))), function(y) if(is.na(y[2])) df[[y[1]]] else df[[y[1]]] + df[[y[2]]]))
c.6..8..10..12..14. c.15..15..15..15..15. c.100..200..300..400..500.
1 6 15 100
2 8 15 200
3 10 15 300
4 12 15 400
5 14 15 500
You can then set names using:
unique(substr(names(df),1,1))
[1] "a" "b" "c"
As commented:
sapply(unique(sub(".$", "", colnames(df))), function(x) rowSums(df[startsWith(colnames(df), x)]))