I have an issue which looks like easy to solve, but I'm stuck. I have a dataframe composed of columns (significant pathways retrieved from GSEA) and rows (entrez gene ids). In this data frame there are 1 if a gene is present in a pathway or 0 when not. This is my data frame:
Path_A Path_B Path_C
Gene_1 0 1 0
Gene_2 1 1 0
Gene_3 0 0 1
Gene_4 1 1 1
I want to sum the rows (genes) to calculate how many times a gene is present in distinct pathways, and thus get something like this:
Path_A Path_B Path_C
Gene_1 0 1 0
Gene_2 2 2 0
Gene_3 0 0 1
Gene_4 3 3 3
At this point, I tried using my_df <- mutate(my_df, sum = rowSums(my_df))
to create a new column sum and then recode the 1 with sum value for each pathway column; however, I failed.
Thanks in advance
You could use dplyr
but the base R solution akrun posted is more reasonable:
library(dplyr)
df1 %>%
mutate(across(Path_A:Path_C, ~ .x * rowSums(across(Path_A:Path_C))))
returns
Path_A Path_B Path_C
Gene_1 0 1 0
Gene_2 2 2 0
Gene_3 0 0 1
Gene_4 3 3 3