R - adding columns together, depending on column name

I have a dataframe (df1) that includes abundances of different species in each sample:

> SampleID   Sp1   Sp2   Sp3   Sp4   ... Spn
> asb-001      3     0     0    23         9
> asb-002      4    15    10    56        98
> asb-003      8    45     8   453         0
> asb-004      0     5     0     3         6
> asb-005    120    56     0     0         0
...

Each column represents a different species.

I have another dataframe (df2)

Sp     Fam
Sp1   Fam1
Sp2   Fam2
Sp3   Fam1
Sp4   Fam3
Sp5   Fam2
Sp6   Fam1
...

There are fewer Family names than there are Species names.

I would like to add columns together depending on the species name to get a total for the family (e.g. Sp1 + Sp3 + Sp6 = Fam1). I don't want to keep the original column with the species name.

If all goes according to plan, my new dataframe (df3) will look something like this:

> SampleID  Fam1  Fam2  Fam3 
> asb-001     12     0     9 
> asb-002     14    18   112 
> asb-003     28    58    18
> asb-004     10    12    10
> asb-005    142    65     0
...

I could manually go through and add individual species together, but this seems tedious for a large dataset, and likely to produce errors. I have the feeling that I should melt df1, but I'm not confident in the details. Any advice would be appreciated!

Solution

Change your original data to be in the long format and join against the species-family mapping. You can then compute summation for each family using group_by. Finally, spread the data back out to be in wide format.

library( tidyverse )

df1 %>% gather( Sp, Value, -SampleID ) %>%     # Convert to long format
  inner_join( df2 ) %>%                        # Combine with family mapping
  group_by( SampleID, Fam ) %>%                # Work on each sample/family pair
  summarize( ValSum = sum(Value) ) %>%         # Compute the sum across species
  ungroup %>% spread( Fam, ValSum, fill=0 )    # Convert back to wide format

Depending on whether each sample is represented in each family, you may get NA after converting back to wide format. The optional parameter fill=0 takes care of converting these NA to 0.