Search code examples
rsumdataframeaggregatesubtotal

Aggregating (subtotals) in data frame with multiple factor (character) variables


I have a table (data.frame) with numerical data & factors data of which several are character variables (e.g. 'species', 'Fam_name', 'gear') where I want to calculate the subtotals (sum) for the 'weight' and 'number' variables for each 'ss'.

I have tried using the 'aggregate' function, but I have failed to get it to return the character value for the 'gear' variable.

Below is the head of my table

   survey station         ss species weight number bdep      lon      lat                       Sci_name       Fam_name gear
1 2012901       1 2012901001 CARSC04  11.20     20   23 37.61650 19.14900        Scomberoides lysan     CARANGIDAE   TB
2 2012901       1 2012901001 SCMGR02   0.98      2   23 37.61650 19.14900 Grammatorcynus bilineatus     SCOMBRIDAE   TB
3 2012901       2 2012901002 NOCATCH   0.00      0    6 38.48333 18.71667                  NO CATCH       NO CATCH   TB
4 2012901       3 2012901003 LUTLU06   5.65      1    6 38.48333 18.71667            Lutjanus bohar     LUTJANIDAE   TB
5 2012901       3 2012901003 SHACAB1   4.00      1    6 38.48333 18.71667         Triaenodon obesus CARCHARHINIDAE   TB
6 2012901       4 2012901004 NOCATCH   0.00      0    9 38.48333 18.71667                  NO CATCH       NO CATCH   TB

I tried using the following code with the intent of combining the two using bind,

catch1<-aggregate(cbind(weight, number) ~ ss, data = catch, FUN = sum) 

catch2<-aggregate(cbind(survey, station, bdep, lon, lat, gear) ~ ss, data = catch, FUN=median) 

but while the first line does what I want it to - sums for each 'ss', the other results in numerical median for 'gear' whereas I want it to return the 'gear' code for that particular 'ss'.

Reconstruction of the 'gear' factor (thanks to BrodieG):

catch2$gear <- factor(levels(catch$gear)[catch2$gear], levels=levels(catch$gear))

Problem solved :-)


Solution

  • Your problem is that gear is a factor, so median is returning the median of the numerical values of the factor. Try:

    catch2$gear <- factor(levels(catch$gear)[catch2$gear], levels=levels(catch$gear))
    

    or something like it to reconstruct the factor for catch2.