Search code examples
rvariablessumextractr-factor

How to sum a variable and extract it related variable to appear in the answer


I am new to R and to Stackoverflow and I need an assistant in sorting and extracting information from a data frame I created. I need to extract which IATA and NAME has received the most commission. The result should print: 3301, you pay, 12. I can subset each and every IATA but it is a long process. What will be the best function in R to sort all this information and print out this information.

  IATA          NAME TICKET_NUM            PAX FARE TAX COMM NET
3300      pay more        700     john cohen   10 1.1    2   8
3300      pay more        701     james levy   11 1.2    2   9
3300      pay more        702 jonathan arbel   12 1.2    3   9
3300      pay more        703      gil matan    9 1.0    2   7
3301      you pay         704      ron natan   19 2.0    6   9
3301      you pay         705    don horvitz   18 2.0    6   9
3302 pay by ticket        706  lutter kaplan    9 1.2    0   9
3303         enjoy        707   lutter omega   12 1.2    0  12
3303         enjoy        708   graig daniel   14 1.3    1  13
3303         enjoy        730 orly rotenberg   15 1.0    1  14
3303         enjoy        731     yohan bach   12 1.0    1  11

Solution

  • This seems to return what you requested (using Jeremy's code for the second part):

    comm <- read.table(text = '
    
    IATA          NAME TICKET_NUM            PAX FARE TAX COMM NET
    3300      pay.more        700     john.cohen   10 1.1    2   8
    3300      pay.more        701     james.levy   11 1.2    2   9
    3300      pay.more        702 jonathan.arbel   12 1.2    3   9
    3300      pay.more        703      gil.matan    9 1.0    2   7
    3301       you.pay        704      ron.natan   19 2.0    6   9
    3301       you.pay        705    don.horvitz   18 2.0    6   9
    3302 pay.by.ticket        706  lutter.kaplan    9 1.2    0   9
    3303         enjoy        707   lutter.omega   12 1.2    0  12
    3303         enjoy        708   graig.daniel   14 1.3    1  13
    3303         enjoy        730 orly.rotenberg   15 1.0    1  14
    3303         enjoy        731     yohan.bach   12 1.0    1  11
    
    ', header=TRUE, stringsAsFactors = FALSE)
    
    comm2 <- with(comm, aggregate(COMM ~ IATA + NAME, FUN = function(x) sum(x, na.rm = TRUE)))
    comm2
    
    max_comm <- comm2[comm2$COMM == max(comm2$COMM),]
    max_comm
    
      IATA    NAME COMM
    4 3301 you.pay   12
    

    Here is an explanation of the first statement:

    The with function identifies the data set to use (here comm). The function aggregate is a general function for performing operations on groups. You want to operate on COMM by IATA and NAME. You write that: COMM ~ IATA + NAME. Next you specify the desired function to perform on COMM (here sum). You do that with FUN = function(x) sum(x). In case there are any missing observations in COMM I added na.rm = TRUE within the sum(x) function.