Search code examples
stata

How do I get a sum of Var_c across Var_a and Var_b?


I am trying to find a sum across two variables.

If I have the following data:

Name   Commodity        Amount_cmdt

Alex       apple           5
Ben        orange          10
Chris      apple           25
Alex       orange          10
Alex       apple           10
Chris      orange          10
Ben        apple            5  

I want a final dataset looking like this:

Name   Commodity      Amount_cmdt       total_apple    total_orange

Alex       apple           5                   15              10
Ben        orange          10                  5               10
Chris      apple           25                  25              20
Alex       orange          10                  15              10
Alex       apple           10                  15              10
Chris      orange          10                  25              20
Ben        apple            5                   5              10 
Chris      orange          10                  25              20   

Ultimately, when I have the number of apples and oranges each person has, I can drop the duplicates. But how do I formulate the statement:

if name = Chris and Commodity = orange, then total_orange = sum(Amount_cmdt)?

I wrote the following, but it is summing all apples or all oranges irrespective of name:

foreach var of varlist Name {
    foreach var of varlist Commodity {
        replace total_apple = sum( Amount_cmdt) if Commodity == "apple"
        replace total_orange = sum( Amount_cmdt) if Commodity == "orange"
    }
}

list

Solution

  • Using your toy example:

    clear
    
    input strL(name commodity) amount total_apple total_orange
    Alex       apple           5                   15              10
    Ben        orange          10                  5               10
    Chris      apple           25                  25              20
    Alex       orange          10                  15              10
    Alex       apple           10                  15              10
    Chris      orange          10                  25              20
    Ben        apple            5                   5              10 
    Chris      orange          10                  25              20 
    end
    

    The following works for me:

    bysort name commodity: egen totals = total(amount)
    bysort name (commodity): generate totalapple = totals[1]
    bysort name (commodity): generate totalorange = totals[_N]
    
    list name commodity amount total_apple totalapple total_orange totalorange, abbreviate(15)
    
         +------------------------------------------------------------------------------------+
         |  name   commodity   amount   total_apple   totalapple   total_orange   totalorange |
         |------------------------------------------------------------------------------------|
      1. |  Alex       apple        5            15           15             10            10 |
      2. |  Alex       apple       10            15           15             10            10 |
      3. |  Alex      orange       10            15           15             10            10 |
      4. |   Ben       apple        5             5            5             10            10 |
      5. |   Ben      orange       10             5            5             10            10 |
         |------------------------------------------------------------------------------------|
      6. | Chris       apple       25            25           25             20            20 |
      7. | Chris      orange       10            25           25             20            20 |
      8. | Chris      orange       10            25           25             20            20 |
         +------------------------------------------------------------------------------------+
    

    EDIT:

    You can generalize this for more than two commodities as follows:

    clear
    
    input strL(name commodity) amount 
    Alex       apple           5     
    Ben        orange          10                 
    Chris      apricot         3
    Alex       apricot         4
    Ben        apricot         2
    Chris      apple           25         
    Alex       orange          10              
    Alex       apple           10         
    Chris      orange          10          
    Ben        apple            5             
    Chris      apricot         15
    Alex       apricot         6
    Chris      orange          10                
    end
    
    bysort name commodity: egen totals = total(amount)
    egen commodities = group(commodity)
    
    levelsof commodity, local(allcommodities) clean
    local i 0
    
    foreach var of local allcommodities {
        local ++i
        generate `var' = .
        bysort name (commodity): replace `var' = totals if commodities == `i'
        bysort name (commodity): egen total`var' = min(`var')
        drop `var'
    }
    
    drop commodities
    

    The modified code snippet will produce the desired output:

    list name commodity amount total*, abbreviate(15)
    
         +-------------------------------------------------------------------------------+
         |  name   commodity   amount   totals   totalapple   totalapricot   totalorange |
         |-------------------------------------------------------------------------------|
      1. |  Alex       apple        5       15           15             10            10 |
      2. |  Alex       apple       10       15           15             10            10 |
      3. |  Alex     apricot        6       10           15             10            10 |
      4. |  Alex     apricot        4       10           15             10            10 |
      5. |  Alex      orange       10       10           15             10            10 |
         |-------------------------------------------------------------------------------|
      6. |   Ben       apple        5        5            5              2            10 |
      7. |   Ben     apricot        2        2            5              2            10 |
      8. |   Ben      orange       10       10            5              2            10 |
      9. | Chris       apple       25       25           25             18            20 |
     10. | Chris     apricot        3       18           25             18            20 |
         |-------------------------------------------------------------------------------|
     11. | Chris     apricot       15       18           25             18            20 |
     12. | Chris      orange       10       20           25             18            20 |
     13. | Chris      orange       10       20           25             18            20 |
         +-------------------------------------------------------------------------------+