I am trying to find a sum across two variables.
If I have the following data:
Name Commodity Amount_cmdt
Alex apple 5
Ben orange 10
Chris apple 25
Alex orange 10
Alex apple 10
Chris orange 10
Ben apple 5
I want a final dataset looking like this:
Name Commodity Amount_cmdt total_apple total_orange
Alex apple 5 15 10
Ben orange 10 5 10
Chris apple 25 25 20
Alex orange 10 15 10
Alex apple 10 15 10
Chris orange 10 25 20
Ben apple 5 5 10
Chris orange 10 25 20
Ultimately, when I have the number of apples and oranges each person has, I can drop the duplicates. But how do I formulate the statement:
if name = Chris and Commodity = orange, then total_orange = sum(Amount_cmdt)?
I wrote the following, but it is summing all apples or all oranges irrespective of name:
foreach var of varlist Name {
foreach var of varlist Commodity {
replace total_apple = sum( Amount_cmdt) if Commodity == "apple"
replace total_orange = sum( Amount_cmdt) if Commodity == "orange"
}
}
list
Using your toy example:
clear
input strL(name commodity) amount total_apple total_orange
Alex apple 5 15 10
Ben orange 10 5 10
Chris apple 25 25 20
Alex orange 10 15 10
Alex apple 10 15 10
Chris orange 10 25 20
Ben apple 5 5 10
Chris orange 10 25 20
end
The following works for me:
bysort name commodity: egen totals = total(amount)
bysort name (commodity): generate totalapple = totals[1]
bysort name (commodity): generate totalorange = totals[_N]
list name commodity amount total_apple totalapple total_orange totalorange, abbreviate(15)
+------------------------------------------------------------------------------------+
| name commodity amount total_apple totalapple total_orange totalorange |
|------------------------------------------------------------------------------------|
1. | Alex apple 5 15 15 10 10 |
2. | Alex apple 10 15 15 10 10 |
3. | Alex orange 10 15 15 10 10 |
4. | Ben apple 5 5 5 10 10 |
5. | Ben orange 10 5 5 10 10 |
|------------------------------------------------------------------------------------|
6. | Chris apple 25 25 25 20 20 |
7. | Chris orange 10 25 25 20 20 |
8. | Chris orange 10 25 25 20 20 |
+------------------------------------------------------------------------------------+
EDIT:
You can generalize this for more than two commodities as follows:
clear
input strL(name commodity) amount
Alex apple 5
Ben orange 10
Chris apricot 3
Alex apricot 4
Ben apricot 2
Chris apple 25
Alex orange 10
Alex apple 10
Chris orange 10
Ben apple 5
Chris apricot 15
Alex apricot 6
Chris orange 10
end
bysort name commodity: egen totals = total(amount)
egen commodities = group(commodity)
levelsof commodity, local(allcommodities) clean
local i 0
foreach var of local allcommodities {
local ++i
generate `var' = .
bysort name (commodity): replace `var' = totals if commodities == `i'
bysort name (commodity): egen total`var' = min(`var')
drop `var'
}
drop commodities
The modified code snippet will produce the desired output:
list name commodity amount total*, abbreviate(15)
+-------------------------------------------------------------------------------+
| name commodity amount totals totalapple totalapricot totalorange |
|-------------------------------------------------------------------------------|
1. | Alex apple 5 15 15 10 10 |
2. | Alex apple 10 15 15 10 10 |
3. | Alex apricot 6 10 15 10 10 |
4. | Alex apricot 4 10 15 10 10 |
5. | Alex orange 10 10 15 10 10 |
|-------------------------------------------------------------------------------|
6. | Ben apple 5 5 5 2 10 |
7. | Ben apricot 2 2 5 2 10 |
8. | Ben orange 10 10 5 2 10 |
9. | Chris apple 25 25 25 18 20 |
10. | Chris apricot 3 18 25 18 20 |
|-------------------------------------------------------------------------------|
11. | Chris apricot 15 18 25 18 20 |
12. | Chris orange 10 20 25 18 20 |
13. | Chris orange 10 20 25 18 20 |
+-------------------------------------------------------------------------------+