I have data in Stata in this format:
Value | Treatment | Area |
---|---|---|
1.1 | 1 | 1 |
2.5 | 1 | 2 |
3.4 | 0 | 3 |
6.7 | 0 | 1 |
3.6 | 0 | 2 |
2.9 | 1 | 3 |
2.5 | 0 | 1 |
7.8 | 1 | 3 |
I want to get a sum of the values in "Value" by area and by treatment status. This part I was able to do using:
by Area Treatment, sort: egen desired_variable = total(Value)
What I want is for each row of the dataset, a sum of the values in "Value" for all the treated AND untreated observations in the area. The issue with this is that it gives me the total for Treatment = 1 and Treatment = 0 for each Area, depending on its own treatment status.
I also tried:
by Area, sort: egen desired_variable = total(Value) if treatment == 1
and
by Area, sort: egen desired_variable = total(Value) if treatment == 0
But this is a different version of the same problem, since in a row where the treatment is = 0, the first line of code will just give an NA, and vice versa with the second one.
This kind of calculation is sometimes needed:
clear
input float value byte(treatment area)
1.1 1 1
2.5 1 2
3.4 0 3
6.7 0 1
3.6 0 2
2.9 1 3
2.5 0 1
7.8 1 3
end
bysort area: egen total1 = total(cond(treatment == 1, value, .))
by area: egen total0 = total(cond(treatment == 0, value, .))
See also Section 9 in https://journals.sagepub.com/doi/pdf/10.1177/1536867X1101100210