I'm trying to generate different 'total count' variables by companyid & year.
One 'total count' for subs, and one total count for loans.
Basically I'm trying to extend this question: Stata: Calculate sum of any x in y?
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 companyid int year float sub_num double sub_amt float(sub_year_total loan_num) double loan_amt float loan_year_total
"001004" 1999 . 0 425000 . 0 0
"001004" 1999 2 425000 425000 . 0 0
"001004" 2004 . 0 0 . 0 0
"001004" 2005 1 4232000 4232000 . 0 0
"001004" 2006 1 16000000 1.60e+07 . 0 0
"001004" 2007 3 58354 182444 . 0 0
"001078" 2006 . 0 471529 . 0 0
"001078" 2006 . 0 471529 . 0 0
"001078" 2006 . 0 471529 . 0 0
"001078" 2006 6 29872 471529 . 0 0
"001078" 2006 6 59748 471529 . 0 0
"001078" 2006 6 381909 471529 . 0 0
"001078" 2007 . 0 768825 7 270000 2580000
"001078" 2007 . 0 768825 7 360000 2580000
"001078" 2007 . 0 768825 7 1500000 2580000
"001078" 2007 . 0 768825 7 450000 2580000
"001078" 2007 . 0 768825 . 0 2580000
"001078" 2007 7 359454 768825 . 0 2580000
"001078" 2007 7 409371 768825 . 0 2580000
"001078" 2008 . 0 1751832 5 450000 2450000
"001078" 2008 . 0 1751832 5 2000000 2450000
"001078" 2008 5 47957 1751832 . 0 2450000
"001078" 2008 5 485631 1751832 . 0 2450000
"001078" 2008 5 1218244 1751832 . 0 2450000
end
To note: If sub_num = 0 then loan_num != 0, and vice versa.
I've tried bysort cik year: gen sub_num = _N if loan_amt != 0
and bysort cik year loan_amt: gen sub_num = _N
but neither really does it. I've left my failed count variables in the examples for reference.
i.e. company #001078 in 2007 would have loan_num = 4 and sub_num = 2
I just noticed this example has one observation that has 0 for both, I can just eliminate entries that have 0 for both so no need to comment on that.
How can I make company total annual counts for my 'sub' and 'loan' variables?
This is a little hard to follow.
There is reference to cik
in your code but it is not in your data example.
It is hard to know what is original data and what is the result of calculations you have tried.
The example seems more complicated than necessary.
Although the title refers to sums, it is also clear that you are interested in counting loans of certain kinds.
A count is a sum of indicators, so this shows some technique rather than necessarily being an answer. Feed to egen, total()
a true-or-false expression and the result will be the count of observations for which the expression is true (1); arguments that are false (0) are ignored in the sense that they make no difference to the sum.
bysort companyid year : egen wanted1 = total(loan_amt > 0)
bysort companyid year : egen wanted2 = total(loan_amt > 0 & sub_num < .)
_N
is just the number of observations, sometimes conditional on other variables. You naturally can assign that number to a variable, but also specifying an if
qualifier doesn't make the calculation ignore the excluded values; it just affects which observations are ignored in receiving non-missing values. Consider this experiment:
. clear
. set obs 1000
number of observations (_N) was 0, now 1,000
. gen count = _N if _n == 1
(999 missing values generated)
. l count in 1
+-------+
| count |
|-------|
1. | 1000 |
+-------+
Otherwise put, _N
is not as general a counting method as you need here.