I have a data set with information about firms:
clear
input firm_id str6 industry int fyear int
1084 7372 2010
1084 7375 2010
1084 7372 2011
1084 7375 2011
1084 7372 2012
1084 7375 2012
1084 7372 2013
1084 7375 2013
1084 7372 2014
1084 7375 2014
1094 2865 2002
1094 2879 2002
1094 5122 2002
1094 5169 2002
1094 2865 2003
1094 2879 2003
1094 5122 2003
1094 5169 2003
1094 2865 2004
1094 2879 2004
1094 5122 2004
1094 5169 2004
1094 2865 2005
1094 2879 2005
1094 5122 2005
1094 5169 2005
1094 2865 2006
1094 2879 2006
1094 5122 2006
1094 5169 2006
1094 2865 2007
1094 2879 2007
1094 5169 2007
1094 2865 2008
1094 2879 2008
end
Aside from a firm_id
it includes information about which industries the firm is active in a given year.
How can I find how many industries a firm left and entered, in a given year?
I know that I could do this by writing a "loop in a loop" that looks at every individual observation and checks if the same firm_id
and industry
combination exists for year
+1. But my data set is large, so that would be extremely inefficient.
I also contemplated solutions using reshape wide
, but could not find a solution to my problem either (and of course this creates an extremely large number of variables and is not efficient either).
If you are trying to generate a single observation for each firm with the number of industries a firm enters and exits in each year, I believe the following code should work. The variables enter
and leave
indicate (respectively) if a firm enters or exists the industry in a given observation. Using a foreach
loop over the years in the data you can then generate each of the variables indicating whether the firm enters or exits each year.
bys firm_id industry (fyear): gen prevyear = fyear[_n-1]
gen yrdifpast = fyear - prevyear
gen enter = yrdifpast > 1
bys firm_id industry (fyear): gen nextyear = fyear[_n+1]
gen yrdiffuture = nextyear - fyear
gen leave = yrdiffuture > 1
levelsof fyear, local(years)
foreach yr of local years {
gen in_`yr' = fyear==`yr'&enter==1
gen out_`yr' = fyear==`yr'&leave==1
}
collapse (sum) in_* out_*, by(firm_id)
list
+----------------------------------------------------------------------------------------------+
1. | firm_id | in_2002 | in_2003 | in_2004 | in_2005 | in_2006 | in_2007 | in_2008 | in_2010 |
| 1084 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
|---------+---------+---------+---------+----------+----------+----------+----------+----------|
| in_2011 | in_2012 | in_2013 | in_2014 | out_2002 | out_2003 | out_2004 | out_2005 | out_2006 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|----------------------------------------------------------------------------------------------|
| out_2007 | out_2008 | out_2010 | out_2011 | out_2012 | out_2013 | out_2014 |
| 0 | 0 | 0 | 0 | 0 | 0 | 2 |
+----------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------+
2. | firm_id | in_2002 | in_2003 | in_2004 | in_2005 | in_2006 | in_2007 | in_2008 | in_2010 |
| 1094 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|---------+---------+---------+---------+----------+----------+----------+----------+----------|
| in_2011 | in_2012 | in_2013 | in_2014 | out_2002 | out_2003 | out_2004 | out_2005 | out_2006 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
|----------------------------------------------------------------------------------------------|
| out_2007 | out_2008 | out_2010 | out_2011 | out_2012 | out_2013 | out_2014 |
| 1 | 2 | 0 | 0 | 0 | 0 | 0 |
+----------------------------------------------------------------------------------------------+