I have three data frames with the same variables for the same firms but corresponding to different years (2016, 2017, 2018). The ID variable corresponds to a firm ID.
df2016 = pd.DataFrame({"ID": [99,101,102,103,104], "A": [1,2,3,4,5], "B": [2,4,6,8,10], "year": [2016,2016,2016,2016,2016]})
ID A B year
0 99 1 2 2016
1 101 2 4 2016
2 102 3 6 2016
3 103 4 8 2016
4 104 5 10 2016
df2017 = pd.DataFrame({"ID": [100,101,102,104], "A": [5,6,7,8], "B": [9,11,13,15], "year": [2017,2017,2017,2017]})
ID A B year
0 100 5 9 2017
1 101 6 11 2017
2 102 7 13 2017
3 104 8 15 2017
df2018 = pd.DataFrame({"ID": [100,106], "A": [6,8], "B": [13,15], "year": [2018,2018]})
ID A B year
0 100 6 13 2018
1 106 8 15 2018
I want to determine the number of entering firms and exiting firms in each year.
Specifically, I would like:
entry
that equals 1 if the firm ID was not in the data last year and is the data this year; and zero otherwise.exit
that equals 1 if the firm ID is in the data this year but is not in the data next year; and zero otherwise. A B enter exit
ID year
99 2016 1.0 2.0 1 1
101 2016 2.0 4.0 1 0
102 2016 3.0 6.0 1 0
103 2016 4.0 8.0 1 1
104 2016 5.0 10.0 1 0
100 2017 5.0 9.0 1 0
101 2017 6.0 11.0 0 1
102 2017 7.0 13.0 0 1
104 2017 8.0 15.0 0 1
100 2018 6.0 13.0 0 0
106 2018 8.0 15.0 1 0
I start by concatenating the three years of data and setting the multi-index to (ID, year):
df = pd.concat([df2016, df2017, df2018])
df.set_index(["ID", "year"], inplace=True)
A B
ID year
99 2016 1 2
101 2016 2 4
102 2016 3 6
103 2016 4 8
104 2016 5 10
100 2017 5 9
101 2017 6 11
102 2017 7 13
104 2017 8 15
100 2018 6 13
106 2018 8 15
Not sure where to go next.
You can use:
df = pd.concat([df2016, df2017, df2018], ignore_index=True)
g = df.groupby('ID')['year']
df['entry'] = g.diff(1).ne(1).astype(int)
df['exit'] = g.diff(-1).ne(-1).astype(int)
print(df)
# Output
ID A B year entry exit
0 99 1 2 2016 1 1
1 101 2 4 2016 1 0
2 102 3 6 2016 1 0
3 103 4 8 2016 1 1
4 104 5 10 2016 1 0
5 100 5 9 2017 1 0
6 101 6 11 2017 0 1
7 102 7 13 2017 0 1
8 104 8 15 2017 0 1
9 100 6 13 2018 0 1
10 106 8 15 2018 1 1