python python-3.x pandas dataframe data-analysis

Python - generate columns with values based on a condition

I have some data like shown in the left side of the image with columns like group, date, open, close, exit.

What I want to do is generate 2 more columns (like shown on the right) with date_open and date_close.

The logic is that for each group, the date_open should be the first entry in the data (for that group) and, for date_close it should be the last entry only when exit is 1 else there is no exit date (0 or NaN).

I tried it using for loop and index and enumerate but could not acheive the output. Is there any way to get the desired output?

group   date    open    close   exit
A      Jan-22    2       0       0 
A      Feb-22    2       0       0
A      Mar-22    2       0       0
A      Apr-22    1       1       0
A      May-22    1       1       0
A      Jun-22    0       2       1
B      Mar-23    3       0       0
B      Apr-23    3       0       0
B      May-23    3       0       0
B      Jun-23    3       1       0
C      Sep-21    1       0       0
C      Oct-21    1       0       0
C      Nov-21    1       0       0
C      Dec-21    1       0       0
C      Jan-22    0       1       1

Solution

In [29]: df = pd.read_csv('data.tsv', delim_whitespace=' ')

In [30]: df
Out[30]:
   group    date  open  close  exit
0      A  Jan-22     2      0     0
1      A  Feb-22     2      0     0
2      A  Mar-22     2      0     0
3      A  Apr-22     1      1     0
4      A  May-22     1      1     0
5      A  Jun-22     0      2     1
6      B  Mar-23     3      0     0
7      B  Apr-23     3      0     0
8      B  May-23     3      0     0
9      B  Jun-23     3      1     0
10     C  Sep-21     1      0     0
11     C  Oct-21     1      0     0
12     C  Nov-21     1      0     0
13     C  Dec-21     1      0     0
14     C  Jan-22     0      1     1

In [31]: df['date'] = df.date.apply(lambda s: dt.datetime.strptime(s, "%b-%y"))

In [32]: df
Out[32]:
   group       date  open  close  exit
0      A 2022-01-01     2      0     0
1      A 2022-02-01     2      0     0
2      A 2022-03-01     2      0     0
3      A 2022-04-01     1      1     0
4      A 2022-05-01     1      1     0
5      A 2022-06-01     0      2     1
6      B 2023-03-01     3      0     0
7      B 2023-04-01     3      0     0
8      B 2023-05-01     3      0     0
9      B 2023-06-01     3      1     0
10     C 2021-09-01     1      0     0
11     C 2021-10-01     1      0     0
12     C 2021-11-01     1      0     0
13     C 2021-12-01     1      0     0
14     C 2022-01-01     0      1     1

In [33]: opens = df[['group', 'date']].groupby(by='group').min()

In [34]: opens
Out[34]:
            date
group
A     2022-01-01
B     2023-03-01
C     2021-09-01

In [35]: opens.rename(columns={'date': 'date_open'})
Out[35]:
       date_open
group
A     2022-01-01
B     2023-03-01
C     2021-09-01

In [36]: opens = opens.rename(columns={'date': 'date_open'})

In [37]: opens
Out[37]:
       date_open
group
A     2022-01-01
B     2023-03-01
C     2021-09-01

In [38]: closes = df[df.exit==1][['group', 'date']].groupby(by='group').max()

In [39]: closes = closes.rename(columns={'date': 'date_close'})

In [40]: closes
Out[40]:
      date_close
group
A     2022-06-01
C     2022-01-01

In [41]: opens.join(df.set_index('group')).join(closes)
Out[41]:
       date_open       date  open  close  exit date_close
group
A     2022-01-01 2022-01-01     2      0     0 2022-06-01
A     2022-01-01 2022-02-01     2      0     0 2022-06-01
A     2022-01-01 2022-03-01     2      0     0 2022-06-01
A     2022-01-01 2022-04-01     1      1     0 2022-06-01
A     2022-01-01 2022-05-01     1      1     0 2022-06-01
A     2022-01-01 2022-06-01     0      2     1 2022-06-01
B     2023-03-01 2023-03-01     3      0     0        NaT
B     2023-03-01 2023-04-01     3      0     0        NaT
B     2023-03-01 2023-05-01     3      0     0        NaT
B     2023-03-01 2023-06-01     3      1     0        NaT
C     2021-09-01 2021-09-01     1      0     0 2022-01-01
C     2021-09-01 2021-10-01     1      0     0 2022-01-01
C     2021-09-01 2021-11-01     1      0     0 2022-01-01
C     2021-09-01 2021-12-01     1      0     0 2022-01-01
C     2021-09-01 2022-01-01     0      1     1 2022-01-01