Search code examples
pythonpython-3.xpandasdataframegroup-by

Python - Forward fill with non gap date values with a loop condition


I have a data frame as below.

Index Date Group Name
0   Oct-10  A   Sam
1   Oct-10  A   Sam
2   Dec-12  A   Sam
3   Dec-12  A   Sam
4   May-13  A   Peter
5   May-14  A   Peter
6   May-15  A   Chris
7   Apr-17  A   Chris
8   Apr-17  A   Sam
9   Jul-23  A   Sam
10  Dec-12  B   Peter
11  Dec-12  B   Peter
12  May-13  B   Peter
13  May-14  B   Peter
14  Apr-18  B   Sam
15  Jul-19  B   Sam
16  Jul-20  B   Sam
17  Jul-22  B   Sam
18  Jul-23  B   Sam

I want to forward fill the date values for every "Group" in such a way that every person start date should be forward filled.

Example : Group A, Sam started from Oct 2010, hence till from index 0 to 3, it should be Oct 2010 for him. Peter started on May 2013, so same May 2013 for index 4 & 5. However, again Sam comes in April 2017 so it should be April 2017 for index 8 & 9.

I am finding this is very difficult because Sam is appearing twice in group A. Your help will be highly appriciated to solve this. I am trying someting like below but not giving any results.

for every group in df["Group"]:
      df["date2"] = (df["Date"].groupby(["Name"]).transform(lambda x: x.ffill()))

the result I want is below

Index Date Group Name date2
0   Oct-10  A   Sam   Oct-10
1   Oct-10  A   Sam   Oct-10
2   Dec-12  A   Sam   Oct-10
3   Dec-12  A   Sam   Oct-10
4   May-13  A   Peter May-13
5   May-14  A   Peter May-13
6   May-15  A   Chris May-15
7   Apr-17  A   Chris May-15
8   Apr-17  A   Sam   Apr-17
9   Jul-23  A   Sam   Apr-17
10  Dec-12  B   Peter Dec-12
11  Dec-12  B   Peter Dec-12
12  May-13  B   Peter Dec-12
13  May-14  B   Peter Dec-12
14  Apr-18  B   Sam   Apr-18
15  Jul-19  B   Sam   Apr-18
16  Jul-20  B   Sam   Apr-18
17  Jul-22  B   Sam   Apr-18
18  Jul-23  B   Sam   Apr-18

Solution

  • You can compute a mask with shift and any, then use it to select the first values and ffill:

    cols = ['Group', 'Name']
    m = df[cols].ne(df[cols].shift()).any(axis=1)
    
    df['date2'] = df['Date'].where(m).ffill()
    

    Output:

        Index    Date Group   Name   date2
    0       0  Oct-10     A    Sam  Oct-10
    1       1  Oct-10     A    Sam  Oct-10
    2       2  Dec-12     A    Sam  Oct-10
    3       3  Dec-12     A    Sam  Oct-10
    4       4  May-13     A  Peter  May-13
    5       5  May-14     A  Peter  May-13
    6       6  May-15     A  Chris  May-15
    7       7  Apr-17     A  Chris  May-15
    8       8  Apr-17     A    Sam  Apr-17
    9       9  Jul-23     A    Sam  Apr-17
    10     10  Dec-12     B  Peter  Dec-12
    11     11  Dec-12     B  Peter  Dec-12
    12     12  May-13     B  Peter  Dec-12
    13     13  May-14     B  Peter  Dec-12
    14     14  Apr-18     B    Sam  Apr-18
    15     15  Jul-19     B    Sam  Apr-18
    16     16  Jul-20     B    Sam  Apr-18
    17     17  Jul-22     B    Sam  Apr-18
    18     18  Jul-23     B    Sam  Apr-18
    

    Intermediates:

        Index    Date Group   Name   date2 Group_shift Name_shift  Group_eq_shift  Name_eq_shift      m
    0       0  Oct-10     A    Sam  Oct-10        None       None            True           True   True
    1       1  Oct-10     A    Sam  Oct-10           A        Sam           False          False  False
    2       2  Dec-12     A    Sam  Oct-10           A        Sam           False          False  False
    3       3  Dec-12     A    Sam  Oct-10           A        Sam           False          False  False
    4       4  May-13     A  Peter  May-13           A        Sam           False           True   True
    5       5  May-14     A  Peter  May-13           A      Peter           False          False  False
    6       6  May-15     A  Chris  May-15           A      Peter           False           True   True
    7       7  Apr-17     A  Chris  May-15           A      Chris           False          False  False
    8       8  Apr-17     A    Sam  Apr-17           A      Chris           False           True   True
    9       9  Jul-23     A    Sam  Apr-17           A        Sam           False          False  False
    10     10  Dec-12     B  Peter  Dec-12           A        Sam            True           True   True
    11     11  Dec-12     B  Peter  Dec-12           B      Peter           False          False  False
    12     12  May-13     B  Peter  Dec-12           B      Peter           False          False  False
    13     13  May-14     B  Peter  Dec-12           B      Peter           False          False  False
    14     14  Apr-18     B    Sam  Apr-18           B      Peter           False           True   True
    15     15  Jul-19     B    Sam  Apr-18           B        Sam           False          False  False
    16     16  Jul-20     B    Sam  Apr-18           B        Sam           False          False  False
    17     17  Jul-22     B    Sam  Apr-18           B        Sam           False          False  False
    18     18  Jul-23     B    Sam  Apr-18           B        Sam           False          False  False