Search code examples
pythonpandasdataframefillna

Stopping a forward fill based on a different string column changing in the DF


i'm new to Python so go easy!

I have a Dataframe like the following. I would like to forward fill the NaN's in the shares_owned column but stop when the string in df['ticker'] changes. And only start when another number appears in shares_owned again.

date ticker shares_owned price
01/01/2020 EZY NaN £2
02/01/2020 EZY 10 £2.1
03/01/2020 EZY NaN £2.12
04/01/2020 EZY NaN £12.5
01/01/2020 FTSE NaN £11
02/01/2020 FTSE NaN £12
03/01/2020 FTSE 2 £12.5
04/01/2020 FTSE NaN £12.5

For example, the output table would look like this:

date ticker shares_owned price
01/01/2020 EZY NaN £2
02/01/2020 EZY 10 £2.1
03/01/2020 EZY 10 £2.12
04/01/2020 EZY 10 £12.5
01/01/2020 FTSE NaN £11
02/01/2020 FTSE NaN £12
03/01/2020 FTSE 2 £12.5
04/01/2020 FTSE 2 £12.5

So far I have been trying to use the .fillna(method='ffill') to no avail.


Solution

    • you note groups, hence groupby() does the grouping
    • within the group fillna(method="fill") within a transform()
    df = pd.read_csv(io.StringIO("""date    ticker  shares_owned    price
    01/01/2020  EZY NaN £2
    02/01/2020  EZY 10  £2.1
    03/01/2020  EZY NaN £2.12
    04/01/2020  EZY NaN £12.5
    01/01/2020  FTSE    NaN £11
    02/01/2020  FTSE    NaN £12
    03/01/2020  FTSE    2   £12.5
    04/01/2020  FTSE    NaN £12.5"""), sep="\t")
    
    df["shares_owned"] = df.groupby("ticker")["shares_owned"].transform(lambda s: s.fillna(method="ffill"))
    
    

    output

    date ticker shares_owned price
    0 01/01/2020 EZY nan £2
    1 02/01/2020 EZY 10 £2.1
    2 03/01/2020 EZY 10 £2.12
    3 04/01/2020 EZY 10 £12.5
    4 01/01/2020 FTSE nan £11
    5 02/01/2020 FTSE nan £12
    6 03/01/2020 FTSE 2 £12.5
    7 04/01/2020 FTSE 2 £12.5