i'm new to Python so go easy!
I have a Dataframe like the following. I would like to forward fill the NaN's in the shares_owned column but stop when the string in df['ticker'] changes. And only start when another number appears in shares_owned again.
date | ticker | shares_owned | price |
---|---|---|---|
01/01/2020 | EZY | NaN | £2 |
02/01/2020 | EZY | 10 | £2.1 |
03/01/2020 | EZY | NaN | £2.12 |
04/01/2020 | EZY | NaN | £12.5 |
01/01/2020 | FTSE | NaN | £11 |
02/01/2020 | FTSE | NaN | £12 |
03/01/2020 | FTSE | 2 | £12.5 |
04/01/2020 | FTSE | NaN | £12.5 |
For example, the output table would look like this:
date | ticker | shares_owned | price |
---|---|---|---|
01/01/2020 | EZY | NaN | £2 |
02/01/2020 | EZY | 10 | £2.1 |
03/01/2020 | EZY | 10 | £2.12 |
04/01/2020 | EZY | 10 | £12.5 |
01/01/2020 | FTSE | NaN | £11 |
02/01/2020 | FTSE | NaN | £12 |
03/01/2020 | FTSE | 2 | £12.5 |
04/01/2020 | FTSE | 2 | £12.5 |
So far I have been trying to use the .fillna(method='ffill') to no avail.
groupby()
does the groupingfillna(method="fill")
within a transform()
df = pd.read_csv(io.StringIO("""date ticker shares_owned price
01/01/2020 EZY NaN £2
02/01/2020 EZY 10 £2.1
03/01/2020 EZY NaN £2.12
04/01/2020 EZY NaN £12.5
01/01/2020 FTSE NaN £11
02/01/2020 FTSE NaN £12
03/01/2020 FTSE 2 £12.5
04/01/2020 FTSE NaN £12.5"""), sep="\t")
df["shares_owned"] = df.groupby("ticker")["shares_owned"].transform(lambda s: s.fillna(method="ffill"))
date | ticker | shares_owned | price | |
---|---|---|---|---|
0 | 01/01/2020 | EZY | nan | £2 |
1 | 02/01/2020 | EZY | 10 | £2.1 |
2 | 03/01/2020 | EZY | 10 | £2.12 |
3 | 04/01/2020 | EZY | 10 | £12.5 |
4 | 01/01/2020 | FTSE | nan | £11 |
5 | 02/01/2020 | FTSE | nan | £12 |
6 | 03/01/2020 | FTSE | 2 | £12.5 |
7 | 04/01/2020 | FTSE | 2 | £12.5 |