I have the following dataframe:
N1 FF1; FF2 PP1; PP2
N2 FF3 PP3
N3 FF4; FF5; FF6 PP4; PP5; PP6
Columns FF & PP have the following relation, every FF is linked to PP with the same number, i.e. FF1 is linked to PP1 and so forth.
The output that I need is as follows:
N1 FF1 PP1
N1 FF2 PP2
N2 FF3 PP3
N3 FF4 PP4
N3 FF5 PP5
N3 FF6 PP6
So far I have tried the following:
df_copy = df["F","P"].str.split(";").apply(Series,1).stack()
However I receive a KeyError and am puzzled on how to proceed...
Very new to programming, I'd highly appreciate some guidance.
Thanks!
With Pandas / NumPy, you should use vectorised methods where possible. apply
+ pd.Series
is an inefficient, Python-level loop.
For example, using itertools.chain
and np.repeat
:
from itertools import chain
df = pd.DataFrame([['N1', 'FF1; FF2', 'PP1; PP2'],
['N2', 'FF3', 'PP3'],
['N3', 'FF4; FF5; FF6', 'PP4; PP5; PP6']])
split1, split2 = df[1].str.split('; '), df[2].str.split('; ')
n = split1.str.len()
res = pd.DataFrame({0: df[0].values.repeat(n.values),
1: list(chain.from_iterable(split1)),
2: list(chain.from_iterable(split2))})
print(res)
0 1 2
0 N1 FF1 PP1
1 N1 FF2 PP2
2 N2 FF3 PP3
3 N3 FF4 PP4
4 N3 FF5 PP5
5 N3 FF6 PP6