I have been worked in a dataframe with one column which has data for different calls to APIs. I need to split by "," and add new columns depends of the method's name (each API contains different input names).
For example:
Column A | Column B |
---|---|
callToAPI1(), Peter, 1979-07-01, male | callToAPI1() |
callToAPI2(), Roxana, 1980-01-01, female, Doctor, University of Toronto | callToAPI2() |
callToAPI1(), Adrian, 1998-03-15, male | callToAPI1() |
I used the next instruction:
df.loc[df['Column B'] == 'callToAPI1()', ['Name','Date','Sex']] = df['Column A'].str.split(', ',expand=False)[1]
It works but the instruction copy the same content of first row (new columns) into the next match, like this:
Column A | Column B | Name | Date | Sex | Occupation | University |
---|---|---|---|---|---|---|
callToAPI1(), Peter, 1979-07-01, male | callToAPI1() | Peter | 1979-07-01 | male | null | null |
callToAPI1(), Adrian, 1998-03-15, male | callToAPI1() | Peter | 1979-07-01 | male | null | null |
How can I resolve this issue? Thanks!
You can create a mask (df['Column B'] == 'callToAPI1()'
) and use df.loc:
import pandas as pd
def _run(df):
df[['Method', 'Name', 'Date', 'Sex', 'Occupation', 'University']] = df['Column A'].str.split(',\\s*', expand=True)
mask = df['Column B'] == 'callToAPI1()'
df.loc[mask, ['Name', 'Date', 'Sex', 'Occupation', 'University']] = df.loc[mask,
['Name', 'Date', 'Sex', 'Occupation', 'University']].ffill()
df.drop(columns=['Column A', 'Column B'], inplace=True)
return df
df = pd.DataFrame({
'Column A': ['callToAPI1(), Peter, 1979-07-01, male', 'callToAPI2(), Roxana, 1980-01-01, female, Doctor, University of Toronto', 'callToAPI1(), Adrian, 1998-03-15, male'],
'Column B': ['callToAPI1()', 'callToAPI2()', 'callToAPI1()']
})
print(_run(df))
Method Name Date Sex Occupation University
0 callToAPI1() Peter 1979-07-01 male None None
1 callToAPI2() Roxana 1980-01-01 female Doctor University of Toronto
2 callToAPI1() Adrian 1998-03-15 male None None