Search code examples
pythondataframelambdasplit

Python Split and assign into new Columns


I have been worked in a dataframe with one column which has data for different calls to APIs. I need to split by "," and add new columns depends of the method's name (each API contains different input names).

For example:

Column A Column B
callToAPI1(), Peter, 1979-07-01, male callToAPI1()
callToAPI2(), Roxana, 1980-01-01, female, Doctor, University of Toronto callToAPI2()
callToAPI1(), Adrian, 1998-03-15, male callToAPI1()

I used the next instruction: df.loc[df['Column B'] == 'callToAPI1()', ['Name','Date','Sex']] = df['Column A'].str.split(', ',expand=False)[1]

It works but the instruction copy the same content of first row (new columns) into the next match, like this:

Column A Column B Name Date Sex Occupation University
callToAPI1(), Peter, 1979-07-01, male callToAPI1() Peter 1979-07-01 male null null
callToAPI1(), Adrian, 1998-03-15, male callToAPI1() Peter 1979-07-01 male null null

How can I resolve this issue? Thanks!


Solution

  • You can create a mask (df['Column B'] == 'callToAPI1()') and use df.loc:

    import pandas as pd
    
    
    def _run(df):
        df[['Method', 'Name', 'Date', 'Sex', 'Occupation', 'University']] = df['Column A'].str.split(',\\s*', expand=True)
        mask = df['Column B'] == 'callToAPI1()'
        df.loc[mask, ['Name', 'Date', 'Sex', 'Occupation', 'University']] = df.loc[mask,
                                                                                   ['Name', 'Date', 'Sex', 'Occupation', 'University']].ffill()
    
        df.drop(columns=['Column A', 'Column B'], inplace=True)
        return df
    
    
    df = pd.DataFrame({
        'Column A': ['callToAPI1(), Peter, 1979-07-01, male', 'callToAPI2(), Roxana, 1980-01-01, female, Doctor, University of Toronto', 'callToAPI1(), Adrian, 1998-03-15, male'],
        'Column B': ['callToAPI1()', 'callToAPI2()', 'callToAPI1()']
    })
    
    print(_run(df))
    
    
    

    Prints

             Method    Name        Date     Sex Occupation             University
    0  callToAPI1()   Peter  1979-07-01    male       None                   None
    1  callToAPI2()  Roxana  1980-01-01  female     Doctor  University of Toronto
    2  callToAPI1()  Adrian  1998-03-15    male       None                   None