Search code examples
pythonpandasmergemultiple-columnsreturn-type

Apply pandas function to column to create multiple new columns?


How to do this in pandas:

I have a function extract_text_features on a single text column, returning multiple output columns. Specifically, the function returns 6 values.

The function works, however there doesn't seem to be any proper return type (pandas DataFrame/ numpy array/ Python list) such that the output can get correctly assigned df.ix[: ,10:16] = df.textcol.map(extract_text_features)

So I think I need to drop back to iterating with df.iterrows(), as per this?

UPDATE: Iterating with df.iterrows() is at least 20x slower, so I surrendered and split out the function into six distinct .map(lambda ...) calls.

UPDATE 2: this question was asked back around v0.11.0, before the useability of df.apply was improved or df.assign() was added in v0.16. Hence much of the question and answers are not too relevant since then.


Solution

  • Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

    df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
        left_index=True, right_index=True)
    
        textcol  feature1  feature2
    0  0.772692  1.772692 -0.227308
    1  0.857210  1.857210 -0.142790
    2  0.065639  1.065639 -0.934361
    3  0.819160  1.819160 -0.180840
    4  0.088212  1.088212 -0.911788
    

    EDIT: Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !