Search code examples
pythonpandasdatabaselambda

How to apply a function to the values of one column, then take the output and apply it to multiple other columns in pandas


Alright y'all, maybe my strategy here isn't ideal, but I've got a very awkward dataset to work with and I need help.

I have a pandas dataframe that's structured such that only the first column has values:

df = 
|Ind| Column A | Column B | Column C |
| - | -------- | -------- | -------- |
| 0 | String1  | Null     | Null     |
| 1 | String2  | Null     | Null     |

What I'd like to do is iteratively take the value from Column A and put it through a function whose output is a list. From there I need to fill the remaining columns with the output of the function, such that:

df = 
|Ind| Column A | Column B         | Column C         |
| - | -------- | ---------------- | ---------------- |
| 0 | String1  | func(String1)[0] | func(String1)[1] |
| 1 | String2  | func(String2)[0] | func(String2)[1] |

Thus far I've been trying to do this using anonymous functions, as such:

df.iloc[:,1:].apply(lambda y: df["Column A"].apply(lambda x: list(map(func, x)))

Which almost does what I want, but does not map the list into the respective columns, and the result is instead:

df = 
|Ind| Column A | Column B      | Column C      |
| - | -------- | ------------- | ------------- |
| 0 | String1  | func(String1) | func(String1) |
| 1 | String2  | func(String2) | func(String2) |

If there's a better approach I'm totally open.


Solution

  • Functional programming is not as fun as they say it is. Here's a procedural version, that applies a function to each value in the column, and extends the data frame. Note that the function can return a variable number of results.

    import pandas as pd
    # Make a three row, one column data frame
    df = pd.DataFrame(["foo","bar","fubar"],columns=["Column A"])
    # Apply to column 1 and create new rows (variable length is fine)
    def fun(s):
        return list(c for c in s)
    # Make a new data frame by applying function
    df2 = pd.DataFrame(fun(s) for s in df["Column A"])
    # Name new columns (first column remains same)
    column_names = {i:f"Column {'ABCDEFGHIJK'[i+1]}" for i in range(len(df2.columns))}
    # And add new columns using new names
    df = pd.concat([df,df2 ],axis=1).rename(columns=column_names)
    df
    

    Resulting data frame: