Search code examples
pythonpandasdataframevectorization

How to vectorize rowwise join


I've been given a dataframe df and a function foo(int x) that returns a Series. I'd like to join df with the result of foo() for each row using a vectorized operation.

For example, given the following dataframe, I call foo() on the values of column col_1

col_1 col_2 col_3
1 1 'a'
12 2 'b'
13 3 'd'
4 4 'c'

If we assume

  • foo(1) = Series('col4': 0, 'col5': 2)
  • foo(12) = Series('col4': 1, 'col5': 3)
  • foo(13) = Series('col4': 1, 'col5': 4)
  • foo(4) = Series('col4': 0, 'col5': 5)

then the output should be

col_1 col_2 col_3 col4 col5
1 1 'a' 0 2
12 2 'b' 1 3
13 3 'd' 1 4
4 4 'c' 0 5

Solution

  • EDIT: Looks like .from_records will handle the map cleanly. You could try that with the pd.concat instead:

    In [118]: pd.DataFrame.from_records(df['col_1'].map(foo))
    Out[118]:
       col4  col5
    0     0     2
    1     1     3
    2     1     4
    3     0     5
    

    Normally, I'd use .map() for something like this since it's generally faster than .apply(), but the output comes out a little funky, so unless you've got a giant dataframe, I'd just use the straightforward .apply() option with pd.concat:

    In [18]: def foo(n):
        ...:     return {1: pd.Series({'col4': 0, 'col5': 2}), 12: pd.Series({'col4': 1, 'col5': 3}), 13: pd.Series({'col4': 1, 'col5': 4}), 4: pd.Series
        ...: ({'col4': 0, 'col5': 5})}[n]
        ...:
    
    In [19]: df
    Out[19]:
       col_1  col_2 col_3
    0      1      1   'a'
    1     12      2   'b'
    2     13      3   'd'
    3      4      4   'c'
    
    In [20]: pd.concat([df, df['col_1'].apply(foo)], axis=1)
    Out[20]:
       col_1  col_2 col_3  col4  col5
    0      1      1   'a'     0     2
    1     12      2   'b'     1     3
    2     13      3   'd'     1     4
    3      4      4   'c'     0     5
    

    Another option you might try is to have the function return a dictionary rather than a Series