Search code examples
pythonpandasdataframedictionaryseries

Create column containing the dict of two pandas df columns containing lists


I have a dataframe looking like this:

df
        a                      b
0   [1, 2]    ['first', 'second']
1       []                     []
2      [5]                    [1]
3       []                     []
4    ['a']                  ['b']
5       []                     []

I would like to create a column (c) which should have the a dictionary containing the zip of values on columns (a) and (b).

If the values of the columns (a) and (b) would not be lists, I could use df.c = dict(zip(df.a, df.b)). However, since they are lists, it gives me an error. I can transform them into a tuple via list(zip(df.a, df.b)), but sadly a dictionary is needed.

Eventually, the output I am looking for is the following:

df
        a                      b                           c
0   [1, 2]    ['first', 'second']    {1: 'first', 2:'second'}
1       []                     []                          {}
2      [5]                    [1]                       {5:1}
3       []                     []                          {}
4    ['a']                  ['b']                   {'a':'b'}
5       []                     []                          {}

Any ideas without looping over the rows of dataframe 1by1?

Well both answers give the same output. Thank you for the answers. However after benchmarking, I accepted the fastest one.

%timeit [dict(zip(ai, bi)) for ai, bi in zip(df['parameter_ids'], df['parameter_values'])]
7.76 ms ± 77 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df[['parameter_ids', 'parameter_values']].apply(lambda row: dict(zip(*row)), axis=1)
140 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Solution

  • Use:

    import pandas as pd
    
    # setup
    data = [[[1, 2], ['first', 'second']],
            [[], []],
            [[5], [1]],
            [[], []],
            [['a'], ['b']],
            [[], []]]
    df = pd.DataFrame(data=data, columns=["a", "b"])
    
    df["c"] = [dict(zip(ai, bi)) for ai, bi in zip(df.a, df.b)]
    print(df)
    

    Output

            a                b                          c
    0  [1, 2]  [first, second]  {1: 'first', 2: 'second'}
    1      []               []                         {}
    2     [5]              [1]                     {5: 1}
    3      []               []                         {}
    4     [a]              [b]                 {'a': 'b'}
    5      []               []                         {}