Search code examples
pythonpandaslistmultiple-columns

Merge columns with lists into one


I have this dataframe

df = pd.DataFrame({
                 'c1':['a','f,g,e','a,f,e,h','g,h','b,c,g,h',],
                 'c2':['1','1,1,0.5','1,2,2.5,1','3,1','2,-1,0.5,-1'],
                 'c3':['0.05','0.01,0.001,>0.5','>0.9,>0.9,0.01,0.002','>0.9,>0.9','0.05,0.1,<0.01,0.1'],
             })

yielding

c1        c2            c3
a         1             0.05
f,g,e     1,1,0.5       0.01,0.001,>0.5
a,f,e,h   1,2,2.5,1     >0.9,>0.9,0.01,0.002
g,h       3,1           >0.9,>0.9
b,c,g,h   2,-1,0.5,-1   0.05,0.1,<0.01,0.1

I would like to combine c1,c2 and c3 to create new column c4 (see desired result below)

c1       c2          c3                     c4
a        1           0.05                   a(1|0.05)
f,g,e    1,1,0.5     0.01,0.001,>0.5        f(1|0.01),g(1|0.001),e(0.5|>0.5)
a,f,e,h  1,2,2.5,1   >0.9,>0.9,0.01,0.002   a(1|>0.9),f(2|>0.9),e(2.5|0.01),h(1|0.02)
g,h      3,1         >0.9,>0.9              g(3|>0.9),h(1|>0.9)
b,c,g,h 2,-1,0.5,-1  0.05,0.1,<0.01,0.1     b(2|0.05),c(-1|0.1),g(0.5<0.01),h(-1|0.1)

I tried working on answers provided to this question, and this question, but it did not work.


Solution

  • You can use a list comprehension with zip, str.split and str.join:

    df['c4'] = [','.join([f'{a}({b}|{c})' for a,b,c in
                          zip(*(y.split(',') for y in x))])
                for x in zip(df['c1'], df['c2'], df['c3'])]
    

    NB. the same can be done with apply, but a list comprehension is generally more efficient.

    Output:

            c1           c2                    c3                                          c4
    0        a            1                  0.05                                   a(1|0.05)
    1    f,g,e      1,1,0.5       0.01,0.001,>0.5            f(1|0.01),g(1|0.001),e(0.5|>0.5)
    2  a,f,e,h    1,2,2.5,1  >0.9,>0.9,0.01,0.002  a(1|>0.9),f(2|>0.9),e(2.5|0.01),h(1|0.002)
    3      g,h          3,1             >0.9,>0.9                         g(3|>0.9),h(1|>0.9)
    4  b,c,g,h  2,-1,0.5,-1    0.05,0.1,<0.01,0.1  b(2|0.05),c(-1|0.1),g(0.5|<0.01),h(-1|0.1)