Search code examples
pythonpandasdataframenlp

How to combine string from one column to another column at same index in pandas DataFrame?


I was doing a project in nlp. My input is:

index  name  lst 
0      a     c    
0            d    
0            e    
1            f    
1      b     g   

I need output like this:

index  name  lst combine  
0      a     c    a c 
0            d    a d  
0            e    a e  
1            f    b f  
1      b     g    b g 

How can I achieve this?


Solution

  • You can use groupby+transform('max') to replace the empty cells with the letter per group as the letters have precedence over space. The rest is a simple string concatenation per column:

    df['combine'] = df.groupby('index')['name'].transform('max') + ' ' + df['lst']
    

    Used input:

    df = pd.DataFrame({'index': [0,0,0,1,1],
                       'name': ['a','','','','b'],
                       'lst': list('cdefg'),
                      })
    

    NB. I considered "index" to be a column here, if this is the index you should use df.index in the groupby

    Output:

       index name lst combine
    0      0    a   c     a c
    1      0        d     a d
    2      0        e     a e
    3      1        f     b f
    4      1    b   g     b g