Search code examples
pythonpandasstring-concatenation

How get string between expression with Pandas?


I would like to get only codes between # and concat it to a new column.

What I have

id code
0 (#M05Q01900R00100# = 1) AND (#M05Q01950R00200# = 0)
1 (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100# = 0) OR (#M05Q01950R00200# = 0))
2 (#M05Q01600R00100# = 1)
3 (#M05Q01125R00200# = 1)
4 (#M05Q01129R00100# = 1) AND (#M05Q01130R00300# = 0)
5 (#M05Q01130R00400# = 1)
6 (#M05Q01132R00100# = 1)
7 (#M05Q01132R00400# = 1)
8 (#M05Q01132R00400# = 1)
9 (#M05Q01140R00200# = 1)

What I would like to get

id code concat
0 (#M05Q01900R00100# = 1) AND (#M05Q01950R00200# = 0) M05Q01900R00100, M05Q01950R00200
1 (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100# = 0) OR (#M05Q01950R00200# = 0)) M05Q01900R00100, M05Q01950R00100, M05Q01950R00200
2 (#M05Q01600R00100# = 1) M05Q01600R00100
3 (#M05Q01125R00200# = 1) M05Q01125R00200
4 (#M05Q01129R00100# = 1) AND (#M05Q01130R00300# = 0) M05Q01129R00100, M05Q01130R00300
5 (#M05Q01130R00400# = 1) M05Q01130R00400
6 (#M05Q01132R00100# = 1) M05Q01132R00100
7 (#M05Q01132R00400# = 1) M05Q01132R00400
8 (#M05Q01132R00400# = 1) M05Q01132R00400
9 (#M05Q01140R00200# = 1) M05Q01140R00200

Solution

  • Use Series.str.findall with regex for values between # and then Series.str.join:

    df['concat'] = df['code'].str.findall(r'#(.*?)#').str.join(', ')
    print (df)
       id                                                code  \
    0    0  (#M05Q01900R00100# = 1) AND (#M05Q01950R00200#...   
    1    1  (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100...   
    2    2                            (#M05Q01600R00100# = 1)   
    3    3                            (#M05Q01125R00200# = 1)   
    4    4  (#M05Q01129R00100# = 1) AND (#M05Q01130R00300#...   
    5    5                            (#M05Q01130R00400# = 1)   
    6    6                            (#M05Q01132R00100# = 1)   
    7    7                            (#M05Q01132R00400# = 1)   
    8    8                            (#M05Q01132R00400# = 1)   
    9    9                            (#M05Q01140R00200# = 1)   
    
                                                  concat  
    0                   M05Q01900R00100, M05Q01950R00200  
    1  M05Q01900R00100, M05Q01950R00100, M05Q01950R00200  
    2                                    M05Q01600R00100  
    3                                    M05Q01125R00200  
    4                   M05Q01129R00100, M05Q01130R00300  
    5                                    M05Q01130R00400  
    6                                    M05Q01132R00100  
    7                                    M05Q01132R00400  
    8                                    M05Q01132R00400  
    9                                    M05Q01140R00200