Search code examples
pythonpandasstringmergeapply

want to apply merge function on column A


How can I apply merge function or any other method on column A. For example in layman term I want to convert this string "(A|B|C,D)|(A,B|C|D)|(B|C|D)" into a "(D A|D B|D C)|(A B|A C|A D)|(B|C|D)"

This (B|C|D) will remain same as it doesn't have comma value to merge in it. Basically I want to merge the values which are in commas to rest of its other values.

I have below data frame.

import pandas as pd

data = {'A': [ '(A|B|C,D)|(A,B|C|D)|(B|C|D)'],
        'B(Expected)': [ '(D A|D B|D C)|(A B|A C|A D)|(B|C|D)']
        }

df = pd.DataFrame(data)

print (df)

My expected result is mentioned in column B(Expected)

Below method I tried:- (1)

df['B(Expected)'] = df['A'].apply(lambda x: x.replace("|", " ").replace(",", "|") if "|" in x and "," in x else x)

(2)

# Split the string by the pipe character
df['string'] = df['string'].str.split('|')
df['string'] = df['string'].apply(lambda x: '|'.join([' '.join(i.split(' ')) for i in x]))

Solution

  • You can use a regex to extract the values in parentheses, then a custom function with itertools.product to reorganize the values:

    from itertools import product
    
    def split(s):
        return '|'.join([' '.join(x) for x in product(*[x.split('|') for x in s.split(',')])])
    
    df['B'] = df['A'].str.replace(r'([^()]+)', lambda m: split(m.group()), regex=True)
    
    print(df)
    

    Note that this requires non-nested parentheses.

    Output:

                                 A                                    B
    0  (A|B|C,D)|(A,B|C|D)|(B|C|D)  (A D|B D|C D)|(A B|A C|A D)|(B|C|D)