The code for input dataframe is
import pandas as pd
df = pd.DataFrame([{'Column1': '((CC ) + (A11/ABC/ZZ) + (!AAA))','Column2': 'XYZ + XXX/YYY'}])
Input Dataframe:-
+---------------------------------+---------------------------------+
| Column1 | Column2 +
+---------------------------------+---------------------------------+
| ((CC ) + (A11/ABC/ZZ) + (!AAA)) | XYZ + XXX/YYY |
+---------------------------------+---------------------------------+
Input list:-
list = [AAA,BBB,CCC]
Conditions:-
'+' should remain as such (similar to AND condition)
'/' means split the data into multiple cells (similar to OR condition)
'!' means replace with other elements in the corresponding list (similar to NOT condition)
Because of the ! sign, the row becomes
+------------------------------------+---------------------------------+
| Column1 | Column2 +
+------------------------------------+---------------------------------+
| ((CC ) + (A11/ABC/ZZ) + (BBB/CCC)) | XYZ + XXX/YYY |
+------------------------------------+---------------------------------+
Please help me to split the singe row into multiple rows as shown below using Pandas
+---------------------------------+---------------------------------+
| Column1 | Column2 +
+---------------------------------+---------------------------------+
| CC + A11 + BBB | XYZ + XXX |
+---------------------------------+---------------------------------+
| CC + ABC + BBB | XYZ + XXX |
+---------------------------------+---------------------------------+
| CC + ZZ + BBB | XYZ + XXX |
+---------------------------------+---------------------------------+
| CC + A11 + CCC | XYZ + XXX |
+---------------------------------+---------------------------------+
| CC + ABC + CCC | XYZ + XXX |
+---------------------------------+---------------------------------+
| CC + ZZ + CCC | XYZ + XXX |
+---------------------------------+---------------------------------+
| CC + A11 + BBB | XYZ + YYY |
+---------------------------------+---------------------------------+
| CC + ABC + BBB | XYZ + YYY |
+---------------------------------+---------------------------------+
| CC + ZZ + BBB | XYZ + YYY |
+---------------------------------+---------------------------------+
| CC + A11 + CCC | XYZ + YYY |
+---------------------------------+---------------------------------+
| CC + ABC + CCC | XYZ + YYY |
+---------------------------------+---------------------------------+
| CC + ZZ + CCC | XYZ + YYY |
+---------------------------------+---------------------------------+
See whether this meets your requirements. The comments explain how it works.
#!/usr/bin/env python
import pandas as pd # tested with pd.__version__ 0.19.2
df = pd.DataFrame([{'Column1': '((CC ) + (A11/ABC/ZZ) + (!AAA))',
'Column2': 'XYZ + XXX/YYY'}]) # your input dataframe
list = ['AAA', 'BBB', 'CCC'] # your input list
to_replace = dict()
for item in list: # prepare the dictionary for the '!' replacements
to_replace["!"+item+'\\b'] = '/'.join([i for i in list if i != item])
df = df.replace(to_replace, regex=True) # do all the '!' replacements
import re
def expanded(s): # expand series s to multiple string list around '/'
l = s.str.replace('[()]', '').tolist()
while True: # in each loop cycle, handle one A/B/C... expression
xl = [] # expanded list for this cycle
for s in l: # for each string in the list so far
m = re.search(r'\w+(/\w+)+', s) # look for a A/B/C... expression
if m: # if there is, add the individual expansions to the list
xl.extend([m.string[:m.start()]+i+m.string[m.end():]
for i in m.group().split('/')])
else: # if not, we're done
return l
l = xl # expanded list for this cycle is now the current list
def expand(c): # expands the column named c to multiple rows
new = expanded(df[c]) # get the new contents
xdf = pd.concat(len(new)/len(df[c])*[df]) # create required rows
xdf[c] = sorted(new) # set the new contents
return xdf # return new dataframe
df = expand('Column1')
df = expand('Column2')
print df
Output:
Column1 Column2 0 CC + A11 + BBB XYZ + XXX 0 CC + A11 + CCC XYZ + XXX 0 CC + ABC + BBB XYZ + XXX 0 CC + ABC + CCC XYZ + XXX 0 CC + ZZ + BBB XYZ + XXX 0 CC + ZZ + CCC XYZ + XXX 0 CC + A11 + BBB XYZ + YYY 0 CC + A11 + CCC XYZ + YYY 0 CC + ABC + BBB XYZ + YYY 0 CC + ABC + CCC XYZ + YYY 0 CC + ZZ + BBB XYZ + YYY 0 CC + ZZ + CCC XYZ + YYY