I'm trying to extract data between these two delimiters [
,]
for two cases: Beginning with #CB
or CR:
Input data:
Scenario,Order_ID,Review
1,1234,<Order No. 10><Remark>Food was good</Remark><UserID>7890</UserID><Filter>[#CB:Customer happy with service]</Filter><Rating>5</Rating>
2,1235,<Rating>Five</Rating><Order No. 17><UserID>7880</UserID><Filter>[#CB:Customer had a good time]</Filter><Remark>Food was good</Remark><Additional>Service up to par</Additional><Remark>[#CB:Customer will return again]</Remark><End>Thats all</End>
3,1236,<Start>We shall begin</Start><Intro>[CR:Fine Dining Experience]</Intro><Rating>Three</Rating><Order No. 12><Filter>[#CB:Food was well prepared]</Filter><Remark>Food was good</Remark><Additional>Ambiance was lively</Additional><Remark>[CR:Hygine was maintained]</Remark><End>Could be better</End><UserID>7880</UserID>
Data wanted:
Scenario,Order_ID,Review1,Review2,Review3,Review4
1,1234,#CB:Customer happy with service,,,
2,1235,#CB:Customer had a good time,#CB:Customer will return again,,
3,1236,CR:Fine Dining Experience,#CB:Food was well prepared,CR:Hygine was maintained,
Building on the other answer, this is how you could get it in the format you asked for.
import re
import pandas as pd
p = re.compile(r"\[(.*?)\]")
s1 = "<Order No. 10>Food was good7890[#CB:Customer happy with service]Five"
s2 = " Five<Order No. 17>7880[#CB:Customer had a good time]Food was goodService up to par[#CB:Customer will return again]Thats all"
print(p.findall(s1))
print(p.findall(s2))
d = pd.DataFrame({'scenario':[1,2], 'order_id':['1234', '1235'], 'reviews':[s1,s2] })
def padList(l, length=4):
l = l[:length]
l += [''] * (length - len(l))
return l
d[['review1','review2','review3','review4']] = d.apply(lambda row: padList(p.findall(row['reviews'])), axis=1, result_type='expand')
d