I have a dataframe with strings in one column. I would like to add the words 'section 22' to a string when it contains the word 'personal information'; at the same time, I would like the section 22 add-on to not happen if the string contains one of the following: s. 26, s. 29, s. 33, s. 22,or s. 32. Here is my dataframe:
df = pd.DataFrame({
'Order': ['Order90-098','OrderF14-47', 'OrderF13-43', 'Order56-090', 'Order90-098', 'Order78-897'],
'Ruling': ['foo','personal information', 's. 26 personal information', 'personal information s. 33', 'personal information s. 67', 'personal s. 32 information']})
Hoped for result:
df = pd.DataFrame({
'Order': ['Order90-098','OrderF14-47', 'OrderF13-43', 'Order56-090', 'Order90-098', 'Order78-897'],
'Ruling': ['foo','personal information section 22', 's. 26 personal information', 'personal information s. 33', 'personal information s. 67', 'personal s. 32 information']})
What I've figured out: I can add section 22 to a string if the string contains 'personal information', and I can also abort the operation if it contains the number 26.
df['Ruling'] = df['Ruling'].apply(lambda x: re.sub(r'^(?!.*26).*(personal information.*$)',r"\1 section 22", x, flags=re.I))
When I try to expand on the above solution by adding multiple negative lookaheads, I get a catastrophic backtracking error:
df['Ruling'] = df['Ruling'].apply(lambda x: re.sub(r'^(?!.*29).*(?!.*32).*(?!.*33).*(?!.*22).*(.*(?!.*26).*personal information.*$)',r"\1 section 22", x, flags=re.I))
When I try to use the disjunctive, 'personal information' matches even in a string with the number present:
df['Ruling'] = df['Ruling'].apply(lambda x: re.sub(r'^(?!29|32|33|22|26.*)(.*personal information.*$)',r"\1 section 22", x, flags=re.I))
I've thought about using any
but don't know how it would work with re.sub.
Thanks in advance for your help.
You could use:
df['Ruling'] = (df['Ruling']
.mask((~df['Ruling'].str.contains(r"s. [22|26|29|32|33]", regex = True)) &
(df['Ruling'].str.contains('personal information')), df['Ruling']+' section 22'))
which gives
Order Ruling
0 Order90-098 foo
1 OrderF14-47 personal information section 22
2 OrderF13-43 s. 26 personal information
3 Order56-090 personal information s. 33
4 Order90-098 personal information s. 67
5 Order78-897 personal s. 32 information