In Python, I am comparing dataframes containing strings to decide if it should pass or fail. How can I stop data from passing when it should fail?

I have over 20 test cases that check a CSV for data anomalies due to data entry. This test case (#15) compares the salutation and addressee to marital status.

# Test case  15
# Compares MrtlStat to  PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
df = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15 = df[df['PrimSalText'].str.contains("AND|&", na=False)]

# Adds row to list of failed data
ids = data_15.index.tolist()

# Keep track of data that failed test case 15 
for i in ids:
  data.at[i,'Test Case Failed']+=', 15'

If MrtlStat contains Widow, Divorced, or Single while PrimAddText or PrimSalTexts contains AND or &, it should fail the test. This test works only if BOTH PrimSalTexts and PrimAddText contain AND or &.

Table showing data that passes but should fail:

PrimAddText	PrimSalText	MrtlStat
Mrs. Judith Elfrank	Mr. & Mrs. Elfrank & Michael	Widowed
Mr. & Mrs.Karl Magnusen	Mr. Magnusen	Widowed

Table showing data that fails as expected:

PrimAddText	PrimSalText	MrtlStat
Mr. & Mrs. Elfrank	Mr. & Mrs. Elfrank & Michael	Widowed

How can I adjust the test to work if only one of the columns (PrimSalTexts or PrimAddText) contains AND or &?

Solution

You have an AND condition b/w the second and third condition, you can separate these out and capturing the result from each condition. finally combine the two lists together

# Test case  15
# Compares MrtlStat to  PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
data_15_A = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15_B = df[df['PrimSalText'].str.contains("AND|&", na=False)]

# Adds row to list of failed data
ids = data_15_A.index.tolist() + data_15_B.index.tolist()

# Keep track of data that failed test case 15 
for i in ids:
  data.at[i,'Test Case Failed']+=', 15'