I have over 20 test cases that check a CSV for data anomalies due to data entry. This test case (#15) compares the salutation and addressee to marital status.
# Test case 15
# Compares MrtlStat to PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
df = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15 = df[df['PrimSalText'].str.contains("AND|&", na=False)]
# Adds row to list of failed data
ids = data_15.index.tolist()
# Keep track of data that failed test case 15
for i in ids:
data.at[i,'Test Case Failed']+=', 15'
If MrtlStat contains Widow, Divorced, or Single while PrimAddText or PrimSalTexts contains AND or &, it should fail the test. This test works only if BOTH PrimSalTexts and PrimAddText contain AND or &.
Table showing data that passes but should fail:
PrimAddText | PrimSalText | MrtlStat |
---|---|---|
Mrs. Judith Elfrank | Mr. & Mrs. Elfrank & Michael | Widowed |
Mr. & Mrs.Karl Magnusen | Mr. Magnusen | Widowed |
Table showing data that fails as expected:
PrimAddText | PrimSalText | MrtlStat |
---|---|---|
Mr. & Mrs. Elfrank | Mr. & Mrs. Elfrank & Michael | Widowed |
How can I adjust the test to work if only one of the columns (PrimSalTexts or PrimAddText) contains AND or &?
You have an AND condition b/w the second and third condition, you can separate these out and capturing the result from each condition. finally combine the two lists together
# Test case 15
# Compares MrtlStat to PrimAddText and PrimSalText
df = data[data['MrtlStat'].str.contains("Widow|Divorced|Single")]
data_15_A = df[df['PrimAddText'].str.contains("AND|&", na=False)]
data_15_B = df[df['PrimSalText'].str.contains("AND|&", na=False)]
# Adds row to list of failed data
ids = data_15_A.index.tolist() + data_15_B.index.tolist()
# Keep track of data that failed test case 15
for i in ids:
data.at[i,'Test Case Failed']+=', 15'