I am using pydeequ to run some checks on data, however it is not behaving as expected. One of my columns should contain any values between 0 and 1. The data looks like this
|col 1 |
| 0.5635412 |
| 0.123 |
| 1.0 |
check = Check(spark, CheckLevel.Warning, "DQ Check")
result = VerificationSuite(spark)\
.onData(df)\
.addCheck(check
.satisfies("col1 BETWEEN 0 AND 1", "range check", lambda x: x==1))\
.run()
result_df = VerificationResult.checkResultsAsDataFrame(spark, result)
THe result is returning a failure with the message
Value: 0.5635412 does not meet the constraint requirement!
Can anyone advise on where I have gone wrong?
I realised there were a couple of null values in the data I hadn't expected.
Updated code to
check = Check(spark, CheckLevel.Warning, "DQ Check")
result = VerificationSuite(spark)\
.onData(df)\
.addCheck(check
.satisfies("col1 BETWEEN 0 AND 1 OR col1 IS NULL", "range check", lambda x: x==1))\
.run()
result_df = VerificationResult.checkResultsAsDataFrame(spark, result)