Excuse my ignorance about this... I'm taking a data analysis course and I stumbled upon this query in an exercise:
SELECT
CASE
WHEN MIN(REGEXP_CONTAINS(STRING(ActivityDate), DATE_REGEX)) = TRUE THEN
"Valid"
ELSE
"Not Valid"
END
AS valid_test
FROM
`tracker_data_clean.daily-activity-clean`;
ActivityDate
is a field that contains date
type data and DATE_REGEX
is a regular expression variable for a date format string.
What I don't know, is what does taking the MIN()
of this boolean expression REGEX_CONTAINS
do or mean.
I would appreciate if any of you could help me understand the concept of doing this.
Thanks !
The query selects rows from the table and applies the REGEXP_CONTAINS()
function to every (string-converted) value in the ActivityDate
column. REGEXP_CONTAINS()
will either return true
or false
based on whether the value matches the regex pattern in DATE_REGEX
.
How MIN()
behaves here can vary by implementation:
MIN()
is evaluating 0
's and 1
's. If all the values are 1
(true
), MIN()
will be 1
(true
), otherwise it will be 0
(false
).MIN()
returns true
if all of the values are true
, because the minimum value is true
(true
being "greater" than false
), otherwise it returns false
.The result, based on the implementation, is that MIN()
returns 0
/1
, or false
/true
. Either way, that result is compared to true
in the CASE
statement. If all values matched the regex, the comparison will be true
.
Basically, the query is "does every row have a valid date in the ActivityDate
column?" The result will be a table with a single column valid_test
and one row, containing "Valid" if they all match, "Not Valid" otherwise.
Another way to look at it that would be relatable to some programming languages is that MIN(bool_function())
is analogous to all()
, meaning return true
if all values are true
. Similarly, MAX(bool_function())
would be analogous to any()
, meaning return true
if any value is true
.