Search code examples
google-bigqueryinternal

bigquery internal error case condtion


I'm getting an internal error when running a query that as peculiarities includes a 'CASE' clause and two different fields in the 'WHEN' clause. The query is as follows:

SELECT
  CASE WHEN site != 'a' OR geo LIKE 'NO%' THEN "test" END
FROM  elc.CpEvents_agderposten
WHERE _partitiontime BETWEEN TIMESTAMP('2014-09-24') AND TIMESTAMP('2014-09-24')

Similar queries are working fine:

SELECT
  CASE WHEN site != 'agderposten' OR geo LIKE 'NO%' THEN "test" END
FROM  elc.CpEvents_agderposten
WHERE _partitiontime BETWEEN TIMESTAMP('2014-09-24') AND TIMESTAMP('2014-09-24')

-

SELECT
  CASE WHEN geo LIKE 'NO%' THEN "test" END
FROM  elc.CpEvents_agderposten
WHERE _partitiontime BETWEEN TIMESTAMP('2014-09-24') AND TIMESTAMP('2014-09-24')

-

SELECT
 CASE WHEN site != 'a' AND geo LIKE 'NO%' THEN "test" END
FROM  elc.CpEvents_agderposten
WHERE _partitiontime BETWEEN TIMESTAMP('2014-09-24') AND TIMESTAMP('2014-09-24')

-

SELECT
  CASE WHEN site != 'a' THEN "test" WHEN geo LIKE 'NO%' THEN "test" END
FROM
  elc.CpEvents_agderposten
WHERE
  _partitiontime BETWEEN TIMESTAMP('2014-09-24')
  AND TIMESTAMP('2014-09-24')

I know that the last example can be used as a work around for the first query, but it cannot be applied always. For example the following query failed too:

SELECT
      CASE WHEN site == 'a' and geo >= 'NO' THEN "test" END
    FROM  elc.CpEvents_agderposten
    WHERE _partitiontime BETWEEN TIMESTAMP('2014-09-24') AND TIMESTAMP('2014-09-24')

The job id for the first query is: cpcd-1357:bquijob_24edf89_157ae353e1e

And for the last query: cpcd-1357:bquijob_1b26172b_157ae348938

Anyone know how to avoid this error?


Solution

  • It looks like you're hitting an edge case in legacy SQL related to using a repeated column inside a condition. If you enable standard SQL by unchecking "Use Legacy SQL" under "Show Options" in the UI, the following query should work:

    SELECT
      CASE WHEN site != 'a' OR geo LIKE 'NO%' THEN "test" END
    FROM  `elc.CpEvents_agderposten` t, t.geo geo
    WHERE _partitiontime BETWEEN TIMESTAMP('2014-09-24') AND TIMESTAMP('2014-09-24');
    

    This query "flattens" the repetition in order to compare against geo.

    Alternatively, you can use the FLATTEN operator with legacy SQL to work around the internal error:

    SELECT
      CASE WHEN site != 'a' OR geo LIKE 'NO%' THEN "test" END
    FROM FLATTEN([elc.CpEvents_agderposten], geo)
    WHERE _partitiontime BETWEEN TIMESTAMP('2014-09-24') AND TIMESTAMP('2014-09-24')
    

    You can read more about the differences between legacy and standard SQL in the migration guide.