I'm trying to exclude values from a larger public table in Googles BigQuery using the following SQL lines. The last line has the purpose to exclude entries, that start with a certain letter, e.g. 'C'. For some reason, when I add the line, the count increases. Logically the selected rows should decrease and I can’t figure out why.
How can I make the exclusion work?
SELECT Count(*)
FROM `patents-public-data.patents.publications`,
unnest(description_localized) as description_unnest,
unnest(claims_localized) as claims_unnest,
unnest(cpc) as cpc_unnest
where description_unnest.language in ('en','EN')
and claims_unnest.language in ('en','EN')
and publication_date >19900101
and (SUBSTRING(cpc_unnest.code,1,1) <> 'C');
OK. I think I found the mistakes I made
#7 unnest(cpc) as cpc_unnest
SELECT Count(DISTINCT publication_number)
The whole solution is this
SELECT Count(DISTINCT publication_number)
FROM `patents-public-data.patents.publications`,
unnest(description_localized) as description_unnest,
unnest(claims_localized) as claims_unnest,
unnest(cpc) as cpc_unnest
where description_unnest.language in ('en','EN')
and claims_unnest.language in ('en','EN')
and publication_date >19900101
and (SUBSTRING(cpc_unnest.code,1,1) <> 'C')