Search code examples
sqlgoogle-bigquerygoogle-patent-search

Querying certain values inside a RECORD type


I'm trying to extract certain values from google patents: https://console.cloud.google.com/marketplace/product/google_patents_public_datasets/google-patents-public-data?project=pivotal-life-sciences

and the datatype of the cpc (The Cooperative Patent Classification codes) classification inside table patents-public-data.patents.publications is RECORD

If I only wanted to pull records where at least 1 of the codes is like "C01%" how would I do that?

what I'm trying right now is:

    SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
    FROM patents-public-data.patents.publications
    WHERE filing_date >= 20200101 and cpc.code like 'C01%'
    LIMIT 2

but this returns the error:

google.api_core.exceptions.BadRequest: 400 Cannot access field code on a value with type ARRAY<STRUCT<code STRING, inventive BOOL, first BOOL, ...>> at [4:43]

I understand that this means that I'm having bad syntax on my attempt to sort the cpc statement, but I don't know what the right records are.

if it helps, here is the result from the query:

    SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
    FROM patents-public-data.patents.publications
    WHERE filing_date >= 20200101 
    LIMIT 2

output:

Row((['MEIJER, JAN', 'STAACKE, Robert', 'BURCHARD, BERND', 'MEIJER, Nils'], ['Jan Meijer', 'Staacke Robert', 'Bernd Burchard'], [{'code': 'C01B32/26', 'inventive': True, 'first': False, 'tree': []}, {'code': 'G01R33/032', 'inventive': True, 'first': True, 'tree': []}, {'code': 'G01N24/006', 'inventive': False, 'first': False, 'tree': []}, {'code': 'G01R33/26', 'inventive': False, 'first': False, 'tree': []}, {'code': 'C01B32/28', 'inventive': True, 'first': False, 'tree': []}], [{'text': 'Nv-zentrum basierender mikrowellenfreier quantensensor und dessen anwendungen und ausprägungen', 'language': 'de', 'truncated': False}, {'text': 'Nv-centre-based microwave-free quantum sensor and uses and characteristics thereof', 'language': 'en', 'truncated': False}, {'text': 'Capteur quantique sans micro-ondes fondé sur un centre nv et applications et formes dudit capteur', 'language': 'fr', 'truncated': False}], [{'text': 'Die Erfindung betrifft ein Sensorsystem auf Basis von Diamanten mit einer hohen Dichte an NV-Zentren. Die Beschreibung umfasst a) Methoden zur Herstellung der notwendigen Diamanten hoher NV-Zentrendichte, b) Merkmale solcher Diamanten, c) Sensorelemente für die Nutzung der Fluoreszenzstrahlung der solcher Diamanten, d) Sensorelemente für die Nutzung des Fotostromes solcher Diamanten, e) Systeme zur Auswertung dieser Größen, f) Systeme mit verringertem Rauschen zur Auswertung dieser Systeme, g) Gehäuse zur Verwendung solcher Systeme in automatischen Bestückungsanlagen, g) Verfahren zum Test diese Systeme und h) ein Musikinstrument als Beispiel einer letztendlichen Anwendung all dieser Vorrichtungen und Verfahren.', 'language': 'de', 'truncated': False}, {'text': 'Die Erfindung betrifft ein Sensorsystem auf Basis von Diamanten mit einer hohen Dichte an NV-Zentren. Die Beschreibung umfasst a) Methoden zur Herstellung der notwendigen Diamanten hoher NV-Zentrendichte, b) Merkmale solcher Diamanten, c) Sensorelemente für die Nutzung der Fluoreszenzstrahlung der solcher Diamanten, d) Sensorelemente für die Nutzung des Fotostromes solcher Diamanten, e) Systeme zur Auswertung dieser Größen, f) Systeme mit verringertem Rauschen zur Auswertung dieser Systeme, g) Gehäuse zur Verwendung solcher Systeme in automatischen Bestückungsanlagen, g) Verfahren zum Test diese Systeme und h) ein Musikinstrument als Beispiel einer letztendlichen Anwendung all dieser Vorrichtungen und Verfahren.', 'language': 'de', 'truncated': False}, {'text': 'The invention relates to a sensor system on the basis of diamonds with a high density of NV centres. The description comprises a) methods for producing the necessary diamonds with high NV centre density, b) features of such diamonds, c) sensor elements for the use of the fluorescence radiation of such diamonds, d) sensor elements for the use of the photocurrent of such diamonds, e) systems for evaluating these variables, f) systems with reduced noise for evaluating these systems, g) housing for the use of such systems in automatic placement systems, g) method for testing these systems, and h) a musical instrument as an example of a final use of all of these devices and methods.', 'language': 'en', 'truncated': False}, {'text': 'The invention relates to a sensor system on the basis of diamonds with a high density of NV centres. The description comprises a) methods for producing the necessary diamonds with high NV centre density, b) features of such diamonds, c) sensor elements for the use of the fluorescence radiation of such diamonds, d) sensor elements for the use of the photocurrent of such diamonds, e) systems for evaluating these variables, f) systems with reduced noise for evaluating these systems, g) housing for the use of such systems in automatic placement systems, g) method for testing these systems, and h) a musical instrument as an example of a final use of all of these devices and methods.', 'language': 'en', 'truncated': False}, {'text': 'L&#39;invention concerne un système capteur à base de diamants dont la densité en centres NV est élevée. La description comprend a)\u202fdes méthodes de production des diamants nécessaires dont la densité des centres NV est élevée, b)\u202fles caractéristiques de tels diamants, c)\u202fdes éléments capteurs pour l&#39;exploitation du rayonnement fluorescent de tels diamants, d)\u202fdes éléments capteurs pour l&#39;exploitation du photocourant de tels diamants, e)\u202fdes systèmes d&#39;évaluation de ces grandeurs, f)\u202fdes systèmes à bruit réduit pour l&#39;évaluation de ces systèmes, g)\u202fun carter permettant d&#39;utiliser de tels systèmes dans des installations de montage automatique, g) un procédé pour tester ces systèmes, et h) un instrument de musique comme exemple d&#39;une application finale de tous ces dispositifs et du procédé.', 'language': 'fr', 'truncated': False}, {'text': 'L&#39;invention concerne un système capteur à base de diamants dont la densité en centres NV est élevée. La description comprend a)\u202fdes méthodes de production des diamants nécessaires dont la densité des centres NV est élevée, b)\u202fles caractéristiques de tels diamants, c)\u202fdes éléments capteurs pour l&#39;exploitation du rayonnement fluorescent de tels diamants, d)\u202fdes éléments capteurs pour l&#39;exploitation du photocourant de tels diamants, e)\u202fdes systèmes d&#39;évaluation de ces grandeurs, f)\u202fdes systèmes à bruit réduit pour l&#39;évaluation de ces systèmes, g)\u202fun carter permettant d&#39;utiliser de tels systèmes dans des installations de montage automatique, g) un procédé pour tester ces systèmes, et h) un instrument de musique comme exemple d&#39;une application finale de tous ces dispositifs et du procédé.', 'language': 'fr', 'truncated': False}], [], [], 20200722), {'inventor': 0, 'assignee': 1, 'cpc': 2, 'title_localized': 3, 'abstract_localized': 4, 'claims_localized': 5, 'description_localized': 6, 'filing_date': 7})

so TL;DR how do I sort through RECORD objects in sql with the LIKE operator?


Solution

  • OpenAI's GPT-3 gave me the answer. Damn language models taking over StackOverflow jobs...

    Looks like its:

        SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
        FROM `patents-public-data.patents.publications`
        WHERE filing_date >= 20200101 and
        (SELECT count(*)
            FROM UNNEST(cpc) c
            WHERE c.code LIKE 'C01%'
            ) > 0    
        LIMIT 2
    

    If anyone is curious I just copied that stack qustion into the playground and this was the raw output:

    A:
    
    I think you want to unnest the cpc column:
    <code>SELECT inventor, assignee, cpc, title_localized, abstract_localized, claims_localized, description_localized, filing_date
    FROM patents-public-data.patents.publications
    WHERE filing_date &gt;= 20200101 and
          (SELECT count(*)
           FROM UNNEST(cpc) c
           WHERE c.code LIKE 'C01%'
          ) &gt; 0
    LIMIT 2;
    </code>