Search code examples
google-cloud-platformgoogle-bigquerystackdriver

Is it possible to query log data stored Cloud Storage without Cleaning it using BigQuery?


I have a huge amount of log data exported from StackDriver to Google Cloud Storage. I am trying to run queries using BigQuery.

However, while creating the table in BigQuery Dataset I am getting

Invalid field name "k8s-app". 
Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long. 
Table: bq_table

A huge amount of log data is exported from StackDriver sinks which contains a large number of unique column names. Some of these names aren't valid as per BigQuery tables.

What is the solution for this? Is there a way to query the log data without cleaning it? Using temporary tables or something else?

Note: I do not want to load(put) my data into BigQuery Storage, just to query data which is present in Google Cloud Storage.

* EDIT *

Please refer to this documentation for clear understanding


Solution

  • To expand on @khan's answer:

    • If the files are JSON, then you won't be able to use the first method (skip headers).

    • But you can load each JSON row raw to BigQuery - as if it was a CSV - and then parse in BigQuery

    Find a full example for loading rows raw at:

    And then you can use JSON_EXTRACT_SCALAR to parse JSON in BigQuery - and transform the existing field names into BigQuery compatible ones.