Search code examples
google-bigquerybigtablegoogle-cloud-bigtable

How to get a clean BigQuery schema when using a BigTable source?


BigTable: a simple table with a string key and one column family, publication_number, also a string.

This table can be mounted as a dataset in BigQuery by following Google's instructions.

When querying this table I'd expect to see a simple two-column output: rowkey and publication_number, both STRING. Instead I see this:

BigQuery Schema

BigQuery adds a .column, a .column.name, .cell, .cell.timestamp, and finally the value is in .column.cell.value.

Is there a way to simplify this to only have:

rowkey              STRING
publication_number  STRING

And nothing else? I can change both how the table is created and how the export is done.


Solution

  • One option would be to create a logical view over the table that exposes the schema that you want:

    CREATE VIEW dataset.publication_view AS
    SELECT
      rowkey,
      cell.value AS publication_number
    FROM dataset.publication_table,
    UNNEST(publication_number.column)
    

    The following does the same as above, but it allows adding additional columns to the view:

    CREATE VIEW dataset.publication_view AS
    SELECT
        rowkey,
        ARRAY_TO_STRING(ARRAY(SELECT cell.value FROM UNNEST(publication_number.column)), "") AS publication_number
    FROM `dataset.publication_table`