Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-bigtablebigtable

Querying BigTable data with BigQuery: how to make a correct table definition file


I try to set up external tables for BigQuery, the source is BigTable. I can create a table but while executing a sql statment i get this error:

Error while reading data, error message: Error detected while parsing row starting at position: 6. Error: Data between close double quote (") and field separator.

Most likely the table definition file is wrong. I spend the last day tinkering, but i don't find the problem. Can somebody point out the mistake i'm making?

thanks

my workflow:

  • create definition file
  • use command: bq mk --external_table_definition=gs://realtimecrypto-ostabprj/def.json test.test1
  • in bigquery: SELECT * FROM ostabprj.test.test1 LIMIT 1000

my table definition file:

I also tried different variations of it.

{
"sourceFormat": "BIGTABLE",
"sourceUris": [
    "https://googleapis.com/bigtable/projects/ostabprj/instances/cryptorealtime/tables/cryptorealtime"
],
"bigtableOptions": {
    "columnFamilies" : [
        {
            "familyId": "market",
            "type": "STRING",
            "encoding": "TEXT"
        }
    ]
}

}

The data in Bigtable looks like this:

(from this tutorial: https://www.cloudskillsboost.google/focuses/5570?locale=en&parent=catalog)

----------------------------------------
XRP/USD#bitfinex#1641740466459#3848397969954
  market:delta                             @ 2022/01/09-15:01:06.459000
    "459"
  market:exchangeTime                      @ 2022/01/09-15:01:06.459000
    "1641740466000"
  market:market                            @ 2022/01/09-15:01:06.459000
    "bitfinex"
  market:orderType                         @ 2022/01/09-15:01:06.459000
    "BID"
  market:price                             @ 2022/01/09-15:01:06.459000
    "0.74557"
  market:volume                            @ 2022/01/09-15:01:06.459000
    "50"

the guides i was using:

https://cloud.google.com/bigquery/external-data-bigtable#permanent-tables

https://cloud.google.com/bigquery/external-table-definition#tabledef-bigtable

(links have been changed)


Solution

  • a colleague has meanwhile found a way to solve the problem. He also found a youtube video which covers most of my questions.

    https://www.youtube.com/watch?v=tW4h6-cQz9s

    I made 2 fundamental mistakes:

    • BigTable and BigQuery weren't on the same server (US and EU)
    • The table definition file has to be uploaded with the console and not into a bucket

    his solution for the table definition file looked like this:

    • 'readRowKeyAsString' should be true
            "sourceFormat": "BIGTABLE",
            "sourceUris": [
                "https://googleapis.com/bigtable/projects/..."
            ],
            "bigtableOptions": {
                "readRowkeyAsString": "true",
                "columnFamilies": [
                    {
                        "familyId": "market",
                        "columns":[
                            {
                                "qualifierString": "market",
                                "type":"STRING"
                            },
                            {
                                "qualifierString": "exchangeTime",
                                "type":"STRING"
                            },
                            {
                                "qualifierString": "delta",
                                "type":"STRING"
                            },
                            {
                                "qualifierString": "orderType",
                                "type":"STRING"
                            },
                            {
                                "qualifierString": "volume",
                                "type":"STRING"
                            },
                            {
                                "qualifierString": "price",
                                "type":"STRING"
                            }
                        ]
                    } 
                ]
            }
        }
    

    To use the whole thing in BigQuery a Dataview is necessary

    CREATE VIEW Table.Dataview AS
    SELECT
        rowkey,
        market.delta as ts,
        ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.delta.cell)), "") AS delta,
        ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.market.cell)), "") AS market,
        ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.exchangeTime.cell)), "") AS exchangeTime,
        ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.orderType.cell)), "") AS orderType,
        ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.price.cell)), "") AS price,
        ARRAY_TO_STRING(ARRAY(SELECT value FROM UNNEST(market.volume.cell)), "") AS volume
    FROM Table.Data