Search code examples
sqlgoogle-cloud-platformgoogle-bigquery

Querying the same row order as preview in google cloud bigquery


I am currently doing some project using google cloud bigquery service. When I querying the whole table, the order arn't as what I saw on preview. Since the data are words in sentences separated, therefore the order are important.

Row extract_id text ner_underthesea self_label
1 0 Hi o other
2 0 , o other
3 0 John per n

above are the table previewed

SELECT extract_id, text, self_label
FROM `project_id.dataset.table`

When I run above query

Row extract_id text ner_underthesea self_label
1 21943 hey o other
2 21456 . o other
3 12234 Bob per n

this is the query result, just ignore the value inside the table, the thing is the column order is random...

I've also try orderby extract_id, but the order inside are not retain.

So is there any method to completely copy, or query the exact same row order from bigquery table?

*total row of table is 10000000

*the "Row" column shown above is default in query, not in schema


Solution

  • GBQ is a columnar database. Data is saved in the backend in the most optimized way for storage & lookups of column data and row order is not exactly followed there. (GBQ backend stores enough info to point to correct column data for a given row.)

    So what order you see in 'preview' is not necessarily what you may get when you do a "select ..." query.

    Even if you do something like "create table ... as select * from tablex order by some_column", even if you order-by while saving the new table, when you retrieve it back row order is not guaranteed.

    IF you need some kind of guaranteed order of rows, then it's better if you create your data with some kind of row_number or ranking function based on your data requirement, that way you can always request select in that order.

    Or, if you have a deterministic way to create "order by" in a select statement from a combination of known columns then you can use it that way.

    Here is the ref: https://cloud.google.com/bigquery/docs/reference/standard-sql/numbering_functions From above check ROW_NUMBER and RANK functions. Those may help you.