I am currently doing some project using google cloud bigquery service. When I querying the whole table, the order arn't as what I saw on preview. Since the data are words in sentences separated, therefore the order are important.
Row | extract_id | text | ner_underthesea | self_label |
---|---|---|---|---|
1 | 0 | Hi | o | other |
2 | 0 | , | o | other |
3 | 0 | John | per | n |
above are the table previewed
SELECT extract_id, text, self_label
FROM `project_id.dataset.table`
When I run above query
Row | extract_id | text | ner_underthesea | self_label |
---|---|---|---|---|
1 | 21943 | hey | o | other |
2 | 21456 | . | o | other |
3 | 12234 | Bob | per | n |
this is the query result, just ignore the value inside the table, the thing is the column order is random...
I've also try orderby extract_id, but the order inside are not retain.
So is there any method to completely copy, or query the exact same row order from bigquery table?
GBQ is a columnar database. Data is saved in the backend in the most optimized way for storage & lookups of column data and row order is not exactly followed there. (GBQ backend stores enough info to point to correct column data for a given row.)
So what order you see in 'preview' is not necessarily what you may get when you do a "select ..." query.
Even if you do something like "create table ... as select * from tablex order by some_column", even if you order-by while saving the new table, when you retrieve it back row order is not guaranteed.
IF you need some kind of guaranteed order of rows, then it's better if you create your data with some kind of row_number or ranking function based on your data requirement, that way you can always request select in that order.
Or, if you have a deterministic way to create "order by" in a select statement from a combination of known columns then you can use it that way.
Here is the ref: https://cloud.google.com/bigquery/docs/reference/standard-sql/numbering_functions From above check ROW_NUMBER and RANK functions. Those may help you.