Search code examples
google-bigqueryconsistencyeventual-consistency

BigQuery eventual consistency guarantees using streaming insert API


I would like to use the BigQuery streaming insert API to insert rows into a BigQuery table. In my use case, this is the only class of "write" operation that will ever be performed; I will never insert rows by any other means, and I will never update or delete rows.

Having read Life of a BigQuery streaming insert, my understanding is that even after an insertAll request has been acked, it is possible that subsequent read operations may not "see" all the rows that were successfully inserted by it until they have been transferred from the streaming buffer into managed storage.

Assuming my understanding is correct (please let me know if not!):

  1. Is there any way to be able to tell when all the rows inserted by a given insertAll request have now been committed to managed storage?
  2. Even after a row has been committed to managed storage, are subsequent read operations guaranteed to either see them, or fail?

Why am I asking these questions? My table schema will include a "key" column, which is a strictly monotonically increasing identifier. All read operations will be constrained to rows whose keys do not exceed some specified value. I need a guarantee that a read operation of this form will always return the same results, assuming the key specified corresponds to a row that has already been inserted. Clearly, no rows inserted subsequently could possibly be returned by such a query (because their keys would be greater than the key specified). However, if there are any rows that have already been inserted but are still in the streaming buffer, the query might not return them, but the exact same query performed later (after the rows have been committed) would return them. In my use case, this would be a disaster.


Solution

  • There are two states.

    In streaming buffer and committed. You can read the rows which are in streaming buffer, this syntax varies between a non partitiones and partitioned tables, you can find examples how to get these rows.

    If the insertId was used by more then one row only one is persisted.

    The committed state guarantees you can read.