Search code examples
google-cloud-platformgoogle-datagoogle-datastream

Google Datastream (beta) issues with removing backfilling table from a stream


I am currently testing Google Datastream to stream data from Cloud SQL into GCS and then onto Big Query. All is well however a 200m row table is currently backfilling with data and I wanted to stop this as the table is not linger in use.

Here is what I have tried so far:

  1. Removing the table from the stream. This has worked so far for all tables however this is the first time i've tried it whilst the table is backfilling.

  2. Adding the table to the No-Backfill option inside the stream.

  3. Pausing the stream, draining and then restarting the stream.

None of these seem to work, anybody come across this issue before when backfilling a table?

Many thanks, Mark.


Solution

  • Just thought I'd update this ticket with the solution from Google Support.

    Once a table has started a backfill process then currently you cannot stop this process until the backfill has finished.

    @Prabir thanks for your reply - i think the 100m row limit is also only without a numeric primary key - "Tables that have more than 100 million rows and that don't have a numeric primary key can't be backfilled."

    I've asked Google Support to add removing a table during backfill to a later release as it's still only in alpha testing and features can be added.

    Let's see how this goes in future releases...