Search code examples
ruby-on-railsactiverecordgoogle-bigquerysoft-delete

Soft deletes with append only database in Ruby on Rails


Using Ruby on Rails and an append only database (Google BigQuery), what is the best practice for soft deletes? The pattern I'm considering is to append a new row for each update/delete and only collect the most recent record. But I'm not sure of a clean way to do that with Active Record. Any other suggested patterns / best practices?


Solution

  • Big Query is for analytics against massive datasets.
    If this is your case - you can ignore slowness that will potentially be introduced by adding new update/delete rows and “keeping” historical rows.

    In BigQuery it is quite simple to get most recent version of your row using window function.

    For example, assuming "id" is the primary key defining record/row and "ts" is timestamp

    SELECT <fields list> FROM (
      SELECT <fields list>, ROW_NUMBER() OVER(PARTITION BY id ORDER BY ts DESC) AS num
      FROM YourTable
    )
    WHERE num = 1
    

    If you need to do historical analysis using historical data – here we go – it is easy to make selection that will represent respective version of row on that time.

    If you do not need historical versions you can periodically do cleaning – for this it is better to keep you data partitioned by day (or month or whatever lese dimantion better fits to be partitioned by in your case)
    BigQuery have excellent support for querying partitioned tables - Table wildcard functions

    If you want to stick with BigQuery – this will be a good approach
    I do recommend to explore it more