Search code examples
sqlamazon-web-servicesamazon-redshiftamazon-redshift-spectrum

Lock table redshift


how can I do a real LOCK of a table when inserting in redshift, I think it's like that but I'm not sure and aws documentation as always zero input

begin;lock table sku_stocks;insert into sku_stocks select facility_alias_id, facility_name, CAST( item_name AS bigint), description, CAST( prod_type AS smallint ), total_available, total_allocated from tp_sku_stocks;

Solution

  • LOCK has a documented behavior: it obtains an exclusive lock on the table, so that no other session or transaction can do anything to the table.

    If you wish to verify this behavior, here's how you can do it:

    1. Open a connection to the database and invoke the begin and lock commands against your test table.
    2. Open a second connection to the database, and attempt to do a select against that table.
    3. Wait until either the select returns or your are convinced that LOCK behaves as documented.
    4. Execute a rollback in the first session, so that you're not permanently locking the table.

    Update

    Based on your comments, I think you have a misunderstanding of how transactions work:

    • When you start a transaction, Redshift assigns a transaction ID, and marks every row that was changed by that transaction.
    • SELECTs that read a table while it is being updated in a transaction will see a "snapshot of the data that has already been committed" (quote from above link), not the rows that are being updated inside the transaction.
    • INSERT/UPDATE/DELETE that try to update a table that is being updated by a transaction will block until the transaction completes (see doc, and note that this behavior is somewhat different than you would see from, say, MySQL).
    • When you commit/rollback a transaction, any new SELECTs will use the updated data. Any SELECTs that were started during the transaction will continue to use the old data.

    Given these rules, there's almost no reason for an explicit LOCK statement. Your example update, without the LOCK, will put a write-lock on the table being updated (so guaranteeing that no other query can update it at the same time), and will use a snapshot of the table that it's reading.

    If you do use a LOCK, you will block any query that tries to SELECT from the table during the update. You may think this is what you want, to ensure that your users only see the latest data, but consider this any SELECTs that were started prior to the update will still see the old data.

    The only reason that I would use a LOCK statement is if you need to maintain consistency between a group of tables (well, also if you're getting into deadlocks, but you don't indicate that):

    begin;
    lock TABLE1;
    lock TABLE2;
    lock TABLE3;
    copy TABLE1 from ...
    update TABLE2 select ... from TABLE1
    update TABLE3 select ... from TABLE2
    commit;
    

    In this case, you ensure that TABLE1, TABLE2, and TABLE3 will always remain consistent: queries against any of them will show the same information. Beware, however, that SELECTS that started before the lock will succeed, and show data prior to any of the updates. And SELECTs that start during the transaction won't actually execute until the transaction completes. Your users may not like that if it happens in the middle of their workday.