Search code examples
databasetime-seriesquestdb

QuestDB Dropping rows when inserting from a temp table


I want to delete some individual rows from QuestDB. I know I can delete only whole partitions, and the recommendation for individual rows is to either use logical deletion with an extra column, or to create a temporary table, insert into it from the original selecting only the rows I want to keep, dropping the original, and renaming the temp table. That sounds straight forward, so I did that. I am finding a weird issue when I do the last INSERT:

INSERT INTO {source_table} (
    col1,
    col2,
    ...
)
SELECT
source.col1,
source.col2,
    ...
FROM ...

It all works, no errors. But when I check the number of rows it is missing A LOT of rows. So I went and did a

SHOW PARTITIONS FROM {source_table}

I was surprised to see many partitions had the maxTimestamp ending well before the hour (I have hourly partitions and have some data pretty much constantly), and the numRows seems to be rounded to the million, when it should be an uneven number.

Any hints?


Solution

  • This one was fun to learn! I asked about this on slack, and a core engineer pointed out that INSERT as select copies rows in batches and it is an asynchronous process, so if we have a large table and we ran a first INSERT into the temp table, and shortly after an INSERT from the same temp table, the table might still be catching up, which explains the smaller number of rows and the rounded number of rows, as those are applied in batches.

    There is an ATOMIC keyword we can apply to INSERT INTO, but on a large table that might consume the whole memory. It is just easier to run

    wal_tables()
    

    And look for writerTxn and sequencerTxn for the temp table. If sequencerTxn is stable, it means we are not sending rows anymore, and then we just wait for writerTxn to catch up, and then the write will be complete.

    At this point it is safe to run my initial INSERT INTO, which completed successfully this time.