Search code examples
sqldatabase-performanceclustered-index

impact of writing data out of order relative to clustered index


I have a database table with a clustered index, basically on a timestamp. If I insert a large amount of data out of order relative to that clustered index (ie. I write data with a timestamp from any point in time), what sort of performance impact will occur? Will it require the data to do a lot of reordering of the data on disk, or rewrite indexes, etc?


Solution

  • You didn't reveal the DB vendor, hardware configuration, the schema, the data volume, nor any observed timing figures.

    what sort of performance impact will occur?

    Zero impact. (Or go ahead, change my mind, publish timing figures.)

    Will it require the data to do a lot of reordering of the data on disk, or rewrite indexes

    No.

    B-tree node splits are a thing.

    Different technologies describe this in different ways. For example, Oracle refers to the fill factor of an index. Typically more than half of each DB block will be full of user data. The fraction of each block that is cleared to zeros will tend to be smallest when ordered rows were used to populate an empty relation. So the size of an index can vary by 2x based on the initial INSERT pattern. Sometimes query speed is dominated by total time to read all index blocks.

    You didn't show us EXPLAIN or ANALYZE details from an example query, which prevents us from predicting what happens in your use case.


    It's easy to rebuild an index, or indeed an entire relation, after all rows have become available. Just tacking an ORDER BY onto the end of a giant INSERT query can help. Measure performance before and after such a refresh to identify whether there's a user-visible difference in the elapsed time.