Search code examples
sql-serverperformanceclustered-index

SQL Server Performance and clustered index values


I have a table myTable with a unique clustered index myId with fill factor 100% Its an integer, starting at zero (but its not an identity column for the table) I need to add a new type of row to the table. It might be nice if I could distinguish these rows by using negative values of myId.

Would having negative values incur extra page splitting and slow down inserts?

Extra Background: This table exists as part of the etl for a data warehouse that gathers data from disparate systems. I now want to accomodate a new type of data. A way for me to do this is to reserve negative ids for this new data, which will thus be automatically clustered. This will also avoid major key changes or extra columns in the schema.

Answer Summary: Fill factors of 100% will noramlly slow down the inserts. But not inserts that happen sequentially, and that includes the sequntial negative inserts.


Solution

  • Besides the practical administration points you already got and the suspect dubious use of negative ids to represent data model attributes, there is also a valid question here: give a table with int ids from 0 to N, inserting new negative values where would those value go and would they cause additional splits?

    The initial rows will be placed on the clustered index leaf pages, row with id 0 on first page and row with id N on the last page, filling the pages in between. When the first row with value of -1 is inserted, this will sort ahead of row with id 0 and as such will add a new page to the tree (will allocate an extent of 8 pages actually, but that is a different point) and will link the page in front of the leaf level linked list of pages. This will NOT cause a page split of the former first page. On further inserts of values -2, -3 etc they will go to the same new page and they will be inserted in the proper position (-2 ahead of -1, -3 ahead of -2 etc) until the page fills. Further inserts will add a new page ahead of this one, that will accommodate further new values. Inserts of positive values N+1, N+2 will go at the last page and be placed in it until it fills, then they'll cause a new page to be added and will start filling that page.

    So basically the answer is this: inserts at either end of a clustered index should not cause page splits. Page splits can be caused only by inserts between two existing keys. This actually extends to the non-leaf pages as well, an index at either end of the cluster may not split a non-leaf page either. I do not discuss here the impact of updates of course (they can can cause splits if the increase the length of a variable length column).

    Lately has been a lot of talk in the SQL Server blogosphere about the potential performance problems of page splits, but I must warn against going to unnecessary extremes to avoid them. Page splits are a normal index operation. If you find yourself in an environment where the page split performance hit is visible during inserts, then you'll be probably worse hit by the 'mitigation' measures because you'll create artificial page latch hot spots that are far worse as they'll affect every insert. What is true is that prolonged operation with frequent splits will result in high fragmentation which impacts the data access time. I say that is best mitigated with off-peak periodical index maintenance operation (reorganize). Avoid premature optimizations, always measure first.