If I create a heap table and drop a bunch of rows into it, each row has a physical location in the data file, and a pointer to that row, but there's no structure to it, no way to look anything up beyond an exact pointer to a given row.
If I then create a nonclustered index sorting those rows by some criterion, then that index contains, for each row, the key it was sorting on, any columns I explicitly included, and the row pointer, so that if you look up a record in a way compatible with the index but require a column not in the index, it knows exactly where to go to get that data, correct?
Assuming this is correct, what I'd like to understand is this: If I then create a clustered index on the table, then the heap gets reorganized so that the records are sorted by the clustered index's key. That means the physical position of potentially every single record is going to change. How is that tracked by the row pointers in the pre-existing nonclustered index?
If it makes a difference, I am using SQL Server as the underlying engine.
In such case all nonclustered indexes will be rebuild, so all pointers will be replaced.
To be precise, if you have clustered index on the table (don't have to be unique), SQL Server will store clustered key value (plus optional unquier) instead of row pointer.
From Microsoft documentation: Create Clustered Indexes:
If a clustered index is created on a heap with several existing nonclustered indexes, all the nonclustered indexes must be rebuilt so that they contain the clustering key value instead of the row identifier (RID). Similarly, if a clustered index is dropped on a table that has several nonclustered indexes, the nonclustered indexes are all rebuilt as part of the DROP operation. This may take significant time on large tables.