What happens when I split partitioned table by using ALTER PARTITION FUNCTION?

I'm trying to understand whole process of partitioning here.

CREATE PARTITION FUNCTION LargePF(DATETIME)
AS RANGE LEFT FOR VALUES('2024-06-01', '2024-07-01','2024-08-01', '2024-09-01', '2024-10-01')

I've made 6 partitions. These partitions have clustered index, which is same as the partitioning column. The last partitioned table covers the range of after '2024-10-01' and it's empty.

If I split the last partition and make range of (2024-10-01~2024-10-31) and (2024-11-11~), Do I need to rebuild clustered index of every partitions? Or newly made partition have clustered index by itself? Does the newly made clustered index aligns with existing partitions?

Solution

A couple of things.

Assuming that you've designed your table such that it can be split (which is to say that all secondary indexes are also built on the same partition scheme as the clustered index), then splitting the partition function is all that you need to do. Which is to say that in your example where you introduce a new partition boundary at 2024-11-01, both sides of that new boundary will have the correct indexing.

Which leads to my next thing - to save yourself a headache, make sure that you're doing these split operations such that both sides of the new boundary are empty. With perhaps a slight lack of rigor in terminology, doing that partition split will do a size-of-data operation on the partition that you're putting the new boundary in the middle of to create two new partitions. In your example, if there was already data in the (2024-10-01, ∞) partition, you'd be re-writing any of that data on the partition split. However, if you did the split operation while that partition was empty, you wouldn't incur that penalty. All of which is to say, plan on splitting your partitions well ahead of time to avoid the write penalty.

Lastly, I'd recommend a range right partition function given your use case. Using the set notation of a square bracket indicating that that boundary is included in the set and a parenthesis indicating that that boundary is not in the set, partition left functions create boundaries that look like (start, end]. Particularly for time-based partitioning regimes, you'll often want all of a specified time range in one partition. To use your example, it's convenient to have all of September 2024 in one partition. However, the way you have it now with a partition left function, midnight on September 1 is in the previous month's partition and midnight on October 1 is in this month's partition. Probably not what you want!