SQL Server CE what is really happening with @threshold and identity ranges?

Folks, as this question relates to IDENTITY columns and merge replication, if I may ask you to refrain from answering with "use GUIDs instead". I'm acutely aware of the benefits and limitations of both and have been using SQL replication with CE since SQL Server 2000. Very occasionally I get surprised. This is such a case.

This is a complex description of the problem so please bear with me.

Below is an extract from here https://msdn.microsoft.com/en-us/library/ms152543.aspx and is what I've always understood with regard to identity ranges and thresholds.

"Subscribers running SQL Server Compact or previous versions of SQL Server are assigned only the primary range; assignment of new ranges is controlled by the @threshold parameter. Additionally, a republishing Subscriber has only the range specified in the @identity_range parameter; it must use this range for local changes and for changes at Subscribers that synchronize with the republishing Subscriber. For example, you could specify 10000 for @pub_identity_range, 500000 for @identity_range and 80 percent for @threshold. After 8000 inserts at a Subscriber (80 percent of 10000), the Publisher is assigned a new range. When a new range is assigned, there will be a gap in the identity range values in the table. Specifying a higher threshold results in smaller gaps, but the system is less fault-tolerant: if the Merge Agent cannot run for some reason, a Subscriber could more easily run out of identities."

If was assume this is true for the moment we'll get to the start of my problem.

To help users using our application we have been using a variation of the following query to let clients know they might run out of identities if they keep going and to initiate a sync to get a new range.

SELECT 
    AUTOINC_MAX, AUTOINC_NEXT, AUTOINC_MAX-AUTOINC_NEXT 
FROM 
    INFORMATION_SCHEMA.COLUMNS 
WHERE 
    TABLE_NAME = N'Asset'

AUTOINC_MAX | AUTOINC_NEXT | AUTOINC_MAX-AUTOINC_NEXT
------------+--------------+-------------------------
3081898     |   3080899    |        999

By evaluating AUTOINC_MAX - AUTOINC_NEXT (= 999) we can see when we're getting low on IDs

In code we are looking at AUTOINC_MAX - AUTOINC_MIN which gives the allocated range. Using the default threshold of 80% and the remaining range we can advise clients to sync if they look like running out.

However this is where what I've thought to be true fails in practice. Referring to the Microsoft details above this sentence stands out "When a new range is assigned, there will be a gap in the identity range values in the table."

I take this to mean the following

If our user has an identity range of 0-1000 IDs and uses IDs up to 801. On next sync the user will be allocated the next range of 1001-2000(we are assuming one subscriber for illustration). As a result of the sync the next ID used will be 1001 leaving a gap from 802-1000.

Firstly please let me know if my understanding is wrong.

Secondly though, this is not what we're seeing in practice.

In practice what we are seeing, based on the example above, post sync and subsequent inserts, is the balance of the IDs being used until we fully expend the original range. THEN upon expending the range AUTOINC-MIN, -MAX and -NEXT are all updated to the new range. No additional syncs have occurred.

Below is an example.

In the target table the last ID used is 3080899

To simulate usage the following query was used

INSERT INTO Asset (lInstID, lTypeID, sUsrName, lUsrID, dCreated, dAudit,  sStatus)
    SELECT 
        lInstID, lTypeID, sUsrName, lusrID, dCreated, dAudit, sStatus
    FROM 
        Asset 
    WHERE 
        lAssetID = 3080899

After insert the next ID value used is 3080900 (as expected e.g. AUTOINC_NEXT = 3080899, + 1 = 3080900)

We repeat this insert until we reach 80% of the allocated identity range.

SELECT 
    AUTOINC_MAX, AUTOINC_NEXT, AUTOINC_MAX-AUTOINC_NEXT 
FROM
    INFORMATION_SCHEMA.COLUMNS 
WHERE 
    TABLE_NAME = N'Asset'

AUTOINC_MAX | AUTOINC_NEXT | AUTOINC_MAX-AUTOINC_NEXT
------------+--------------+-------------------------
3081898     |   3081699    |           199

We sync. We note 800 Subscriber changes. We query and this is what we see. No change from pre-sync.

AUTOINC_MAX | AUTOINC_NEXT | AUTOINC_MAX-AUTOINC_NEXT
------------+--------------+-------------------------
3081898     |    3081699   |         199

We continue to insert until there are zero IDs remaining

AUTOINC_MAX | AUTOINC_NEXT | AUTOINC_MAX-AUTOINC_NEXT
------------+--------------+-------------------------
3081898     |   3081898    |            0

One more insert and the results are this

AUTOINC_MAX | AUTOINC_NEXT | AUTOINC_MAX-AUTOINC_NEXT
------------+--------------+-------------------------
3082898     |    3081899   |           999

This is completely unexpected and contrary to when a new range is assigned, there will be a gap in the identity range values in the table. In fact the IDENTITY RANGE is contiguous. This is somewhat desirable as we don't waste IDs.

I cannot find in the .SDF where the next allocated range is stored.

I'm presuming it's next_range_start and next_range_end from the sysmergearticles server table but no documentation can be found that exposes these values in the .SDF.

If someone knows what's going here I'd greatly appreciate it.

As a point to note if you fully expend this "next range" without a sync the database returns an error as expected.

A sync post error shows 1200 new records uploaded by the subscriber (200 from the previous range plus the 1000 from the "next range")

AUTOINC_MAX | AUTOINC_NEXT | AUTOINC_MAX-AUTOINC_NEXT
------------+--------------+-------------------------
3083898     |    3082898   |          999

Kind regards

Andrew

Solution

For those interested I received this answer from a chap at Microsoft.

Hi Andrew,

Based on testing, I believe this should be a document bug. The CE synchronization should use same method subscriber running SQL Server 2005 or a later version. That is,

subscriber will receive two identity ranges. The secondary range is equal in size to the primary range; when the primary range is exhausted, the secondary range is used, and the Merge Agent assigns a new range to the Subscriber. The new range becomes the secondary range, and the process continues as the Subscriber uses identity values.

So, let us say, you set subscriber identity range to 10. Then the publisher starts from 1 to 11 for primary range and 11 to 21 for the secondary range. And subscriber starts from 21 to 31 for the primary range and 31 to 41 for the secondary range. If there is no synchronization before you consume all subscriber range, you will hit the error, it means you could insert double number record of subscriber range you set to subscriber database before you hit the error message.

For publisher end, if you consume all range but not in a single batch, insert trigger will help you re-allocate a new range. For example, before synchronization, if publisher end reaches 21 and next insert will start from 42 since 22 to 41 belongs to subscriber.

The mechanism is the same for multiple subscribers, if you are still in primary range for subscriber, new range will not be allocated. If you are already in the secondary range, new range will be allocated and makes your secondary range to new primary and the new range to the new secondary range.

You could skip re-publisher scenario since CE subscriber could not act as publisher.

Best regards,

Peter https://social.msdn.microsoft.com/profile/sql%20team%20-%20msft/?ws=usercard-mini

This confirms precisely what we are seeing in our testing. So to be clear SQL CE 3.1+ does receive a primary and secondary range on first sync. After expending the first range the secondary becomes the primary and next sync a new secondary range is applied.

Just need the queries to identify what the secondary range is in SQL CE...