For the project I'm working on, we need the customer data in a database nearby the customer. For this reason we have adopted the new Elastic Scale solution from Microsoft. This takes away the complexity of sharding and still gives you the power to scale worldwide.
At the moment I'm facing a rather important problem. I need to migrate data from 1 shard to another. There is an example application (Merge/Split) which does something, but it works with Ranges (1..100, 101..400, etc.). The database I'm working on works with Guids, so we can't use the example code.
I've created a Move/Merge management tool myself, but facing an issue here.
At first I wanted to insert all the objects and dependencies with the ORM. Due to some circular keys I'm not able to do this easily. Therefore I'm creating a SQL script now.
The SQL script is about 130MB and only contains INSERT
commands.
All of this has to be done within 1 transaction, because you don't want the migration to be done half. If there's an error, everything should be rolled back.
Running this 130MB script gives me some errors. My local development machine and SQL Azure I'm running out of memory. SQL Azure:
There is insufficient memory available in the buffer pool
and locally:
There is insufficient system memory in resource pool 'default' to run this query.
I've tried disabling the indices, so this won't get rebuild on every INSERT
. This didn't fix anything.
Any suggestions on how to proceed? I can't really split the script, because all of the data has to be INSERT
ed at once. A SSIS package isn't an option either I think.
Creating my own database transaction system appears to be a lot of overkill and error prone.
Besides the INSERT
script, I also need to execute a DELETE
script on the 'old' shard/database, so I guess the solution has to work for this script also. I would love to do the INSERT
and DELETE
scripts in 1 transaction, but that's not (yet) possible on SQL Azure (distributed transactions).
The version of Split/Merge in the current preview for Azure SQL DB Elastic Scale has the known limitation that it only works with range shard maps. I am assuming that you are currently using a list shard map for your guids. While we are currently working on making support for list shard maps and Split/Merge available in an update to the Elastic Scale preview, there is a work-around that I would encourage you to try out. This workaround may turn out to be simpler than writing your own infrastructure for data movement between shards and save you a ton of work (hopefully).
Here is what I'd suggest:
Let me know how this works. If you run into trouble with this - in particular with incrementing of the guids - give me a shout at torsteng(at)microsoft(dot)com.
Best, Torsten
Here is a piece of code that might help you with the incrementing of the guid values.
static void CreateMappings()
{
ShardKey guid1 = new ShardKey(new Guid("<yourgui1d>"));
ShardKey guid2 = new ShardKey(new Guid("<yourguid2>"));
ShardKey guid1_next = NextShardKeyForGuid(guid1);
ShardKey guid2_next = NextShardKeyForGuid(guid2);
_map.CreateRangeMapping(new Range<Guid>(guid1.GetValue<Guid>(), guid1_next.GetValue<Guid>()), _shard1);
_map.CreateRangeMapping(new Range<Guid>(guid2.GetValue<Guid>(), guid2_next.GetValue<Guid>()), _shard2);
}
static ShardKey NextShardKeyForGuid(ShardKey shardkey)
{
int len = 16;
byte[] b = new byte[len];
shardkey.RawValue.CopyTo(b, 0);
while (--len >= 0 && ++b[len] == 0) ;
// Treat overflow if the current key's value is the maximum in the domain
if (len < 0)
{
return new ShardKey(ShardKeyType.Guid, null);
}
else
{
return ShardKey.FromRawValue(ShardKeyType.Guid, b);
}
}
}