I have two postgres databases, same tables have to be synchronized among those databases (not all of them, just a few of them).
This creates a problem with synchronization, if I have table A and table B (same structure, just in different databases), if someone inserts into table A, the pk id may be 5, but then, someone may insert on table B, getting a pk id of 5 too.
At the time of synchronizing both tables, I may found two registers with the same pk, but with different contents.
I am pretty sure there must be solutions for this, there must be a smart way to keep ids synchronized.
So far I was thinking on creating a simple webservice which will provide a new valid pk id, however this is problematic because
1-It will slow down the process 2-It merely moves the problem to the webservice, how will the webservice find which one is the next valid id?
Has anyone faced a similar problem in the past?
The environment is postgres 9.1
Yes, people have faced this a lot. The question you need to ask isn't "how do I do this" but "which of the existing well-understood and established methods best fits my needs".
Look into solutions to used to generate keys in sharded and distributed databases.
Options include:
Allocate non-overlapping ranges of IDs to each machine. One common approach is to set your sequences up to increment in blocks of (say) 100 and give each machine a unique offset, so machine A generates 101, 201, 301, 401, etc and machine B generates 102, 202, 302, 402, ... . Works great until you need to add machine #101. You can use a bigint key and leave space for say 10,000 machines; that way by the time you reach this limitation you will've redesigned the whole system three times anyway.
(Before you say "I'll run out of keys" ... max bigint is 9223372036854775807
so with a 10000 machine range you're looking at about 1015 or 250 keys per table per machine, which is about a 21 petabyte table at 25 bytes per record including overhead, ie really really tiny records).
Use a composite primary key like (machine_id, locally_generated_sequence_id)
. Ideal unless you're dealing with braindead ORMs that insist on single-valued primary keys.
Use huge random primary keys (UUIDs) with the uuid
data type, giving a 128-bit space and absolutely tiny probability of a collision. Works OK in most cases and despite the birthday paradox collisions are pretty unusual. The paranoid can have plans in place to deal with collisions.
Use a common service to generate big blocks of keys. Call it infrequently and store the available keys locally. This tends to hamper concurrent write performance on each node due to lock contention over the key table so I don't recommend it.
... probably more I haven't remembered or have never heard of.
The simplest option is generally to allocate non-overlapping IDs. It's easy, it's transparent and it just works.