Search code examples
phpmysqlsqlforeign-keysmerging-data

SQL issue when merging two databases with foreign keys


I have a database at a workspace A and workspace B. Online there is a copy of this database which is always updated from both workspaces. Also both workspaces can need to update their databases whenever the other workspace makes any changes.

Everything is working well but my problem is this: for example there are 2 tables Stock and Orders, in Orders there is a column which is the stock_id.

If workspace A creates a new "Stock X" with an automatically incremented "stock_id"=23 and workspace B creates a new "Stock Y" with an automatically incremented "stock_id" = 23, workspace B will add workspace A's "Stock X" and workspace B will add workspace B's "Stock Y" but each will have a different id in each database.

The problem occurs when workspace A makes an order on stock_id=23 which is "Stock X", when this query is sent to the central database, and then sent to workspace B, it will insert the order but stock_id = 23 will refer to "Stock Y".

I would really appreciate some help with this thanks :)


Solution

  • EDITED:

    Your original design used AUTO INCREMENT columns for the primary key. The problem with this is, that as you're seeing when you have data simultaneously going into separate databases that will eventually be merged, you end up creating duplicate keys.

    One alternate is to use a sequence in each distinct database. Unfortunately, sequences are not natively available in mysql (many other db's such as Oracle, db2 do have them -- they allow this type of distributed database inserts w/o collisions). AUTO INCREMENT columns don't allow the kind of replication you want.

    So, you're left with two options.

    1) Add an extra part of the primary key that sets the location_id (as per the first answer).

    2) Or generate your id's for your inserts by hand using your own sequence, rather than using AUTO INCREMENT columns.

    It's probably best to implement a sequence as a stored proc/function that commits itself when it gets the next value in the sequence -- this may end up with values not being used. That's fine -- it's better than the contention if you were to wait to commit a sequence number until the entire insert committed.

    The main thing is that when you're doing the first insert, you use the sequence # from a stored proc. When you effectively replicate the data into the 2nd db, you use the generated sequence # in the row from the originating db. And the sequence would be maintained in each separate db w/ different starting points to prevent collisions.

    For example, in each database, you would want two pieces:

    1) a table that holds the next available sequence number for each named sequence. (each table that gets a primary key drawn from a sequence gets an entry). 2) a function that accesses and updates that table w/ the next sequence number.

    A sample implementation would be:

    Sequence table:

    CREATE TABLE sequences (
      name varchar(30) NOT NULL,
      value int(10) unsigned,
      PRIMARY KEY (name)
    ) ENGINE=InnoDB
    

    Sequence Function(s):

    delimiter //
    create function get_next_value(p_name varchar(30)) returns int
      deterministic
      sql security invoker
    begin
      declare current_val integer;
        UPDATE sequences
        SET value = (@current_val:=value) + 1
        WHERE name = p_name;
    
      return @current_val;
    end //
    delimiter ;
    

    The main issue is that the stored function needs to be a single statement, so that it completes and therefore commits immediately (otherwise, you'll have a lock on it that will cause your transactions to stack up behind each other as orders come in. If you don't have very high throughput, this isn't as much of an issue.

    I didn't write this function -- I'm liberally copying it from here: http://www.bigdbahead.com/?p=185 where I would refer you to, for more details. (And if that user ever finds me here, I'll be happy to let him write an answer and give him the appropriate credit here also).

    Now, for each database, you initialize the value with a different number to avoid collisions. So, for the orders table, in location A, you'd initialize this with:

    insert into sequences ('orders', 1);
    

    and in location B, you'd initialize this with:

    insert into sequences ('orders', 1000000);
    

    And then in both databases, on an insert into orders, you'd do:

    insert into orders (order_id, . . .)
    select mysql.get_next_value('user_id'), . . . <hardcoded-values>
    

    --

    I haven't road-tested this solution -- take it as an outline of the what I was suggesting in my answer regarding sequences. You should follow-up on the blog entry link above, which provides some more details, specifically on how this work under transaction control, see the comments (where I've taken a form of the function from the comments, not the original function), and of course, test it under load.