Search code examples
postgresqldata-warehousemultiple-schema

DWH from multiple schemas


I have a project in postgresql to create a DWH from multiple schemas. The schemas have the same structure(tables, constrains). I am stuck on this problem : I have a table "employees" in all the schemas that i will use in DWH. The first entry in schema1 table employees is 1 John Dow, the first entry in schema2 table employees is 1 Mary Jane (the id column is auto increment). Is there a way to create a DWH with this structure ? Is there a way to partition the schemas data ? Thank you.


Solution

  • The easiest way will probably be to assign a number of the most significant bits of the key to become a source ID, thereby making your keys unique across all of your source databases. E.g. Let's assume you will never have more than 100 databases that you're warehousing, so a 7 bit database identifier will be sufficient.

    1. Ensure all tables in your database that have sequential integer primary keys, or foreign keys referencing those, use the same data type. This data type must be large enough to hold values for all rows on any table without touching the most significant 8 bits. So INTEGER (32 bit) keys would leave 24 bits remaining allowing up to 16777215 insertions, and BIGINT (64 bit) keys will leave 56 bits remaining, allowing up to 72057594037927935 insertions. (Beware that ALTER TABLE ... ALTER COLUMN SET DATA TYPE requires a full table rewrite and can be very slow for high volume tables.)
    2. Assign each DB a unique, 7 bit database identifier that will occupy the most significant 8 bits of each key. You only get 7 bits, not 8, because the top bit is the signing bit, and you can't touch that.
    3. For every integer primary key and foreign key referencing it in the database, update it, adding DB_ID << 24 (for 32 bit keys) or DB_ID << 56 (for 64 bit keys) to it. E.g. For a DB with ID 13 and 32 bit keys, you would add 218103808 to every key value.
    4. Optionally, add CHECK CONSTRAINTs to each table, ensuring their key is in the range [(DB_ID << 24) + 1, ((DB_ID+1) << 24) - 1].
    5. Restart each table's PK sequence using: ALTER SEQUENCE :seq_name MINVALUE :min_val MAXVALUE :max_val START WITH :start_val RESTART WITH :start_val, where min_val = (DB_ID << 24) + 1, max_val = ((DB_ID+1) << 24) - 1, and start_val = currval(SEQUENCE_NAME).