Search code examples
sqldatabasesql-server-2008data-structuresbloom-filter

Fast way to find the row associated with a given Guid across many SQL databases and tables


I have a number of SQL databases, each with plenty of tables. These predominantly have randomly generated Guids as row primary keys. I would like a way to locate the table and row data associated with a given Guid (with no type information).

So I was considering setting up a Bloom filter for each table and cache it somewhere. I would first test the Guid against the filter then subsequently query the database or cache for the data (or for a false positive). However I have never worked with them before and so do not know what their performance characteristics are like, and if they would be an effective solution to my problem.

Database characteristics:

  • ~100 tables to choose between. Some are far more likely to be chosen than others.

  • No rows are ever deleted (unless a manual cleanup after a bug) so I am not concerned about not being able to remove information from a bloom filter.

  • More structure than data! Everything fits on one server.

Is this a solution worth investigating? Am I better of sticking to caching more conventional lookup structures? If I go for Bloom, are there any shortcuts to the hashing functions given that Guids are a very independent input?


Solution

  • Search your platform's help for "INFORMATION_SCHEMA" or "system tables". As far as I know, every SQL dbms has at least one way for you to query metadata. The "standard" way is to use INFORMATION_SCHEMA views, but their content varies among dbms vendors.

    In PostgreSQL's information schema, this query will show you all the table and column names that have foreign key constraints, and also their target table and column names.

    select kc2.table_name as fk_table_name, kc2.column_name as fk_column_name, 
           kc1.table_name as ref_table_name, kc1.column_name as ref_column_name
    from INFORMATION_SCHEMA.referential_constraints rc
    inner join INFORMATION_SCHEMA.key_column_usage kc1 
            on rc.constraint_catalog = kc1.constraint_catalog
           and rc.constraint_schema = kc1.constraint_schema
           and rc.unique_constraint_name = kc1.constraint_name
    inner join INFORMATION_SCHEMA.key_column_usage kc2 
            on rc.constraint_catalog = kc1.constraint_catalog
           and rc.constraint_schema = kc1.constraint_schema
           and rc.constraint_name = kc2.constraint_name
    order by kc2.table_name, kc2.column_name
    

    SQL Server Information Schema Views