I have a number of SQL databases, each with plenty of tables. These predominantly have randomly generated Guids as row primary keys. I would like a way to locate the table and row data associated with a given Guid (with no type information).
So I was considering setting up a Bloom filter for each table and cache it somewhere. I would first test the Guid against the filter then subsequently query the database or cache for the data (or for a false positive). However I have never worked with them before and so do not know what their performance characteristics are like, and if they would be an effective solution to my problem.
Database characteristics:
~100 tables to choose between. Some are far more likely to be chosen than others.
No rows are ever deleted (unless a manual cleanup after a bug) so I am not concerned about not being able to remove information from a bloom filter.
More structure than data! Everything fits on one server.
Is this a solution worth investigating? Am I better of sticking to caching more conventional lookup structures? If I go for Bloom, are there any shortcuts to the hashing functions given that Guids are a very independent input?
Search your platform's help for "INFORMATION_SCHEMA" or "system tables". As far as I know, every SQL dbms has at least one way for you to query metadata. The "standard" way is to use INFORMATION_SCHEMA views, but their content varies among dbms vendors.
In PostgreSQL's information schema, this query will show you all the table and column names that have foreign key constraints, and also their target table and column names.
select kc2.table_name as fk_table_name, kc2.column_name as fk_column_name,
kc1.table_name as ref_table_name, kc1.column_name as ref_column_name
from INFORMATION_SCHEMA.referential_constraints rc
inner join INFORMATION_SCHEMA.key_column_usage kc1
on rc.constraint_catalog = kc1.constraint_catalog
and rc.constraint_schema = kc1.constraint_schema
and rc.unique_constraint_name = kc1.constraint_name
inner join INFORMATION_SCHEMA.key_column_usage kc2
on rc.constraint_catalog = kc1.constraint_catalog
and rc.constraint_schema = kc1.constraint_schema
and rc.constraint_name = kc2.constraint_name
order by kc2.table_name, kc2.column_name