Search code examples
databasegoogle-cloud-platformbigtablegoogle-cloud-bigtable

Modeling Forward and Reverse Query Questions in Bigtable


Let's say that we have the following three entities:

Organization
 - id

Role
 - id

Member
 - id

A Role can be granted to a Member within an Organization, thus giving that Member certain access control rights to that Organization. I'd like to be able to answer the following two queries:

  1. List the IDs of all the members who have a given Role within a given Organization (e.g. given a Role ID and Org ID give me the list of Members).
  2. List all of the IDs of the Roles that a member has been granted within a given Organization (e.g. given a Member ID and Org ID give me the list of Roles).

I'm trying to find recommendations on how to model this in Bigtable (ideally with a single row for atomic mutations)... I'm also open to other technology recommendations (I'm trying to design within the constrains my company has given me).


If we model the relationship described above using the Bigtable row key org#{orgID}#role#{roleID}#member#{memberID}, I can easily answer the first question. However, it doesn't allow me to easily answer the second question. If I duplicate data and store another row key org#{orgID}#member#{memberID}#role#{roleID} then I can easily answer the second question, but now I have two rows to manage and atomic updates cannot be guaranteed between the two, so that may lead to consistency issues.

Has anyone in the community ran into a similar problem, and if so, how did you solve it?


Solution

  • Cloud Bigtable doesn't natively support secondary indexes, which is what you would need to only need a single row and be able to efficiently run both of those queries without requiring a full table scan. The alternative to that that you've already identified would be to write two rows via a process that would ensure eventual consistency. This might be sufficient for your needs depending on the underlying requirements of your system.

    Depending on your constraints (cloud provider, data scale, atomicity, multi-region replication, etc.), you might be better served with a standard relational database (e.g. Postgres, MySQL), or Google Cloud Spanner.

    Possible approaches with Spanner to accomplish this:

    • Have a single table that represents a a Member <-> Role relationship. Have RoleID being the primary index for the row, and then add a Secondary Index for MemberID and you'd be able to run queries against either.

    • Go the traditional relational database route of having Member, Role and MemberRole joining table. With Spanner you should have atomic updates via a Transaction. When querying you could potentially have issues with reads going across multiple splits, but you'd have to do some real world testing to see what your performance would be like.