Search code examples
google-cloud-platformgoogle-cloud-spanner

Is it a good idea to have an index on a boolean?


I have a table with a boolean field, IsNew, that indicates whether or not the corresponding entity is new. I want to periodically query for all entities in a particular state. What are the implications of having index on boolean (or enum)? Will it create a hotspot? Any limitations on QPS?


Solution

  • A secondary index is implemented internally as a table that has a primary key based on the declared secondary index key, plus whatever indexed table keys weren't mentioned in the secondary index explicitly. So, say you have a table like this:

    CREATE TABLE UserThings (
      UserId INT64 NOT NULL,
      ThingId INT64 NOT NULL,
      ...
      IsNew BOOL NOT NULL,
      ...
    ) PRIMARY KEY(UserId, ThingId), ...
    

    And you create an index like this:

    CREATE INDEX UserThingsByIsNew ON UserThings(IsNew, ThingId)
    

    That'll create an internal table that looks something like this:

    CREATE TABLE UserThingsByStatus_Index (
      IsNew BOOL,
      ThingId INT64 NOT NULL,
      UserId INT64 NOT NULL,
    ) PRIMARY KEY(new, ThingId, UserId), ...
    

    So, when you update rows of UserThings to change the value of the IsNew column, it will delete the old row in UserThingsByIsNew_Index, and insert an additional row. This will tend to create a lot of churn in the index if the IsNew value of rows is changing at a high frequency. This might not be a problem at all, but you will only really know by testing your scenario under a real-world workload for a sustained time.

    If you don't update the IsNew field of entities too frequently, then you probably won't have any hot-spotting problems. That's why I mentioned earlier that Cloud Spanner also appends the original table keys to the keys of the index: assuming that your original table rows are well-distributed by the table's keys, then the portion of the index for IsNew=true and IsNew=false, respectively, will have a similar distribution, and shouldn't cause a hotspot.