I have a table with a boolean field, IsNew
, that indicates whether or not the corresponding entity is new. I want to periodically query for all entities in a particular state. What are the implications of having index on boolean (or enum)? Will it create a hotspot? Any limitations on QPS?
A secondary index is implemented internally as a table that has a primary key based on the declared secondary index key, plus whatever indexed table keys weren't mentioned in the secondary index explicitly. So, say you have a table like this:
CREATE TABLE UserThings (
UserId INT64 NOT NULL,
ThingId INT64 NOT NULL,
...
IsNew BOOL NOT NULL,
...
) PRIMARY KEY(UserId, ThingId), ...
And you create an index like this:
CREATE INDEX UserThingsByIsNew ON UserThings(IsNew, ThingId)
That'll create an internal table that looks something like this:
CREATE TABLE UserThingsByStatus_Index (
IsNew BOOL,
ThingId INT64 NOT NULL,
UserId INT64 NOT NULL,
) PRIMARY KEY(new, ThingId, UserId), ...
So, when you update rows of UserThings to change the value of the IsNew column, it will delete the old row in UserThingsByIsNew_Index, and insert an additional row. This will tend to create a lot of churn in the index if the IsNew value of rows is changing at a high frequency. This might not be a problem at all, but you will only really know by testing your scenario under a real-world workload for a sustained time.
If you don't update the IsNew
field of entities too frequently, then you probably won't have any hot-spotting problems. That's why I mentioned earlier that Cloud Spanner also appends the original table keys to the keys of the index: assuming that your original table rows are well-distributed by the table's keys, then the portion of the index for IsNew=true and IsNew=false, respectively, will have a similar distribution, and shouldn't cause a hotspot.