indexing relational-database google-cloud-spanner unique-index

Enforce uniqueness with soft deletions

Let's say I have a "Dashboard" object that contains various "Pages". The "Pages" table looks like this:

Page
- ID (PK)
- DashboardID (FK)
- Name (VARCHAR)
- DeletedAt (TIMESTAMP)

I want to ensure that each dashboard has uniquely-named (non-deleted) pages. For example, a dashboard with the following pages is fine:

Pages: [(1, 1, "Sales", NULL), (1, 2, "Product", NULL)
Pages: [(1, 1, "Sales", NULL), (1, 2, "Sales", "2014-01-01T00:00:00")

But this is not:

Pages: [(1, 1, "Sales", NULL), (1, 2, "Sales", NULL)

What would be the proper way to create an index on this? Essentially I'd want something like:

CREATE UNIQUE INDEX ON Page(DashboardID, Name) # WHERE DeletedAt IS NOT NULL

What would be the proper way to do this in Spanner? I believe this is the concept of a "Partial Index" in Postgres: https://www.postgresql.org/docs/8.0/indexes-partial.html.

Perhaps this is the correct approach? https://cloud.google.com/spanner/docs/generated-column/how-to#create_a_partial_index_using_a_generated_column.

Solution

Yes, the approach that you are suggesting yourself is the right approach. So in your specific case that would mean using the following schema (Dashboard column left out for simplicity):

CREATE TABLE page (
  id INT64 NOT NULL,
  name STRING(MAX) NOT NULL,
  deleted_at TIMESTAMP,
  unique_name STRING(MAX) AS (if(deleted_at is null, name, null)) STORED,
) PRIMARY KEY(id);

CREATE UNIQUE NULL_FILTERED INDEX idx_page_unique_name ON page(unique_name);

The unique null-filtered index will only contain the entries where the column that is indexed is not null, and the uniqueness will therefore also only be applied to the actual values in that index.

Source: https://cloud.google.com/spanner/docs/generated-column/how-to#create_a_partial_index_using_a_generated_column.