google-cloud-platform google-cloud-spanner

Using INTERLEAVE tables for 1-to-1 relationships

I'm designing the schema for a set of document editors (spreadsheet editor, text document editor, powerpoint editor, etc.). The editors will share a database, although they may use separate databases someday. Each editor shares a lot of common information for each doc, but then -- depending on the kind of document -- there is editor-specific information also.

My question comes from trying to design the parts of the schema that will be different for each editor. Assume that there will be a Docs table, that holds common information about the documents in general (say, ID). On top of this, I want to associate information specific to a particular editor that has a 1:1 relationship with the Doc record. My proposed schema is:

CREATE TABLE Docs (
  DocId STRING(MAX) NOT NULL,
  CreationTime TIMESTAMP NOT NULL,
  ....
) PRIMARY KEY (DocId);

CREATE TABLE SpreadsheetStuff (
  DocId STRING(MAX) NOT NULL,
  ... spreadsheet-specific information here ...
) PRIMARY KEY (DocId),
  INTERLEAVE IN PARENT Docs
  ON DELETE CASCADE;

CREATE TABLE TextDocumentStuff (
  DocId STRING(MAX) NOT NULL,
  ... text-document-specific information here ...
) PRIMARY KEY (DocId),
  INTERLEAVE IN PARENT Docs
  ON DELETE CASCADE;

My reasoning for having the separate table is to isolate the common parts from any editor-specific stuff.

I wonder if this this is unnecessary, as editors can alter the Docs table as necessary for their own needs, even though this structure works technically. In other words, I could just have a ton of extra columns in the Docs table with editor-specific information. One concern is that my proposed structure may have performance or other implications that aren't obvious.

Is this a reasonable structure for a 1:1 relationship? Is there definitive guidance one way or the other about best practices?

Solution

Cloud Spanner can handle either option efficiently, assuming you don't risk getting close to the column limit. If you plan to do a lot of SQL queries, they may be more complicated with the two-table approach, since you'll officially need to join them (although the joins should generally be efficient since the data is interleaved). Despite the extra SQL complication of the JOIN, this is probably the cleaner approach. YMMV.