database database-design primary-key read-write

Primary Key Theory & I/O Efficiency

According to RSBMS theory, when choosing a primary key, we are supposed to choose amongst minimal superkeys, effectively optimizing our key choice w.r.t # of columns.

Why are we interested in optimizing against # of columns instead of number of bytes? Wouldn't a smaller byte size PK result in smaller index tables and overall more read/write time efficient queries? For example, choosing a PK comprised of 2 varchar(16) rather than 1 varchar(64).

Solution

I think I agree with you.

I don't think theory accounts for physical storage.

Yes, if for instance, you created a column which was a SHA256 of two small columns, say VARCHAR(16), then yes the nodes of the B-tree in the index would take up more space, and the index would not be faster than indexing the two 16 byte columns.

There is some efficiency lost building an index which matches on the first column, and has to switch to comparisons on the second column. The b-nodes are more efficient if the whole b-node is comparing on the same column.

Honestly though, I don't think either amounts to much difference in efficiency. I think the statement is RDBMS theory not accounting for storage size.