PostgreSQL & SQL Server btree storage fundamentals question

I know that SQL Server can store a row's data at leaf level in a clustered index. I believe that PostgreSQL doesn't do this. If so, what is its storage paradigm?

My main question is as follows. Consider the following design & data (shown in T-SQL):

CREATE TABLE dbo.Tree
    (
    [Key] int NOT NULL,
    ID int NOT NULL
    ) ON [PRIMARY]
GO
ALTER TABLE dbo.Tree ADD CONSTRAINT
    PK_Tree PRIMARY KEY CLUSTERED 
    (
    [Key],
    ID
    ) WITH (...) ON [PRIMARY]

INSERT INTO TREE ([Key], ID) VALUES (1, 1), (1, 2), (1, 3), (1, 4).

Since this is a btree with both columns as the PK, am I correct in saying that "[Key] = 1" would only be stored once, and "ID = [1, 2, 3, 4]" would be individual values in the btree, while there would be no leaf values per sé since there are no row columns that aren't part of the PK?

How would this work in PostgreSQL?

Solution

TL;DR version - your key values are always stored on disk, regardless of DBMS implementation.

PostgreSQL would store 4 rows in page on disk, one for each row that you've inserted. SQL Server will also store 4 rows on disk. The B-tree is the lookup structure, not the page level storage structure.

At the underlying disk level, PostgreSQL uses unordered disk structures to store data. This happens because PostgreSQL may be maintaining multiple copies of a row at any given time due to MVCC transaction semantics. Each row has an xmin and xmax detailing the creation and destruction transaction ID of the current row. The autovacuum process performs ghost record clean up operations. The indexes in PostgreSQL point back to the rows in the heap table structure. This set of slides details the process. In particular you'll want to look at slide 29 for how the b-tree lookup occurs and 48-52 for a theoretical discussion of how the data is stored on disk.

In SQL Server, you'll have records on a leaf page, but with only four rows the clustered index will have just 1 index level - the leaf level. You can verify this by running SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('dbo.Tree'), NULL, NULL, NULL). You can also verify the physical page level in SQL Server doing something like this:

-- Locate the first page of the index
DBCC IND('test', 'Tree', 1);
GO
-- tell SQL Server to show DBCC output in the message page, not the SQL Server log
DBCC TRACEON (3604);
GO
-- look at nasty, dirty, on page data.
DBCC PAGE(test, 1,155,3);

Once you look at the DBCC PAGE output, you'll be ready to hate me. Towards the end you should see four rows that look something like this:

Slot 0 Offset 0x60 Length 15

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 15

Memory Dump @0x000000006D6FA060

0000000000000000:   10000c00 01000000 01000000 020000††††...............  

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

Key = 1                              

Slot 0 Column 2 Offset 0x8 Length 4 Length (physical) 4

ID = 1                               

Slot 0 Offset 0x0 Length 0 Length (physical) 0

KeyHashValue = (e2338e2f4a9f)

This is the actual row data as SQL Server is storing it. You'll see multiple copies of Key = 1 throughout the output, followed by the ID information. Supporting information for these commands can be found here.

The reasoning behind the difference between PostgreSQL and SQL Server comes from PostgreSQL's MVCC implementation. Since we may have multiple copies of a row in PostgreSQL, it's more optimal to keep several copies of the data on disk instead of modifying the supporting index structures. Whenever possible, PostgreSQL does heap-only updates and only issues updates on the underlying table. SQL Server does the same thing and will only update the clustered index (or heap) when it can avoid updating the supporting indexes.