Search code examples
sqlpostgresqltransactional-replicationchange-data-capturewal

Are transactions logged to WAL if there is no change in PostgreSQL?


I'm trying to find out whether changes are reflected on the WAL (write ahead log) files if there is no change to a row. To test it, I create a replication slot in PostgreSQL to capture changes. Here are the steps I've taken.

ALTER SYSTEM SET wal_level TO logical;
$ pg_ctl restart
SELECT pg_create_logical_replication_slot('slotname', 'test_decoding');
CREATE TABLE foo(col1 INTEGER, col2 INTEGER);
ALTER TABLE foo REPLICA IDENTITY FULL;
INSERT INTO foo VALUES(1,2);

Then I execute SELECT * FROM pg_logical_slot_get_changes('slotname', NULL, NULL); in psql (previous changes omitted)

The output is:

    lsn    | xid |                           data                            
-----------+-----+-----------------------------------------------------------
 0/165B208 | 488 | BEGIN 488
 0/165B208 | 488 | table public.foo: INSERT: col1[integer]:1 col2[integer]:2
 0/165B278 | 488 | COMMIT 488
(3 rows)

Then I execute UPDATE foo SET col2=2 WHERE col1=1;. Then the output of select * from pg_logical_slot_get_changes('slotname', null, null); is:

    lsn    | xid |                                                     data                                                      
-----------+-----+---------------------------------------------------------------------------------------------------------------
 0/165B2B0 | 489 | BEGIN 489
 0/165B2B0 | 489 | table public.foo: UPDATE: old-key: col1[integer]:1 col2[integer]:2 new-tuple: col1[integer]:1 col2[integer]:2
 0/165B338 | 489 | COMMIT 489
(3 rows)

It looks like the UPDATE statement has updated the WAL files even though it has no effect on the table. But where I'm confused is, if we look at the PostgreSQL docs for version 12 which is the version I'm using, it says in the "REPLICA IDENTITY" section that,

REPLICA IDENTITY This form changes the information which is written to the write-ahead log to identify rows which are updated or deleted. This option has no effect except when logical replication is in use. DEFAULT (the default for non-system tables) records the old values of the columns of the primary key, if any. USING INDEX records the old values of the columns covered by the named index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL. FULL records the old values of all columns in the row. NOTHING records no information about the old row. (This is the default for system tables.) In all cases, no old values are logged unless at least one of the columns that would be logged differs between the old and new versions of the row.

The last sentence states that the old and the new versions of a row must differ to get logged. But I'm seeing the opposite. What am I missing here?


Solution

  • Replica identity is just one part of the logical replication message/protocol, see Message format:

    Update ... Byte1('K')

    Identifies the following TupleData submessage as a key. This field is optional and is only present if the update changed data in any of the column(s) that are part of the REPLICA IDENTITY index. Byte1('O')

    Identifies the following TupleData submessage as an old tuple. This field is optional and is only present if table in which the update happened has REPLICA IDENTITY set to FULL.

    The portion of docs you quote refer to the above. The slot information you show is looking at the replication process as a whole.

    The purpose of Replica identity is spelled out here Logical replication:

    A published table must have a “replica identity” configured in order to be able to replicate UPDATE and DELETE operations, so that appropriate rows to update or delete can be identified on the subscriber side. By default, this is the primary key, if there is one. Another unique index (with certain additional requirements) can also be set to be the replica identity. If the table does not have any suitable key, then it can be set to replica identity “full”, which means the entire row becomes the key. This, however, is very inefficient and should only be used as a fallback if no other solution is possible. If a replica identity other than “full” is set on the publisher side, a replica identity comprising the same or fewer columns must also be set on the subscriber side. See REPLICA IDENTITY for details on how to set the replica identity. If a table without a replica identity is added to a publication that replicates UPDATE or DELETE operations then subsequent UPDATE or DELETE operations will cause an error on the publisher. INSERT operations can proceed regardless of any replica identity.