I'm trying to apply Debezium's New Record State Extraction SMT
using the following configuration:
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": true,
"transforms.unwrap.delete.handling.mode": "rewrite",
"transforms.unwrap.add.fields": "db,schema,table,txId,ts_ms"
For INSERT
and UPDATE
operations I get the messages as expected, but in case of DELETE
I get the following as a payload:
"payload": {
"id": 2,
"first_name": "",
"last_name": "",
"__db": "postgres",
"__schema": "schema1",
"__table": "user_details",
"__txId": 5145,
"__ts_ms": 1638760801510,
"__deleted": "true"
}
As you can see above, both first_name
and last_name
fields have empty values, though the record I deleted has non-empty values for both of those fields. What I expect to see as a value for those 2 fields is their value at the moment of deletion as it is shown in debezium's before
payload chunk in case when New Record State Extraction SMT
is not applied.
The reason of empty values for all columns except the PK is not related to New Record State Extraction SMT
at all. For postgres
, there is a REPLICA IDENTITY table-level parameter that can be used to control the information written to WAL to identify tuple data that is being deleted or updated.
This parameter has 4 modes:
In the case of DEFAULT
, old tuple data is only identified with the primary key of the table. Columns that are not part of the primary key do not have their old value written.
In the case of FULL
, all the column values of old tuple are properly written to WAL all the time. Hence, executing the following command for the target table will make the old record values to be properly populated in debezium message:
ALTER TABLE some_table REPLICA IDENTITY FULL;
NOTE!! FULL
is the most verbose, and as well the most resource-consuming mode. Be careful with it particularly for heavily-updated tables.