Search code examples
postgresqlmaster-slavewal

Postgres slave server log replay details


Postgres supports master slave replication through WAL(Write Ahead Log) files.

On master server, a backend process called WALSender will send log files(records) to the slave server, and on the slave server, a backend process called WALReceiver will receive log files(records), and block data instead of SQL queries are stored in WAL files. (am I right??)

Then how does slave server apply the log updates in both cases of file-based replication and streaming replication? I mean in detail which process handles this and how it does this?

Thanks in advance.


Solution

  • Your description is a reasonable summary, yes. There are roughly three levels:

    • High level SQL queries like "UPDATE mytable SET a=nextval('some_sequence'), b=current_timestamp WHERE id=$1"

    • logical row changes, like "in table abc with primary key value id=42 the tuple was updated to new value (a=11, b=12314234321)". These aren't recorded on disk in this form, but are an intermediate stage produced inside the executor, which are then turned into:

    • block-level changes written to the xlog then the heap, like "in relation with relfilenode 12312.1 in dboid 9191 stored in tablespace 1, block 41231 byte offset 0x0012 changed to 0xde 0xad 0xbe 0xef 0x01"

    (all numbers completely made up)

    WAL-based replication happens at the lowest level, recording block changes to the files in the database. OK, so it's not quite that simple, but that'll do for the purposes of understanding replication.

    On a standalone master, SQL gets executed, producing row changes that are recorded to WAL and to shared_buffers. Then the WAL is replayed to apply the changes to the database heap. (Again, it's not that simple because of dirty writeback by the bgwriter, etc, but that'll serve for now).

    With WAL based replication the master keeps the WAL around - sends it to replicas or archive it. Replicas, instead of producing their own WAL then replaying it, just replay WAL from a master elsewhere, using that to update the database heap and their shared_buffers.

    Then how does slave server apply the log updates in both cases of file-based replication

    The replica runs the restore_command to ask for the next WAL archive when it hits the end of the previous archive. The recovery process then starts reading the WAL archive, processing it record-by-record.

    Look at how the restore_command is used in the source to see how it's invoked.

    and streaming replication?

    The replica's walreceiver connects to the upstream's walsender and receives wal records. These are written to a file, which the recovery process then reads.

    The recovery part is the same in both cases, the difference is just how the WAL is received from the upstream. Recovery is also much the same when doing crash recovery on a master; that just replays WAL too.

    I mean in detail which process handles this and how it does this?

    The best reference for this is the source code - specifically the comments and README files.

    I strongly suggest that you use pg_xlogdump to see what WAL actually contains. Then read the documentation for the relevant WAL records to understand what each record type does.

    Also start by reading src/backend/access/transam/README and src/backend/access/transam/xlog.c.

    I'm not going to repeat what's already in the relevant READMEs, which are much more likely than my description to be correct.