Search code examples
postgresqldatabase-sequence

About the implementation details of PostgreSQL sequence


I am interested in the implementation details of sequences in PostgreSQL, the reason being that they're used in the background for the SERIAL type. I have reviewed the source file here (found in Google search: "postgresql source code sequence"), although it is not clear to which version of PG this file relates to. I understand the general code flow in the function nextval_internal, which is the heart of the issue, but PG's code base is large and I just can't afford the time to research this issue to its fullest.

I'm interested to know:

  • What is the persistence policy with regard to sequences? How are they loaded when the PG server starts, and how are they persisted during the server's lifetime to prevent mishaps after a server crash/power failure?
  • What are the runtime costs associated with using sequences? Do they always incur some kind of disk I/O, or never at all?
  • What are the deficiencies in using sequences which I should attend to when defining tables and writing application code?

Solution

  • Sequences are like normal tables in many respects: they are found in the pg_class catalog and they have a data file, which is where the current value is stored.

    In recent versions of PostgreSQL (as a side effect of making ALTER SEQUENCE fully transactional) you also have to SELECT from the sequence to see its current value.

    Persistence is handled transactionally like in a normal table (even though nextval is not transactional), so changes go through the transaction log, but different from normal tables there is no multiversioning. Rather, the single row of a sequence is changed “in place”. That's ok because we don't need different versions for the value, and VACUUM would never be able to keep up with cleaning up in a busy sequence.

    To answer your questions individually:

    • During recovery, changes to the sequence are replayed from WAL just like any other data change.

    • Using a sequence has very little overhead. Modifications will not cause I/O all the time, because sequences are cached in shared buffers like all other tables. Only during checkpoints they have to be persisted to storage, and there is some small amount of WAL to be accounted for.

      If you want to further reduce the overhead of using sequences, define them with a CACHE greater than one.

    • There are no problems with using sequences. They are the best way to generate unique numbers.