Search code examples
postgresqlconcurrencydistributed-computingoptimistic-locking

Generate sequential and distinct order numbers for vendors in PostgreSQL


I'm working on an ecommerce app in a distributed system where multiple servers are interacting with a PostgreSQL database. I need to implement an order number counter for each vendor to ensure each vendor has its own unique and incremental order numbers.

Each vendor has a distinct sequence of order numbers, such as Vendor 1 having orders numbered 1, 2, 3... and Vendor 2 also having orders numbered 1, 2, 3... These order numbers are independent of the database IDs and are generated sequentially for each vendor.

I've considered using sequences and triggers, but I really dont know if it is the right approach. Also I have heard about optimistic locks and could help but I am not familiare with the implementation and I fear that locking the DB would cause some performance issues or would require creating optimistic locks failure handlers. this is how to SQL data looks like:

CREATE TABLE vendors (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL);

CREATE TABLE orders (
id SERIAL PRIMARY KEY,
order_number INT NOT NULL,
vendor_id INT REFERENCES vendors(id));

So is there a way to do this ?


Solution

  • Effective ordinal number per vendor

    You'll need a view for that: if you try to only guard the consistency of that numbering through default values that are established on insert (either automatically db-side or by hand client-side), then a single delete, update, or non-default insert can break that consistency:

    • A delete introduces a gap in the sequence of a vendor's orders - you said it's not a problem for you, but still, it's avoidable.
    • Someone can update an order, reassigning it to another vendor or explicitly changing its number within the same vendor. In both cases that order_number will collide with another, future or existing, automatically-numbered order.
    • It's possible to explicitly insert an invalidly numbered, out-of-sequence order. Depending on how you set up the default, OVERRIDING SYSTEM VALUE clause might not even be necessary.

    A view can compute the current effective order of things, as of query time, using row_number() window function, partitioned by vendor_id: demo1

    CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    vendor_id INT REFERENCES vendors(id),
    some_other_column TEXT);
    
    CREATE VIEW v_orders AS 
    SELECT *,row_number()over(partition by vendor_id order by id) AS order_number
    FROM orders;
    

    Per-vendor sequential identifier

    If those are just identifiers that you want to be sequential and unique for each vendor, then that's a fairly similar problem to keeping up-to-date, cached group counts (example). None of the standard methods help here:

    All of them use a single, common sequence tied to the id column. A workaround is to use a function that accepts vendor_id as a "seed" and maintains separate sequences for each distinct seed, returning adequate nextval() based on that. Or, maintain a separate table holding the most recent id per vendor: demo2

    create table order_number_per_vendor(
      vendor_id INT PRIMARY KEY REFERENCES vendors(id),
      max_order_number INT NOT NULL DEFAULT 1);
    
    create function f_order_numbers()returns trigger language plpgsql as $f$
    begin
        insert into order_number_per_vendor as f
        values (new.vendor_id,1)
        on conflict (vendor_id) do update
        set max_order_number=f.max_order_number+1
        returning max_order_number
        into new.order_number;
        
        return new;
    end $f$;
    
    create trigger t_order_numbers before insert on orders 
    for each row execute function f_order_numbers();
    

    It makes no attempt at removing gaps or correcting manually broken sequences. Also, all concurrent insert into orders attempts for the same vendor, lock each other and have to wait for predecessors to finish. You can trade that for only locking if they all insert the first orders of a vendor: demo3

    create function seeded_sequence_nextval(seed text) returns int as $f$
    declare sequence_name text:='seeded_sequence_'||seed;
    begin 
      begin execute format('create sequence if not exists %I;',sequence_name);
      exception when /*23505*/unique_violation then null;--concurrent create if not exists
      end;
      return nextval(format('%I',sequence_name));
    end $f$ language plpgsql;
    
    create function f_order_numbers()returns trigger language plpgsql as $f$
    begin
        select seeded_sequence_nextval('orders_vendor_'||new.vendor_id) 
        into new.order_number;
        return new;
    end $f$;
    
    create trigger t_order_numbers before insert on orders 
    for each row execute function f_order_numbers();
    

    If a vendor already has a sequence, they'll all just call their adequate, dynamically constructed nextval(). If not, they'll all get locked until the first one finishes setting it up, then call nextval() on it.

    It's tempting to try and use a function like that in a default or generated always as expression, but the former can't reference any other fields to establish the shared seed, and the latter can't use non-immutable functions - which this one is not. It's possible but not necessarily a good idea to try to falsely declare it as immutable: that tells the planner it can potentially re-use a single evaluation of that nextval() throughout all rows of a statement. Even if it doesn't, it's technically allowed to:

    Labeling a function IMMUTABLE when it really isn't might allow it to be prematurely folded to a constant during planning, resulting in a stale value being re-used during subsequent uses of the plan.