Search code examples
postgresqlperformancequery-optimizationcommon-table-expression

Optimize PostgreSQL query to insert users into the database


  • The use case goes like this.
  • There is a signup endpoint on which the user will submit their email and password
  • 4 rows have to be created at once with a couple of ids generated
  • The tables look like this

authentication_types

id name
uuid varchar
primary key
0aa4d9a9-e024-4792-bc41-36a4f3528d36 password

accounts

id email password ...few other columns authentication_type_id
uuid varchar varchar uuid
primary key unique foreign key to authentication_types
7a9d912a-69ab-4615-9058-e1bb1c4e36c5 password ... ... 0aa4d9a9-e024-4792-bc41-36a4f3528d36

users

id enabled
uuid boolean
primary key
fc9ca826-63dc-43b8-97b6-2e949ffd8a30 true

user_accounts

id account_id user_id
uuid uuid uuid
primary key foreign key to accounts foreign key to users
bd4b338f-1b5a-4b24-9908-e5cfb4080dd4 7a9d912a-69ab-4615-9058-e1bb1c4e36c5 fc9ca826-63dc-43b8-97b6-2e949ffd8a30

verification_tokens

id expires token account_id
uuid timestamptz varchar uuid
primary key unique foreign key to accounts
865a6389-67ea-4e38-a48f-9f4b60ffe816 ... ... 7a9d912a-69ab-4615-9058-e1bb1c4e36c5

When a new signup happens I want to do the following things

  • Generate a uuid and insert a row into accounts
  • Generate a uuid and insert a row into users
  • Generate a uuid + Take the id of the inserted row from accounts and users and insert into user_accounts
  • Generate a uuid + token + account id generated above and insert into verification_tokens

This is the query I came up with after a lot of thought. Get the authentication type id for say password and run a CTE with some uuid generation in every step. Is there a way to optimize this any further?

WITH account_data(id, email, password, authentication_type_id) AS (
      VALUES( gen_random_uuid()
             ,:email
             ,:password
             ,(SELECT id 
               FROM authentication_types 
               WHERE name = :authenticationTypeName) ) )
,ins1(user_id) AS (
      INSERT INTO users(id, enabled) 
      VALUES( gen_random_uuid()
             ,true)
      RETURNING id AS user_id )
,ins2(account_id) AS (
      INSERT INTO accounts (id, email, password, authentication_type_id) 
      SELECT id
            ,email
            ,password
            ,authentication_type_id 
      FROM account_data 
      RETURNING id AS account_id )
,ins3 AS (
      INSERT INTO user_accounts (id, account_id, user_id) 
      VALUES( gen_random_uuid()
             ,(SELECT account_id 
               FROM ins2)
             ,(SELECT user_id 
               FROM ins1)       ) )
INSERT INTO verification_tokens (id, token, account_id) 
VALUES( gen_random_uuid()
       ,:token
       ,(SELECT account_id 
         FROM   ins2)     ) 
RETURNING (SELECT account_id FROM ins2) AS id

If there is a faster method to execute the above query, I would be glad to hear. Thank you for your help in advance

EDIT 1

Here is a sample query with actual data

WITH account_data(id, email, password, authentication_type_id) AS (
      VALUES( gen_random_uuid()
             ,'[email protected]'
             ,'$2a$12$71qwp8jK0z6EVbpe9FZSIeHQkYh4Wtn2sOxH/Y3xuEhSLP1IjPAEK'
             ,(SELECT id 
               FROM authentication_types 
               WHERE name = 'password') ) )
,ins1(user_id) AS (
      INSERT INTO users(id, enabled) 
      VALUES( gen_random_uuid()
             ,true)
      RETURNING id AS user_id )
,ins2(account_id) AS (
      INSERT INTO accounts (id, email, password, authentication_type_id) 
      SELECT id
            ,email
            ,password
            ,authentication_type_id 
      FROM account_data 
      RETURNING id AS account_id )
,ins3 AS (
      INSERT INTO user_accounts (id, account_id, user_id) 
      VALUES( gen_random_uuid()
             ,(SELECT account_id 
               FROM ins2)
             ,(SELECT user_id 
               FROM ins1)       ) )
INSERT INTO verification_tokens (id, token, account_id) 
VALUES( gen_random_uuid()
       ,'$2a$12$VNP/ya5ILSP.yYjSa3anyuZhm8Jxc97Y3p95grDjS/Xe1XjBmX/VK'
       ,(SELECT account_id 
         FROM   ins2)     ) 
RETURNING (SELECT account_id FROM ins2) AS id

Here is the visual query analyzer

enter image description here

Here is the explain analyze result

enter image description here


Solution

  • I'll take it as a given that:

    • your design is constrained to DML-based CRUD.
    • the uniqueness check for whether an email already exists in accounts is being done on the back-end prior to this operation or that unique constraint violations raised by ins2 are appropriately trapped and handled by the API service based on the SQLSTATE of the response - the below optimizations rely on this advance knowledge that the constraints will all pass

    A couple of small tweaks to the DML:

    WITH account_data(id, email, password, authentication_type_id) AS (
          VALUES( gen_random_uuid()
                 ,:email
                 ,:password
                 ,:authentication_type_id  --cache these at the level of the CRUD service and submit parameterized 
                 ) 
    ), ins1(user_id) AS (
          INSERT INTO users(id, enabled) 
          VALUES( gen_random_uuid(), true )
          RETURNING id AS user_id 
    ), ins2(account_id) AS (
          INSERT INTO accounts (id, email, password, authentication_type_id) 
          SELECT id
                ,email
                ,password
                ,:authentication_type_id  --parameterized
          FROM account_data
          RETURNING id AS account_id
    ), ins3 AS (
          INSERT INTO user_accounts (id, account_id, user_id) 
          SELECT gen_random_uuid()
                 , account_id
                 , user_id )
          FROM ins1, ins2  --ditch the subqueries, just use the cartesian product of the two preceding always-single records
    )
    INSERT INTO verification_tokens (id, token, account_id) 
    SELECT gen_random_uuid()
           , :token
           , account_id 
    FROM ins2
    RETURNING account_id AS id;
    

    You can additionally try altering the DDL to make the PRIMARY KEY and FOREIGN KEY constraints DEFERRABLE/INITIALLY DEFERRED, which will cause the constraint checks and index updates to only run together upon the successful completion of all 4 INSERTs rather than after each one. This should reduce your overhead and deadlock risk in a high-concurrency live environment.

    Note that you might not see any material improvement in benchmark performance until you get a prod-like concurrency load on the database.