Search code examples
sqlpostgresqlsupabasepg-cron

Why is my sql statements giving wrong results?


I am using Supabase for database and have written some sql to update certain columns in the table. I have also set up pg_cron jobs to run at intervals automating the process. This is basically a job application engine, the relevant fields (columns) are

  • total_jobs (the total number of jobs available to a user based on subscription plan),
  • jobs_applied (the number of jobs that has been applied for a user)
  • limit (the number of jobs left).

The basic math behind this is if a user pays for a plan of 500 jobs, and 50 has been applied for, the limit should be 450 i.e. limit = total_jobs - jobs_applied.

MRE: db<>fiddle

create table user_tbl(
   user_id int generated by default as identity primary key
  ,name text);
create table metrics_tbl(
   user_id int references user_tbl(user_id)
  ,total_jobs numeric
  ,jobs_applied numeric
  ,job_limit numeric);

create table applied_tbl(
  user_id int references user_tbl(user_id));
create table subscriptions_tbl(
   price_id text default 'id_AAAAAAAAAAAAAAAAAAAAAAAA'
  ,user_id int references user_tbl(user_id))

insert into user_tbl(name)values('user1');
insert into metrics_tbl(user_id)values(1);
insert into applied_tbl(user_id)values(1),(1),(1);
insert into subscriptions_tbl(user_id)values(1);

Run every hour counting the number of applications and updating the relevant column.

SELECT
  cron.schedule (
    '0 * * * *',
    $$UPDATE metrics_tbl
SET jobs_applied =
(SELECT COUNT(user_id) FROM applied_tbl
WHERE applied_tbl.user_id = metrics_tbl.user_id);
$$
  );

Run every hour updating the total_jobs based on user subscription. I am using stripe for payment and the price ids determine the total jobs for a plan.

SELECT
  cron.schedule (
    '0 * * * *',
    $$
    UPDATE metrics_tbl
SET
  total_jobs = CASE price_id
    WHEN 'id_AAAAAAAAAAAAAAAAAAAAAAAA' THEN 100
    WHEN 'id_BBBBBBBBBBBBBBBBBBBBBBBB' THEN 150
    WHEN 'id_CCCCCCCCCCCCCCCCCCCCCCCC' THEN 200
    WHEN 'id_DDDDDDDDDDDDDDDDDDDDDDDD' THEN 250
    WHEN 'id_EEEEEEEEEEEEEEEEEEEEEEEE' THEN 300
    WHEN 'id_FFFFFFFFFFFFFFFFFFFFFFFF' THEN 350
    ELSE 0
  END
FROM
  subscriptions_tbl
WHERE
  metrics_tbl.user_id = subscriptions_tbl.user_id; 
$$
  );

Implement the math logic total limit = total_jobs - jobs_applied

SELECT
  cron.schedule (
    '0 * * * *',
    $$
    UPDATE metrics_tbl
    SET limit = total_jobs - jobs_applied;
$$
  );

Whilst this worked in development, I started noticing some bugs in production. It sometimes shows limit as a negative number and other times shows limit as the jobs_applied. This means that there's something that keeps changing the total_jobs to 0 therefore making limit to be negative.

For example if 27 jobs have been applied for a user with 50 total jobs, sometimes the limit shows -27 and other times shows 27 but never shows the correct result (13).

I feel my implementation is inefficient and I'm no expert in sql. I've tried refactoring the sql and also checking if Supabase has an unsigned int data type but couldn't find any. The math logic worked in dev, not sure what's happening now.


Solution

  • Make the limit a generated column and you won't have to ever update it, it'll get recalculated automatically every time there's any change in the other two fields it depends on. It might be a good idea to also change its name because limit is a reserved keyword.
    demo at db<>fiddle

    alter table metrics_tbl
        drop column if exists job_limit
       ,add column job_limit numeric 
          generated always as(total_jobs-jobs_applied)stored;
    

    This means you no longer need that last pg_cron job at all. The other two can be merged into one:

    SELECT cron.schedule('0 * * * *',
        $q$ UPDATE metrics_tbl
            SET jobs_applied =(SELECT COUNT(user_id) 
                               FROM applied_tbl
                               WHERE applied_tbl.user_id = metrics_tbl.user_id)
               ,total_jobs =CASE price_id
                            WHEN 'id_AAAAAAAAAAAAAAAAAAAAAAAA' THEN 100
                            WHEN 'id_BBBBBBBBBBBBBBBBBBBBBBBB' THEN 150
                            WHEN 'id_CCCCCCCCCCCCCCCCCCCCCCCC' THEN 200
                            WHEN 'id_DDDDDDDDDDDDDDDDDDDDDDDD' THEN 250
                            WHEN 'id_EEEEEEEEEEEEEEEEEEEEEEEE' THEN 300
                            WHEN 'id_FFFFFFFFFFFFFFFFFFFFFFFF' THEN 350
                            ELSE 0
                            END
            FROM subscriptions_tbl
            WHERE metrics_tbl.user_id = subscriptions_tbl.user_id; 
        $q$
    );
    

    As already stated by @Richard Huxton, the 3 jobs you set up weren't guaranteed to execute in any particular order. This means the limit=total_jobs-jobs_applied was sometimes calculated

    • before either of its input fields were updated,
    • after total_jobs got updated but still used the old jobs_applied that didn't update yet
    • after a job wrote the new jobs_applied value but before new total_jobs got saved

    Switching to a single job and a generated column solves the problem:

    user_id total_jobs jobs_applied job_limit
    1 100 3 97