Search code examples
node.jspostgresqlnext.jsvercelsupabase

Run slow background process when certain conditions in Postgres data are met?


I'm using Postgres (via Supabase) and Node (via NextJS on Vercel).

I have a table report which has columns for an enum region, and generated_text which is content generated by AI.

create type region as enum (
  'America',
  'Europe',
  'Asia',
  'Africa'
);

create table report (
  id serial primary key,
  region region,
  generated_text: text
);

A user selects their region and then purchases a report. Purchases are recorded in a join-table report_user.

create table report_user(
  id serial primary key,
  report uuid NOT NULL,
  user: uuid NOT NULL
)

When the user makes a purchase:

  • If there is a row in the report table for that region which the user hasn't purchased before then that row is returned to them.
  • If there isn't a report for that region, or if the user has already purchased all of the reports for that region, then we need to create a new row. This requires a request to an AI API for the generated_text column which is very slow and is a bad user experience.

I'd like to improve the speed for users and this can only be done by creating report rows before they are needed. I can bulk-create some but I don't know which regions will be more popular (I've simplified my example, my actual application has many more filters and thus a lot of potential combinations).

Is there a way to run a background task to create a new report when the user has purchased all report rows for a certain region filter?


Solution

  • Is there a way to run a background task to create...when...

    Yes, there is - and you don't even need to take one step outside Supabase.

    It offers http extension which lets you run the API request to get new reports to top up your inventory when you run low, straight from a trigger, in SQL. There's even pg_net that does that asynchronously. Demo at db<>fiddle:

    Set up a table you insert a row into whenever someone orders a report:

    create table orders(id int generated always as identity primary key,
                        "user" uuid references users(uid), 
                        region region);
    

    Another one with parameters dictating when you want to restock reports for each region:

    create table report_stock_parameters as
    select unnest(enum_range(null::region)) as region, 
           3 as min_stock_size,
           4 as restock_batch_size;
    

    You can place the criteria telling you when to restock, in a function:

    create function is_restock_required(p_user uuid,p_region region)
    returns boolean as $f$
        select (select min_stock_size 
                from report_stock_parameters
                where region=p_region) 
              >(select count(*) --counting reports unused by the user
                from report 
                left join report_user
                       on report.id=report_user.report
                      and report.region=p_region
                      and report_user.user=p_user
                where report_user.id is null) --anti-join
    $f$ language sql;
    

    Finally, the procedure that calls the API to restock reports:

    create procedure fetch_some_more_reports(p_region region)as $f$
        insert into report (region,generated_text)
        select p_region, (response."content")::jsonb #>> '{path,to,report}'
        from generate_series(1,(select restock_batch_size 
                                from report_stock_parameters
                                where region=p_region))as a(g1),
             http_get('https://generative_ai_api.com/reports/please',
                       jsonb_build_object('access_token', 'eyJ0eXAiOiJKV',
                                          'throwaway', g1))as response;
    $f$ language sql;
    

    The trigger that fires in response to inserts of new, incoming orders, and when you need a refill, runs the restocking procedure:

    create function trgf_restock_reports()returns trigger as $f$
    begin 
        call fetch_some_more_reports(new.region);
        return null;
    end $f$ language plpgsql;
    
    create trigger t_report_stock_monitor after insert on orders
    for each row when (is_restock_required(new.user,new.region))
    execute function trgf_restock_reports();
    

    Depending on how your API wants you to authenticate, you'd change sending the access_token in payload to passing a key in a header, URL parameter, etc. See examples on the extension's GitHub.

    I'm stuffing g1 value from generate_series() in the http_get() call to prevent the db from evaluating it only once and re-using for each of the reports you want to generate in a batch during refill. The dependency communicates that you want to fetch a new report, that many times.

    Your API might offer a method of requesting multiple in one request, in which case you'd skip generate_series(), pass restock_batch_size directly into the request, then split the response with jsonb_array_elements_text() into separate rows.


    1. Supabase also supports pg_cron extension you can use to set up recurring jobs, also right in the db. A trigger soudns better because it'll fire anytime someone orders a report and immediately fetch reports from the API to re-stock. A recurring, time-based pg_cron job would run the risk that sometimes you'd run out and have to wait some time before it checks in to re-stock according to the schedule.

    2. If it was your own, plain PostgreSQL, you could use a Steampipe plugin for the API you're getting the reports from, which hide away all the details of API-calling logic, expose it as a table, and translate your simple selects from it into API calls, then translates the responses into rows as if they came from a table.

      There are existing, off-the-shelf Steampipe plugins for generative AI (CohereAI, OpenAI) you can use:

      select completion
      from openai_completion
      where prompt = 'Write a tagline for an ice cream shop.';
      
      completion
      1. Cool down with a scoop of our delicious ice cream!
      2. Indulge your cravings with creamy goodness!
      3. Sweeten your day with a scoop of our tasty treats!

      If you're using some other service, you'd have to build your own Steampipe wrapper for it.

      In Supabase you can't add full-sized Steampipe, but you can have Steampipe running elsewhere and link its API-caller-disguised-as-a-table through postgres_fdw as a foreign table. This would give you a completely declarative, pure SQL, in-db solution, possibly offloading API handling to Steampipe, with the caveat of requiring two db's: pg_cron schedule or trigger-based logic on Supabase dictates when to run a refill, which it does by just running a select from the other, linked db, where Steampipe translates that to an API call.

    3. Supabase also has its own wrappers framework which offers a similar approach to Steampipe. It has only 12 readily available options and none for generative AI (compared to Steampipe's 140, with 2 of those), but it's native, so it might be a better choice if you were to try and build your own API-wrapping-extension. It would appear that there might be a way you could build a "Steampipe postgres fdw" and add that to Supabase but it seems Supabase only welcomes plain SQL or PL/pgSQL extensions from outside their pre-configured list, and the Steampipe-generated FDW is a bit more complex than that.

    4. It could be a good idea to collect stats who and how often needs a refill and increase the parameter dictating how far ahead of their demand you want to stay. Depending on how the API works, what traffic you're taking and your storage/db config, you might want to let the trigger categorise users into green, yellow and red groups, and not refill anyone's reports when they only drop from green to yellow, but as soon as at least one runs red, run a single bulk refill for everyone in either yellow or red. One large request and one large insert every now and then, replacing a continuous stream of tiny ones.