Search code examples
sqlpostgresqlsupabase-database

Query is slow with SECURITY INVOKER


We have a big table (events) which contains events for a lot of devices. Each row is protected by an RLS check that verifies that the authenticated user is in an organization that owns the specific device.

We now implemented a get_events function that returns the last _limit events for an array of given devices (_device_ids). Unfortunately, that function is really slow, hitting the default timeout of 8s.

-- return the _limit most recent events for each given entry in the `_device_ids` array.
        RETURN QUERY
            SELECT e.device_id, e.type, e.timestamp, e.data
            FROM unnest(_device_ids) AS p(device_id)
            CROSS JOIN LATERAL (
                SELECT e.*
                FROM private_schema.events e
                WHERE e.device_id = p.device_id
                        AND e.timestamp >= _since
                ORDER BY e.timestamp DESC
                LIMIT _limit
            ) AS e
            ORDER BY e.device_id, e.timestamp DESC;

We did some experiments, and found that running this function with SECURITY DEFINER dramatically improves its performance, bringing it back to ~40ms.

At the moment I therefore implemented a work-around where a SECURITY INVOKER function makes sure that the user has the right access, before calling the get_events function which is now marked as SECURITY DEFINER.

Ideally, I would prefer keeping the get_events function as SECURITY INVOKER (just so I can't make any mistakes). Is there a way to make the function faster?

Edit: As suggested by Florian Klein I added 'LEAKPROOF' to my functions, but that didn't change much.

Here is an attempt to get the profile of running the query that is run in the get_events function:

ALTER FUNCTION is_auth_in_org_of_alias LEAKPROOF;
ALTER FUNCTION is_auth_member_of_org LEAKPROOF;
ALTER FUNCTION private_schema.get_events LEAKPROOF;

SET auto_explain.log_nested_statements = true;
SET auto_explain.log_min_duration=0;
EXPLAIN ANALYZE SELECT e.device_id, e.type, e.timestamp, e.data
            FROM unnest(ARRAY['5a52f07f-c2fb-54b9-b611-21e8fcec678b'::uuid]) AS p(device_id)
            CROSS JOIN LATERAL (
                SELECT e.*
                FROM private_schema.events e
                WHERE e.device_id = p.device_id
                ORDER BY e.timestamp DESC
                LIMIT 20
            ) AS e
            ORDER BY e.device_id, e.timestamp DESC;

with output:

Sort  (cost=3841.50..3841.55 rows=20 width=55) (actual time=7475.117..7475.119 rows=20 loops=1)

  Sort Key: e.device_id, e."timestamp" DESC
  Sort Method: quicksort  Memory: 26kB
  ->  Nested Loop  (cost=3840.61..3841.07 rows=20 width=55) (actual time=7474.995..7475.000 rows=20 loops=1)
        ->  Function Scan on unnest p  (cost=0.00..0.01 rows=1 width=16) (actual time=0.022..0.023 rows=1 loops=1)
        ->  Limit  (cost=3840.60..3840.65 rows=20 width=59) (actual time=7474.968..7474.971 rows=20 loops=1)
              ->  Sort  (cost=3840.60..3845.55 rows=1978 width=59) (actual time=7474.967..7474.968 rows=20 loops=1)
                    Sort Key: e."timestamp" DESC
                    Sort Method: top-N heapsort  Memory: 47kB
                    ->  Bitmap Heap Scan on events e  (cost=89.29..3787.97 rows=1978 width=59) (actual time=9.051..7438.728 rows=44280 loops=1)
                          Recheck Cond: (device_id = p.device_id)
                          Filter: is_auth_in_org_of_alias(device_id)
                          Heap Blocks: exact=2036
                          ->  Bitmap Index Scan on events_device_id  (cost=0.00..88.80 rows=5934 width=0) (actual time=5.329..5.329 rows=44280 loops=1)
                                Index Cond: (device_id = p.device_id)
Planning Time: 1.101 ms
Execution Time: 7475.282 ms

Since this might be related, here is the RLS policy:

CREATE FUNCTION is_auth_member_of_org(_organization_id uuid)
  RETURNS boolean
  LANGUAGE sql
  SECURITY DEFINER
  AS $function$
    SELECT EXISTS (
      SELECT 1
      FROM public.roles
      WHERE user_id = auth.uid()
      AND organization_id = _organization_id
    )
  $function$;

CREATE OR REPLACE FUNCTION is_auth_in_org_of_alias(_device_id UUID)
  RETURNS BOOLEAN
  SECURITY DEFINER
  LANGUAGE plpgsql
  AS $$
BEGIN
  RETURN is_auth_member_of_org(
    (SELECT organization_id FROM public.devices WHERE alias = _device_id)
  );
END;
$$;

CREATE POLICY "Authenticated have full access to events table of devices in the orgs they are members in"
  ON toit_artemis.events
  FOR ALL
  TO authenticated
  USING (public.is_auth_in_org_of_alias(device_id))
  WITH CHECK (public.is_auth_in_org_of_alias(device_id));


Solution

  • Looks like you could simplify your policy by inlining the sql directly in the RLS:

    CREATE POLICY "Authenticated have full access to events table of devices in the orgs they are members in"
    ON toit_artemis.events
    FOR ALL
    TO authenticated
    USING (exists(
      select from public.roles r
      join public.device d using (organization_id)
      where r.user_id = auth.uid()
      and d.alias = toit_artemis.events.device_id
      -- maybe more is needed, IDK
    ))
    

    didn't test it as I don't have access to the full schema, but I'd be interested in seeing the plan of this one, where you shouldn't see

       Filter: is_auth_in_org_of_alias(device_id)
    

    in the plan anymore, which might be the reason why it's slow, as it might make 44000 calls to your UDF, but I'm not sure exactly (see https://explain.dalibo.com/plan/8c8d3b99438egccg#plan/node/6)