Search code examples
sqlpostgresqlrepeatdrymaintainability

Avoid repetition of strings in SQL query


I have a SQL query where the list of strings is repeated twice: ('foo', 'bar', 'baz', 'bletch'). What is the most maintainable method to avoid such repetition and make this code more DRY?

I am not optimizing query performance, only maintainability.

select
    *
from
    gizmo_details
where
    _gizmo_name in ('foo', 'bar', 'baz', 'bletch')
    or gizmo_id in
    (select id from gizmos where
        gizmo_name in ('foo', 'bar', 'baz', 'bletch'));

Solution

  • DRY - Don't Repeat Yourself.

    Repeated rows?

    Multiple rows in gizmo_details can match the same input string. Such duplicates are preserved.
    But the same row in gizmo_details can match multiple input strings (if there can be duplicate input), or the same gizmo_details.gizmo_id can qualify multiple times. Such duplicates are removed.

    This may or may not be an issue. It's just something that tends to get overlooked. Several of the solutions posted so far are not true to your original in this respect.

    If there can be dupes in any of the three sets (tables, input), you might want to spell out how to deal with dupes. There are fitting query styles for any possible specification.

    You did not ask for optimizing query performance (which may be possible on top of anything else here), but correctness always comes first.

    Repetition within a single query

    A CTE is one way to keep a single query DRY:

    WITH input(gizmo_name) AS (
       SELECT unnest('{foo, bar, baz, bletch}'::text[])  -- your input as array
       )
    SELECT *
    FROM   gizmo_details
    WHERE  _gizmo_name = ANY (TABLE input)
    OR     gizmo_id IN (SELECT g.id FROM gizmos g JOIN input USING (gizmo_name));
    

    Much of this comes down to the question of defining constants in SQL. See:

    Repetition within the same session

    To repeat this query over and over, prepared statements are one way to keep it DRY:

    PREPARE qr1 (text[]) AS
    SELECT *
    FROM   gizmo_details gd
    WHERE  gd._gizmo_name = ANY ($1)
    OR     gd.gizmo_id IN (SELECT g.id FROM gizmos g WHERE g.gizmo_name = ANY ($1));
    

    Call:

    EXECUTE qr1('{foo, bar, baz, bletch}'::text[]);  -- array input
    

    This (also) prevents repeating the input inside the query.

    Prepared statements are built into many client libraries as standard feature.

    Repetition across sessions

    A server-side function keeps that "DRY". A VARIADIC input parameter is just an optional added feature. See:

    CREATE OR REPLACE FUNCTION f_my_func(VARIADIC _gizmo_names text[])
      RETURNS SETOF gizmo_details
      LANGUAGE sql STABLE PARALLEL SAFE AS
    $func$
    SELECT *
    FROM   gizmo_details gd
    WHERE  gd._gizmo_name = ANY (_gizmo_names)
    OR     gd.gizmo_id IN (SELECT g.id FROM gizmos g WHERE g.gizmo_name = ANY (_gizmo_names));
    $func$
    

    Call:

    SELECT * FROM f_my_func('foo', 'bar', 'baz', 'bletch');  -- list input
    

    Or:

    SELECT * FROM f_my_func(VARIADIC '{foo, bar, baz, bletch}');  -- back to array input
    

    A lot more versatile than prepared statements, plus it can be nested in outer queries (as opposed to a prepared statement).

    Aside: using _ as prefix for variables and/or parameters but never for column names is a common naming convention. Optional, but I stick to it.