Search code examples
sqlpostgresqlfunctionsql-update

Create function with loop to be used in UPDATE


In my Postgres 13 DB I want to update a column mother_tongues_id in my table based on name columns.

Names are like "Le Minh Thien" and I want to check if those contain words like 'le' or 'lee' then set nationality to 'Vietnam'. The words check table is some thing like this:

language noun
vietnam +lee+le+long+la+
chinese +lu+zhu+

Here is my UPDATE code:

UPDATE resume
SET mother_tongues_id = (
    SELECT id FROM mother_tongues
    WHERE check_language(resume.id) = id
    limit 1);
WHERE mother_tongues_id IS NULL;

Here is my check_language() function:

CREATE OR REPLACE FUNCTION check_language(rid integer) RETURNS integer AS
$BODY$
    DECLARE i int = (SELECT (LENGTH(CONCAT(first_name, ' ',last_name)) - LENGTH(replace(CONCAT(first_name, ' ',last_name), ' ', ''))) FROM resume where id = rid);
    DECLARE sname varchar = '';
    DECLARE sid int = 0
    LOOP
        SET i = i - 1;
        SET sname = SELECT split_part(SELECT CONCAT(first_name, ' ',last_name) FROM resume  where id = rid, ' ', i)
        if select id from mother_tongues WHERE noun ILIKE '%+sname+%' then SET sid = id;
        end if;
        IF i = 1 THEN
            EXIT;
        END IF;
    END LOOP;
    if sid > 0 then RETURN sid;
    else RETURN NULL;
    end if;
END;

$BODY$ 
LANGUAGE plpgsql;

I try to create a function in Postgres with loop to split the name and check the words, however always error.


Solution

  • Your function is extremely inefficient, errors aside.
    You certainly don't need a function for this to begin with. A plain UPDATE with a correlated subquery matching with the array overlap operator && does the job:

    UPDATE resume
    SET    mother_tongues_id = (
        SELECT m.id
        FROM   mother_tongues m
        WHERE  string_to_array(lower(concat_ws(' ', r.first_name, r.last_name)), ' ')
           &&  string_to_array(trim(m.noun, '+'), '+')
        LIMIT  1
        )
    WHERE  mother_tongues_id IS NULL;
    

    Of course, this is just putting lipstick on a pig. You should really start by fixing your DB schema. As a minimum, convert mother_tongues.noun to text[], so we don't have to do it on the fly. And add a GIN index on that column. See:

    Or properly normalize the design.

    More fundamentally, deciding nationality based on typical names is a dubious approach. Besides being hardly valid, there can be any number of conflicts, like multiple matches. Currently, LIMIT 1 simply takes an arbitrary match in this case.

    If you really need to loop in a PL/pgSQL function (which you don't in this case), read here: