Search code examples
regexpostgresqlposix

How to create a generalized regex (POSIX) in PostgreSQL?


In postgreSQL (9.5), PgAdmin III, I would like to generalize this POSIX statement for two words:

This works for the words 'new' and 'intermediate' with word boundaries:

select * from cpt where cdesc ~* '^(?=.*\mnew\M)(?=.*\mintermediate\M)'

This fails ( the "where" argument is seen as a text string):

select * from cpt where cdesc ~* '^(?=.*\m'||'new'||'\M)(?=.*\mintermediate\M)'

How can this be written for a generlized function, e.g.:

CREATE OR REPLACE FUNCTION getDesc(string1 text, string2 text)
  RETURNS SETOF cpt AS
$BODY$ 

select * from cpt where cdesc ~* '^(?=.*\m$1\M)(?=.*\m$2\M)'

$BODY$
  LANGUAGE sql VOLATILE;

(where $1 is string1 and $2 is string2)

TIA

Edit. Match stings in cdesc would be:

  • "This is a new and intermediate art work"
  • "This is an intermediate and new piece of art"

Non-match would be:

  • "This is new art"
  • "This is intermediate art"

Please note the order of the words is not important as long as both are present. Also, either word may have a punctuation mark -- (comma or period)--immediately following the word (no space).


Solution

  • My first suggestion would be to split the expensive regex into two SQL WHERE clauses and:

    • matching with LIKE, as it is much faster, you can filter in code for more specific matches,
    • or matching with a simple regex, something like '\m$1[\M,.]'

    As for the regex you are using:

    • I have not used it in a while, but I think you need parenthesis for string concatination

    ~* ( '^(?=.*\m' || 'new' || '\M)(?=.*\mintermediate\M)' )