Search code examples
sqlregexpostgresqlpattern-matchingpostgresql-8.4

String matching in PostgreSQL


I need to implement a regular expression (as I understand) matching in PostgreSQL 8.4. It seems regular expression matching are only available in 9.0+.

My need is:

When I give an input 14.1 I need to get these results:

14.1.1
14.1.2
14.1.Z
...

But exclude:

14.1.1.1
14.1.1.K
14.1.Z.3.A
...

The pattern is not limited to a single character. There is always a possibility that a pattern like this will be presented: 14.1.1.2K, 14.1.Z.13.A2 etc., because the pattern is provided the user. The application has no control over the pattern (it's not a version number).

Any idea how to implement this in Postgres 8.4?

After one more question my issue was solved:

Escaping a LIKE pattern or regexp string in Postgres 8.4 inside a stored procedure


Solution

  • Regular expression matching has been in Postgres practically for ever, at least since version 7.1. Use the these operators:

    ~ !~ ~* !~*
    

    For an overview, see:

    The point in your case seems to be to disallow more dots:

    SELECT *
    FROM   tbl
    WHERE  version LIKE '14.1.%'        -- for performance
    AND    version ~ '^14\.1\.[^.]+$';  -- for correct result
    

    db<>fiddle here
    Old sqlfiddle

    The LIKE expression is redundant, but it is going to improve performance dramatically, even without index. You should have an index, of course.

    The LIKE expression can use a basic text_pattern_ops index, while the regular expression cannot, at least in Postgres 8.4.
    Or with COLLATE "C" since Postgres 9.1. See:

    [^.] in the regex pattern is a character class that excludes the dot (.). So more characters are allowed, just no more dots.

    Performance

    To squeeze out top performance for this particular query you could add a specialized index:

    CREATE INDEX tbl_special_idx ON tbl
    ((length(version) - length(replace(version, '.', ''))), version text_pattern_ops);
    

    And use a matching query, the same as above, just replace the last line with:

    AND   length(version) - length(replace(version, '.', '')) = 2
    

    db<>fiddle here
    Old sqlfiddle