Search code examples
sqlpostgresqlexecution

Postgres Select execution order incorrect


The following query does not work in Postgres 9.4.5.

SELECT * FROM (
   SELECT M.NAME, M.VALUE AS V
     FROM METRICS AS M, METRICATTRIBUTES AS A
    WHERE M.NAME=A.NAME AND A.ISSTRING='FALSE'
    ) AS S1
 WHERE CAST(S1.V AS NUMERIC)<0

I get an error like:

invalid input syntax for type numeric: "astringvalue"

Read on to see why I made query this overly complicated.

METRICS is a table of metric, value pairs. The values are stored as strings and some of the values of the VALUE field are, in fact strings. The METRICATTRIBUTES table identifies those metric names which may have string values. I populated the METRICATTRIBUTES table from an analysis of the METRICS table.

To check, I ran...

SELECT * FROM (
    SELECT M.NAME, M.VALUE AS V
      FROM METRICS AS M, METRICATTRIBUTES AS A
     WHERE M.NAME=A.NAME AND A.ISSTRING='FALSE'
    ) AS S1
 WHERE S1.V LIKE 'a%'

This returns no values (like I would expect). The error seems to be in the execution plan. Which looks something like this (sorry, I had to fat finger this)

1 -> HAS JOIN 
2    HASH COND: ((M.NAME::TEXT=(A.NAME)::TEXT))
3       SEQ SCAN ON METRICS M
4       FILTER: ((VALUE)::NUMERIC<0::NUMERIC)
5   -> HASH 
6     -> Seq Scan on METRICATTRIBUTES A
7        Filter: (NOT ISSTRING)

I am not an expert on this (only 1 week of Postgres experience) but it looks like Postgres is trying to apply the cast (line 4) before it applies the join condition (line 2). By doing this, it will try to apply the cast to invalid string values which is precisely what I am trying to avoid!

Writing this with an explicit join did not make any difference. Writing it as a single select statement was my first attempt, never expecting this type of problem. That also did not work.

Any ideas?


Solution

  • As you can see from your plan, table METRICS is being scanned in full (Seq Scan) and filtered with your condition: CAST(S1.V AS NUMERIC)<0—join does not limits the scope at all.

    Obviously, you have some rows, that contain non-numeric data in the METRICS.VALUE. Check your table for such rows like this:

    SELECT * FROM METRICS
     WHERE NOT VALUE ~ '^([0-9].,e)*$'
    

    Note, that it is difficult to catch all possible combinations with regular expression, therefore check out this related question: isnumeric() with PostgreSQL

    Name VALUE for the column is not good, as this word is a reserved one.

    Edit: If you're absolutely sure, that joined tables will produce wanted VALUE-s, than you can use CTEs, which have optimization fence feature in PostgreSQL:

    WITH S1 AS (
        SELECT M.NAME, M.VALUE AS V
          FROM METRICS AS M
          JOIN METRICATTRIBUTES AS A USING (NAME)
         WHERE A.ISSTRING='FALSE'
    )
    SELECT *
      FROM S1
     WHERE CAST(S1.V AS NUMERIC)<0;