Search code examples
postgresqlinner-jointimescaledb

postgresql - Inner JOIN is not working correctly


I have tables that look like this:

enter image description here

I'm trying to INNER JOIN these two tables, so that I will get something like this:

        time         | block_height | differential_pressure |
---------------------+--------------+-----------------------+
 2018-09-08 11:14:10 |         83.7 |                286.84 |
 2018-09-08 11:14:10 |         83.6 |                282.14 |
 2018-09-08 11:14:11 |         83.4 |                298.35 |
 2018-09-08 11:14:12 |         83.1 |                298.23 |
 2018-09-08 11:14:12 |         82.9 |                294.76 |
 2018-09-08 11:14:13 |         82.7 |                288.37 |

But when I run the following query:

SELECT * FROM rt_block_height 
INNER JOIN rt_differential_pressure 
ON rt_block_height.time = rt_differential_pressure.time;

This is what I get:

enter image description here

I do not understand what's going on here. It seems that some random additional rows were added, but I have no clue why its happening. There are only 6 rows in the original tables, but the queried table returns 10.

I don't know if this info would help, but this is a TimescaleDB Hypertable. And here is the source code for table creation:

CREATE TABLE IF NOT EXISTS public.rt_BLOCK_HEIGHT
(
"time" timestamp without time zone,
BLOCK_HEIGHT double precision
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public.rt_BLOCK_HEIGHT
OWNER to postgres;
SELECT create_hypertable('rt_BLOCK_HEIGHT', 'time');

Solution

  • Your time column is not unique.

    For 2018-09-08 11:14:10 timestamp you have:

    block_heightA = 83.7
    block_heightB = 83.6
    differential_pressureA = 286.84
    differential_pressureB = 282.14
    

    So when you do a join you'll get a Cartesian product of two 2-element sets:

    2018-09-08 11:14:10 block_heightA differential_pressureA
    2018-09-08 11:14:10 block_heightA differential_pressureB
    2018-09-08 11:14:10 block_heightB differential_pressureA
    2018-09-08 11:14:10 block_heightB differential_pressureB
    

    To get a result you want you have to decide what to do with duplicate values for each timestamp. For example you can calculate average value:

    SELECT
      grouped_block_height.time,
      avg_block_height,
      avg_differential_pressure
    FROM (
      SELECT time, avg(block_height) as avg_block_height
      FROM rt_block_height
      GROUP BY time
    ) as grouped_block_height 
    INNER JOIN (
      SELECT time, avg(differential_pressure) as avg_differential_pressure
      FROM rt_differential_pressure
      GROUP BY time
    ) as grouped_differential_pressure
    ON grouped_block_height.time = grouped_differential_pressure.time;