Search code examples
sqlpostgresqlinner-join

Returning row with max value on INNER JOIN


I'm not the strongest in SQL but I've been managing..... I have a situation where one of my INNER JOINS is returning multiple rows and I'm trying to find a way to grab the row with the highest value using the column called "certainty".

The statement involves two tables, dim_asset and dim_asset_operating_system. These tables are linked by a common field called asset_id and a field called os_description on dim_asset and description on dim_asset_operating_system.

Code:

SELECT
    A.ip_address,
    A.os_type,
    A.os_vendor,
    A.os_family,
    A.os_name,
    A.os_version,
    A.os_description,
    A.os_system,
    B.certainty
FROM 
    dim_asset A
INNER JOIN 
    dim_asset_operating_system B ON A.asset_id = B.asset_id 
                                 AND A.os_description = B.description
WHERE 
    ip_address = '192.168.65.100';

Output:

   ip_address   | os_type | os_vendor | os_family | os_name | os_version |   os_description   |  os_system   | certainty
----------------+---------+-----------+-----------+---------+------------+--------------------+--------------+-----------
 192.168.65.100 |         | Ubuntu    | Linux     | Linux   | 16.04      | Ubuntu Linux 16.04 | Ubuntu Linux |         1
 192.168.65.100 |         | Ubuntu    | Linux     | Linux   | 16.04      | Ubuntu Linux 16.04 | Ubuntu Linux |         1
 192.168.65.100 |         | Ubuntu    | Linux     | Linux   | 16.04      | Ubuntu Linux 16.04 | Ubuntu Linux |      0.85
(3 rows)

Ideally... I'd like the following returned:

   ip_address   | os_type | os_vendor | os_family | os_name | os_version |   os_description   |  os_system   | certainty
----------------+---------+-----------+-----------+---------+------------+--------------------+--------------+-----------
 192.168.65.100 |         | Ubuntu    | Linux     | Linux   | 16.04      | Ubuntu Linux 16.04 | Ubuntu Linux |         1

In my sample above, this is just for one record... ideally I need to scale this to the 10's of millions of returned rows.

Thank you for any help you can provide.


Solution

  • If you wish to get the highest certainty you just need to ORDER BY it and constraint the result set to FETCH FIST ROW ONLY:

    SELECT * FROM dim_asset A
    INNER JOIN dim_asset_operating_system B ON A.asset_id = B.asset_id AND
                                               A.os_description = B.description
    WHERE ip_address = '192.168.65.100'
    ORDER BY certainty DESC
    FETCH FIRST ROW ONLY;
    

    However, if multiple records share the same highest certainty (a tie), this query you will return just one of them, which in many use cases isn't a desirable behaviour. If you're interested in all records that share the highest value use FETCH FIRST ROW WITH TIES (Introduced in PostgreSQL 13):

    SELECT * FROM dim_asset A
    INNER JOIN dim_asset_operating_system B ON A.asset_id = B.asset_id AND
                                               A.os_description = B.description
    WHERE ip_address = '192.168.65.100'
    ORDER BY certainty DESC
    FETCH FIRST ROW WITH TIES;
    

    Demo: db<>fiddle