I'm not the strongest in SQL but I've been managing..... I have a situation where one of my INNER JOINS is returning multiple rows and I'm trying to find a way to grab the row with the highest value using the column called "certainty".
The statement involves two tables, dim_asset and dim_asset_operating_system. These tables are linked by a common field called asset_id and a field called os_description on dim_asset and description on dim_asset_operating_system.
Code:
SELECT
A.ip_address,
A.os_type,
A.os_vendor,
A.os_family,
A.os_name,
A.os_version,
A.os_description,
A.os_system,
B.certainty
FROM
dim_asset A
INNER JOIN
dim_asset_operating_system B ON A.asset_id = B.asset_id
AND A.os_description = B.description
WHERE
ip_address = '192.168.65.100';
Output:
ip_address | os_type | os_vendor | os_family | os_name | os_version | os_description | os_system | certainty
----------------+---------+-----------+-----------+---------+------------+--------------------+--------------+-----------
192.168.65.100 | | Ubuntu | Linux | Linux | 16.04 | Ubuntu Linux 16.04 | Ubuntu Linux | 1
192.168.65.100 | | Ubuntu | Linux | Linux | 16.04 | Ubuntu Linux 16.04 | Ubuntu Linux | 1
192.168.65.100 | | Ubuntu | Linux | Linux | 16.04 | Ubuntu Linux 16.04 | Ubuntu Linux | 0.85
(3 rows)
Ideally... I'd like the following returned:
ip_address | os_type | os_vendor | os_family | os_name | os_version | os_description | os_system | certainty
----------------+---------+-----------+-----------+---------+------------+--------------------+--------------+-----------
192.168.65.100 | | Ubuntu | Linux | Linux | 16.04 | Ubuntu Linux 16.04 | Ubuntu Linux | 1
In my sample above, this is just for one record... ideally I need to scale this to the 10's of millions of returned rows.
Thank you for any help you can provide.
If you wish to get the highest certainty you just need to ORDER BY
it and constraint the result set to FETCH FIST ROW ONLY
:
SELECT * FROM dim_asset A
INNER JOIN dim_asset_operating_system B ON A.asset_id = B.asset_id AND
A.os_description = B.description
WHERE ip_address = '192.168.65.100'
ORDER BY certainty DESC
FETCH FIRST ROW ONLY;
However, if multiple records share the same highest certainty (a tie), this query you will return just one of them, which in many use cases isn't a desirable behaviour. If you're interested in all records that share the highest value use FETCH FIRST ROW WITH TIES
(Introduced in PostgreSQL 13):
SELECT * FROM dim_asset A
INNER JOIN dim_asset_operating_system B ON A.asset_id = B.asset_id AND
A.os_description = B.description
WHERE ip_address = '192.168.65.100'
ORDER BY certainty DESC
FETCH FIRST ROW WITH TIES;
Demo: db<>fiddle