I am trying to develop a query to identify clients with multiple client IDs. Client ID is column 1, columns 19 and 20 hold unique person identifiers, you can think of them as some kind of Social Security number(lets call them SSN.19 and SSN.20)
My first thought was to look for every row with matching SSNs but different client ID, like so:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE a."19"=b."19" and a."20"=b."20" and a."1"<b."1" and a."1"='Value';
However, it returned 0 rows. To check if the table had indeed no duplicates I performed the following queries:
select distinct "19" as hk, count("19") as dl from "clients_1" group by "19" order by dl desc;
select distinct "20" as hk, count("20") as dl from "clients_1" group by "20" order by dl desc;
It turned out that on this specific table no client had an SSN19 associated with them, but there were several repeated SSN20 in the table. So I performed the following query to find clients with multiple IDs:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE a."20"=b."20" and a."1"<b."1" and a."7"='Value';
This one returned a table with several clients that had different IDs but the same SSN20. Afterwards I began to think of a way I could generalize this query for cases where clients had both SSN19 and SSN20 or just one of them, so I thought of the following:
SELECT
a."5", a."3"||' '||a."4" as "3+4", a."19", a."20", a."21", a."1",
b."1", a."8"
FROM
"clients_1" AS a,
"clients_1" AS b
WHERE ((a."19"=b."19" and a."19" is not null) or (a."20"=b."20" and a."20" is not null)) and a."1"<b."1" and a."7"='Value';
However, this query takes forever, I had the query run for around 20min and nothing came back, whereas the previous attempt took around 2min at the most. What am I doing wrong?
I believe something like this will be better performing and give you more flexibility:
SELECT
*
FROM
(
SELECT
COUNT(*) OVER (PARTITION BY "19") as 19_matches,
COUNT(*) OVER (PARTITION BY "20") as 20_matches,
COUNT(*) OVER (PARTITION BY "19","20") as both_matches,
clients_1.*
FROM
clients_1
WHERE "7" = 'value'
)
WHERE 19_matches > 1 OR 20_matches > 1 or both_matches > 1
ORDER BY "19","20"