Search code examples
sqlpostgresqlgroup-bysql-updatelimit

Postgresql query: update status of limit number of records based on group size


I have a postgresql table contains a list of email addresses. The table has three columns, Email, EmailServer (e.g., gmail.com, outlook.com, msn.com, and yahoo.com.ca etc.), and Valid (boolean).

Now, I want to group those emails by EmailServer and then update the first 3 records of each large group (count >=6) as Valid = true while leaving the rest of each group as Valid = false.

I failed to get the wanted output by below query:

UPDATE public."EmailContacts"   
SET "Valid"=true
WHERE "EmailServer" IN (
    SELECT "EmailServer"
    FROM public."EmailContacts"
    GROUP by "EmailServer"
    HAVING count(*) >=6
    LIMIT 5)

Please help to modify so as to get the expected results. Would be greatly appreciated for any kind of your help!


Solution

  • WITH major_servers AS (
        SELECT email_server
        FROM email_address
        GROUP by email_server
        HAVING count(*) >=6
    ),
    enumerated_emails AS (
        SELECT email,
               email_server,
               row_number() OVER (PARTITION BY email_server ORDER BY email) AS row_number --TODO:: ORDER BY email - attention
        FROM email_address
        WHERE email_server IN (SELECT email_server FROM major_servers)
    )
    UPDATE email_address
    SET valid = true
    WHERE email IN (SELECT email
                    FROM enumerated_emails ee
                    WHERE ee.row_number <= 3);
    
    • The first query major_servers finds major groups where more than 5 email servers exist.
    • The second query enumerated_emails enumerates emails by their natural order (see a TODO comment, I think you should choose another ORDER BY criteria) which belong to major groups using window function row_number().
    • The last query updates the first 3 rows in each major server group.

    Find the sql-fiddle here.