Search code examples
sql-serversql-server-2005

How do I prevent duplicates from a SQL join?


I have the following tables:

customer:

id customer_number company firstname lastname account_manager_email email comments terms tax_id_number lead_source default_catalog credit_limit
99453 C00123456 Serenity Inc. Malcom Reynolds [email protected] [email protected] The cap'n 1 NULL NULL 12345 NULL
99468 C00123456 Serenity Inc. Zoe Washburne [email protected] [email protected] NULL 1 NULL NULL NULL NULL
99960 C00123456 Serenity Inc. Hoban Washburne [email protected] [email protected] NULL 1 NULL NULL NULL NULL
100088 C00123456 Serenity Inc. Inara Serra [email protected] [email protected] NULL 1 NULL NULL 12345 NULL

customer_address:

id company street city state_abbreviation postcode telephone firstname lastname created_at
133996 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Malcom Reynolds 2017-05-08 12:45:53.000
134452 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-09 10:19:59.000
134961 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 987654321 2017-05-09 10:19:59.000
134962 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 192837465 2017-05-09 10:19:59.000
133995 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-09 10:19:59.000
133669 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 123456789 2017-05-18 10:29:42.000
133667 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-18 07:56:45.000
133666 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-31 07:56:46.000
133626 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-06-16 12:45:08.000
133668 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF PO 2017-06-16 12:45:08.000

I'm running this query to generate a CSV that I can use to import the data into another system:

     SELECT '"' + CAST(c.customer_number AS VARCHAR) + '"' AS 'Customer Number',
            '"' + CAST(c.company AS VARCHAR) + '"' AS 'Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Company Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Company City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Company State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Company Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Company Phone',
            '"' + CAST(c.firstname AS VARCHAR) + '"' AS 'Contact First Name',
            '"' + CAST(c.lastname AS VARCHAR) + '"' AS 'Contact Last Name',
            '"' + CAST(c.account_manager_email AS VARCHAR) + '"' AS 'Account Manager Email',
            '"' + CAST(a.company AS VARCHAR) + '"' AS 'Contact Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Contact Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Contact City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Contact State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Contact Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Contact Phone',
            '"' + CAST(c.email AS VARCHAR) + '"' AS 'Contact Email',
            '"' + CAST(c.comments AS VARCHAR) + '"' AS 'Internal Notes',
            '"' + CAST(c.terms AS VARCHAR) + '"' AS 'Terms',
            '"' + CAST(c.tax_id_number AS VARCHAR) + '"' AS 'Tax ID (US)',
            '"' + CAST(c.lead_source AS VARCHAR) + '"' AS 'Lead Source',
            '"' + CAST(c.default_catalog AS VARCHAR) + '"' AS 'Catalog',
            '"' + CAST(c.credit_limit AS VARCHAR) + '"' AS 'Credit Limit'
       FROM customer c,
            customer_address a
      WHERE c.customer_number = 'C00123456'
        AND a.company = c.company
   ORDER BY c.customer_number,
            c.created_at;

However, when I run that query I get 40 rows back, ten for each entry in customer. I've tried different join types but the results are the same.

A lot of this is legacy data and so the only thing that I appear to reliably be able to join on is the company name ("Serenity, Inc.")

I actually need two versions of this output. The first is a single row for each company containing the entry with the oldest created_at value from the customer table. The second are all the other records.

Note: This is on SQL Server 2005 (I know... an upgrade is planned but I have to get through this first)


Solution

  • To remove your duplicates you need to enumerate each row and assign a value based on an ordering criteria.

    You could do this easily using a cte - I believe they are available in SQL Server 2005, I am of course unable to check.

    with c as(
      select *, Row_Number() over(partition by customer_number order by id) rn
      from customer
    ),
    ca as (
      select *, Row_Number() over(partition by company order by created_at) rn
      from customer_address
    )
    select <columns>
    from c join ca on c.company=ca.company
    where c.rn=1 and ca.rn=1 and c.customer_number='C00123456'