Search code examples
sqlsql-servercursor

Cursor(?) in SQL: Iterate through table in WHERE-Statement


Currently I have a problem with SQL. I am working with SQL Server 2014.

Some background: I uploaded lots of weather data from .csv files into a table named dbo.import. Because the data is not adjusted/corrected so I chose mostly "text" as data type.

I have many duplicate rows identified and I wrote them in a new table named dbo.duplikate

INSERT INTO dbo.duplikate 
    SELECT 
       airportCode, CAST(DateUTC AS VARCHAR(25)), Count(*) 
   FROM 
       dbo.import
   GROUP BY 
       airportCode, CAST(DateUTC AS VARCHAR(25))
   HAVING 
       Count(*) > 1 

(Maybe there is a way available how I can write all rows in the new table through a similar statement so I don't need a cursor(?)?)

Now I try to write all duplicates, with all rows (in total 16), in a new table.

My idea was iterate through my table dbo.duplikate row for row and compare this in my WHERE clause.

Something like this (pseudo-code):

INSERT INTO dbo.newTable
   SELECT * 
   FROM dbo.import
   WHERE dbo.import.DateUTC LIKE dbo.duplikate.DateUTC[i] 
     AND dbo.import.airportCode LIKE dbo.duplikate.airportCode[i]

Through google I find cursors. But I am not sure if this is the right way. Every try failed because I don't know how to assign the rows to my WHERE clause ...

Regards, Julian


Solution

  • There's nothing in your question that makes me believe you need a cursor. Cursors can be very useful, but performance is almost always better when you perform actions on sets of data rather than looping.

    In SQL Server you can use a ROW_NUMBER() or a windowed COUNT() to identify duplicates without losing the detail as you do in your query:

    ;with cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25)) ORDER BY airportCode) RN
                          ,COUNT(*) OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25))) Dup_CT
                  FROM dbo.import)
    SELECT *
    FROM cte
    

    From there you can add in WHERE criteria to suit your needs, for example if you just want to put all records that have a duplicate into a new table you'd use WHERE Dup_CT > 1. You can also DELETE from the cte if you want to remove duplicate records:

    ;with cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25)) ORDER BY airportCode) RN
                          ,COUNT(*) OVER(PARTITION BY airportCode, CAST(DateUTC AS VARCHAR(25))) Dup_CT
                  FROM dbo.import)
    DELETE FROM cte
    WHERE RN > 1
    

    You can adjust the ORDER BY clause in the ROW_NUMBER() function to specify which record you'll keep if performing the above DELETE.