Search code examples
sqlsql-serversql-server-2008-r2greatest-n-per-group

How do I get a record from the most recent archive table that an ID occurs in


I have linked databases from MAS (accounting system) in SQL Server 2008 R2. The way payroll records work in this database is that each quarter, an archive of the payroll tables are made, and a new current quarter table is made with any terminated employees removed. It does have unique employee IDs. The structure is identical in the archive and current versions.

So, in the 2nd quarter of the year, an employee who was terminated in the 1st quarter, is only in the 1st quarter copy of the database. An employee who worked in both the 1st and 2nd quarters is in the 1st quarter copy and the 2nd quarter copy. An employee who was hired in the 2nd quarter only occurs in the 2nd quarter.

I need to be able to build a set (temp table, view, whatever) to work with that includes the most recent record for each employee. This set will be the basis for queries like turnover, average wage by position, etc.

I can build a list of IDs quickly enough with a UNION of the EmployeeID column, but other columns change over time (pay rate, address, etc), so I cannot simply UNION all of the columns, as I will have duplicates for some employees.

Is there a reasonably efficient way to merge the historical databases together, keeping only the most recent row for each employee? Having phrased the question that way, should I just make a series of MERGEs, stepping back through the archive copies from newest to oldest, inserting when not matched? Is there a better way?


Solution

  • There are several ways you can approach this problem. Here is one:

    • Create a view to merge the tables, adding an appropriate archive date column to each
    • Create a sub query against the view to Group on EmployeeID and get the MAX archive date
    • Use the sub query to join with the view and get the latest record for each employee

    Create a sample table schema and test data

        SET NOCOUNT ON
    
        IF OBJECT_ID('Employee', 'U') IS NOT NULL DROP TABLE Employee;
        GO
        IF OBJECT_ID('Employee2012Q1', 'U') IS NOT NULL DROP TABLE Employee2012Q1;
        GO
        IF OBJECT_ID('Employee2012Q2', 'U') IS NOT NULL DROP TABLE Employee2012Q2;
        GO
    
        CREATE TABLE dbo.Employee (
            EmployeeID      int IDENTITY(1,1)   NOT NULL    PRIMARY KEY
           ,EmployeeName    varchar(50)         NOT NULL
           ,StartDate       smalldatetime       NOT NULL
           ,EndDate         smalldatetime       NULL
           ,Salary          decimal(18,2)       NOT NULL
           ,Position        varchar(50)         NOT NULL
           ,HomeAddress     varchar(200)        NOT NULL
        )
    
        -- Employees hired
        INSERT dbo.Employee SELECT 'John Doe', '01/01/2012', NULL, 50000, 'Developer', '12345 Main St, New York, NY'
        INSERT dbo.Employee SELECT 'Jane Doe', '01/01/2012', NULL, 52000, 'Developer', '837 1st St, New York, NY'
    
        -- Employee fired
        UPDATE A
           SET A.EndDate = '02/01/2012'
          FROM dbo.Employee A
         WHERE A.EmployeeName = 'John Doe'
    
        -- Table archived
        SELECT * INTO dbo.Employee2012Q1 FROM dbo.Employee
    
        --  Remove fired employees
        DELETE dbo.Employee WHERE EndDate IS NOT NULL
    
        -- Employee hired
        INSERT dbo.Employee SELECT 'Jack Flash', '04/01/2012', NULL, 73000, 'Manager', '9580 21st St, New York, NY'
    
        -- Employee fired
        UPDATE A
           SET A.EndDate = '05/01/2012'
          FROM dbo.Employee A
         WHERE A.EmployeeName = 'Jack Flash'
    
        -- Table archived
        SELECT * INTO dbo.Employee2012Q2 FROM dbo.Employee
    
        --  Remove fired employees
        DELETE dbo.Employee WHERE EndDate IS NOT NULL
    
        SET NOCOUNT OFF
        GO
    

    Create a view that merges all the employee tables

        IF OBJECT_ID('EmployeeArchive', 'V') IS NOT NULL DROP VIEW dbo.EmployeeArchive
        GO
        CREATE VIEW dbo.EmployeeArchive
        AS
        SELECT CONVERT(smalldatetime, CONVERT(char(10), GETDATE(), 101)) as [ArchiveDate]
              ,EmployeeID
              ,EmployeeName
              ,StartDate
              ,EndDate
              ,Salary
              ,Position
              ,HomeAddress
         FROM dbo.Employee 
        UNION ALL
        SELECT CONVERT(smalldatetime, '03/31/2012') as [ArchiveDate]
              ,EmployeeID
              ,EmployeeName
              ,StartDate
              ,EndDate
              ,Salary
              ,Position
              ,HomeAddress
          FROM dbo.Employee2012Q1 
        UNION ALL
        SELECT CONVERT(smalldatetime, '06/30/2012') as [ArchiveDate]
              ,EmployeeID
              ,EmployeeName
              ,StartDate
              ,EndDate
              ,Salary
              ,Position
              ,HomeAddress
          FROM dbo.Employee2012Q2 
        GO
    

    Get the latest info for each employee

        SELECT A.*
          FROM dbo.EmployeeArchive A
          JOIN (
                SELECT EmployeeID, MAX(ArchiveDate) as MaxArchiveDate
                  FROM dbo.EmployeeArchive
                 GROUP BY EmployeeID
               ) B
            ON A.EmployeeID  = B.EmployeeID
           AND A.ArchiveDate = B.MaxArchiveDate