I have linked databases from MAS (accounting system) in SQL Server 2008 R2. The way payroll records work in this database is that each quarter, an archive of the payroll tables are made, and a new current quarter table is made with any terminated employees removed. It does have unique employee IDs. The structure is identical in the archive and current versions.
So, in the 2nd quarter of the year, an employee who was terminated in the 1st quarter, is only in the 1st quarter copy of the database. An employee who worked in both the 1st and 2nd quarters is in the 1st quarter copy and the 2nd quarter copy. An employee who was hired in the 2nd quarter only occurs in the 2nd quarter.
I need to be able to build a set (temp table, view, whatever) to work with that includes the most recent record for each employee. This set will be the basis for queries like turnover, average wage by position, etc.
I can build a list of IDs quickly enough with a UNION
of the EmployeeID
column, but other columns change over time (pay rate, address, etc), so I cannot simply UNION
all of the columns, as I will have duplicates for some employees.
Is there a reasonably efficient way to merge the historical databases together, keeping only the most recent row for each employee? Having phrased the question that way, should I just make a series of MERGE
s, stepping back through the archive copies from newest to oldest, inserting when not matched? Is there a better way?
There are several ways you can approach this problem. Here is one:
Create a sample table schema and test data
SET NOCOUNT ON
IF OBJECT_ID('Employee', 'U') IS NOT NULL DROP TABLE Employee;
GO
IF OBJECT_ID('Employee2012Q1', 'U') IS NOT NULL DROP TABLE Employee2012Q1;
GO
IF OBJECT_ID('Employee2012Q2', 'U') IS NOT NULL DROP TABLE Employee2012Q2;
GO
CREATE TABLE dbo.Employee (
EmployeeID int IDENTITY(1,1) NOT NULL PRIMARY KEY
,EmployeeName varchar(50) NOT NULL
,StartDate smalldatetime NOT NULL
,EndDate smalldatetime NULL
,Salary decimal(18,2) NOT NULL
,Position varchar(50) NOT NULL
,HomeAddress varchar(200) NOT NULL
)
-- Employees hired
INSERT dbo.Employee SELECT 'John Doe', '01/01/2012', NULL, 50000, 'Developer', '12345 Main St, New York, NY'
INSERT dbo.Employee SELECT 'Jane Doe', '01/01/2012', NULL, 52000, 'Developer', '837 1st St, New York, NY'
-- Employee fired
UPDATE A
SET A.EndDate = '02/01/2012'
FROM dbo.Employee A
WHERE A.EmployeeName = 'John Doe'
-- Table archived
SELECT * INTO dbo.Employee2012Q1 FROM dbo.Employee
-- Remove fired employees
DELETE dbo.Employee WHERE EndDate IS NOT NULL
-- Employee hired
INSERT dbo.Employee SELECT 'Jack Flash', '04/01/2012', NULL, 73000, 'Manager', '9580 21st St, New York, NY'
-- Employee fired
UPDATE A
SET A.EndDate = '05/01/2012'
FROM dbo.Employee A
WHERE A.EmployeeName = 'Jack Flash'
-- Table archived
SELECT * INTO dbo.Employee2012Q2 FROM dbo.Employee
-- Remove fired employees
DELETE dbo.Employee WHERE EndDate IS NOT NULL
SET NOCOUNT OFF
GO
Create a view that merges all the employee tables
IF OBJECT_ID('EmployeeArchive', 'V') IS NOT NULL DROP VIEW dbo.EmployeeArchive
GO
CREATE VIEW dbo.EmployeeArchive
AS
SELECT CONVERT(smalldatetime, CONVERT(char(10), GETDATE(), 101)) as [ArchiveDate]
,EmployeeID
,EmployeeName
,StartDate
,EndDate
,Salary
,Position
,HomeAddress
FROM dbo.Employee
UNION ALL
SELECT CONVERT(smalldatetime, '03/31/2012') as [ArchiveDate]
,EmployeeID
,EmployeeName
,StartDate
,EndDate
,Salary
,Position
,HomeAddress
FROM dbo.Employee2012Q1
UNION ALL
SELECT CONVERT(smalldatetime, '06/30/2012') as [ArchiveDate]
,EmployeeID
,EmployeeName
,StartDate
,EndDate
,Salary
,Position
,HomeAddress
FROM dbo.Employee2012Q2
GO
Get the latest info for each employee
SELECT A.*
FROM dbo.EmployeeArchive A
JOIN (
SELECT EmployeeID, MAX(ArchiveDate) as MaxArchiveDate
FROM dbo.EmployeeArchive
GROUP BY EmployeeID
) B
ON A.EmployeeID = B.EmployeeID
AND A.ArchiveDate = B.MaxArchiveDate