I am working on a script to analyze some data contained in thousands of tables on a SQL Server 2008 database.
For simplicity sakes, the tables can be broken down into groups of 4-8 semi-related tables. By semi-related I mean that they are data collections for the same item but they do not have any actual SQL relationship. Each table consists of a date-time stamp (datetime2
data type), value (can be a bit
, int
, or float
depending on the particular item), and some other columns that are currently not of interest. The date-time stamp is set for every 15 minutes (on the quarter hour) within a few seconds; however, not all of the data is recorded precisely at the same time...
For example:
TABLE1:
TIMESTAMP VALUE
2014-11-27 07:15:00.390 1
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 0
2014-11-27 08:00:00.327 0
TABLE2:
TIMESTAMP VALUE
2014-11-19 08:00:07.880 0
2014-11-19 08:15:06.867 0.0979999974370003
2014-11-19 08:30:08.593 0.0979999974370003
2014-11-19 08:45:07.397 0.0979999974370003
TABLE3
TIMESTAMP VALUE
2014-11-27 07:15:00.390 0
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 1
2014-11-27 08:00:00.327 1
As you can see, not all of the tables will start with the same quarterly TIMESTAMP
. Basically, what I am after is a query that will return the VALUE for each of the 3 tables for every 15 minute interval starting with the earliest TIMESTAMP
out of the 3 tables. For the example given, I'd want to start at 2014-11-27 07:15 (don't care about seconds... thus, would need to allow for the timestamp to be +- 1 minute or so). Returning NULL for the value when there is no record for the particular TIMESTAMP is ok. So, the query for my listed example would return something like:
TIMESTAMP VALUE1 VALUE2 VALUE3
2014-11-27 07:15 1 NULL 0
2014-11-27 07:30 0 NULL 0
2014-11-27 07:45 0 NULL 1
2014-11-27 08:00 0 NULL 1
...
2014-11-19 08:00 0 0 1
2014-11-19 08:15 0 0.0979999974370003 0
2014-11-19 08:30 0 0.0979999974370003 0
2014-11-19 08:45 0 0.0979999974370003 0
I hope this makes sense. Any help/pointers/guidance will be appreciated.
The first thing I would do is normalize the timestamps to the minute. You can do this with an update to the existing column
UPDATE TABLENAME
SET TIMESTAMP = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
or in a new column
ALTER TABLE TABLENAME ADD COLUMN NORMTIME DATETIME;
UPDATE TABLENAME
SET NORMTIME = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
For details on flooring dates this see this post: Floor a date in SQL server
The next step is to make a table that has all of the timestamps (normalized) that you expect to see -- that is every 15 -- one per row. Lets call this table TIME_PERIOD and the column EVENT_TIME for my examples (call it whatever you want).
There are many ways to make such a table recursive CTE, ROW_NUMBER(), even brute force. I leave that part up to you.
Now the problem is simple select with left joins and a filter for valid values like this:
SELECT TP.EVENT_TIME, a.VALUE as VALUE1, b.VALUE as VALUE2, c.VALUE as VALUE3
FROM TIME_PERIOD TP
LEFT JOIN TABLE1 a ON a.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE2 b ON b.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE3 c ON c.[TIMESTAMP] = TP.EVENT_TIME
WHERE COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) is not null
ORDER BY TP.EVENT_TIME DESC
The where might get a little more complex if they are different types so you can always use this (which is not as good as coalesce but will always work):
WHERE a.[TIMESTAMP] IS NOT NULL OR
b.[TIMESTAMP] IS NOT NULL OR
c.[TIMESTAMP] IS NOT NULL