Search code examples
sqlsql-serversql-server-2014

sql server "bridging" data


I have the following mssql table:

+-------------------------+---+---+---+---+---+
|          date           | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+

I need to write a query that changes the 0s to 1s on column D if the state of D goes to zero for less than 5 minutes. In other words I need to "bridge" the two consecutive 1s at the extremities of the 0s if the state 0 is smaller than ten mins.

Is it possible to perform this operation using T-SQL (SQL SERVER 2014)? Thank you.

Example1:

+-------------------------+---+---+---+---+---+
|          date           | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+

The query should return

+-------------------------+---+---+---+---+---+
|          date           | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+

example 2:

+-------------------------+---+---+---+---+---+
|          date           | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+

the query should return

+-------------------------+---+---+---+---+---+
|          date           | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+

Solution

  • UPDATE - You probably got the idea from the original, but I used the wrong aggregate function some of the time; I think I have it untangled now.

    So... If a row's value is 0, but the time between the most recent preceding row with a 1 and the earliest subsequent row with a 1 is less than 10 minutes, you want to change that row's value to a 1. And in all other cases you leave the value as is. Right?

    The time of the most recent row with a 1 can be expressed as max(case when D = 1 then date end) over (order by date rows unbounded preceding).

    Likewise the time of the earliest subsequent row with a 1 can be expressed as min(case when D = 1 then date end) over (order by date rows unbounded following).

    Find the interval between them; if the dates are all aligned to an even minute, then you can simply use datediff:

    datediff(minute, max(case when D=1 then date end) over (order by date rows unbounded preceding),
                     min(case when D=1 then date end) over (order by date rows unbounded following))
    

    Then apply case logic.

    case when -- the above expression
              < 10 then 1 else D end