I am looking for a way to identify event names names that co-occur: i.e., correlate event names with the same start (startts) and end (endts) times: the events are exactly concurrent (partial overlap is not a feature of this data base).
toy dataframe
+------------------+
|name startts endts|
| A 02:20 02:23 |
| A 02:23 02:25 |
| A 02:27 02:28 |
| B 02:20 02:23 |
| B 02:23 02:25 |
| B 02:25 02:27 |
| C 02:27 02:28 |
| D 02:27 02:28 |
| D 02:28 02:31 |
| E 02:27 02:28 |
| E 02:29 02:31 |
+------------------+
Ideal output:
+---------------------------+
|combination| count |
+---------------------------+
| AB | 2 |
| AC | 1 |
| AE | 1 |
| AD | 1 |
| BC | 0 |
| BD | 0 |
| BE | 0 |
| CE | 0 |
+-----------+---------------+
Naturally, I would have tried a loop but I recognize mysql server is not optimal for this.
What I've tried is generating a temporary table by selecting for distinct name and startts and endts combinations and then doing a left join on the table itself (selecting name).
Thank you.
I understand this as a self-join, aggregation, and a conditional count of matching intervals:
select t1.name name1, t2.name name2,
sum(t1.startts = t2.startts and t1.endts = t2.endts) cnt
from mytable t1
inner join mytable t2 on t2.name > t1.name
group by t1.name, t2.name
order by t1.name, t2.name
name1 | name2 | cnt :---- | :---- | --: A | B | 2 A | C | 1 A | D | 1 A | E | 1 B | C | 0 B | D | 0 B | E | 0 C | D | 1 C | E | 1 D | E | 1
Note that, if you are looking for a count of overlapping intervals, all you have to do is change the sum()
to:
sum(t1.startts <= t2.endts and t1.endts >= t2.startts) cnt