Search code examples
algorithmrecursionprofilingpseudocode

Pseudocode: Recursively process start/stop times in list


Here's rather nebulous question.

I have a list of start/stop times from script executions, which may include nested script calls.

| script | start | stop |    duration    |            time executing           |
| ------ | ----- | ---- | -------------- | ----------------------------------- |
| A      |     1 |    8 | 7 i.e. (8-1)   | 3 i.e. ((8-1) - (6-2) - (5-4))      |
| ->B    |     2 |    6 | 4 i.e. (6-2)   | 3 i.e. ((6-2) - (5-4))              |
| ->->C  |     4 |    5 | 1 i.e. (5-4)   | 1                                   |
| D      |     9 |   10 | 1 i.e. (10-9)  | 1                                   |
| E      |    11 |   12 | 1 i.e. (12-11) | 1                                   |
| F      |     9 |   16 | 7 i.e. (16-9)  | 5 i.e. ((16-9) - (14-13) - (16-15)) |
| ->G    |    13 |   14 | 4 i.e. (14-13) | 1 i.e. (14-13)                      |
| ->H    |    15 |   16 | 1 i.e. (15-14) | 1 i.e. (16-15)                      |

Duration is the total time spent in a script. Time executing is the time spent in the script, but not in subscript.

So A calls B and B calls C. C takes 1 tick, B takes 4 but time executing is just 3, and A takes 7 ticks, but time executing is 3. F calls G and then H, so takes 7 ticks but time executing is only 5.

What I'm trying to wrap my ('flu-ridden) head around is a pseudo-code algorithm for step-wise or recursing through the list of times in order to generate the time executing value for each row.

Any help for this problem (or cure for common cold) gratefully received. :-)


Solution

  • If all time points are distinct, then script execution timespans are related to each other by an ordered tree: Given any pair of script execution timespans, either one strictly contains the other, or they don't overlap at all. This enables an easy recovery of parent-child relationships, if you wanted to do that.

    But if you just care about execution times, we don't even need that! :) There's a pretty simple algorithm that just sorts the starting and ending times and walks through the resulting array of "events", maintaining a stack of open "frames":

    1. Create an array of (time, scriptID) pairs, and insert the start time and end time of each script into it (i.e., insert two pairs per script into the same array).
    2. Sort the array by time.
    3. Create a stack of integer triples, and push a single (0, 0, 0) entry on it. (This is just a dummy entry to simplify later code.) Also create an array seen[] with a boolean flag per script ID, all initially set to false.
    4. Iterate through the sorted array of (time, scriptID) pairs:
      • Whenever you see a (time, scriptID) pair for a script ID that you have not seen before, that script is starting.
        • Set seen[scriptID] = true.
        • Push the triple (time, scriptID, 0) onto the stack. The final component, initially 0, will be used to accumulate the total duration spent in this script's "descendant" scripts.
      • Whenever you see a time for a script ID that you have seen before (because seen[scriptID] == true), that script is ending.
        • Pop the top (time, scriptID, descendantDuration) triple from the stack (note that the scriptID in this triple should match the scriptID in the pair at the current index of the array; if not, then somehow you have "intersecting" script timespans that could not correspond to any sequence of nested script runs).
        • The duration for this script ID is (as you already knew) time - startTime[scriptID].
        • Its execution time is duration - descendantDuration.
        • Record the time spent in this script and its descendants by adding its duration to the new top-of-stack's descendantDuration (i.e., third) field.

    That's all! For n script executions this will take O(n log n) time, because the sorting step takes that long (iterating over the array and performing the stack operations take just O(n)). Space usage is O(n).