Given multiple static data streams, how to design an optimal scheduling policy?

I am trying to find an optimal scheduling policy when multiple static data streams are given. For example,

stream 0: 000---00--000

stream 1: 1111--11--11-11

stream 2: 22-222---2-2

...

Here, "1" means valid data processed in one cycle, "-" means one cycle stall. A dispatcher can only process valid data from one stream in each cycle. When one stream stalls, the dispatcher can always switch to another stream with valid data waiting. Or the dispatcher can make stream switching decisions with other scheduling policies.

For example, a strict round robin policy(follow order stream 0 1 2, even though the current selected stream is stalled or null) is shown below:

DispatchTimeLine: 012012012-1201201-01201201xx1

stream 0 time line:   0xx0xx0---xx0xx0--0xx0xx0xxxx  

stream 1 time line:   x1xx1xx1xx1--1xx1--1xx1-x1xx1 

stream 2 time line:   xx2xx2-x2xx2xx2---xx2-x2xxxxx

In this case, the dispatcher takes 29 cycles to process all the data. If using a greedy policy, a total of 26 cycles can be achieved. ("x" means waiting or idle.)

If the goal is the best performance (least number of total cycles), how to derive an optimal policy for the dispatcher? Is there any theoretical proof that is available for a more general case?

Below is a general description of this problem:

Assume there are N data point streams (0 to N-1), and each data point stream (Di) has its own static data point pattern (such as "valid, valid, valid...stall, stall,...", i.e., an interleaved sequence of "valid" and "stall" points. The number of "valid" is Vi, and the number of "stall" is Si. The temporal ordering within each stream is fixed.) There is no specific constraint for the value of N, Vi, and Si, and the data pattern for each data stream are also fixed during scheduling. In other words, there can be multiple, or a single data stream, and the composition and length of the data stream are not constrained.

Regarding the dispatcher, it can only process one data point from one stream in a cycle. When stream Di is stalled, the dispatcher can select other streams to go, and the stalling time of stream Di can be hidden when the dispatcher is processing other streams. Once the stalling is done for stream Di, it becomes eligible to be selected again. When all streams are in their stalls in a cycle, no data can be processed in this cycle, and a NOP will occupy this cycle in the dispatcher's time line.

The only goal here is the least total processing time (or say highest performance) of the dispatcher. There is no other requirements here, such as fairness among streams.

In intuition, I imagine greedy policy can be optimal in some cases, such as in the numerical example above. But I am not sure whether this policy is the best in all situations. I wonder if it can be proved theoretically? Or is there a systematic method that can aid the process of finding an optimal scheduling policy?

Solution

I'm going to conjecture that finding an optimal schedule, in general, is NP-hard. It seems as though there ought to be a reduction from some partition problem that exploits the trickiness of scheduling when the available tasks are arriving very close to the throughput of the dispatcher. Certainly the various greedy algorithms that I've tried haven't worked out (what I didn't realize earlier is that each stream has a "buffer" of only one element).

There's a fixed-parameter algorithm for k streams each of length n that runs in time O(n^k) via dynamic programming. This is not useful to you when k = 48, so I won't describe it. I would suggest constraint programming instead, which has a chance to exploit the features of your particular input.

Here's one way to formulate your problem as a constraint program. For the j^th item in the i^th stream, let t_ij be the integer time at which processing for that item is finished. To force a stall of length k between the j^th and (j + 1)^th items in the i^th stream, constrain t_i(j+1) - t_ij > k. For all i, constrain t_i1 > 0. Put an all-different constraint on every single one of the variables and minimize their maximum.

There are various software packages that solve constraint programs. They use intelligent constraint propagation and branching strategies and should be much, much, MUCH faster than brute force. I don't feel as though I have enough experience to make specific recommendations, so find one that looks easy to use and reasonably mature.