Search code examples
algorithmdata-structuresrun-length-encoding

Run length encoding only for sequence longer than K


Let's say that we are given an input stream/iterator of numbers (e.g. [2,3,1,1,1,1,1,5,6]) and some number K. We need to encode this into a list of objects. When there are K or more repeats of the same number, we encode with object of TypeA. In all other cases, we encode with TypeB. The classes are as follows:

class TypeA {
  int num;
  int count;
}

class TypeB {
  int[] nums;
}

For example, for input [2,3,1,1,1,1,1,5,6] and K=5. We return

[TypeB([2,3]), TypeA(1,5), TypeB([5,6])]

For input [2,3,1,1,1,1,5,6] and K = 5, we return

[TypeB([2,3,1,1,1,1,5,6])]

Decoding from the list of objects to a stream is really simple, but I am having trouble writing the encoder. Since we only encode when the repetition is K or greater, I don't think we can just keep a single list of numbers found so far then flush it into an object when there is a state change.


Solution

  • Seems that keeping a single list and flushing it when repetition cound reaches K (without repeated end) is right idea. Why do you think it is wrong?

    Consider extreme case like you second example - all stream contents should be packed in a single list, so we have to store this list all the time, if output elements are lists.

    If output behaves like stream too, we can dump elements immediately when value changes, and put ending ']' when needed.