Search code examples
c#compressionjffs2

rtime compression used in jffs2


In a C# project i have to read a image of a jffs2 filesystem. One of the compression algorithms used in jffs2 is "rtime".

I did not found any information about this "rtime" compression method, except some line of C code on a linux cross reference homepage.

Is there somewhere a description how the decompression works or even better a .Net library or project for compression / decompression ?

Thank you

Peter


Solution

  • I wasn't able to find any real documetation either. So I had to resort to studying the source at http://www.codeforge.com/read/7074/compr_rtime.c__html

    Here's what I figured out. The Rtime compressed data is encoded in byte pairs. Where the first item is a data byte and the second item is a repeat counter.

    The Rtime algorithm iterates over the input data one byte at a time. It remembers the position of the last occurence of the current byte plus one. It calls this 'backpos'. So for example, if 'A' was last seen at position 0, it's backpos would be 1. The distance between the positions of the current byte and the backpos creates sort of a LZ77-ish sliding window. Rtime then goes on to compare the next input byte with the one at backpos. If they're identical, it increments the repeat counter by one. It keeps doing that until either a difference is detected or the end of the input is reached.

    Nuff said! Here's an example:

    Compression

    Given the following input:

      A  B  B  B  A  B  B  B  A
    

    1) The first input byte is an A, so Rtime stores an A as the first item of the first pair of the compressed result. Since it has never seen an A before backpos of A is zero.

     [A]  B  B  B  A  B  B  B  A
      ^  ^
      |__|  
       =? 
    

    Next it compares the byte at backpos and the next byte of the input (namely A and B). Since they're not the same, Rtime uses a repeat count of zero. So the first pair is (A, 0).

    2) The second input byte is a B. Rtime stores a B as the first item of the second pair of the result. Since it has never seen a B before, the backpos of B is also zero. It goes on to compare the byte at backpos with the next input byte:

      A [B] B  B  A  B  B  B  A
      ^     ^
      |_____|
        =?
    

    They're clearly not the equal. So the repeat count is again zero. Hence the second result pair is (B, 0).

    3) The third byte is again B. Rtime stores a B as the first item of the third pair of the result. Since it has encountered a B at position 1 in the previous iteration, backpos is now 2.

    It goes on to compare the byte at backpos (index 2) with the next input byte (index 3):

      A  B [B] B  A  B  B  B  A
            ^  ^
            |__|
             =?
    

    They're equal so the repeat count is set to 1.

      A  B [B] B  A  B  B  B  A
               ^  ^
               |__|
                =?
    

    Rtime compares the next two bytes, but they're different. So comparison stops an the third pair of the result is (B, 1)

    4) Since we had a successful repeat on the last iteration, the fourth input byte has already been covered and we can now continue with the fifth byte. Which is A. So the first item of the fourth result pair is A. Since it has encountered a A at position 0 in the first iteration, backpos is 1.

    At this point things become interesting. The next four comparisons will succeed and Rtime has to stop because it has reached the end of the input.

      A  B  B  B [A] B  B  B  A
         ^           ^
         |___________|
              =?
    
      A  B  B  B [A] B  B  B  A
            ^           ^
            |___________|
                 =?
    
      A  B  B  B [A] B  B  B  A
               ^           ^
               |___________|
                    =?
    
      A  B  B  B [A] B  B  B  A
                  ^           ^
                  |___________|
                       =?
    

    So the fourth and last result pair is (A, 4). All in all the compressed result is: (A, 0)(B, 0)(B, 1)(A, 4).

    This requires "only" 8 bytes whereas the original input was 9 Bytes.

    Decompression

    The decompression works exactly the same, but in reverse order. Given the above result:

    (A, 0)(B, 0)(B, 1)(A, 4).

    1) The first value of the first pair is A. So right away, we can put an A at the first position of the result. And since the repeat count is zero, this iteration is done. Backpos of A is 0.

     compressed:    [A, 0](B, 0)(B, 1)(A, 4)
    
     decompressed:  [A]
    

    2) The second pair contains B. We put a B at the second position of the result. And since the repeat count is zero, this iteration is also done. Backpos of B is 1.

     compressed:    (A, 0)[B, 0](B, 1)(A, 4)
    
     decompressed:   A [B]
    

    3) The third pair contains B. We put a B at the third position of the result. And the repeat count is 1 in this case. So we append the byte at backpos (which is still 1 at this point) to the end of the result. Backpos of B then set to 2.

     compressed:    (A, 0)(B, 0)[B, 1](A, 4)
    
     decompressed:   A  B [B]+B+
    

    4) The fourth pair contains A. We put a A at fifth position the result. The repeat count is 4. So we append four bytes starting at backpos (which is still 0 at this point) to the end of the decompressed stream.

     compressed:    (A, 0)(B, 0)(B, 1)[A, 4]
    
     decompressed:   A  B  B  B [A]+A++B++B++B+
    

    And we're done :)

    Hope this helps a little.

    Unless there is a lot of redundancy in the input, Rtime will not produce results that are smaller than the original. In the comments of the C implementation I read somewhere that Rtime is only used to further compress an already gzipped image. Presumably gzip contains very little redundancies. I wonder how often Rtime will actually yield improved results.