Search code examples
c++giflzw

LZW encoding and the GIF file format


I'm trying to understand how to create a .gif file in C++. So far I think I understand everything except how the LZW encoding works. This is the file that I have generated with labels:

47 49 46 38 39 61 -header
0A 00 01 00 91 00 -logical screen descriptor
00 00 FF 00 FF 00 -color table [green,red,yellow,black]
00 FF FF 00 00 00
00 21 F9 04 00 00 -graphics control extension
00 00 00 2C 00 00 -image descriptor
00 00 0A 00 01 00 -(10 pixels wide x 1 pixel tall)
00 02 04 8A 05 00 -encoded image
3B                -terminator

And here it is again without labels for copy/paste purposes: 47 49 46 38 39 61 05 00 04 00 91 00 00 00 FF 00 FF 00 00 FF FF 00 00 00 00 21 F9 04 00 00 00 00 00 2C 00 00 00 00 0A 00 01 00 00 02 04 8A 05 00 3B

I am having a lot of trouble understanding how 02 04 8A 05 translates to the image yryryggyry. I know the 02 is the minimum code size, and the 04 is the length of the image block, and I think I have identified the clear and EOI codes, but I don't understand the code inbetween.

8A       05
10001010 00000101
100|01010 00000|101
 ^      ????     ^
 clear code      EOI code

So far I have gotten the most information from the .gif specification: http://www.w3.org/Graphics/GIF/spec-gif89a.txt

And this website was also helpful: http://www.matthewflickinger.com/lab/whatsinagif/lzw_image_data.asp

Thanks

EDIT*

I watched the Youtube video linked in the comments and encoded the image manually for the color stream "yryryggyry":

Color table-012=gry

2   1   2   1   2   0   0   2   1   2
010 001 010 001 010 000 000 010 001 010

current next output dict
010     001  010    21 6
001     010  001    12 7
010     001  -      -
001     010  110    121 8
010     000  010    212 9
000     000  000    00  10
000     010  1010   002 11
010     001  -      -
001     010  110    -
010     -    010    -

outputs-100 010 001 110 010 000 1010 110 010 101

01010101 4th 55
10101000 3rd A8
00101100 2nd 2C
01010100 1st 54

Code-54 2C A8 55

I must have made a mistake because this code produces the image "yr" instead of "yryryggyry"

I am going to try to redo the work to see if I get a different answer


Solution

  • Perhaps you've made a mistake at line 4: 001 010 110 121 8

    At line 3, "010" is ignored, so you have to add it to line 4 first. And at line 4, it comes to:

    current  next  output    dict
    010 001  010   010 001   212   8
    

    Here is my solution(also manually created):

    LZW for yryryggyry

    Update:

    Finally figured out why:

    When you are encoding the data, you increase your code size as soon as your write out the code equal to 2^(current code size)-1. If you are decoding from codes to indexes, you need to increase your code size as soon as you add the code value that is equal to 2^(current code size)-1 to your code table. That is, the next time you grab the next section of bits, you grab one more.

    The writer means that you should increase your word size when you are about to output 2^(current code size) - 1, but there might be a different explanation which also seems to be reasonable:

    When you add #(2 ^ current code size) item to the code table, next output should increase its word size.

    Also correct in the writer's example, and this is the explanation I prefer.

    Here's your example ("yryryggyry"):

    output sequence:
        #4 #2 #1 #6 #2 #0 #0 #8 #5
    

    When you are about to output #6, you add "yry" into your code table, which has an index of #8.

    Since 8 = 2 ^ current word size

    (current word size = 2(original) + 1(reserved) = 3)
    

    Next output should increase word size,so #2 becomes a word of 4 bits.

    And the final output sequence is:

    4   100
    2   010
    1   001
    6   110
    2   0010
    0   0000
    0   0000
    8   1000
    5   0101
    

    After encoding they become

    54 2C 00 58
    

    So the data block is

    02            -minimum word size     
    04            -data length
    54 2c 00 58   -data
    00            -data block terminator