I am currently reading about the DEFLATE method for encoding/decoding data. I understand that the process is composed of two parts:
i. Replace duplicate information (within a specified window) with a reference back to the previous identical piece.
ii. Use Huffman coding to reduce the size of the most commonly occurring symbols.
I have a question with regards to (i). DEFLATE uses LZ77 which, based on a size window, searches through the information and, if it finds any duplicate information, replaces it with a "pointer". That makes perfect sense.
However, when decoding using LZ77 how does DEFLATE recognize a pointer? (Pointers are length-distance pairs; how can you discern if it's a pointer or just a number that was present in the initial data?)
Reference: http://en.wikipedia.org/wiki/DEFLATE#Duplicate_string_elimination
It's recommended to read the Deflate RFC 1951 specification, which is much more precise, and answer such questions.
What you'll see in => 3.2.5. Compressed blocks (length and distance codes)
"the literal and length alphabets are merged into a single alphabet"
which means that, by simply retrieving the next symbol, you immediately know if it is a literal (0..255), or a match length (257..285), or even an end of block (256). In case of a match length, a reference (offset) must be decoded too. Offset are encoded using a separate tree.