Search code examples
matrixbinarydecodeprefix

Confused about prefix free and uniquely decodable with respect to binary code


A code is the assignment of a unique string of characters (a codeword) to each character in an alphabet.

A code in which the codewords contain only zeroes and ones is called a binary code.

All ASCII codewords have the same length. This ensures that an important property called the prefix property holds true for the ASCII code.

The encoding of a string of characters from an alphabet (the cleartext) is the concatenation of the codewords corresponding to the characters of the cleartext, in order, from left to right. A code is uniquely decodable if the encoding of every possible cleartext using that code is unique.

Based on the above information I was trying to do some exercises:

Considering the following matrix:

  Code1   Code2  Code3  Code4

A  0       0        1      1

B  100     1       01     01

C  10      00     001    001

D  11      11    0001    000

The confusions:

  1. Are all the above assignment considered as codes since they have a unique string of characters???
  2. I understand that code 1 and code 2 are prefix free since they do not have equal length. Having said that, if you have a look at code 4 for alphabets D and C it cosists of 3 digits. Would code 4 be considered prefix free too?
  3. Is code 3 the only uniquely decodable code?

Solution

  • I think you have misunderstood the prefix property - it isn't mainly about length (but enforcing the same length n on each code point will make the code prefix-free - you cannot have unique codes otherwise).

    Rather, it is about uniquely being able to identify each code point so that a decoder greedily can take the first translation that matches. In the case of fixed length, the decoder knows that it has to read n digits.

    In the case of variable length code like Code1, you don't know upon reading 10 if that can be translated to C or if it is the first two digits of the three-digit B - 10 is a prefix of 100. The same holds true for Code2: 0 is a prefix for 00 and 1 is a prefix of 11.

    Consider reading the sequence 100 one digit at a time:

    Code1:
    Read 1
    ; "1" does not match any code -  Remember the 1 and continue.
    Read 0
    ; "10" matches reduction "C" - or is this the beginning of a "B"? Darn!
    Read 0
    ; Ok, this was either "CA" or "B" - but there is no way of knowing which one.
    

    Hope this helps you forward!