A code is the assignment of a unique string of characters (a codeword) to each character in an alphabet.
A code in which the codewords contain only zeroes and ones is called a binary code.
All ASCII codewords have the same length. This ensures that an important property called the prefix property holds true for the ASCII code.
The encoding of a string of characters from an alphabet (the cleartext) is the concatenation of the codewords corresponding to the characters of the cleartext, in order, from left to right. A code is uniquely decodable if the encoding of every possible cleartext using that code is unique.
Based on the above information I was trying to do some exercises:
Considering the following matrix:
Code1 Code2 Code3 Code4
A 0 0 1 1
B 100 1 01 01
C 10 00 001 001
D 11 11 0001 000
The confusions:
codes
since they have a unique string of characters??? code 1 and code 2
are prefix free since they do not have equal length. Having said that, if you have a look at code 4
for alphabets D and C
it cosists of 3 digits. Would code 4
be considered prefix free too?code 3
the only uniquely decodable code?I think you have misunderstood the prefix property - it isn't mainly about length (but enforcing the same length n
on each code point will make the code prefix-free - you cannot have unique codes otherwise).
Rather, it is about uniquely being able to identify each code point so that a decoder greedily can take the first translation that matches. In the case of fixed length, the decoder knows that it has to read n
digits.
In the case of variable length code like Code1
, you don't know upon reading 10
if that can be translated to C
or if it is the first two digits of the three-digit B
- 10
is a prefix of 100
. The same holds true for Code2
: 0
is a prefix for 00
and 1
is a prefix of 11
.
Consider reading the sequence 100
one digit at a time:
Code1:
Read 1
; "1" does not match any code - Remember the 1 and continue.
Read 0
; "10" matches reduction "C" - or is this the beginning of a "B"? Darn!
Read 0
; Ok, this was either "CA" or "B" - but there is no way of knowing which one.
Hope this helps you forward!