Search code examples
qr-codenumericdecodingbitstring

How do I decode a bit string representing a n-digit number, where the 'n' digits were grouped and encoded in bit words of different lengths?


Background

I am trying to decode a bitstring from a QR-Code, where the data was encoded in numeric mode. According to this QR-Code tutorial: https://www.thonky.com/qr-code-tutorial/numeric-mode-encoding (which references the standard), the numeric encoding shall be as follows:

Split the n-digit number into 3-digit groups and encode each group into a

  • 10-bit word, if the group has no leading zero
  • 7-bit word, if the group has 1 leading zero,
  • 4-bit word, if the group has 2 leading zeros.

If n is not a multiple of three, the last group will be 1- or 2-digits long The rules above also apply for this last group.

Encoding Example:

Take the 14-digit number: 12300101234567

Split it into 3-digit groups and convert then to binary numbers:

  • group 1: 123 > 0001111011 (10 bits)
  • group 2: 001 > 0001 ( 4 bits)
  • group 3: 012 > 0001100 ( 7 bits)
  • group 4: 345 > 0101011001 (10 bits)
  • group 5: 67 > 1000011 ( 7 bits)

Therefore the 14-digit number is encoded into the following 38 bits: 00011110110001000110001010110011000011

Decoding (what I have so far):

The QR-Code gives me the number of digits that were encoded, so taking the exams above I know n = 14.

Following the encoding rules calculate:

  • int(14/3) = 4
  • 14 % 3 = 2

thus there are

  • 4x 3-digit numbers
  • 1x 2-digit number

Therefore the last bit word is 7 bits long. 1000011 and encodes the number 67

The remaining 31 bits encode the other 12 digits. 0001111011000100011000101011001

How do 4-, 7- and 10-bit words fit into 31 bit?

Try combinations:

  • First try: 10+10+10+1? No, the encoding does not allow a 1-bit word
  • Next try: 10+10+4+7? Yes.

Therefore the bitstring is built from the following bit words.

  • 2x 10-bit words
  • 1x 7-bit word
  • 1x 4-bit word

Problem:

I don't know the order of the bit words.

There are two more restrictions that result from the encoding rules:

  • The highest 1-digit number that can be encoded into a 4-bit word is 9 > 1001.
  • The highest 2-digit number that can encoded into a 7-bit word is 99 > 1100011.

This can help exclude several orders when iterating over all possible orders. But it does not exclude all possibilities. I am not able to get the correct order of the bit words.

I appreciate your help, thanks.


Solution

  • After locating a copy of the actual ISO/IEC 18004:2015 QR code standard, I have found that the part about leading zeros is not in the actual standard.

    A 3-digit group is encoded in 10 bits, regardless of how many leading zeros it has. Sources claiming otherwise are wrong.

    The standard even uses an example with a leading zero: 01234567 is broken up into 012 345 67, and the 012 is encoded as 0000001100, not as 0001100.