Search code examples
bashbinarydecodeqr-codeencode

encode / decode binary data in a qr-code using qrencode and zbarimg in bash


I have some binary data that I want to encode in a qr-code and then be able to decode, all of that in bash. After a search, it looks like I should use qrencode for encoding, and zbarimg for decoding. After a bit of troubleshooting, I still do not manage to decode what I had encoded

Any idea why? Currently the closest I am to a solution is:

$ dd if=/dev/urandom bs=10 count=1 status=none > data.bin
$ xxd data.bin
00000000: b255 f625 1cf7 a051 3d07                 .U.%...Q=.
$ cat data.bin | qrencode -l H -8 -o data.png
$ zbarimg --raw --quiet data.png | xxd
00000000: c2b2 55c3 b625 1cc3 b7c2 a051 3d07 0a    ..U..%.....Q=..

It looks like I am not very far, but something is still off.

Edit 1: a possible fix is to use base64 wrapping, as explained in the answer by @leagris.

Edit 2: using base64 encoding doubles the size of the message. The reason why I use binary in the first place is to be size-efficient so I would like to avoid that. De-accepting the answer by @leagris as I would like to have it 'full binary', sorry.

Edit 3: as of 2020-03-03 it looks like this is a well-known issue of zbarimg and that a pull request to fix this is on its way:

https://github.com/mchehab/zbar/pull/64

Edit 4: if you know of another command-line tool on linux that is able to decrypt qr-codes with binary content, please feel free to let me know.


Solution

  • My pull request has been applied. ZBar version 0.23.1 and newer will be able to decode binary QR codes:

    zbarimg --raw --oneshot -Sbinary qr.png
    zbarcam --raw --oneshot -Sbinary
    

    QR codes have several encoding modes. The simplest, most commonly used and widely supported is the alphanumeric encoding which is suitable for simple text. The byte encoding allows storing arbitrary 8 bit data in the QR code. The ECI mode is like 8 bit mode but with additional metadata that tells the decoder which character set to use in order to decode the binary data back to text. Here's a list of known ECI values and the character encodings they represent. For example, when a decoder encounters an ECI 26 mode QR code it knows to decode the binary data as UTF-8.

    The qrencode tool is doing its job correctly: it is creating a byte mode QR code with the data you gave it as its contents. The problem is most decoders were explicitly designed to handle textual data first and foremost. The retrieval of the raw binary data is a detail at best.

    Current versions of the zbar library will treat byte mode QR codes as if they were unknown ECI mode QR codes. If a character set isn't specified, it will attempt to guess the encoding and convert the data to it. This will most likely mangle the binary data. As you noted, I brought this up in issue #55 and after some time managed to submit a pull request to improve this. Should it be merged, the library will have binary decoder option that will instruct decoders to return the raw binary data without converting it. Another source of data mangling is the tendency of the command line tools to append line feeds to the output. I submitted a pull request to allow users to prevent this and it has already been merged.

    The zxing-cpp library will also try to guess the encoding of binary data in QR codes. The comments suggest that the QR code specification requires that decoders pick an encoding without specifying a default or allowing them to return the raw binary data. In order to make that possible, the binary data is copied to a byte array which can be accessed through the DecoderResult. When I have some free time, I intend to write zximg and zxcam tools with binary decoding support for this library.

    It's always possible to encode binary data as base 64 and encode the result as an alphanumeric QR code. However, base 64 encoding will increase the size of the data and the alphanumeric mode doesn't allow use of the QR code's maximum capacity. In a comment, you mentioned what you intend to use binary QR codes for:

    I want to have a package to effectively dump some gpg stuff in a format that makes recovery easy.

    That is the exact use case I'm attempting to enable with my pull request: an easier-to-restore paperkey. 4096 bit RSA secret keys can be directly QR encoded in 8 bit mode but not in alphanumeric mode as base 64-encoded data.