I have been reading and researching Error Correction in binary data, but I can't seem to get a solid grasp on the steps used. I have read https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction and it's related articles, and I have an understanding of the mathematics involved but I want to have a thorough understand the whole process.
Can anyone explain to me or link me to an explanation that will tell me, step by step, how I go from having a binary representation of, say, the string "Hello, how are you?" (01001000011001010110110001101100011011110010110000100000011010000110111101110111001000000110000101110010011001010010000001111001011011110111010100111111
) to a blob of binary with enough error correction information to recover up to 1 in 10 garbled bits and then interpret the result and be able to determine which bits are wrong? I can understand both lines of code or math, so any help would be welcome. Thank you!
From the wiki article, a systematic encoding process treats a message as a polynomial with finite field coefficients, appends the required number of zeroes (multiplies by x^t), divides by the generator polynomial, then subtracts the remainder from those appended zeroed bytes. This means the encoded message polynomial is an exact multiple of the generator polynomial.
If the original encoded message polynomial is evaluated at the roots of the generator polynomial (a^1, a^2, ...), the result is a set of zeroes. If a received encoded message with errors is evaluated at the roots of the generator polynomial, the original terms drop out and the result is a set of "syndromes" that are a function of the error locations and error values. The wiki article then explains a key equation between an error locator polynomial (lambda) and the "syndromes".
The wiki article then explains 4 methods that can be used convert the syndromes into error locations and error values, but Berlekamp-Massey and Euclid are the two main methods used.
As noted in the wiki article, each error requires two parity terms in order to correct an error (since there are two unknowns per error, the location of the error, and the value of the error). For 10% error handling, then 20% of the message would have to be parity. If a message has 30 bytes, then 20 bytes of parity are added to produce a 50 byte encoded message, of which up to 10 bytes in error can be corrected.
The wiki article includes a link to this NASA tutorial:
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19900019023_1990019023.pdf
You may find this much briefer tutorial I wrote a bit simpler to follow.
http://rcgldr.net/misc/ecc.pdf
There's some stuff in my tutorial that you won't need, like solving a quadratic or cubic equations (legacy stuff left over that I didn't remove), and it's missing stuff like extended Euclid or Berlekamp Massey decoding, but it's enough to get started.