java matlab error-correction reed-solomon galois-field

Porting MATLAB's Reed Solomon function to Java

I have implemented a simple RS error correction scheme in MATLAB with RS(160,80). The basic process is as follows:

I generate a message with length 80 and 8 bits per symbol, and I generate an RS code of length 160.
After generating the RS code, I add/XOR another Galois field of code length 160. (this field contains only 00000000 and 00000001). This is to simulate adding errors in the scheme. This generates my noisy code.
Now, I take another GF field (of a similar type as above [00000000 00000001]) which has < 40 symbols different from the one I used to create the noise. I add/XOR this to the generated noisy code in the previous step.
Finally, I run it through the RS decoder which retrieves my original message.

My MATLAB function:

function RSKeyExchange(dev1, dev2)
    dev1_fp = zeros(1,160);
    dev2_fp = zeros(1,160);

    for i=1:160
        dev1_fp(i) = str2double(dev1.key(i));
        dev2_fp(i) = str2double(dev2.key(i));
    end

    n = 160;        % total code length
    k = 80;         % message length - actual message for syncronisation
    m = 8;          % order (2^m)

    % therefore, RS decoder can detect t = n-k = 80 errors
    % and correct t/2 = 40 errors

    msg = gf(randi([1 255],1 ,80), m);
    code = rsenc(msg, n, k);

    noise_add = gf(dev1_fp, 8);
    noise_remove = gf(dev2_fp, 8);

    noisy_code = code + noise_add;

    % noisy_code is now transmitted from dev1 to dev2 (sender to receiver)

    decommited_code = noisy_code + noise_remove;
    [rxcode, cnumerr] = rsdec(decommited_code, n, k);

    fprintf('Number of errors corrected: %d\n', cnumerr);
end

Now I have been searching for ways to port this to Java. I have looked up libraries, but I'm not sure how to exactly port my particular use case.

Zxing - Takes only QR and Aztec codes as input
Backblaze - JavaReedSolomon - Fixes code erasures, which isn't the kind of error I'm producing, and input is in the form of files(seriously confused as to what is happening here)
Simple RS error correction example - Feels a bit more legible, but takes only strings as inputs. I feel I can modify it to suit my use case, but I'm not sure how to go about adding noise. I'm not sure how to go about generating an RS(160, 80) code via this implementation, nor can I tell how to generate custom GF fields to add noise.

Any help would appreciated (especially if you could point me to an implementation that would suit my use-case, or help modify one of the resources above which would work)

Thanks!

Solution

I looked at the "simple RS" example "GF28" Java code. The decoder appears to handle erasures only (one of the inputs is an array of bad indices). It's using GF(256) based on hex 11B = x^8 + x^4 + x^3 + x + 1, the same as AES encryption. It is somewhat unusual choice since the lowest "primitive" is 3 (all numbers other than zero can be considered to be a power of 3), rather than the fields where the "primitive" is 2. The field polynomial is defined via PX, so it can be easily changed. I'm not sure why it generates tables dynamically instead of generating them during initialization, using a second set of true/false tables to indicate if specific table values have been generated.

I have a C RS demo program for 8 bit GF(256) fields. It's interactive, you select a field (there are 30 of them), whether to use a self reciprocal polynomial (it's usually not used), if the first consecutive root of the generator polynomial is 1 (if no is specified, then the first consecutive root is the "primitive"), number of parity bytes, and number of data bytes. It handles both erasures and errors, and since it's a demo program, it includes code for the 3 main types of decoder algorithms Peterson matrix algorithm, Sugiyama's adaptation of extended Euclid algorithm, and Berlekamp Massey algorithm, and also Forney algorithm to generate error values. The interactive part could be replaced with code that selects all these parameters. For your situation, change the define for MAXPAR from 20 to 80 (maximum number of parities). The user input is via stdin, so a text input file can be used to run the demo.

http://rcgldr.net/misc/eccdemo8.zip

In my demo, to generate a code word, the user enters values (user option "E" or "C"), then enters "N" to encode. To generate errors, the user enters values at specific locations. To generate erasures, the user uses the "P" option to enter erasure values at specific locations. "F" is used to fix (correct) the code word.

The wiki article includes the logic for the 3 main types of decoders. It also explains the Fourier transform, but the Fourier transform requires using one of the other decoders in order to generate the error polynomial, and isn't practical.

https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Error_correction_algorithms

Based on your comment I looked at the Zxing library. The method descriptions are a bit terse and use the wrong terminology. Here's a redo of the description:

GenericGF(primitive, size, b)
    primitive - wrong term, this is the field polynomial

    size - The description is "size of the field", but that is
    determined by the polynomial, a n+1 bit polynomial is used
    for an n bit field. My guess is this is the size of the
    generator polynomial, perhaps the number of roots.

    b - alpha raised to the b power is the first consecutive root.
    alpha is the primitive for the particular field. Unlike the
    comment, b == 0 is about as common as b == 1.

https://en.wikipedia.org/wiki/Primitive_element_(finite_field)

encode(int[] toEncode, int ecBytes)
    toEncode - toEncode includes space for the "parity" data used
    to encode the message.

    ecByte - the number of "parity" elements at the end of to Encode.
    Why is it named ecBytes when it is an array of ints?

decode(int[] received, int twoS )
    received is an array of concatenated encoded messages?
    Since encode generates a single encoded message, then it would
    make more sense for decode to decode a single encoded message.

    twoS - the number of encoded messages?

    Based on the references, it is using the Sugiyama adaptation
    of extended Euclid algorithm. A side benefit is that the
    error evaluator polynomial is generated during the process.

The wrapper comment also has an error. The maximum size for a GF(256) codeword is 255 bytes, since error locations are limited to the range 0 to 254. This is because locations are related to powers, and any number raised to the power 255 in GF(256) is that same number.

Note that a mis-correction is possible without triggering the library's exception. However with 80 parity bytes, it would take 41 errors for this to be possible. The mis-correction would involve generating a set of 40 locations and values that produce a valid codeword, but one that differs from the original codeword by 81 or more bytes (81 or more errors). However with a shortened codeword of 160 bytes, all 40 of the generated locations would have to be in the range 0 to 159. Assuming random, uniform distribution of the calculated locations, the odds of them mis-correcting would be ((160!)/((160-40)!))/(255^40) != 3.86 × 10^-11 . Mis-correction is an issue if using full length (255) codewords, or smaller number of parities.

I made a java RS ecc GF(256) example. It doesn't handle erasures (which requires modifying syndromes for known erasure locations, then merging the error locators with the erasure locators). It uses Euclid algorithm to calculate error locator and error evaluator polynomials. It's commented, but you may need to look at Wiki RS article to better understand it. The code deals with Java signed bytes by converting them to unsigned integers using byte&0xff as needed. I set the parameters to 80 message bytes, 80 parity bytes to match the question's example. Although both add and subtract are xor for GF(256), the code uses separate functions for add and subtract so that the code corresponds to the algorithm as it would apply to GF(p) where p is a prime number, such as GF(257). The calculation for the "derivative" of Lambda would also be different for GF(p), this is explained here:

https://en.wikipedia.org/wiki/Forney_algorithm#Formal_derivative

Link to java rsecc GF(256) example:

http://rcgldr.net/misc/rsecc8.zip

The interactive C version is easier to follow since it's interactive, displays some of the internal calculations (defines can be changed to enable / disable what is displayed), and the user can try different variations.