Search code examples
c#barcodebarcode-scannerchecksum

What code do I need to calculate a checksum for any barcode, and to verify/validate existing checksums?


There are many barcode types, and sizes (lengths). Is there a common set of algorithms I can use to calculate a checksum for any barcode?


Solution

  • Yes, there is a very common checksum calculator algorithm. Various barcodes (and other digit entry schemes) use them to validate the scanner (or the human) entered all the digits properly. One of the first examples, and the most commonly understood checksum algorithm is the Luhn algorithm, which is known for its use on credit cards; but many variations on it exist. At their core, though, most use the same algorithm.

    The common algorithm uses an array of weights corresponding to the positions of the digits, a modulo divisor, and a flag indicating either a "product add" or "product digit add" scheme.

    Pseudocode

    int:computeChecksum(string:inputData, int[len(inputData)]: weightArray, int:divisor, Boolean:productDigitAdd)
    {
    
        If the number of digits in inputData is not equal to the number of elements in weightArray
            raise an invalidWeightArray exception
        endif
    
        create an int:checksum and set it to zero
    
        For int:position = each digit in inputData
            int:digitProduct = value of inputData digit at [position] times the weightArray at [position] 
            If productDigitAdd then
                for int:prodPosition = each digit in digitProduct
                    checksum = checksum + digitProduct[prodPosition]
                end for
            else
                checksum = checksum + digitProduct
            end if
        end for
    
        int:remainder = checksum modulo divided by divisor
        return remainder
    }
    

    Common Schemes

    Check digit schemes are in common use throughout various industries, and barcode production and verification is just one. Here is a list of some of the common check digit schemes that can be used with this algorithm:

    • UPC-A barcode: the weights are twelve digits of {3,1,3,1,3,1,3,1,3,1,3,1}, the divisor is 10, and the productDigitAdd value is false.
    • EAN-13 barcode: the weights are {1,3,1,3,1,3,1,3,1,3,1,3,1}, the divisor is 10, and the productDigitAdd value is false.
    • Code 128 barcode: the digits are the value of the code of the bars, not the values the barcode contains (e.g. the barcode symbol value of 65 represents the ASCII letter 'A' in modes A and B, but is the pair of digits '33' in mode C); the weights are {1, 2, 3, 4, 5 ... through the total number of barcode symbols, 1}, where 1 is the check digit weight at the right end; the divisor is 103; and the productDigitAdd value is false.
    • POSTNET barcode: the weights are {1,1,1,1,1,1,1,1,1,1}, the divisor is 10, and the productDigitAdd value is false.
    • 16-digit credit cards: the weights are an array of sixteen digits, {2,1,2,1 ... 2,1}, the divisor is 10, and the productDigitAdd value is true. This is the Luhn algorithm.
    • ISBN (book numbers): the weights are {10,9,8,7,6,5,4,3,2,1}, the divisor is 11, and the productDigitAdd field is false.
    • Routing Transit Number (on bank checks): the weights are {3,7,1,3,7,1,3,7,1}, the divisor is 10, and the productDigitAdd field is false.

    Validation

    To validate a barcode with this algorithm, simply compare its output to zero. Non-zero values indicate failure.

    A common mistake is to try to isolate the check digit, then compare the output of the routine to the check digit that was isolated. It's much simpler and safer to simply include the check digit in the overall loop and then compare the output to zero. This algorithm then continues to work for check digits that are embedded in positions other than the right end of the barcode, or where the check digit has a non-one weight.

    Generation

    You can also use the same algorithm to compute a check digit, as you would when generating a new barcode. In your input data, one of the digits will be reserved for the position of the check digit. This is commonly, but not always, the rightmost digit. Set that position of the input digit array to zero. Call the algorithm as you would to verify the inputData, then subtract the output of the algorithm from the modulo divisor. Replace the zero placeholder in the input digit array with the subtracted output.

    Special cases

    The Luhn algorithm is often implemented very informally. A common way to describe it is "double every other digit, then add the digits of the sums, and the last digit must be the check digit." This simplistically works for 16 digit cards, but can lead to inflexible code which is often implemented using case statements to handle different length card numbers, etc.

    ISBN numbers (and others) use a divisor of 11, but this can yield a "10" as an output check digit. A two digit value doesn't fit in the barcode position reserved for one digit. The ISBN specification says that a "10 shall be replaced by the letter 'X'". Other schemes I've encountered simply discard as impossible any numbers that yield two-digit check digit results.

    For schemes like this or others that use non-numeric values, such as Vehicle Identification Numbers (VINs) on cars sold in the USA and Canada, a solid approach is to change this routine from accepting a "string" as the input data and instead converting it into accepting an array of input values. Then perform a translation step on the array before and after this routine to map the given symbols to the necessary values. Normally I keep one version of this routine specialized to accept a string type for input, as most programs deal with barcodes as strings.

    I have encountered older check digit schemes that include negative values in the weight array. They get subtracted from the sum instead of added, but otherwise everything works the same way.

    A very common optimization, especially in embedded devices such as barcode scanners, is to accept a weight array shorter than the number of digits, and extend it to the left by as many places as there are digits in the number. That way all UPC and EAN schemes, including UPC-A, UPC-E, EAN-8, and EAN-13, match to a common routine. The weights are then {3,1}, the divisor is 10, and PDA is false. Extending the weights must be done by anchoring the rightmost weight digit to the rightmost inputData digit, so the weights of 31 become

             <-31
    1313131313131
    9780321146533
    

    The same technique handles any repeating set of weights. With the Luhn algorithm for credit cards using a 21 weight, the same routine works for 13 digit Visa cards, 15 digit AmEx cards, and 16 digit Visa cards. But it then requires external validation of the length of the number of digits, so it doesn't save much.