There are many barcode types, and sizes (lengths). Is there a common set of algorithms I can use to calculate a checksum for any barcode?
Yes, there is a very common checksum calculator algorithm. Various barcodes (and other digit entry schemes) use them to validate the scanner (or the human) entered all the digits properly. One of the first examples, and the most commonly understood checksum algorithm is the Luhn algorithm, which is known for its use on credit cards; but many variations on it exist. At their core, though, most use the same algorithm.
The common algorithm uses an array of weights corresponding to the positions of the digits, a modulo divisor, and a flag indicating either a "product add" or "product digit add" scheme.
int:computeChecksum(string:inputData, int[len(inputData)]: weightArray, int:divisor, Boolean:productDigitAdd)
{
If the number of digits in inputData is not equal to the number of elements in weightArray
raise an invalidWeightArray exception
endif
create an int:checksum and set it to zero
For int:position = each digit in inputData
int:digitProduct = value of inputData digit at [position] times the weightArray at [position]
If productDigitAdd then
for int:prodPosition = each digit in digitProduct
checksum = checksum + digitProduct[prodPosition]
end for
else
checksum = checksum + digitProduct
end if
end for
int:remainder = checksum modulo divided by divisor
return remainder
}
Check digit schemes are in common use throughout various industries, and barcode production and verification is just one. Here is a list of some of the common check digit schemes that can be used with this algorithm:
To validate a barcode with this algorithm, simply compare its output to zero. Non-zero values indicate failure.
A common mistake is to try to isolate the check digit, then compare the output of the routine to the check digit that was isolated. It's much simpler and safer to simply include the check digit in the overall loop and then compare the output to zero. This algorithm then continues to work for check digits that are embedded in positions other than the right end of the barcode, or where the check digit has a non-one weight.
You can also use the same algorithm to compute a check digit, as you would when generating a new barcode. In your input data, one of the digits will be reserved for the position of the check digit. This is commonly, but not always, the rightmost digit. Set that position of the input digit array to zero. Call the algorithm as you would to verify the inputData, then subtract the output of the algorithm from the modulo divisor. Replace the zero placeholder in the input digit array with the subtracted output.
The Luhn algorithm is often implemented very informally. A common way to describe it is "double every other digit, then add the digits of the sums, and the last digit must be the check digit." This simplistically works for 16 digit cards, but can lead to inflexible code which is often implemented using case statements to handle different length card numbers, etc.
ISBN numbers (and others) use a divisor of 11, but this can yield a "10" as an output check digit. A two digit value doesn't fit in the barcode position reserved for one digit. The ISBN specification says that a "10 shall be replaced by the letter 'X'". Other schemes I've encountered simply discard as impossible any numbers that yield two-digit check digit results.
For schemes like this or others that use non-numeric values, such as Vehicle Identification Numbers (VINs) on cars sold in the USA and Canada, a solid approach is to change this routine from accepting a "string" as the input data and instead converting it into accepting an array of input values. Then perform a translation step on the array before and after this routine to map the given symbols to the necessary values. Normally I keep one version of this routine specialized to accept a string type for input, as most programs deal with barcodes as strings.
I have encountered older check digit schemes that include negative values in the weight array. They get subtracted from the sum instead of added, but otherwise everything works the same way.
A very common optimization, especially in embedded devices such as barcode scanners, is to accept a weight array shorter than the number of digits, and extend it to the left by as many places as there are digits in the number. That way all UPC and EAN schemes, including UPC-A, UPC-E, EAN-8, and EAN-13, match to a common routine. The weights are then {3,1}, the divisor is 10, and PDA is false. Extending the weights must be done by anchoring the rightmost weight digit to the rightmost inputData digit, so the weights of 31 become
<-31
1313131313131
9780321146533
The same technique handles any repeating set of weights. With the Luhn algorithm for credit cards using a 21 weight, the same routine works for 13 digit Visa cards, 15 digit AmEx cards, and 16 digit Visa cards. But it then requires external validation of the length of the number of digits, so it doesn't save much.