I am looking for help understanding what the overhead (# of additional symbols that need to be transmitted) is associated with error correcting codes (like reed-solomon) as the error rate that it is designed to handle increases. For instance if a process needs to be able to correct for 1 wrong symbol per 500 what is that overhead compared to say 1 in 100.
I realize that in practice that complex schemes are often used (CDs use overlapping sets of encoding, etc), but I trying to get a sense of the simplest case first. Is the relationship between the overhead and the error rate approximately linear? Quadratic? exponential? I realize Big O notation isn't the right tool here, so please forgive me if this is not how the math community usually formulates the problem.
I would be thrilled for an answer that calculated overhead associated with an the following error rates for reed-solomon encoding:
1 symbol error per 10000 1 symbol error per 2000 1 symbol error per 1000 1 symbol error per 500 1 symbol error per 50
Look for Hamming bound. There you will read that a code which has Hamming distance d, i.e. which can detect d−1 single digit errors and correct t=⌊(d−1)/2⌋ single digit errors, has a size of at most
different code words. Now for the simplest case, assume binary codes. So assume q=2 as the alphabet size and n=2b as the number of code words of b bits. Then the base-two logarithm of the above bound will give you the maximal number of bits you can actually communicate, and the code rate would be the ratio between these two bit counts.
When you express the code rate in terms of the error tolerance, you might want to investigate the limit for large code words. For this to make sense, you'd turn the absolute number of errors into an error rate, so that the number of errors is proportional to the code length. Then it should be possible to obtain useful limits.