c floating-point floating-accuracy floating-point-precision

Upper bound for number of digits of big integer in different base

I want to create a big integer from string representation and to do that efficiently I need an upper bound on the number of digits in the target base to avoid reallocating memory.

Example:

A 640 bit number has 640 digits in base 2, but only ten digits in base 2^64, so I will have to allocate ten 64 bit integers to hold the result.

The function I am currently using is:

int get_num_digits_in_different_base(int n_digits, double src_base, double dst_base){
    return ceil(n_digits*log(src_base)/log(dst_base));
}

Where src_base is in {2, ..., 10 + 26} and dst_base is in {2^8, 2^16, 2^32, 2^64}.

I am not sure if the result will always be correctly rounded though. log2 would be easier to reason about, but I read that older versions of Microsoft Visual C++ do not support that function. It could be emulated like log2(x) = log(x)/log(2) but now I am back where I started.

GMP probably implements a function to do base conversion, but I may not read the source or else I might get GPL cancer so I can not do that.

Solution

I imagine speed is of some concern, or else you could just try the floating point-based estimate and adjust if it turned out to be too small. In that case, one can sacrifice tightness of the estimate for speed.

In the following, let dst_base be 2^w, src_base be b, and n_digits be n.

Let k(b,w)=max {j | b^j < 2^w}. This represents the largest power of b that is guaranteed to fit within a w-wide binary (non-negative) integer. Because of the relatively small number of source and destination bases, these values can be precomputed and looked-up in a table, but mathematically k(b,w)=[w log 2/log b] (where [.] denotes the integer part.)

For a given n let m=ceil( n / k(b,w) ). Then the maximum number of dst_base digits required to hold a number less than b^n is:

ceil(log (b^n-1)/log (2^w)) ≤ ceil(log (b^n) / log (2^w) ) ≤ ceil( m . log (b^k(b,w)) / log (2^w) ) ≤ m.

In short, if you precalculate the k(b,w) values, you can quickly get an upper bound (which is not tight!) by dividing n by k, rounding up.