Search code examples
integerdivisionbigintegerinteger-divisionprime-factoring

How do I test whether VERY large integers have prime factors that are only 2s, 3s, and 7s?


I am dealing with VERY large integers, on the order of 30000 digits in base 10. Most of the digits are 1s, but a few (less than 20 digits) are not ones.

I only care about these numbers if their prime factorizations SOLELY consist of 2s, 3s, and 7s. I know for sure that the number cannot be divisible by 5, since it cannot end in a 0 or 5, so I don't need to check that.

I want to be able to take this VERY large integer, if it's divisible by 2, I divide it by 2 repeatedly until the result is no longer divisible by 2. Then, if the result is divisible by 3, I divide it by 3 repeatedly until that result is no longer divisible by 3. Finally, if the result is divisible by 7, I divide it by 7 repeatedly until that result is no longer divisible by 7. If the final result is 1, then I know that the original number's prime factors are only 2s, 3s, and 7s, which is the property that I am looking for.


My questions:

  1. Is this a good way to test whether a large number has prime factors that are only 2s, 3s, and 7s? If not, is there a more efficient way to test this?
  2. What is the least costly way to store these VERY large (~30000 digits) integers? I only care about testing whether these large integers have prime factors other than 2, 3, 7. I'm open to using Java, Python, C++, C, whichever of these common languages will give me the most efficient possible search.

Things I found from searching:

  • Many languages have a "BigInteger" type that is may only be limited by the available memory. This sounds good, but I think ~30000 digits in base 10 corresponds to a lot of memory for that integer. I'm not sure whether there are more clever ways to deal with the problem.

  • Mathematicians call the property I am testing for "7-smoothness". A positive integer is 7-smooth if it has no prime factors greater than 7. I already know the numbers I am going to test are not divisible by 5, so I'm really only looking for 2s, 3s, and 7s.


I'm more of a mathematician than a computer scientist, so I'm quite new to this kind of computational optimization.

Any help is appreciated :)


Solution

  • Interesting question. Lets call the number in question X.

    Is this a good way to test whether a large number has prime factors that are only 2s, 3s, and 7s?

    The method you outlined seems to be best, both very fast and easy to understand. Another method is to compute the least common multiple of X and 2*3*7. If the result is X, then X is composed entirely of powers of 2, 3, 7. However, I believe the computational complexity of the LCM method is inferior to your method.

    What is the least costly way to store these VERY large (~30000 digits) integers.

    30000 digits is bigger than numbers typically used in cryptography, but is otherwise not exceptional on any system I'm aware of. It would only require about 13k bytes of storage in base-2, hardly even worth thinking about.

    The "best" way to store the integers is unfortunately somewhat more complicated because it depends on where the data comes from and where its going. Big integers are typically stored in a binary format, and also depends on how you define cost. This format can be called base 2, but more accurately it is usually stored as a sequence of base 2w "words" or "limbs", where w is a convenient size for doing the basic arithmetic operations. For example, w is often 32 or 64. Sometime it is a few bits less than 32 or 64, so that a w by w multiply plus a large number of carries can fit in one word of a 64 or 128-bit type.

    This kind of base-2 representation is the only one I've seen used in general big integer packages*. It makes computing the largest power of 2 that divides X particularly efficient. If you write your own code you can choose other bases for storage, for example in your case you might choose base 5 or base 7 or base 2*3*7=42.

    The choice of base should also consider how the data is coming in, because converting the data from the input source to the big integer source can be expensive if the input is provided in something other than the base of the underlying package.

    * There are also big decimal packages which may store the data in base 10 instead of base 2, a convenience for determining the powers of 5 or 10 in X. You'd have to examine the details of the big decimal implementation to determine if it's actually faster, and besides you are not interested in powers of 5 or 10.