Question is that simple: Would combining two low values with a common basic operation like addition, division, modulo, bit shift and others be combuted faster than same operation with greater values?
This would, as far as I can tell, require the CPU to keep track of the most significant bit (which I assume to be unlikely) but maybe there's something else in the business.
I am specifically asking because I often see rather low primes (e.g. 31
) used in some of Java's basic classes' hashCode()
method (e.g. String
and List
), which is surprising since greater values would most likely cause more diffusion (which is generally a good thing for hashfunctions).
Arithmetic
I do not think there are many pipelined processors (i.e. almost all except the very smallest) where a simple arithmetic instruction's cost would change with the value of a register or memory operand. This would make the design of the pipeline more complex and may be counterproductive in practice.
I could imagine that a very complex instruction (at least a division) that may take many cycles compared to the pipeline length could show such behaviour, since it likely introduces wait states anyway. Agner Fog writes that this is true "on AMD processors, but not on Intel processors."
If an operation cannot be computed in one instruction, like a multiplication of numbers that are larger than the native integer width, the implementation may well contain a "fast path" for cases e.g. the upper half of both operands is zero. A common example would be 64 bit multiplications on x86 32 bit architectures used by MSVC. Some smaller processors do not have instructions for division, sometimes not even for multiplication. The assembly for computing these operations may well terminate earlier for smaller operands. This effect would be felt more acutely on smaller architectures.
Immediate Value Encodings
For immediate values (constants) this may be different. For example there are RISC processors that allow encoding up to 16 bit immediates in a load/add-immediate instruction and require either two operations for loading a 32-bit word via load-upper-immediate + add-immediate or have to load the constant from program memory.
In CISC processors a large immediate value will likely take up more memory, which may reduce the number of instructions that can be fetched per cycle, or increase the number of cache misses.
In such a cases a smaller constant may be cheaper than a large one.
I am not sure if the encoding differences matter as much for Java, since most code will at least initially be distributed as Java bytecode, althoug a JIT-enabled JVM will translate the code to machine code and some library classes may have precompiled implementations. I do not know enough about Java bytecode to determine the consequences of constant size on it. From what I read it seems to me most constants are usually loaded via an index from constant pools and not directly encoded in the bytecode stream, so I would not expect a large difference here, if any.
Strenght reduction optimizations
For very expensive operations (relative to the processor) compilers and programmers often employ tricks to replace a hard computation by a simpler one that is valid for a constant, like in the multiplication example mentioned where a multiplication is replaced by a shift and a subtraction/addition.
In the example given (multiply by 31 vs. multiply by 65,537), I would not expect a difference. For other numbers there will be a difference, but it will not correlate perfectly with the magnitude of the number. Divisions by constants are also commonly replaced by an arcane sequence of multiplications and shifts.
See for example how gcc translates a division by 13.
On an x86 processors some multiplications by small constants can be replaced by load-effective-address instructions, but only for certain constants.
All in all I would expect this effect to depend very much on the processor architecture and the operations to be performed. Since Java is supposed to run almost everywhere, I think the library authors wanted their code to be efficient over a large range of processors, including small embedded processors, where operand size will play a larger role.