floating-point cpu-architecture cpu-registers

Is there any architecture that uses the same register space for scalar integer and floating point operations?

Most architectures I've seen that support native scalar hardware FP support shove them off into a completely separate register space, separate from the main set of registers.

X86's legacy x87 FPU uses a partially separate floating-point "stack machine" (read: basically a fixed-size 8-item ring buffer) with registers st(0) through st(7) to index each item. This is probably the most different of the popular ones. It can only interact with other registers through load/store to memory, or by sending compare results to EFLAGS. (286 fnstsw ax, and i686 fcomi).
FPU-enabled ARM has a separate FP register space that works similarly to its integer space. The primary difference is a separate instruction set specialized for floating-point, but even the idioms mostly align.
MIPS is somewhere in between, in that floating point is technically done through a coprocessor (at least visibly) and it has slightly different rules surrounding usage (like doubles using two floating-point registers rather than single extended registers), but they otherwise work fairly similarly to ARM.
X86's newer SSE scalar instructions operate similarly to their vector instructions, using similar mnemonics, and idioms. It can freely load and store to standard registers and to memory, and you can use a 64-bit memory reference as an operand for many scalar operations like addsd xmm1, m64 or subsd xmm1, m64, but you can only load from and store to registers via movq xmm1, r/m64, movq r/m64, xmm1, and friends. This is similar to ARM64 NEON, although it's slightly different from ARM's standard scalar instruction set.

Conversely, many vectorized instructions don't even bother with this distinction, just drawing a distinction between scalar and vector. In the case of x86, ARM, and MIPS all three:

They separate the scalar and vector register spaces.
They reuse the same register space for vectorized integer and floating-point operations.
They can still access the integer stack as applicable.
Scalar operations simply pull their scalars from the relevant register space (or memory in the case of x86 FP constants).

But I was wondering: are there any CPU architectures that reuse the same register space for integer and floating point operations?

And if not (due to reasons beyond compatibility), what would be preventing hardware designers from choosing to go that route?

Solution

The Motorola 88100 had a single register file (thirty-one 32-bit entries plus a hardwired zero register) used for floating point and integer values. With 32-bit registers and support for double precision, register pairs had to be used to supply values, significantly constraining the number of double precision values that could be kept in registers.

The follow-on 88110 added thirty-two 80-bit extended registers for additional (and larger) floating point values.

Mitch Alsup, who was involved in Motorola's 88k development, has developed his own load-store ISA (at least partially for didactic reasons) which, if I recall correctly, uses a unified register file.

It should also be noted that the Power ISA (descendant from PowerPC) defines an "Embedded Floating Point Facility" which uses GPRs for floating point values. This reduces core implementation cost and context switch overhead.

One benefit of separate register files is that such provides explicit banking to reduce register port count in a straightforward limited superscalar design (e.g., providing three read ports to each file would allow all pairs of one FP, even three-source-operand FMADD, and one GPR-based operation to start in parallel and many common pairs of GPR-based operations compared with a five read ports with single register file to support FMADD and one other two-source operation). Another factor is that the capacity is additional and the width independent; this has both advantages and disadvantages. In addition, by coupling storage with operations a highly distinct coprocessor can be implemented in a more straightforward manner. This was more significant for early microprocessors given chip size limits, but the UltraSPARC T1 shared a floating point unit with eight cores and AMD's Bulldozer shared an FP/SIMD unit with two integer "cores".

A unified register file has some calling convention advantages; values can be passed in the same registers regardless of the type of the values. A unified register file also reduces unusable resources by allowing all registers to be used for all operations.