Search code examples
assemblyarmreverse-engineeringportingfpu

How to fix VFP asm code made for armeabi (arm5 or 6) to run on ARM v7?


Here's a short introduction to what's this about. I come from the PSVita homebrew community, where Andy Nguyen made so-loader, basically a library that allows us to execute Android .so's by resolving their imports and applying some patches. This is a successful practice that made possible porting many great games to the platform without source code, some of them ARMv6 but most ARMv7 (Battlefield Bad Company 2, Fahrenheit, GTA:SA are some examples).

Interesting thing about this process is that Android has no support for hard-float ABIs, so we have a version of SDK completely compiled for use with softfp to have compliant calling conventions end-to-end.

That works great until this one peculiar .so, that apparently was made for ARMv6. It seems to have its own understanding of calling conventions not compatible with what we can provide, but I can't understand why and how to fix it. In a nutshell, there is a number of VFP-related instructions that cause Undefined instruction exception when ran on our ARMv7-softfp device, the PSVita. Some examples of such instructions taken from Ghidra disasm of this .so:

        0026f17c 00 4a 24 7e     vmulvc.f32 s8,s8,s0
        00264bc0 00 4a 24 ee     vmul.f32   s8,s8,s0
        00264bc4 4c 8a 38 ee     vsub.f32   s16,s24
        00264bc8 08 ca 04 ee     vmla.f32   s24,s8,s16
        0025e330 48 6a b0 ee     vmov.f32   s12,s16

There are a few things I'm sure about with this problem:

  1. Only these instructions are a problem, because if I force-patch them with NOPs (0xe1a00000), the game boots and works, but has some glitches apparently related to this missing math.
  2. The .so itself is working fine on Android armv7 (xperia play). That leads me to think that I must be missing some compiler configuration.
  3. Except for that apparently needed specific compiler conf, my setup is perfectly fine as I'm able to compile and run loaders for all our other ports done in same way.

Any thoughts about what can be done here?

  1. Patch these with working equivalents?
  2. Configure compiler with some missing flags?
  3. ???

P.S. For reference, compiler is GCC 10.3.0, complete toolchain can be found here https://github.com/vitasdk. Example of how so_loader works can be found on the creator's github here: https://github.com/TheOfficialFloW/gtasa_vita/

EDIT: As requested in comments, I add some examples that do work. These are taken from the same game in question, but newer version. This version doesn't have any Undefined instruction crashes at all, but unfortunately we can't use it because in the newer version the developers stripped out physical controls support :)

        003cafac 27 7a 67 ee     vmul.f32   s15,s14,s15
        001a253c 87 7a 46 ee     vmla.f32   s15,s13,s14
        001a2540 e8 7a f4 ee     vcmpe.f32  s15,s17
        001a2528 27 7a 67 ee     vmul.f32   s15,s14,s15
        000bf9d4 a7 5a 75 ee     vadd.f32   s11,s11,s15
        000bf9d8 64 7a f0 ee     vmov.f32   s15,s9

EDIT2: Further analysis showed that the difference between the two versions is that broken one uses the VFP short vector feature. Specifically, you can see the short vector mode being enabled with something like this in crashing functions:

        00264a58 10 0a f1 ee     vmrs       param_1,fpscr
        00264a5c 37 08 c0 e3     bic        param_1,param_1,#0x370000=>LAB_00360000
        00264a60 07 08 80 e3     orr        param_1,param_1,#0x70000
        00264a64 10 0a e1 ee     vmsr       fpscr,param_1

PSVita's CPU is Cortex A9 and according to the ARM developers documentation,

The Cortex-A9 FPU hardware does not support the deprecated VFP short vector feature. Attempts to execute VFP data-processing instructions when the FPSCR.LEN field is non-zero result in the FPSCR.DEX bit being set and a synchronous Undefined instruction exception being taken. You can use software to emulate the short vector feature, if required.

So it seems, we need to use software to emulate the short vector feature. Now the question is: how.


Solution

  • We ended up writing a library that catches Undefined Instruction exception and generates code for VFP short vector emulation on the flyL https://github.com/bythos14/VFPVector/