compiler-construction constraints virtualization native aot

Make a compiled binary run at native speed flawlessly without recompiling from source on a another system?

I know that many people, at a first glance of the question, may immediately yell out "Java", but no, I know Java's qualities. Allow me to elaborate my question first.

Normally, when we want our program to run at a native speed on a system, whether it be Windows, Mac OS X, or Linux, we need to compile from source codes. If you want to run a program of another system in your system, you need to use a virtual machine or an emulator. While these tools allow you to use the program you need on the non-native OS, they sometimes have problems of performance and glitches.

We also have a newer compiler called "JIT Compiler", where the compiler will parse the bytecode program to native machine language before execution. The performance may increase to a very good extent with JIT Compiler, but the performance is still not the same as running it on a native system.

Another program on Linux, WINE, is also a good tool for running Windows program on Linux system. I have tried running Team Fortress 2 on it, and tried experiment with some settings. I got ~40 fps on Windows at its mid-high setting on 1280 x 1024. On Linux, I need to turn everything low at 1280 x 1024 to get ~40 fps. There are 2 notable things though:

Polygon model settings do not seem to affect framerate whether I set it low or high.
When there are post-processing effects or some special effects that require manipulation of drawn pixels of the current frame, the framerate will drop to 10-20 fps.

From this point, I can see that normal polygon rendering is just fine, but when it comes to newer rendering methods that requires graphic card to the job, it slows down to a crawl.

Anyway, this question is rather theoretical. Is there anything we can do at all? I see that WINE can run STEAM and Team Fortress 2. Although there are flaws, they can run at lower setting. Or perhaps, I should also ask, "is it possible to translate one whole program on a system to another system without recompiling from source and get native speed?" I see that we also have AOT Compiler, is it possible to use it for something like this? Or there are so many constraints (such as DirectX call or differences in software architecture) that make it impossible to have a flawless and not native to the system program that runs at native speed?

Solution

The first step to running the same compiled body of code on multiple systems at native speed without recompiling is to choose one processor instruction set and throw out all other systems. If you pick Intel, then you must throw out ARM, MIPS, PowerPC, and so forth because the native machine code instructions for one architecture are completely unintelligible to other processors.

Ok. So now the task is to run the same body of compiled native code on multiple systems (all using the same processor architecture) at native speed without recompiling. So basically, you want to run the same code under different operating systems on the same hardware.

If the hardware is the same and the only difference is the operating system, then the trivial answer is yes, you can do it if you can write your code without making any calls to the operating system. No memory allocation. No console output. No file I/O. No network I/O. No fun.

Furthermore, your code will have to be written in such a way that the code does not require address relocation fixups, since each operating system has different ways to represent relocatable code. One way to do that is to arrange your code on disk exactly as it would appear in memory, including reserving space to use for writable data (global variables, stack, and heap). Then all you have to do to run the code is copy the file bytes into memory at a predefined base address, and jump to the starting address.

The MSDOS .com executable file format has been doing this since at least 1981, and CP/M for long before that.

However, MSDOS didn't have today's virus scanners to contend with back then. Virus scanners get very excited when anyone other than the host OS loads file data into memory and attempts to execute that memory. Because, ya know, that's exactly what viruses do.

Since each OS has its own executable file format, you'll also need to figure out how to get your block of "flawless" native code into memory on all these different operating systems. You will need at a minimum one program loader compiled for each operating system you want to run your block of native code in. While you're writing a program loader for each OS you want to target, you could also define your own file I/O functions that map to the OS native equivalents so that your block of native code can do file I/O on any system. Ditto for console I/O or graphics output.

Oh wait - that's exactly what WINE does.

That's also why the frame rates you see in WINE are so much lower than the same operations in the host OS - WINE is translating Win32 GDI graphics calls into something provided by the native host OS (Linux -> XWindows), and where there isn't an exact function match or where there is an operation semantic mismatch (which is frequently the case), WINE has to implement all the functionality itself, sometimes at great cost.

But given the ubiquity of standardized hardware like IDE drives, USB devices, and BIOS functions, maybe you don't need to go to all the trouble of mapping your own portable APIs onto whatever the OS has built in. Just write a little code to do file I/O to IDE devices, do graphics output using VESA BIOS functions. If you abstract the code a little bit, you can support multiple kinds of hardware and pick the appropriate function pointer to use based on what hardware you find at runtime.

Then you could truly run your block of native code on any system (using one particular processor architecture) at native speed without recompiling.

Oh wait - you just wrote your own OS. ;>