Clarification regarding traditional interpreter, compiler and JIT compiler/interpreter

I'm learning Java and the following things are a bit confusing for me. What I understood is:

Java Compiler → The Java compiler just converts .java programs into .class files, which means converting our source code into bytecode (it is a list of op codes for the virtual machine (JVM) which makes Java platform-independent).
Java Interpreter → merely "interprets" the code and does not transform it into native machine code. It executes each and every instruction of the byte code one-by-one as a command and carries it out, regardless how many time the same instruction occurs. That's why it's slow and Java introduces the JIT concept.
JIT Compiler → This also comes into play at execution time. The JIT compiler is able to improve performance by caching results of blocks of code that have been translated – compared to simply re-evaluating every line or operand in the bytecode each time it occurs.

Now I have several questions:

As my physical processor understands only native machine code, how does a Java program get executed using the JVM's interpreter? The interpreter doesn't convert bytecode to native machine code. Until and unless someone places the machine code into memory, the physical processor won't be able to execute it.
Supposing that somehow, the interpreter also converts bytecode to native machine code then a "block of code execution with caching (JIT) and line-by-line execution (interpreter)" is the only thing that differentiates the JIT and the interpreter?
If, at execution time, a JIT compiler translates bytecode to native machine code (for executing the program), why doesn't Java use ahead-of-time compilation? After generating the JVM-dependent bytecode (which in turn makes Java platform-independent), we could bring it to the target machine where we want to execute it and just translate it to native machine code (creating an .exe or .out file as is the case with C compilation). This could be possible because we have a specific JVM on every system. This would be much faster than using JIT compilation as it takes some time compiling and loading the program. It will still be platform-independent by just distributing bytecode (generated before the final translation from bytecode to machine code).

Solution

Disclaimer: Take all of this with a grain of salt; it's pretty oversimplified.

1: You are correct in that the computer itself doesn't understand the code, which is why the JVM itself is needed. Let's pretend XY means "add the top two elements on the stack and push the result". The JVM would then be implemented something like this:

for(byte bytecode : codeToExecute) {
    if (bytecode == XX) {
        // ...do stuff...
    } else if (bytecode == XY) {
        int a = pop();
        int b = pop();
        push(a+b);
    } else if (bytecode == XZ) {
        // ...do stuff...
    } // ... and so on for each possible instruction ...
}

The JVM has, in the computer's native machine code, implemented each individual instruction and essentially looks up each chunk of bytecode for how to execute it. By JITting the code, you can achieve large speedups by omitting this interpretation (i.e. looking up how each and every instruction is supposed to be handled). That, and optimization.

2: The JIT doesn't really run the code; everything is still run inside the JVM. Basically, the JIT translates a chunk of bytecode into machine code when appropriate. When the JVM then comes across it, it thinks "Oh hey, this is already machine code! Sweet, now I won't have to carefully check each and every byte of this as the CPU understands it on its own! I'll just pump it through and everything will magically work on its own!".

3: Yes, it is possible to pre-compile code in that way to avoid the early overhead of interpretation and JITting. However, by doing so, you lose something very valuable. You see, when the JVM interprets the code, it also keeps statistics about everything. When it then JITs the code, it knows how often different parts are used, allowing it to optimize it where it matters, making the common stuff faster at the expense of the rare stuff, yielding an overall performance gain.