Vulkan API calls to GPU drivers

Background:
I have been eyeing writing an application which needs very basic but fast graphics (just drawing lines and squares), and I'm probably going to use a library such as GLFW, or Vulkano if i'm going with Rust.

I want to understand a specific, and I guess quite practical, detail of the Vulkan API. I understand that GPUs can be quite a complicated topic, but I want to emphasize that I don't have any background in low-level graphics or Vulkan, so I understand if my question cannot be answered, or if my question does not even make sense. I'll try my best to use the correct terminology. I have to admit, I'm not the best at skimming through and looking at large amounts of source code I don't quite understand and still grasp the overall concept, which is why I hope I can find my answer here. I've tried looking at the source code for Vulkan and Mesa drivers, but it bore no fruit.

ORIGINAL Question:
I want to understand how an API call is propagated to the GPU driver.

I have searched around, but couldn't find the specifics I am searching for. The closest posts I've found are these two:
https://softwareengineering.stackexchange.com/questions/279069/how-does-a-program-talk-to-a-graphics-card
https://superuser.com/questions/461022/how-does-the-cpu-and-gpu-interact-in-displaying-computer-graphics

They both mention something similar to "In order to make the GPU do something, you have to make a call via a supported API". I know that, but neither of the two dig into the specifics of how that API call is made. Hopefully, the diagram below illustrates my question.

MyVulkanProgram.c with "#include <vulkan/vulkan.h>"
|
| (Makes call via Vulkan API)
v
This is the part I don't understand!
|
v
Driver (Mesa, for example) takes the request sent via the Vulkan API.
|
| (Driver asks GPU to perform task)
v
GPU does task

I don't care what or how the GPU does something. Just how it is invoked through the API call via Vulkan and how it propagates through the system. Ideally what I'm looking for is a code-snippet or link to where in the Vulkan source code the actual request is sent to the driver.

Or have I gotten it all wrong? Is Vulkan more part of the driver than I realize? Is it maybe the case that the driver includes the same Vulkan header as my "MyVulkanProgram.c" and the driver is linked together with library files such as libvulkan.so et al? Is it more like the diagram below?

MyVulkanProgram.c with "#include <vulkan/vulkan.h>"
|
| (Makes call via Vulkan API)
v
Driver (Mesa, for example, which includes the vulkan headers and is linked with the Vulkan shared object-files) takes the request sent via the Vulkan API.
|
| (Driver asks GPU to perform task)
v
GPU does task

Might be a basic question, might not be, but I'm confused nonetheless. Very thankful for any answers!

UPDATED Question:
After having read the answer from @krOoze (answer from krOoze), and given the "Vulkan loader" overview figure in the mentioned document, I can more precisely express my question.

How does an application, making a call via the Vulkan API, reach the ICD via the Vulkan loader?

Solution

You are looking for the Vulkan-Loader/LoaderAndLayerInterface.md documentation.

The app interfaces with The Loader (sometimes called Vulkan RT, or Vulkan Runtime). That is the vulkan-1.dll (or so).

The Loader also has vulkan-1.lib, which is classic dll shim. It is where the loading of core version and WSI commands happens, but you can skip the lib and do it all manually directly from the dll using vkGetInstanceProcAddr.

Then you have ICDs (Installable Client Drivers). Those are something like nvoglv64.dll, and you can have more of them on your PC (e.g. Intel iGPU + NV). The name is arbitrary and vendor specific. The Loader finds them via config files.

Now when you call something to a command obtained with vkGetInstanceProcAddress (which is everything if you use the *.lib only), you get onto a loader trampoline, which calls a chain of layers, after which the relevant ICD (or all of them) are called. Then the callstack is unwound, so it goes the other direction until the returned to the app. The loader mutexes and merges the the input and output to the ICD.

Commands obtained with vkGetDeviceProcAddress are little bit more streamlined, as they do not require to be mutexed or merged and are meant to be passed to the ICD without much intervention from the Loader.

The code is also at the same repo: trampoline.c, and loader.c. It's pretty straightforward; every layer just calls the layer below it. Starts at the trampoline, and ends with the terminator layer which in turn will call the ICD layer.