How is the GPU "instructed" to render an image?

If this question is off, please let me know as I don't want to clutter the platform with off-topic questions!

Anyways, I'm having a hard time finding information about what's actually going on when an image is rendered because of some code I've written.

Say I wanted to add the numbers 5 and 3. The CPU would write 5 to one register and 3 to another one. The ALU would take care of the calculation and output 8. That's fine, the CPU uses MOVE and ADD to produce a result.

What I don't find any information on however, is what's going on when I want to draw a rectangle. There are importable frameworks for most programming languages which lets you do this. In SpriteKit (Swift & Objc) for example, you would write something like

let node = SKSpriteNode(color: .white, size: CGSize(width: 200, height: 300))

and add node to an SKScene (just a scene containing childNodes) and a white rectangle would "magically" get rendered. What I would like to know is what goes on under the hood. Why does this exact framework let you draw a rectangle. What is the assembly code (say, for Intel Core M) which makes the GPU calculate what this rectangle will look like? And how does SpriteKit build on the basics of Swift/Objective C to actually do this (and could I do this myself)?

Maybe a weird question, but I feel like I have to know (yes, sometimes I'm too curious). Thank you.

P.S. I would love a really detailed answer, not "the CPU 'tells' the GPU to draw a rectangle" - CPUs can't talk!

Solution

There are many ways to render convex polygon. The most used in past was ScanLine algorithm where you simply rasterize all the lines of circumference into left/right buffers and then just render using horizontal lines and interpolating the other coordinates along the way (like z,r,g,b,tx,ty,nx,ny,nz...). This was suited for single-thread CPU based SW rendering.

With parallelization (like on GPU) different approach get more popular. It simply renders only triangles (so you need to triangulate your polygons) and renders like this:

compute AABB

so simply min,max of x,y coordinates of the triangle vertexes.
loop through AABB

this is done in parallel and its done by GPU interpolators. Each interpolated (looped) "pixel" is called fragment (as it usually contains more than just color)
for each fragment

compute barycentric coordinates and from the result decide if fragment is inside (s+t<=1) or outside (s+t>1) triangle. If inside invoke Fragment shader.

All this gets done just before Fragment shader stage and usually all this (or majority of it) is implemented in HW so no code.

Nowadays GPU rendering is done by passing geometry to the gfx driver itself. What drivers does under the hood is just guess work for us but most likely they also just pass the geometry and configuration setting to the right places on the GPU (memory, registers, ...).