Koch snowflake rendering time (and how to draw a snowflake using turtle)

I'm currently working through the online course material for the MIT 6.006 course for fun. I'm on problem set #2 (found here) and had a question about the calculations for the asymptotic rendering time for the koch snowflake problem (problem #1).

According to the solutions, when the CPU is responsible for the rendering and the calculation of coordinates, the asymptotic rendering time is faster than if the process is split up between the CPU and GPU. The math makes sense to me, but does anyone have an intuition about why this is true?

In my mind, the CPU still has to calculate the coordinates to render the snowflake (Theta(4^n) time), and then has to render the image. In my mind, these should be additive, not multiplicative.

However, the solutions state these are multiplicative, so since each triangle/line segment is shorter (for the last two subproblems in problem 1) the runtime is reduced to either Theta((4/3)^n) or Theta(1)!

I'm not a computer scientist--this stuff is just a fun hobby for me. I'd really appreciate an answer from one of you geniuses out there :)

Also, some fun I had while playing with the python turtle module. Heres some very imperfect code to draw a koch snowflake in python:

import turtle

def snowflake(n,size=200):
    try: turtle.clear()
    except: pass
    turtle.tracer(0,0)
    snowflake_edge(n,size)
    turtle.right(120)
    snowflake_edge(n,size)
    turtle.right(120)
    snowflake_edge(n,size)
    turtle.update()
    turtle.hideturtle()

def snowflake_edge(n,size=200):
    if n==0:
        turtle.forward(size)
    else:
        snowflake_edge(n-1,size/3.0)
        turtle.left(60)
        snowflake_edge(n-1,size/3.0)
        turtle.right(120)
        snowflake_edge(n-1,size/3.0)
        turtle.left(60)
        snowflake_edge(n-1,size/3.0)

Solution

As indicated by the comments on P.5 of the problem set, the approach taken by the CPU and the GPU are different.

The CPU-only (without hardware acceleration) appproach is to first compute what needs to be drawn, and then send it to the GPU to draw. Since we are assuming that the cost to rasterize is bigger than the cost to gather the line segments, then the amount of time to draw the image will be dominated by the GPU, whose cost will be proportional to the number of pixels it actually has to draw.

The GPU-CPU (with hardware acceleration) approach computes all the triangles, big and small, and then sends them to the GPU to draw the big triangle, delete the middle pixels, draw the smaller triangles, delete the unneeded pixels, and so on. Since drawing takes a long time, then the time GPU has to spend drawing and erasing will dominate the total amount of time spent; since most (asymptotically, 100%) of the pixels that are drawn will in the end be erased, then the total amount of time taken will be substantially more than simply the number of pixels that actually have to be drawn.

In other words: the hardware-acceleration scenario dumps most of the work on to the GPU, which is much slower than the CPU. This is okay if the CPU has other things to work on while the GPU is doing its processing; however, this is not the case here; so, the CPU is just wasting cycles while the GPU is doing its drawing.