Why is SDL so much slower on Mac than Linux?

I am working on a single-threaded graphical program that renders using SDL2. See the end for a smaller example.

It runs on both an old Linux machine and a somewhat less old Mac. The Linux computer has 1.60GHz processors while the Mac's processors are 2.2GHz. The SDL version on Linux is 2.0.8, while the SDL version of the Mac is 2.0.10. On both computers I compiled with clang++ using optimization flags -O3 and -flto. I invoked the executable with ./intergrid -fullscreen -pixel-size 3 (essentially, I had the program draw very many pixels.)

For some reason, the slower Linux computer executed the program with no sweat, while the Mac took several seconds to render the first frame. The Mac was faster than the Linux machine, as expected, when I used the -no-draw flag to disable graphics.

EDIT: The Linux computer has "Intel Haswell Mobile" for graphics and the Mac lists "Intel Iris Pro 1536 MB."

Here is a minimal reproducible example:

#include <SDL2/SDL.h>
#include <stdio.h>

int main(void)
{
    SDL_Init(SDL_INIT_VIDEO | SDL_INIT_TIMER);

    SDL_Window *window = SDL_CreateWindow(
        "Example",
        SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,
        0, 0,
        SDL_WINDOW_SHOWN);
    SDL_SetWindowFullscreen(window, SDL_WINDOW_FULLSCREEN_DESKTOP);

    SDL_Renderer *renderer = SDL_CreateRenderer(window, -1, 0);

    SDL_Rect viewport;
    SDL_RenderGetViewport(renderer, &viewport);

    // The screen is not rendered to unless this is done:
    SDL_Event event;
    while (SDL_PollEvent(&event))
        ;

    Uint32 ticks_before = SDL_GetTicks();
    for (int x = 0; x < viewport.w - 10; x += 10) {
        for (int y = 0; y < viewport.h - 10; y += 10) {
            // I just chose a random visual effect for this example.
            SDL_Rect square;
            square.x = x;
            square.y = y;
            square.w = 10;
            square.h = 10;
            SDL_SetRenderDrawColor(renderer, x % 256, y % 256, 255, 255);
            SDL_RenderFillRect(renderer, &square);
        }
    }
    Uint32 ticks_after = SDL_GetTicks();
    printf("Ticks taken to render: %u\n", ticks_after - ticks_before);

    SDL_RenderPresent(renderer);

    SDL_Delay(500);

    // I Won't worry about cleaning stuff up.
}

I compiled this on Mac and Linux with clang++ -O3 -flto <filename> -lSDL2. When I ran the program on the Mac, it printed:

Ticks taken to render: 849

The program on Linux printed:

Ticks taken to render: 4

That's a gigantic difference!

Solution

@keltar found a solution that is good enough for me, but they have not yet posted it as an answer, so I will. For some reason, SDL2's Metal back end is immensely slow, so the solution is to use the OpenGL back end. I accomplished this by calling SDL_SetHint(SDL_HINT_RENDER_DRIVER, "opengl") if I found that the default driver was Metal (using SDL_GetRendererInfo.)