POSIX standards and semantics of semaphores

I'm trying to understand the semantics of semaphores.

Here is some code:

#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>

sem_t sem;

void *helper(void *arg) {
        sem_wait(&sem);
        printf("helper woke up\n");
        return NULL;
}

int main() {
        sem_init(&sem, 0, 0);
        pthread_t tid;
        pthread_create(&tid, NULL, helper, NULL);
        sleep(1);
        sem_post(&sem);
        sem_wait(&sem);
        printf("main woke up\n");
        exit(0);
}

Here main creates a helper thread, sleeps for a second to be (almost) sure that the helper has run and waited on the semaphore, and then tries to post and wait on the semaphore in quick succession. According to POSIX (https://pubs.opengroup.org/onlinepubs/009695399/functions/sem_post.html):

If the value of the semaphore resulting from this operation is zero, then one of the threads blocked waiting for the semaphore shall be allowed to return successfully from its call to sem_wait().

So, I would expect "helper woke up" to be printed. However, if you actually run this code on Linux, "main woke up" is typically observed, meaning that by default semaphores on Linux are allowed to be not fair.

My question is: why is this behavior allowed? Yes, you can say "it's easier to implement" but that does not explain why semaphores have been designed with these particular semantics.

Elsewhere, (https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Semaphore.html) it says that this behavior is preferable in some situations. In what situation would a non-fair semaphore be preferable? I'd appreciate any detailed source on this topic.

Edit: please don't make the sleep(1) the focus of the post. Assume that the helper thread waits on the semaphore (like it almost always does) in any way that you like (yield, barrier, no timer ticks, whatever). I'm interested in the semaphore semantics here.

Solution

One way that this can be preferable is that it reduces context switches. The thread that calls sem_post() is allowed to continue running without having to resume the thread blocked in sem_wait().

By the time the scheduler makes the decision of which thread to unblock, the main thread has called sem_wait(). Then it chooses which of the waiters to unblock. Then it can base this on scheduling priorities, as described in the rest of the paragraph that the quote was taken from, or choose arbitrarily. And it may choose the main thread to avoid a context switch.

Even though it's unlikely that it would take more than a second for the second thread to run, there are no requirements for it to run right away (unless scheduler priorities have been configured). So decisions like this can act as if there were an arbitrary delay in running the second thread. Optimizations like there's no observable difference in behavior due to timing windows like this.

An important thing to remember is that specifications like POSIX describe behavior in abstract terms, it's not trying to force a specific implementation.