Why do all threads always take the same time, why does the OS run all threads "in parallel"?

Why is the amount of time always equal for each thread, where all threads perform the same instruction?

I've set up a program to measure the time it takes for a thread to perform a certain operation (see below for more details). My MacBook has 4 physical or 8 logical cores (hyper threading available), so I'd assume I could run up to 8 threads in parallel.

I would expect the OS to run up to 8 threads in parallel, and, once finished, run the remaining threads. But, contrary to these expectations, it seems like all threads (no matter how many) always seems to take the same time. Why does the OS split them up like that, why does it start all of the threads instead of finishing say the first 8 threads first? Why does it try to run all of them "in parallel" (I know it actually can't run more than 8 in parallel)?

Also see the table below; each bar represents the average time for n threads, n going from 1 to 17. Surprisingly, the average time equals the time for each thread (and that's what I find odd/don't understand - why do they all take the same time?):

This is the piece of code I used:

#include <stdio.h>
#include <sys/time.h>
#include <pthread.h>

#define MAX 1000000000

void * calculate (void * val) {
  /* only time measurement ... */
  long a = 0, start, end;
  struct timeval timecheck;

  gettimeofday(&timecheck, NULL);
  start = (long)timecheck.tv_sec * 1000 + (long)timecheck.tv_usec / 1000;

  /* actual calculation (just for testing) */
  for (unsigned i = 1; i <= MAX; ++i)
    a += i;

  gettimeofday(&timecheck, NULL);
  end = (long)timecheck.tv_sec * 1000 + (long)timecheck.tv_usec / 1000;

  printf("%ld; time: %ldms\n", a, (end - start));

  return NULL;
}


void main (int argc, char ** argv) {
  pthread_t thread_1, thread_2, thread_3, thread_4; /* thread_5, ...*/
  long a = 0, start, end;
  struct timeval timecheck;

  gettimeofday(&timecheck, NULL);
  start = (long)timecheck.tv_sec * 1000 + (long)timecheck.tv_usec / 1000;

  if (pthread_create(&thread_1, NULL, &calculate, NULL) != 0)
    perror("Couldn't create thread 1");
  /* ... repeat the above two lines for each thread ... */

  pthread_join(thread_1, NULL);
  /* ... repeat the line above for each thread ... */

  gettimeofday(&timecheck, NULL);
  end = (long)timecheck.tv_sec * 1000 + (long)timecheck.tv_usec / 1000;

  printf("total time: %ldms\n", (end - start));
}

Solution

why does it start all of the threads instead of finishing say the first 8 threads first?

You asked it to create and start 17 threads. So it created and started 17 threads.

It has no way to know that it would be safe to run 8 to completion before scheduling the others. It has even less of an idea that this would be beneficial. (Imagine if the threads spend most of their time blocked.) But most importantly, it doesn't even know if the threads will ever finish at all!

Perhaps you'd like to use a thread pool model, where you have a limited number of threads being fed work from a shared queue.

Example of a thread pool (in C)

Example of a thread pool (in Perl):

use threads;

use Thread::Queue qw( );

use constant NUM_WORKERS => 10;

sub work {
   my $job = shift;
   ...
}

my $q = Thread::Queue->new();  # A thread-safe queue.

# Create worker threads.
my @threads;
for (1..NUM_WORKERS) {
   push @threads, async {
      while ( defined( my $job = $q->dequeue() ) ) {
         work($job);
      }
   };
}

# Feed them work
for my $job (...) {
    $q->enqueue($job);
}

# $q->dequeue normally blocks when the queue is empty.
# This causes it to return an undef value instead.
$q->end();  

# Wait for the workers to complete the work.
$_->join() for @threads;

There are thread pool libraries for C. (And for Perl too, for that matter.)