java multithreading matrix-multiplication java-threads

Java ThreadPool limit maximum thread ever created

I am trying to write a Java multithreaded program performing a multiplication on 2 matrices given as a file and using a limited total of threads used.

For example if I set a number of thread at 16 I want my threadpool to be able to reuse those 16 threads until all the tasks are done.

However I end up with a larger execution time for a larger number of threads and I am having a hard time trying to understand why.

Runnable:

class Task implements Runnable
{
    int _row = 0;
    int _col = 0;

    public Task(int row, int col)
    {
        _row = row;
        _col = col;
    }

    @Override
    public void run()
    {
        Application.multiply(_row, _col);
    }
}

Application:

public class Application
{
    private static Scanner sc = new Scanner(System.in);

    private static int _A[][];
    private static int _B[][];
    private static int _C[][];

    public static void main(final String [] args) throws InterruptedException
    {                
        ExecutorService executor = Executors.newFixedThreadPool(16);
        ThreadPoolExecutor pool = (ThreadPoolExecutor) executor;

        _A = readMatrix();
        _B = readMatrix();
        _C = new int[_A.length][_B[0].length];

        long startTime = System.currentTimeMillis();
        for (int x = 0; x < _C.length; x++)
        {
            for (int y = 0; y < _C[0].length; y++)
            {
                executor.execute(new Task(x, y));
            }
        }
        long endTime = System.currentTimeMillis();

        executor.shutdown();
        executor.awaitTermination(Long.MAX_VALUE, TimeUnit.HOURS);

        System.out.printf("Calculation Time: %d ms\n" , endTime - startTime);
   }

    public static void multMatrix(int row, int col)
    {
        int sum = 0;
        for (int i = 0; i < _B.length; i++)
        {
            sum += _A[row][i] * _B[i][col];
        }
        _C[row][col] = sum;
    }

    ...
}

The matrix calculations and workload sharing seems correct so it might come from a bad use of ThreadPool

Solution

Context switching takes time. If you have 8 cores and you are executing 8 threads they all can work simultaneously and as soon as one finishes it will be reused. On the other hand if you have 16 threads for 8 cores each thread will compete for the processor time and scheduler will switch those threads and your time would increase to - Execution time + Context swithcing.

The more the threads the more the context switching and hence the time increases.