Database Benchmark : Weird results when testing concurrency (ExecutorService)

I am currently developing a java Benchmark to evaluate some usecases (inserts, updates, deletes, etc.) with an Apache Derby database.

My implementation is the following :

After having warmed up the JVM, I execute a serie (for loop : (100k to 1M iterations)) of, let's say, ÌNSERT in a (single table at the moment) of a database. As it is an Apache Derby, for those who knows, I test every mode (In memory/Embedded, In memory/Network, Persistent/Embedded, Persistent/Network)

The execution of the process may be singleThreaded, or multiThreaded (using Executors.newFixedThreadPool(poolSize)

Well, here goes my problem :

When I execute the benchmark with only 1 thread, I have pretty realistics results

In memory/embedded[Simple Integer Insert] : 35K inserts/second (1 thread)

Then, I decide to execute with 1 and then 2 (concurrent) threads sequentially.

Now, I have the following results :

In memory/embedded[Simple Integer Insert] : 21K inserts/second (1 thread)
In memory/embedded[Simple Integer Insert] : 20K inserts/second (2 thread)

Why do the results for 1 thread change so much ?

Basically, I start and end the timer before and after the loop :

// Processing
long start = System.nanoTime();

for (int i = 0; i < loopSize; i++) {
    process();
}
// end timer
long absTime = System.nanoTime() - start;
double absTimeMilli = absTime * 1e-6;

and the process() method :

private void process() throws SQLException {
        PreparedStatement ps = clientConn.prepareStatement(query);
        ps.setObject(1, val);
        ps.execute();
        clientConn.commit();
        ps.close();
}

As the executions are processed sequantially, the reste of my code (data handling) should not alter the benchmark ?

The results go worse as the number of sequential threads grows (1, 2, 4, 8 for example).

I am sorry in advance if this is confusing. If needed, I'll provide more information or re-explain it!

Thank you for you help :)

EDIT :

Here is the method (from the Usecase class) calling the aforementionned execution :

@Override
public ArrayList<ContextBean> bench(int loopSize, int poolSize) throws InterruptedException, ExecutionException {
    Future<ContextBean> t = null;
    ArrayList<ContextBean> cbl = new ArrayList<ContextBean>();

    try {

        ExecutorService es = Executors.newFixedThreadPool(poolSize);


        for (int i = 0; i < poolSize; i++) {
            BenchExecutor be = new BenchExecutor(eds, insertStatement, loopSize, poolSize, "test-varchar");
            t = es.submit(be); 
            cbl.add(t.get());
        }

        es.shutdown();
        es.awaitTermination(Long.MAX_VALUE,TimeUnit.MILLISECONDS);

    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    return cbl;
}

Solution

On simple operations, every database behaves as you described.

The reason is that the all threads you are spawning try to operate on the same table (or set of tables), so the database must serialize the access.

In this situation every thread works a little slower, but the overall result is a (small) gain. (21K+20K=41K against a 35K of the single threaded version).

The gain decreases (usually exponentially) with the number of threads, and eventually you may experience a loss, due to lock escalation (see https://dba.stackexchange.com/questions/12864/what-is-lock-escalation).

Generally, the multithread solution gains most when its performance is not bound by a single resource, but by multiple factors (i.e calculations, selects on multiple tables, inserts on different tables).