Search code examples
javamysqlmultithreadingexecutors

Help with java threads or executors: Executing several MySQL selects, inserts and updates simmultaneously


I'm writing an application to analyse a MySQL database, and I need to execute several DMLs simmultaneously; for example:

// In ResultSet rsA: Select * from A;
rsA.beforeFirst();
while (rsA.next()) {
   id = rsA.getInt("id");
   // Retrieve data from table B: Select * from B where B.Id=" + id;
   // Crunch some numbers using the data from B
   // Close resultset B
}

I'm declaring an array of data objects, each with its own Connection to the database, which in turn calls several methods for the data analysis. The problem is all threads use the same connection, thus all tasks throw exceptios: "Lock wait timeout exceeded; try restarting transaction"

I believe there is a way to write the code in such a way that any given object has its own connection and executes the required tasks independent from any other object. For example:

DataObject dataObject[0] = new DataObject(id[0]);
DataObject dataObject[1] = new DataObject(id[1]);
DataObject dataObject[2] = new DataObject(id[2]);
...
DataObject dataObject[N] = new DataObject(id[N]);
// The 'DataObject' class has its own connection to the database, 
// so each instance of the object should use its own connection. 
// It also has a "run" method, which contains all the tasks required.
Executor ex = Executors.newFixedThreadPool(10);

for(i=0;i<=N;i++) {
   ex.execute(dataObject[i]);
}
// Here where the problem is: Each instance creates a new connection,
// but every DML from any of the objects is cluttered in just one connection
// (in MySQL command line, "SHOW PROCESSLIST;" throws every connection, and all but
// one are idle).

Can you point me in the right direction?

Thanks


Solution

  • After some time of brain breaking, I figured out my own mistakes... I want to put this new knowledge, so... here I go

    I made a very big mistake by declaring the Connection objet as a Static object in my code... so obviously, despite I created a new Connection for each new data object I created, every transaction went through a single, static, connection.

    With that first issue corrected, I went back to the design table, and realized that my process was:

    1. Read an Id from an input table
    2. Take a block of data related to the Id read in step 1, stored in other input tables
    3. Crunch numbers: Read the related input tables and process the data stored in them
    4. Save the results in one or more output tables
    5. Repeat the process while I have pending Ids in the input table

    Just by using a dedicated connection for input reading and a dedicated connection for output writing, the performance of my program increased... but I needed a lot more!

    My original approach for steps 3 and 4 was to save into the output each one of the results as soon as I had them... But I found a better approach:

    • Read the input data
    • Crunch the numbers, and put the results in a bunch of queues (one for each output table)
    • A separated thread is checking every second if there's data in any of the queues. If there's data in the queues, write it to the tables.

    So, by dividing input and output tasks using different connections, and by redirecting the core process output to a queue, and by using a dedicated thread for output storage tasks, I finally achieved what I wanted: Multithreaded DML execution!


    I know there are better approaches to this particular problem, but this one works quite fine.

    So... if anyone is stuck with a problem like this... I hope this helps.