Search code examples
javamysqlmultithreadingjdbcbatch-insert

java jdbc design pattern : handle many inserts


I would like to ask for some advices concerning my problem. I have a batch that does some computation (multi threading environement) and do some inserts in a table. I would like to do something like batch insert, meaning that once I got a query, wait to have 1000 queries for instance, and then execute the batch insert (not doing it one by one).

I was wondering if there is any design pattern on this. I have a solution in mind, but it's a bit complicated:

  • build a method that will receive the queries

  • add them to a list (the string and/or the statements)

  • do not execute until the list has 1000 items

The problem : how do I handle the end ? What I mean is, the last 999 queries, when do I execute them since I'll never get to 1000 ? What should I do ?

I'm thinking at a thread that wakes up every 5 minutes and check the number of items in a list. If he wakes up twice and the number is the same , execute the existing queries.

Does anyone has a better idea ?


Solution

  • Your database driver needs to support batch inserting. See this.

    Have you established your system is choking on network traffic because there is too much communication between the service and the database? If not, I wouldn't worry about batching, until you are sure you need it.

    You mention that in your plan you want to check every 5 minutes. That's an eternity. If you are going to get 1000 items in 5 minutes, you shouldn't need batching. That's ~ 3 a second.

    Assuming you do want to batch, have a process wake up every 2 seconds and commit whatever is queued up. Don't wait five minutes. It might commit 0 rows, it might commit 10...who cares...With this approach, you don't need to worry that your arbitrary threshold hasn't been met.

    I'm assuming that the inserts come in one at a time. If your incoming data comes in n at once, I would just commit every incoming request, no matter how many inserts happen. If your messages are coming in as some sort of messaging system, it's asynchronous anyway, so you shouldn't need to worry about batching. Under high load, the incoming messages just wait till there is capacity to handle them.