I'm reading the MySQL docs for hours but I still cannot answer to myself a couple of pretty simple questions... :(
Here is my (simplified) scenario: I have two tables in a database: tablea
and tableb
, both tables use the InnoDB storage engine. tablea
(which is my main table) has a PRIMARY index (id
) with autoincrement. Now here is what I want to achieve and please keep in mind that the following business logic can be and will be run concurrently:
I start a transaction:
START TRANSACTION
BEGIN
then I check if an id exists in tablea
if yes, I SELECT the row FOR UPDATE, let's call the id I am looking for myid :
SELECT `id` FROM `tablea` WHERE `id`='myid' FOR UPDATE;
if the above SELECT returns no rows, I simply ROLLBACK the transaction and exit from my function. In other words I'm done when myid is not present in tablea
.
On the other hand when myid exists then first I need to update some values in tablea
:
UPDATE `tablea` SET `somefield`='somevalue' WHERE `id`='myid';
then I need to check if myid also exists in tableb
:
SELECT * FROM `tableb` WHERE `id`='myid' FOR UPDATE;
my first question is about the above SELECT statement: Is it okay to do another SELECT FOR UPDATE here (on tableb
) ??? Or "FOR UPDATE" is not needed here when dealing with tableb
, because I already started a transaction and also acquired a lock based on a row in tablea
??? Can someone please answer this?
The last SELECT statement above either returns a row from tableb
(and locks that row for update) or it turns out that myid does not exist in tableb
.
When myid is present in tableb
then I just need to update some values in that row, it's simple:
UPDATE `tableb` SET `somefieldintableb`='somevaluefortableb' WHERE `id`='myid';
On the other hand when myid is not in tableb
I need to insert it, and here comes my 2nd question: Should I lock tableb
before I issue my INSERT INTO statement, like this:
LOCK TABLES `tableb` WRITE;
INSERT INTO `tableb` (`id`,`somefieldintableb`) VALUES ('myid','somevaluefortableb');
UNLOCK TABLES `tableb`;
and then finally, I do:
COMMIT
My goal is this: Since the above described function (with the MySQL transaction) will run in many instances in parallel, I want to prevent any of those instances updating the same row in either tablea
or tableb
at the same time. I also want to prevent double-insertion of myid into tableb
, hence I thought about using LOCK TABLES when myid was not found in tableb
.
So I have two questions: Should I do a SELECT ... FOR UPDATE within my already started transaction when I want to update tableb
or locking tableb
with SELECT ... FOR UPDATE is unnecessary, because holding the lock on tablea
already "protects" tableb
too from simultaneous UPDATEs in this case ??? Thanks to the way I started my transaction, I mean.
2nd question: When I need to INSERT a new row into tableb
should I lock the whole table for that insertion? Or is that something that is totally unnecessary in this case? (Do I need LOCK TABLES tableb
or not?)
I would appreciate if an expert can answer these two questions for me, because reading the various docs and examples online simply won't help me answering these questions. :(
I would do it this way:
BEGIN;
SELECT a.`id` AS a_id, b.`id` AS b_id
FROM `tablea` AS a LEFT OUTER JOIN `tableb` AS b ON a.id=b.id
WHERE a`id`='myid'
FOR UPDATE;
Now you have row locks on both tablea and tableb if rows exist. If the SELECT returns nothing, you know the id is not present in tablea. If the SELECT returns a row with a value for a_id, but a NULL for b_id, then you know it's present in tablea and not in tableb.
If the row is present in both tables, this locks rows in both tables simultaneously. If you do it in two steps, you might risk a race condition and deadlock.
Try the INSERT and use ON DUPLICATE KEY UPDATE:
INSERT INTO `tableb` (id, somefieldintableb) VALUES ('myid', 'somevaluefortableb')
ON DUPLICATE KEY UPDATE `somefieldintableb`='somevaluefortableb';
If the row with your desired id value is not present, this will insert it. If the row is present, this will update the row. And you're sure to have access to an existing row, because your SELECT FOR UPDATE locked it earlier.
Don't use table locks if you can avoid it. That's a sure way to create a bottleneck in your application.
Re your comments:
Yes, you can use extra join conditions for the date column.
You don't have to update all the columns when you use ON DUPLICATE KEY UPDATE. You can leave most of them alone if the row exists, and just update one, or a few, or whatever.
Also you can reference the value you tried to insert.
INSERT INTO `tableb` (id, date, col1, col2, col3, col4, col5, col6)
VALUES ('myid', $a_date, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE col4=VALUES(col4);
For more details, I recommend reading http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html