Search code examples
phpmysqlconcurrencymysql-insert-id

How bad is using SELECT MAX(id) in MYSQL instead of mysql_insert_id() in PHP?


Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).

I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...

"Oh okay, we'll only face this problem when we have 
(a) a lot of users, or 
(b) it'll only happen when two people try doing something
    at _exactly_ the same time"

I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.

Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...


Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!


Solution

  • The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.

    It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.

    I kind of agree with @Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.

    And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.

    And right from the MySQL documentation on the subject:

    You can generate sequences without calling LAST_INSERT_ID(), but the utility of 
    using the function this way is that the ID value is maintained in the server as 
    the last automatically generated value. It is multi-user safe because multiple 
    clients can issue the UPDATE statement and get their own sequence value with the
    SELECT statement (or mysql_insert_id()), without affecting or being affected by 
    other clients that generate their own sequence values.