Search code examples
phpmysqlperformancemaintainability

Doing calculations in MySQL vs PHP


Context:

  • We have a PHP/MySQL application.
  • Some portions of the calculations are done in SQL directly. eg: All users created in the last 24 hours would be returned via an SQL query ( NOW() – 1 day)

There's a debate going on between a fellow developer and me where I'm having the opinion that we should:

A. Keep all calculations / code / logic in PHP and treat MySQL as a 'dumb' repository of information

His opinion:

B. Do a mix and match depending on whats easier / faster. http://www.onextrapixel.com/2010/06/23/mysql-has-functions-part-5-php-vs-mysql-performance/

I'm looking at maintainability point-of-view. He's looking at speed (which as the article points out, some operations are faster in MySQL).


@bob-the-destroyer @tekretic @OMG Ponies @mu is too short @Tudor Constantin @tandu @Harley

I agree (and quite obviously) efficient WHERE clauses belong in the SQL level. However, what about examples like:

  1. Calculating a 24 period using NOW() - 1 day in SQL to select all users created in last 24 hours?
  2. Return capitalized first name and last name of all users?
  3. Concatenating a string?
  4. (thoughts, folks?)

Clear examples belonging in the SQL domain:

  1. specific WHERE selections
  2. Nested SQL statements
  3. Ordering / Sorting
  4. Selecting DISTINCT items
  5. Counting rows / items

Solution

  • I'd play to the strengths of each system.

    Aggregating, joining and filtering logic obviously belongs on the data layer. It's faster, not only because most DB engines have 10+ years of optimisation for doing just that, but you minimise the data shifted between your DB and web server.

    On the other hand, most DB platforms i've used have very poor functionality for working with individual values. Things likes date formatting and string manipulation just suck in SQL, you're better doing that work in PHP.

    Basically, use each system for what it's built to do.

    In terms of maintainability, as long as the division between what happens where is clear, separating these to types of logic shouldn't cause much problem and certainly not enough to out way the benefits. In my opinion code clarity and maintainability are more about consistency than about putting all the logic in one place.


    Re: specific examples...

    1. I know this isn't what you're referring too but dates are almost a special case. You want to make sure that all dates generated by the system are created either on the web server OR the database. Doing otherwise will cause some insidious bugs if the db server and webserver are ever configured for different timezones (i've seen this happen). Imagine, for example, you've got a createdDate column with a default of getDate() that is applied on insert by the DB. If you were to insert a record then, using a date generated in PHP (eg date("Y-m-d", time() - 3600), select records created in the last hour, you might not get what you expect. As for which layer you should do this on, i'd favour the DB for, as in the example, it lets you use column defaults.

    2. For most apps i'd do this in PHP. Combining first name and surname sounds simple until you realise you need salutations, titles and middle initials in there sometimes too. Plus you're almost definitely going to end up in a situation where you want a users first name, surname AND a combine salutation + firstname + surname. Concatenating them DB-side means you end up moving more data, although really, it's pretty minor.

    3. Depends. As above, if you ever want to use them separately you're better off performance-wise pulling them out separately and concatenating when needed. That said, unless the datasets your dealing with are huge there are probably other factors (like, as you mention, maintainability) that have more bearing.

    A few rules of thumb:

    • Generating incremental ids should happen in the DB.
    • Personally, i like my default applied by the DB.
    • When selecting, anything that reduces the number of records should be done by the DB.
    • Its usually good to do things that reduce the size of the dataset DB-side (like with the strings example above).
    • And as you say; ordering, aggregation, sub-queries, joins, etc. should always be DB-side.
    • Also, we haven't talked about them but triggers are usually bad/necessary.

    There are a few core trade-offs your facing here and the balance really depends on you application.

    Some things should definitely-everytime-always be done in SQL. Excluding some exceptions (like the dates thing) for lot of tasks SQL can be very clunky and can leave you with logic in out of the way places. When searching your codebase for references to a specific column (for example) it is easy to miss those contained in a view or stored procedure.

    Performance is always a consideration but, depending on you app and the specific example, maybe not a big one. Your concerns about maintainability and probably very valid and some of the performance benefits i've mentioned are very slight so beware of premature optimisation.

    Also, if other systems are accessing the DB directly (eg. for reporting, or imports/exports) you'll benefit from having more logic in the DB. For example, if you want to import users from another datasource directly, something like an email validation function would be reusable is implemented in SQL.

    Short answer: it depends. :)