I have around 500,000 records of personal profile in MySQL database containing a birthdate column (dob). Since I had to get the ages for each profile, I needed to calculate it dynamically which I can either do via PHP (date_diff(date_create($dob), date_create('today'))->y)
or through SQL ('SELECT TIMESTAMPDIFF(YEAR, dob, CURDATE()) AS age')
.
Which of the two is faster or more preferred especially if I have hundreds of thousands of rows?
Here is a test:
Create a table with 100K random dates
drop table if exists birthdays;
create table birthdays (
id int auto_increment primary key,
dob date
);
insert into birthdays (dob)
select '1950-01-01' + interval floor(rand(1)*68*365) day as dob
from information_schema.COLUMNS c1
, information_schema.COLUMNS c2
, information_schema.COLUMNS c3
limit 100000
;
Run this PHP script
<?php
header('Content-type: text/plain');
$db = new PDO("mysql:host=localhost;dbname=test", "test","");
### SQL
$starttime = microtime(true);
$stmt = $db->query("SELECT id, dob, TIMESTAMPDIFF(YEAR, dob, CURDATE()) AS age FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);
$runtime = microtime(true) - $starttime;
echo "SQL: $runtime \n";
### PHP
$starttime = microtime(true);
$stmt = $db->query("SELECT id, dob FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);
foreach ($data as $row) {
$row->age = date_diff(date_create($row->dob), date_create('today'))->y;
}
$runtime = microtime(true) - $starttime;
echo "PHP: $runtime \n";
Result:
SQL: 0.19094109535217
PHP: 1.203684091568
It looks like the SQL solution is 6 times faster. But that is not quite true. If we remove the code which calculates the age from both solutions, we will get something like 0.1653790473938. That means the overhead for SQL is 0.025 sec, while for PHP it is 1.038 sec. So SQL is 40 times faster in this test.
Note: There are faster ways to calculate the age in PHP. For example
$d = date('Y-m-d');
$row->age = substr($d, 0, 4) - substr($row->dob, 0, 4) - (substr($row->dob, 5) > substr($d, 5) ? 1 : 0);
is like four times faster - while date('Y-m-d')
consumes more than 80% of the time. If you find a way to avoid any date function, you might get close to the performance of MySQL.