Search code examples
phpmysqlstrpos

PHP performance strpos filename or MySQL query


I'm storing some highres files on a serves (100K+ if that matters) and I organize them in different galleries. When somebody access the gallery I only show thumbnails and a lowres version of the images, which in some cases is watermarked in other case not. Now due to the fact that I'm speaking of a huge amount of pictures the low resolution version which is displayed on the gallery page is purged from the server after X days. If somebody does access the gallery and the lowres version of the file doesn't exist on the server it is generated on the fly, however when I generate the lowres I might need to watermark it or not.

Currently, the script which display the images doesn't do any SQL call it's all based on filesystem (if file exists, etc) and the decision to watermark an image or not is based on:

if (strpos($file_name,"FREE")===false){ //add watermark }else{ //just resize}

My logic says this is more performant than doing an SQL query against the filename or fileid and checking if it should be a non watermarked image. However I find it a bit of inconvenience to have the file names containing the word FREE.

How much of a performance difference I can expect if I use an SQL query instead of the strpos?

EDIT/UPDATE

To summarize up the answers and the comments:

  • The system is being designed to be working for a few years, with all the galleries which are added over time to be still accessible. This means the storage requirement is just HUGE, and older albums high res image will be moved off-site on slow and cheap dedicated storage, so suggestion to keep an additional overhead off all the thumbnails it's a severe no go option. Last year I needed to store over 3TB of images (this is only the highres size).

  • I'm on Lighttpd and I'm intending to use rewrite-if-not-file to get the best performance for the existing thumbnails.

  • I know about the I/O write penalty and I intend to keep it to the minimum, writing only when necessary, preferably reading. However the comment from @N.B. did actually got me thinking to store the lowres images on an SSD, so even when I need to create and write them to the disk have a much better I/O performance than a normal HDD.

  • It will be actually difficult to do some test (@Steve E.) I'm behind schedule and the system has to go live by the end of this month. (I just got the bomb today that they are pulling the plug on the old system). Yes the flexibility is the main reason I'm tempted to go with SQL, but I'm expecting the SQL database to grow significantly, beside the file information, there are loads of other informations that I need to store as well, tagging, purchases, downloads, etc, so I'm also trying to make sure that I'm not putting too much pressure on the SQL, when I can actually leverage some of that with a good structure and Filesystem access.


Solution

  • Without doing tests, it is hard to be certain which approach would be faster. Simple logic may suggest that PHP accessing disk is faster, but that is based on a lot of assumptions.

    In a well configured system, variables that are required frequently will be in the RAM cache rather than on disk. This applies to caching of the filesystem as well as MySQL caching indexes. The impact of caching and other mechanisms can give results different to what might be expected.

    In many scenarios, either solution will work and be adequate since the time taken for either request should be minimal in a well designed system and the extra performance of one approach may not be worth the inconvenience you find with using 'FREE' in the filename. It would not be too hard to trial both methods and measure performance.

    In the long term also consider that MySQL provides more flexibility for adding additional functionality that would get complicated if all the state was stored in file names.

    If performance really is a significant issue, then look at using the webserver to check for the file on disk (or in a cache like memcache) and return that if it exists before passing the request to PHP at all. Both Nginx and Apache can do this, it's a common acceleration approach for high traffic websites.