Search code examples
phppythonmysqlfull-text-searchquery-performance

faster way for Search in multiple databases


I am working on big eCommerce shopping website. I have around 40 databases. i want to create search page which show 18 result after searching by title in all databases.

(SELECT id_no,offers,image,title,mrp,store from db1.table1 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
 UNION ALL (SELECT id_no,offers,image,title,mrp,store from db3.table3 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%')
 UNION ALL (SELECT id_no,offers,image,title,mrp,store from db2.table2 WHERE MATCH(title) AGAINST('$searchkey') AND title like '%$searchkey%') 
LIMIT 18

currently i am using the above query its working fine for 4 or more character keyword search like laptop nokia etc but takes 10-15 sec for processes but for query with keyword less than 3 characters it takes 30-40sec or i end up with 500 internal server error. Is there any optimized way for searching in multiple databases. I generated two index primary and full text index with title

Currently my search page is in php i am ready to code in python or any other language if i gets good speed


Solution

  • FULLTEXT is not configured (by default) for searching for words less than three characters in length. You can configure that to handle shorter words by setting a ...min_token_size parameter. Read this. https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html You can only do this if you control the MySQL server. It won't be possible on shared hosting. Try this.

    FULLTEXT is designed to produce more false-positive matches than false-negative matches. It's generally most useful for populating dropdown picklists like the ones under the location field of a browser. That is, it requires some human interaction to choose the correct record. To expect FULLTEXT to be able to do absolutely correct searches is probably a bad idea.

    You simply cannot use AND column LIKE '%whatever%' if you want any reasonable performance at all. You must get rid of that. You might be able to rewrite your python program to do something different when the search term is one or two letters, and thereby avoid many, but not all, LIKE '%a%' and LIKE '%ab%' operations. If you go this route, create ordinary indexes on your title columns. Whatever you do, don't combine the FULLTEXT and LIKE searches in a single query.

    If this were my project I'd consider using a special table with columns like this to hold all the short words from the title column in every row of each table.

    id_pk INT autoincrement
    id_no INT
    word  VARCHAR(3) 
    

    Then you can use a query like this to look up short words

      SELECT a.id_no,offers,image,title,mrp,store 
        FROM db1.table1 a
        JOIN db1.table1_shortwords s ON a.id_no = s.id_no
       WHERE s.word = '$searchkey'
    

    To do this, you will have to preprocess the title columns of your other tables to populate the shortwords tables, and put an index on the word column. This will be fast, but it will require a special-purpose program to do the preprocessing.

    Having to search multiple tables with your UNION ALL operation is a performance problem. You will be able to improve performance dramatically by redesigning your schema so you need search only one table.

    Having to search databases on different server machines is a performance problem. You may be able to rig up your python program to search them in parallel: that is, to somehow use separate tasks to search each one, then aggregate the results. Each of those separate search tasks requires its own connection to the data base, so this is not a cheap or simple solution.

    If this system faces the public web, you will have to redesign it sooner or later, because it will never perform well enough as it is now. (Sorry to be the bearer of bad news.) Many system designers like to avoid redesigning systems after they become enormous. So, if I were you I would get the redesign done.