Search code examples
phpregexperformancememory-efficient

Code efficiency for text analysis


I need advice regarding text analysis. The program is written in php.

My code needs to receive a URL and match the site words against the DB and seek for a match.

The tricky part is that the words aren't allways written in the DB as they appear in the text.

example:

Let's say my DB has these values: Word = letters

And the site has: Wordy thing

I'm supposed to output: Letters thing

My code makes several regex an after each one tries to match the searched word against the DB.

For each word that isn't found I make 8 queries to the DB. Most of the words don't have a match so when we talk about a whole website that has hundreds of words my CPU level makes a jump.

I thought about storing every word not found in the DB globaly as they appear ( HD costs less than CPU ) or maybe making an array or dictionary to store all of that.

I'm really confused with this project. It's supposed to serve a lot of users, with the current code the server will die after 10-20 user requests.

Any thoughts?

Edit: The searched words aren't English words and the code runs in a windows 2008 server


Solution

  • Thank you all for your answers. Unfortunately none of the answers helped me, maybe I wasn't clear enough.

    I ended up solving the issue by creating a hash table with all of the words on the DB (about 6000 words), and checking against the hash instead of the DB.

    The code started up with 4 sec execution time and now it's 0.5 sec! :-)

    Thanks again