I need advice regarding text analysis. The program is written in php.
My code needs to receive a URL and match the site words against the DB and seek for a match.
The tricky part is that the words aren't allways written in the DB as they appear in the text.
example:
Let's say my DB has these values: Word = letters
And the site has: Wordy thing
I'm supposed to output: Letters thing
My code makes several regex an after each one tries to match the searched word against the DB.
For each word that isn't found I make 8 queries to the DB. Most of the words don't have a match so when we talk about a whole website that has hundreds of words my CPU level makes a jump.
I thought about storing every word not found in the DB globaly as they appear ( HD costs less than CPU ) or maybe making an array or dictionary to store all of that.
I'm really confused with this project. It's supposed to serve a lot of users, with the current code the server will die after 10-20 user requests.
Any thoughts?
Edit: The searched words aren't English words and the code runs in a windows 2008 server
Thank you all for your answers. Unfortunately none of the answers helped me, maybe I wasn't clear enough.
I ended up solving the issue by creating a hash table with all of the words on the DB (about 6000 words), and checking against the hash instead of the DB.
The code started up with 4 sec execution time and now it's 0.5 sec! :-)
Thanks again