I have a table of forum posts, and want to improve the basic search functionality as we get a lot of users from all over the world who are not native English speakers and will have trouble finding results when they spell incorrectly. The current forum search is exact.
Which of these designs will perform best? Assume the database has 500,000 records and the search is used frequently. Ideally I would like it to search every record.
Design One
Along side each forum post, I store soundex_post, which contains all the soundex data. When a search is run, it soundexes all search terms, and does a LIKE operation on the soundex fields.
Design Two
I normalise it. Every soundex code is stored in a new table, tblSoundexCodes. Then there is a table tblForumPostSoundexCodes:
ID | Post_ID | SoundexCode_ID | Count
Then when a soundex is searched for, I simply pull out all the Post_IDs where SoundexCode_ID = n
Am I correct that method two will be considerably faster, but that it will be a lot harder to maintain (IE, when people edit posts).
Design Two is better.
Design two won't be faster. The data storage will be more compact, and you will have to update or insert a row into tblForumPostSoundexCodes, as well as insert a row into tblSoundexCodes, when someone writes or updates a post.
You'll have to verify that this soundex transaction processing takes place for every change to a post (create, update, delete).