We're considering adding rich text editing to our system. I understand we'll get a heavily tagged string from our textareas. But I wonder about database searches for text within that data. There might be non-html tags containing versioning comments or other stuff that we wouldn't want to be searched.
How does something like this get implemented? Do we store the data twice, once with tags and once without? Or are there sql server tools to help skip tags during searches that won't kill us performance-wise?
(We're on sql server 2005 now, moving to 2008)
I would probably use full text search and include all HTML tags as stopwords. You can read more about these here.
Good luck!