Search code examples
phpmysqlalgorithmforumwebbbs

Is it possible to auto-categorize posts in forums or BBS?


If I have a forum using tags to categorize posts, is it possible to automatically add tags according to contents and titles after posts are created ?

Thank you very much


Solution

  • The simplest way to do this would be to have a table of known tags. Iterate over each word in the post, and if the word is in the tag table add it to the list. To make this slightly more effective, you could store the tag in both its display and stemmed version (e.g., algorithms and algorithm). Then compared the stemmed words in the post with the stemmed tag name. See Porter's stemming algorithm for a simple way to do that (for English words).

    A more effective solution would be using something like TF-IDF and associate vectors with each tag. Create a vector for the new post and compare it to each tag vector using cosine similarity. Whichever tags are above a certain threshold would be added to the post. I've never used it for auto-tagging, but in my experience it is a very effective matching tool when dealing with non-spammy data. (i.e., People aren't trying to cheat or fool the system.)

    Both of these methods assume that you already have some sort of tag dictionary built to start things off. You could guess at tag names by looking at which uncommon words (need a frequency table for that) are used frequently in the post.