I'm interested in doing text categorization using LibSVM. How do you recommend I convert the terms/words to numerical data, so LibSVM can understand it?
Thank you!
In text categorization people tend to build histograms of the words used in the domain, sometimes they look at combinations of two words and put that in their histogram (this are called bigrams). But it really depends on your data and your objectives.