I'm trying to create a data extractor algoritm from group buying sites to build an agregator for deals. First I need an algorith that will extract title,price,discount,image,coordinates.
I have solution for image,discount and coordinates but for title and category recognition I need to create an naive bayes algorithm. What is best language to do this: php? python? js? node.js?
What do I need to create an algorithm?
An model with examples? etc. I give 100 titles and then give all web content from some sites and do script can recognize what sentence is a title?
So I dont need a word. I need an sentence and that sentence is sometimes <h1> - <h2>
and somethings other.
I seriously cannot understand much of your post, but since naive bayes is something very commonly requested here on SO, I created a simple piece of code which can be used without any additional library (like NLTK) in python (and is also way faster than NLTK for training). You can find it here.