Q: Can a Machine Learning model solve rule-based problems?

Can Machine Learning be used to validate statements or catch errors in text documents?

For example, if you teach a classifier that "You should eat apples twice per day", but in a document that you're testing on, the statement is "You should eat apples three times per day", can the statement be flagged?

Obviously you can build some rules-based software that catches these, but my question centers around training an ML model to catch these, as rules change.

I have looked at word2vec and NLTK and performed some tests with them, but can't connect the dots for teaching the classifier.

If it's possible, how would one go about it or provide some direction?

Thanks, Doug

Solution

(Got too long for a comment. )

Yes it can. However, it is freakingly complicated. This kind of reasoning and analysis is done by Watson for example. IBM is calling these cognitive computing. As you wrote rule based (or logical reasoning) systems can solve such tasks. So the question you should ask yourself is how you can extract the required facts from text. => NLP , Part Of Speech, Named Entity,... However the task is extremely hard because " not more then 100times" a day is not contradicting the sentence. So reasoning would require rich background knowledge.

As said it is an extremely broad topic. You would have to sketch the solution and then pick a tiny piece, which would be called a PhD thesis ;). Which is illustrated in this nice image http://matt.might.net/articles/phd-school-in-pictures/

So looking with the right keywords for PhD thesis's turned up http://nakashole.com/papers/2012-phd-thesis.pdf . This one might provide you a few nights of reading.

If you want to try something hands on with NLTK I would generate parse trees for the sentences you want to analyse. Afterwards you could try to align these and check for overlaps and deviations. However I'm not sure how to draw conclusions. A slightly simpler version would be to match word by word. Something along Levenstein Distance calculations.