I have recently started understanding algorithms related to natural language processing, and have come across various sites which indicate that Naive Bayes cannot capture the XOR concept. Firstly I do not understand what exactly is the XOR problem. Can someone please explain, what the XOR problem is with a simple classification example if possible.
The XOR problem is the most simple problem that is not linearly separable. Imagine you have two Boolean variables X and Y, and the target value you want to "predict" is the result from XORing the two variables. That is, only when either (but not the other) is 1, you want to predict 1 as outcome, and 0 otherwise. A bit more graphically:
Y ^
1 | XOR(x=0,y=1)=1 XOR(x=1,y=1)=0
|
0 | XOR(x=0,y=0)=0 XOR(x=1,y=0)=1
+------------------------------->
0 1 X
As you can see, for the four "points" of my "plot" above (X horizontally, Y vertically; imagine the commas are the "points", if you like), there is no way you can draw a straight line that separates the two outcomes (the two 1s in the upper left and lower right, and the two 0s, also in opposing corners). So linear classifiers, which model the class separation using straight lines, cannot solve problems of this nature.
Now, as to Naive Bayes, it models independent events. Given only X and Y, it can model the distribution of xs and it can model the ys, but it does not model any relation between the two variables. That is, to model the XOR function, the classifier would have to observe both variables at the same time. Only making a prediction based on the state of X without taking into account Y's state (and vice versa) cannot lead to a proper solution for this problem. Hence, the Naive Bayes classifier is a linear classifier, too.