Search code examples
javaalgorithmxoranagram

Comparing two string using XOR return true but the strings are different


I'm testing some ways to identify anagrams and I found a situation that got me off guard. I found out that it's possible to do using XOR so I was testing it using the XOR operator. Here's my code:

public static void main(String[] args) {
    // TODO code application logic here
    
    String s1 = "pe";
    String s2 = "ep";
    
    System.out.println(isAnagram(s1, s2));
}

private static boolean isAnagram(String firstString, String secondString) 
{
    int control = 0;
    
    System.out.println("Comparing: " + firstString + " and " + secondString);
    
    for (int i = 0; i < firstString.length(); i++) {
        control = control ^ firstString.charAt(i);
    }
    
    for (int i = 0; i < secondString.length(); i++) {
        control = control ^ secondString.charAt(i);
    }
    
    System.out.println("Control: " + control);
    return (control == 0);
}

When the 2 strings have the same sets of characters, even when they are not in the same order the control variable is 0 returning true to anagram. However, when the 2 strings are different the control has a value > 0 returning false to anagram. I tried using many words and most of them worked, but for some reason it often has some strange situations where, for example, "v" and "ils" return true to anagram or "tat" and "atata" returns true as well.

I would like to understand why it happens and what should I do so this situations doesn't show up anymore.


Solution

  • Simply, the algorithm you are using is not going to work. Since XOR is associative and commutative (like, for example, addition), XORing together all the characters in a string produces the same value regardless of the order in which you do the XORs. Similarly, you get the same sum of the values in an array regardless of the order in which you do the additions.

    But, also like addition, XOR throws away information. You cannot go backwards from the result to the original values: 1+3 = 2+2 = 0+4. And similarly with XOR: 1^3 = 6^4 = 0^2.

    One particular feature of XOR is that a ^ a = 0 for any a; also a ^ 0 = a. (These statements are related.) So you can always just remove pairs of identical characters; the XOR combination of atata is the same as the combination of tat and also the same as a.