Search code examples
javascriptregexwhitespace

Why is   different from " " while comparing


Can you please explain me why I'm getting false while comparing text === ' ' in this case?

var div = document.getElementById('d');

div.innerHTML = ' ';

// ' '
var text = div.innerText;

console.log(/\s/.test(' ')); // true
console.log(/\s/.test(text)); // true
console.log(text === ' '); // false
#d {
  border: 1px solid;
  position: absolute;
}
<div id="d"></div>

It seems to be not logic: \s is A, ' ' is B, text is C.

A  = B
A  = C
B != C ???

Solution

  • The space " " and the non-breaking space are two different characters. The non-breaking space has a code unit of 160, whereas the space has a code unit of 32.

    Going off this observation, the spec uses the following logic when strict equality is used between two non-numeric types:

    7.2.13 SameValueNonNumeric ( x, y )

    The internal comparison abstract operation SameValueNonNumeric(x, y), where neither x nor y are numeric type values, produces true or false. Such a comparison is performed as follows:

    • Assert: Type(x) is not Number or BigInt. Assert: Type(x) is the same as Type(y).

    • If Type(x) is Undefined, return true. If Type(x) is Null, return true.

    • If Type(x) is String, then

    • If x and y are exactly the same sequence of code units (same length and same code units at corresponding indices), return true; otherwise, return false. ...

    The last statement above is not true as both have different code unit values (as seen above), and so, we get false when you try and compare the two. This shouldn't be too surprising as we're comparing two different strings (as indicated by their code unit values).

    When you use \s in a regular expression, however, you're referring to special whitespace characters:

    Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to

    [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]
    

    - MDN

    The character set above includes both the space character (seen at the beginning of the character set) and the non-breaking space (which has a Unicode encoding of U+00A0), and so both your tests using regular expressions will return true.