I am comparing two strings and want to verify they are equal. Text-wise, they look equal, but in digging into the Ascii Bytcode, the space character used on each string are different. Is there a way to do regex or a bytecode change?
I am using Ruby/Watir.
More details:
79 99 101 97 110 32 79 108 101 111 #Employee
79 99 101 97 110 194 160 79 108 101 111 #Emp
The two Strings are "Ocean Oleo" and "Ocean Oleo". They look be equal, but according to the Ascii Bytecode the Ascii Bytecode they appear to be using different spaces. The first uses number 32 (space), and the second uses 194, 160 (which apparently also creates a space).
assert((employee.include? emp), "Employee, #{employee}, from search result is NOT expected")
I want this code to evaluate to true, but it can't because of the space issue.
Thoughts?
You’ve got a non-breaking space in your string. The bytes 194, 160 (c2
, a0
in hex) are the UTF-8 encoding of the Unicode character U+00A0 NO-BREAK SPACE.
The simple way to fix this would be to swap all non-breaking spaces with normal ones with gsub!
, something like:
my_string.gsub! /\u00a0/, ' '
# now my_string will just have "normal" spaces
This may be enough for you, but a more complete way to do this would be to use a library to normalize your strings before comparing them. For example using the UnicodeUtils:
# first install the gem, obviously
require 'unicode_utils'
# ...
my_string = UnicodeUtils.compatibility_decomposition(my_string)
This not only changes non-breaking spaces to normal spaces but a range of other things like making sure any characters with diacritics (e.g. é
) are represented the same way (they can be represented in two ways in Unicode), and changing ligatures like ffi
to separate characters (ffi
).