Search code examples
utf-8comparisondiffbeyondcomparebeyondcompare4

Ignore non UTF-8 characters in Beyond Compare


My project include some units of measurement that are expressed with non UTF-8 characters like the squared symbol. With most editors these are displayed with the following character: �.

I am comparing parts of the source code with beyond compare and I would like to ignore the cases where these symbol appear. I tried with these two solutions:

Beyond Compare - ignore certain text strings?

How do I make Beyond Compare ignore certain differences while comparing versions of Delphi Form Files

but in both cases the differences are still marked in red (? vs � or ² vs �). How can I fix that?


Solution

  • If the characters are unprintable characters, you can define them as unimportant text in Beyond Compare 4's Text Compare using a hex value.

    As an example, assume the character is superscript 2, the squared symbol, with hex value 0x00B2.

    1. Load files in the Text Compare.
    2. Click the Rules toolbar button (referee icon).
    3. In the Importance tab, click Edit Grammar.
    4. In the Grammar tab, click +.
    5. Element name: Squared
    6. Text matching: \x{00B2}
    7. Check Regular Expression
    8. Click OK.
    9. Click OK.
    10. In the Grammar element list, uncheck Squared to make it unimportant.
    11. Click OK.

    If View | Ignore Unimportant Text is turned on, differences matching Squared will show as a match (black). If it is turned off, differences matching matching Squared will show in blue.

    In the above instructions, the regular expression \x{nnnn} matches on character with hex value nnnn.

    References:

    Unicode Character Superscript 2

    Define Unimportant Text in Beyond Compare

    Beyond Compare Help - Regular Expression Reference