Search code examples
emeditor

In EmEditor, how to filter for lines matching across columns?


To clean sentence pairs (source/target language) to train a machine translation (ML) model, I want to filter out rows where source=target. Does this require a macro?

In Excel, e.g., you can match/compare strings across column A and B in row 3 with formula “=A3=B3”. Matching strings result in TRUE. Non-matching strings result in FALSE.

enter image description here


Solution

  • I understand your question is: Suppose you have a text like below:

    Source,Target,
    Database,Database,
    Database,Datenbank,
    

    Then, you want to add the third row like this:

    Source,Target,FALSE
    Database,Database,TRUE
    Database,Datenbank,FALSE
    

    If this is correct, you can use JavaScript replacement expressions in the Replace dialog of EmEditor.

    1. Click the third empty column heading to select the whole third column.

    2. Press Ctrl+H to display the Replace dialog box.

    3. Click the Advanced... button to display the Advanced dialog box. Click Reset and OK. This step ensures the advanced options are set as default.

    4. In the Replace dialog box, Enter:

    • Find: .*

    • Replace with: \J cell(-2)==cell(-1) ? "TRUE" : "FALSE"

    1. Make sure the Regular Expressions and In the Selection Only options are set.

    2. Click Replace All.