Search code examples
regexemeditor

Delete conditionally repeated string in multiple lines


Using EmEditor, I want to delete all the repeated instances of a string that occupies the full line plus the line above it. For example, in this text the repeated string is Cyperus esculentus (it could be anything else) and I want all its repeated instances deleted, including the previous line, i.e. language code. So far, what I figured out is something like this:

.{2,3} \nCyperus esculentus\n

But the problem is that I have to replace the repeated string with the one that is repeated in each different text.

ar 
سعد لذيذ
ast 
Cyperus esculentus
azb 
یئمه‌لی توپالاق
az 
Yeməli topalaq
bo 
ཆུ་འབྲུམ།
ca 
Xufa
ceb 
Cyperus esculentus
cs 
Šáchor jedlý
de 
Erdmandel
en 
Cyperus esculentus
eo 
Cyperus esculentus
es 
Cyperus esculentus
eu 
Bedaur
fa 
اویار سلام زرد
fr 
Souchet comestible
gl 
Xunca doce
ha 
Aya
he 
גומא נאכל
id 
Cyperus esculentus
it 
Cyperus esculentus
ja 
ショクヨウガヤツリ
la 
Cyperus esculentus
nl 
Knolcyperus
nv 
Tłʼohigaaí
pl 
Cibora jadalna
pt 
Cyperus esculentus
ru 
Чуфа
srn 
Affo
sv 
Jordmandel
th 
แห้วไทย
tr 
Yer bademi
uk 
Смикавець їстівний
uz 
Yerbodom
vi 
Củ gấu tàu
war 
Cyperus esculentus
zh 
油莎草

The expected result is what is left after applying the regex I mentioned above (to clarify, in these texts there is only one string that can is repeated, so the regex does not have to look for multiple different repeated strings):

ar 
سعد لذيذ
azb 
یئمه‌لی توپالاق
az 
Yeməli topalaq
bo 
ཆུ་འབྲུམ།
ca 
Xufa
cs 
Šáchor jedlý
de 
Erdmandel
eu 
Bedaur
fa 
اویار سلام زرد
fr 
Souchet comestible
gl 
Xunca doce
ha 
Aya
he 
גומא נאכל
ja 
ショクヨウガヤツリ
nl 
Knolcyperus
nv 
Tłʼohigaaí
pl 
Cibora jadalna
ru 
Чуфа
srn 
Affo
sv 
Jordmandel
th 
แห้วไทย
tr 
Yer bademi
uk 
Смикавець їстівний
uz 
Yerbodom
vi 
Củ gấu tàu
zh 
油莎草

This is what worked for me

document.selection.StartOfDocument(false);
document.DeleteDuplicates("",eeIncludeAll);
document.selection.Replace("([a-z]{2,3} \\n)([a-z]{2,3} \\n)","\\2",eeFindReplaceCase | eeReplaceAll | eeFindReplaceRegExp,0);
document.selection.Replace("([a-z]{2,3} \\n)([a-z]{2,3} \\n)","\\2",eeFindReplaceCase | eeReplaceAll | eeFindReplaceRegExp,0);
document.selection.Replace("([a-z]{2,3} \\n)([a-z]{2,3} \\n)","\\2",eeFindReplaceCase | eeReplaceAll | eeFindReplaceRegExp,0);

Solution

    1. In the Filter toolbar, select 1 from the Number of Additional Visible Lines Above Matched Lines, enter Cyperus esculentus, and press the Enter key.

    2. Make sure the Block Multiple Changes button is clear (NOT set) in the same toolbar.

    3. Select Select All and Delete on the Edit menu (or press Ctrl + A, Delete when the keyboard forcus is in the editor).

    4. Click the Abort button in the Filter toolbar. EmEditor - Filter toolbar

    If you would like to use a macro, here is the macro for you:

    fs = document.filters;
    fs.Clear();
    fs.AddFind( "Cyperus esculentus", eeFindReplaceCase, 0 );
    fs.VisibleLinesAbove  = 1;
    fs.VisibleLinesBelow  = 0;
    document.filters = fs;
    document.selection.SelectAll();
    document.selection.Delete();
    fs.Clear();
    document.filters = fs;
    

    You can run this macro after you open your data file. To do this, save this code as, for instance, Filter.jsee, and then select this file from Select... in the Macros menu. Finally, open your data file, and select Run in the Macros menu while your data file is active. Make sure the Block Multiple Changes button is clear before you run the macro.

    References: EmEditor Macro Reference: Filters Collection

    Updates

    I understand that "Cyperus esculentus" could be any other phrase. Assuming the duplicates always appear at even line numbers, here is the macro you can use instead. This macro selects all even numbers, bookmark duplicates in the selected lines, and delete all bookmarked lines (+one line above). Make sure the Block Multiple Changes button is clear before you run the macro.

    editor.ExecuteCommandByID(4323);  // clear all bookmarks
    document.selection.StartOfDocument(false);
    editor.ExecuteCommandByID(4208);  // No Wrap
    nLines = document.GetLines();
    document.selection.LineDown(false,1);
    for( i = 0; i < nLines; i += 2 ) {
        editor.ExecuteCommandByID(4153);  // select character
        document.selection.CharRight(false,1);
        editor.ExecuteCommandByID(4153);
        document.selection.StartOfLine(false,eeLineView | eeLineHomeText);
        document.selection.LineDown(false,2);
    }
    
    document.DeleteDuplicates("",eeSortSelectionOnly | eeBookmark | eeIncludeAll);  // bookmark all duplicates in selected lines
    document.selection.Collapse();
    
    // filter bookmarked lines only
    fs = document.filters;
    fs.Clear();
    fs.AddFind( "", 0, eeExFindBookmarkedOnly );
    fs.VisibleLinesAbove  = 1;
    fs.VisibleLinesBelow  = 0;
    document.filters = fs;
    
    document.selection.SelectAll();
    document.selection.Delete(1);    // delete all filtered lines
    fs.Clear();
    document.filters = fs;