Search code examples
performanceemeditor

Optimised EmEditor macro to populate column based on another column for a large file


I’ve got a really large file, circa 10m rows, in which I’m trying to populate a column based on conditions on another column via a jsee macro. While it is quite quick for small files, it does take some time for the large file.

//pseudocode
//No sorting on Col1, which can have empty cells too
For all lines in file
     IF (cell in Col2 IS empty) AND (cell in Col1 IS NOT empty) AND (cell in Col1 = previous cell in Col1)
          THEN cell in Col2 = previous cell in Col2

//jsee code
document.CellMode = true;   // Must be cell selection mode
totalLines = document.GetLines();
    
for( i = 1; i < totalLines; i++ ) {

     nref = document.GetCell( i, 1, eeCellIncludeNone );
     gsize = document.GetCell( i, 2, eeCellIncludeNone );

     if (gsize == "" && nref != "" && nref == document.GetCell( i-1, 1, eeCellIncludeNone ) ) {
          document.SetCell( i, 2, document.GetCell( i-1, 2, eeCellIncludeNone ) , eeAutoQuote);
      }
 }

Input File:

Reference Group Size
14/12/01819 1
14/12/01820 1
15/01/00191 4
15/01/00191
15/01/00191
15/01/00198
15/01/00292 3
15/01/00292
15/01/00292
15/01/00401 5
15/01/00401
15/01/00402
1
15/01/00403 2
15/01/00403
15/01/00403
15/01/00403
15/01/00404
20/01/01400 1

Output File:

Reference Group Size
14/12/01819 1
14/12/01820 1
15/01/00191 4
15/01/00191 4
15/01/00191 4
15/01/00198
15/01/00292 3
15/01/00292 3
15/01/00292 3
15/01/00401 5
15/01/00401 5
15/01/00402
1
15/01/00403 2
15/01/00403 2
15/01/00403 2
15/01/00403 2
15/01/00404
20/01/01400 1

Any ideas on how to optimise this and make it run even faster?


Solution

  • I wrote a JavaScript for EmEditor macro for you. You might need to set the correct numbers in the first 2 lines for iColReference and iColGroupSize.

    iColReference = 1;   // the column index of "Reference"
    iColGroupSize = 2;   // the column index of "Group Size"
    document.CellMode = true;   // Must be cell selection mode
    sDelimiter = document.Csv.Delimiter;  // retrieve the delimiter
    nOldHeadingLines = document.HeadingLines;  // retrieve old headings
    document.HeadingLines = 0;   // set No Headings
    yBottom = document.GetLines();   // retrieve the number of lines
    if( document.GetLine( yBottom ).length == 0 ) {  // -1 if the last line is empty
        --yBottom;
    }
    str = document.GetColumn( iColReference, sDelimiter, eeCellIncludeQuotes, 1, yBottom );  // get whole 1st column from top to bottom, separated by TAB
    sCol1 = str.split( sDelimiter );
    str = document.GetColumn( iColGroupSize, sDelimiter, eeCellIncludeQuotes, 1, yBottom );  // get whole 2nd column from top to bottom, separated by TAB
    sCol2 = str.split( sDelimiter );
    
    s1 = "";
    s2 = "";
    for( i = 0; i < yBottom; ++i ) {  // loop through all lines
        if( sCol2[i].length != 0 ) {
            s1 = sCol1[i];
            s2 = sCol2[i];
        }
        else {
            if( s1.length != 0 && sCol1[i] == s1 ) {  // same value as previous line, copy s2
                if( s2.length != 0 ) {
                    sCol2[i] = s2;
                }
            }
            else {  // different value, empty s1 and s2
                s1 = "";
                s2 = "";
            }
        }
    }
    
    str = sCol2.join( sDelimiter );
    document.SetColumn( iColGroupSize, str, sDelimiter, eeDontQuote );  // set whole 2nd column from top to bottom with the new values
    document.HeadingLines = nOldHeadingLines;   // restore the original number of headings
    

    To run this, save this code as, for instance, Macro.jsee, and then select this file from Select... in the Macros menu. Finally, select Run Macro.jsee in the Macros menu.