Search code examples
regexpowershellappendnotepad++richtextfx

MarcEdit append 001 field


I have a .mrk file containing over 5,000 records, all with duplicate 001 fields.

My current thought is to use Notepad++, PowerShell, or VBS, to append the end of any line starting with =001 with its line number, replacing:

=001 20110708095140328

with

=001 2011070809514032800002

Using Notepad++ I'm able to find every line starting with =001, using (\n=001 .*)\r

But I don't know if Notepad++'s Regex or TextFX can replace with a line's number.


Solution

  • As an alternative to the TextFX solution I mentioned but cannot test, there is a way to do that without any plugin with the Column Editor (Edit->Column Editor or Alt-C).

    First, select the column you want to insert the line number in. If your lines are fixed-width, you'll be able to directly insert it at the desired position, otherwise I suggest inserting in the first column. To select the column, use Column Mode edition by pressing Alt-Shift while you are on the desired column of the first line of your file, then click on the desired column of the last line. You will see a 0-width selection spanning on the selected lines, and typing letters would write them on every line in this column.

    Here we're not going to write anything ourselves, but open the Column Editor mentionned above and chose to insert a number starting from 1, incrementing by 1. You will also want to check the "Leading zeros" checkbox so that the numbers are fixed-width.

    If you started with the following content :

    bla
    bla
    X bla
    bli
    bla
    X blu
    bli
    

    You'll end up with this one :

    1bla
    2bla
    3X bla
    4bli
    5bla
    6X blu
    7bli
    

    At this point your desired result can easily be obtained with one or two regexs :

    • you want to remove the line number from lines which don't start with =001 : match the line number not followed by =001 by using a lookahead

    • if your lines weren't fixed-width, you had to insert the line number at their start, and want to move it to the end : use capturing group to match separately the line number and the rest of the line, and reconstruct the line by inverting their order.