Search code examples
regexreplacenotepad++edi

How to Identify ID of containers(4 words and 7 digits) at one string in EDI code file using notepad++


The ID of each container consists of 4 words and 7 digits( in EDI files -no space between. )
At the code there are strings of 11 digits that match the expression as well. The expression has the form:

(\w{4}\d{7})

And this not fully solve matching due to letters and digits.
link for demo: https://regex101.com/r/vwH9nH/4
Another expression more closer to match is:

([A-Z]{4}d{7})

This seem to be more specified closer but not match to notepad++ to express container's ID.

In notepad++ I try:

  • Ctrl+H
  • Find What: (([A-Z]{4}d{7})h*|(?s:.) defined ID of containers
    • Replace with: (?1$1\n:)
    • check Wrap around
    • check Regular expression
    • Replace all

Here is part of code to copy notepad++:

UNB+UNOA:2+RCW OPS CENTER+TERMINAL+180808:1519+1533741570C3ED+++++RCW OPS CENTER'UNH+01533741570BAP+BAPLIE:D:95B:UN:SMDG22'BGM++CAPSTAN4.20180808151930+9'DTM+137:1808081519UTC:301'TDT+20+081S+++HSD:172:166+++9V7575:103:ZZZ:MONTE VERDE'LOC+5+BRSSA:139:6'LOC+61+COCTG:139:6'DTM+178:1808090412:201'DTM+133:1808091512:201'DTM+132:1808180041:201'RFF+VON:081N'LOC+147+0380412::5'MEA+WT++KGM:29515'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOHAI:139:6'RFF+BM:1'EQD+CN+SUDU8505087+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0380312::5'MEA+WT++KGM:29586'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOCAU:139:6'RFF+BM:1'EQD+CN+UACU5363691+45G1+++5'NAD+CA+HLC:172:20'LOC+147+0380212::5'MEA+WT++KGM:29591'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+TGHU9702812+45G1+++5'NAD+CA+MSC:172:20'LOC+147+0380112::5'MEA+WT++KGM:29616'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOCAU:139:6'RFF+BM:1'EQD+CN+HLXU6240079+45G1+++5'NAD+CA+HLC:172:20'LOC+147+0380414::5'MEA+WT++KGM:29476'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+PRSJU:139:6'RFF+BM:1'EQD+CN+HASU4556735+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0380314::5'MEA+WT++KGM:29476'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOHAI:139:6'RFF+BM:1'EQD+CN+SUDU6787839+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0380214::5'MEA+WT++KGM:29481'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+TGHU9861619+45G1+++5'NAD+CA+MSC:172:20'LOC+147+0380114::5'MEA+WT++KGM:29492'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+HASU5014810+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0301582::5'MEA+WT++KGM:29123'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+CLHU4693498+42G1+++5'NAD+CA+MSC:172:20'LOC+147+0301482::5'MEA+WT++KGM:29160'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+PECLL:139:6'RFF+BM:1'EQD+CN+TCLU4424005+42G1+++5'NAD+CA+HLC:172:20'LOC+147+0301382::5'MEA+WT++KGM:29183'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+...

In this matching and replacing I have only one empty line:
and I want to have all containers in one column.

My expected output to be:

SUDU8505087
UACU5363691
TGHU9702812
HLXU6240079
HASU4556735
SUDU6787839
TGHU9861619
HASU5014810
CLHU4693498
TCLU4424005

Solution

  • Replace

    .*?([A-Z]{4}\d{7})((?![A-Z]{4}\d{7}).)*
    

    by

    $1\n
    

    and get

    SUDU8505087
    UACU5363691
    TGHU9702812
    HLXU6240079
    HASU4556735
    SUDU6787839
    TGHU9861619
    HASU5014810
    CLHU4693498
    TCLU4424005
    

    screenshot