Search code examples
regexedi

Regex over EDI File


Hi Guys i have an EDI file which has somelines with quantity, delivery date and so on. Now i want to split it with regular expressions so that i get the lines splitted with the needed information. So attached you find the file content. I tried it with expressions like LIN+.* or LIN+.*?' but then i only get all the LIN segments together or the LIN segments splitted but with less information. I want to split every LIN Element with the whole information after it. Could someone help me?

UNB+UNOA:2+094200005561400986LA:ZZ+MTEL+200406:1436+34906++++1'UNH+112490+DELFOR:D:96A:UN'BGM+241+2004060008796+9'DTM+137:202004061436:203'DTM+157:20200406:102'DTM+36:20200206:102'NAD+BY+FRSFA0222838V::92'NAD+SE+000563X::92'UNS+D'NAD+CN+VP1::92++TEST+SK TEST:204 TEST:TEST 22:TEST ST TEST+++37540+FRA'LIN+1+3+441344:IN'PIA+1+7PK1150:VN'IMD+++:::VO-VKMV 7PK1150 VP'LOC+11+999'LOC+159+999'RFF+ON:P092303'QTY+113:100.00:PC'SCC+1'DTM+2:20200116:102'RFF+AAJ:P092303:100'QTY+113:100.00:PC'SCC+1'DTM+2:20200206:102'RFF+AAJ:P092304:100'LIN+2+3+502107:IN'PIA+1+3PK670:VN'IMD+++:::VO-VKMV 3PK670 EDC'LOC+11+999'LOC+159+999'RFF+ON:P088273'QTY+113:300.00:PC'SCC+1'DTM+2:20190503:102'RFF+AAJ:P088273:100'LIN+3+3+502109:IN'PIA+1+6PK970:VN'IMD+++:::VO-VKMV 6PK970 EDC'LOC+11+999'LOC+159+999'RFF+ON:P084470'QTY+113:200.00:PC'SCC+1'DTM+2:20190422:102'RFF+AAJ:P084470:100'LIN+4+3+6DK1215:IN'PIA+1+AVRRV50D1-VKMV 6DK1215:VN'IMD+++:::6DK1215'LOC+11+999'LOC+159+999'RFF+ON:P046369'QTY+48:533.00:PC'RFF+AAK:32299'DTM+171:20181109:102'QTY+113:533.00:PC'SCC+1'DTM+2:20190419:102'RFF+AAJ:P046369:100'LIN+5+3+6DK1320:IN'PIA+1+AVRRV50D1-VKMV 6DK1320?+282:VN'IMD+++:::6DK1320'LOC+11+999'LOC+159+999'RFF+ON:P061903'QTY+48:115.00:PC'RFF+AAK:43146'DTM+171:20181003:102'QTY+113:104.00:PC'SCC+1'DTM+2:20181005:102'RFF+AAJ:P061903:100'QTY+113:104.00:PC'SCC+1'DTM+2:20181102:102'RFF+AAJ:P062034:100'UNS+S'UNT+75+112490'UNZ+1+34906' ```

Solution

  • You may use

    LIN(?:(?!LIN).)*
    

    Or, a much more efficient version (following the unroll-the-loop principle):

    LIN[^L]*(?:L(?!IN)[^L]*)*
    

    See regex demo #1 and regex demo #2

    The (?:(?!LIN).)* tempered greedy token pattern matches any char (.) that does not start a LIN character sequence, 0 or more times, but as many as possible.

    The [^L]*(?:L(?!IN)[^L]*)* pattern matches any 0 or more chars other than L, then 0 or more occurrences of a sequence of L not followed with IN and then 0+ chars other than L.