Search code examples
pythonregexnotepad++pcre

REGEX - NP++ - removing padding 0s from subgroups of a code with one regex


It's my first topic question, thanks in advance for your help and the time you spend to read me
I work under NP++ to try some Regex

What i want

I would like to get transformed those lines (from) into those formatted lines (to) with one more attractive and smarter regex than mine below (see Unattractive solution)

(from) => (to)

H04B0001240000; => H04B 1/24;  
H04B0010300000; => H04B 10/30;  
H04B0011301000; => H04B 11/301;  
H04B0111300000; => H04B 111/30;  
H04B0101303400; => H04B 101/3034;  
H04B0100300010; => H04B 100/30001;  
H04B0110300000; => H04B 110/30;  

How to proceed ?

-For a given code, the rules are
H04B0001240000;
-Cut into three parts 4, 4 and 6
H04B 0001/240000;
-Withdraw all padding 0s at the beginning of the second group (the second group should have at least one digit)
H04B 1/240000;
-Withdraw all padding 0s at the end of the third group (the third group should have at least two digits)
H04B 1/24;

So the deemed useless 0s are at the beginning of the second group and at the end of third group. The number of padding 0s is varying...

Unattractive solution

Under NP++, I found a solution that I find unattractive
In 'Search' field :

([A-Z])((?:0{3}([1-9]))|(?:0{2}([1-9][0-9]))|(?:0([1-9][0-9]{2})))([0-9]{2})([0-9]*[1-9])?0{1,4}(;)

In 'Replace' field :

\1 \3\4\5\/\6\7\8

Explanations with H04B 0001/240000;
==============================
([A-Z]) means one capital letter from A to Z, matchs the last letter of the first group (H04B)

((?:0{3}([1-9]))|(?:0{2}([1-9][0-9]))|(?:0([1-9][0-9]{2}))) should matchs 0002 or 0020 or 0201 but not 2011. It concerns detection of the second group (0001)

([0-9]{2})([0-9]*[1-9])?0{1,4}(;) concerns the third group of 6 digits (240000) with with the intention of discard all padding 0s at the end. The third group should have at least two digits ([0-9] {2})

Final question

Do you know a more attractive and smarter Regex to reach the aimed result ?


Solution

  • You can do it like this

    (?m)^(\S{4})0*(\d\d*?)(?<=^.{8})(\d{2}\d*?)0*;

    https://regex101.com/r/7pTjkB/2

     (?m)
     ^ 
     ( \S{4} )                     # (1)
     0*
     ( \d  \d*? )                  # (2)
     (?<= ^ .{8} )
     (                             # (3 start)
          \d{2} 
          \d*? 
     )                             # (3 end)
     0*
     ;                             # Or, (?<= ^ .{14} )
    

    Or, like this

    (?m)^(\S{4})0*(\d\d*?)(?<=^.{8})(\d{2}\d*?)0*(?<=^.{14})

    https://regex101.com/r/7pTjkB/3

     (?m)
     ^ 
     ( \S{4} )                     # (1)
     0*
     ( \d  \d*? )                  # (2)
     (?<= ^ .{8} )
     (                             # (3 start)
          \d{2} 
          \d*? 
     )                             # (3 end)
     0*
     (?<= ^ .{14} )