Search code examples
regexregex-groupregex-greedy

Regex that doesnt match 0 at the beginning for every group


  1. 01 Ded.PASIVIC 05-01-2016.xlsx
  2. 01 Ded.PASIVIC 15-01-2016.xlsx
  3. 01 Ded.PASIVIC 10-01-2016.xlsx
  4. 06 DED. PASIVIC 30-03-2016 (1).xlsx
  5. 19 DEDUCCION PASIVIC DEL 15-10-2016.xlsx (2)
  6. 23 DEDUCCION PASIVIC DEL 15-12-2016.xlsx (1)
  7. 18 APORTE PASIVIC DEL 30-09-2016.xlsx

I would like to get the date that is printed on the name of the files above but without leading zeros. enter image description here

Instead of getting the whole date as I'm doing above, I want to get for the first file 5-1-2016, for the second file I want 15-1-2016, for the third 10-1-2016 and so on (NO LEADING ZEROS).

The expected output should be like this:

  1. 5-1-2016
  2. 15-1-2016
  3. 10-1-2016
  4. 30-3-2016
  5. 15-10-2016
  6. 15-12-2016
  7. 30-9-2016

I'm doing this on python.


Solution

  • You could match 3 groups and for the first 2 groups match an optional zero followed by capturing 1 or 2 times a digit 0?([0-9]{1,2}-) followed by a dash.

    You might add a word boundary \b at the start and at the end.

    ^.*?\b0?([0-9]{1,2}-)0?([0-9]{1,2}-)([0-9]{4})\b.*$

    Then you could use sub and in the replacement use the capturing groups:

    \1\2\3

    import re
    regex = r"^.*?\b0?([0-9]{1,2}-)0?([0-9]{1,2}-)([0-9]{4})\b.*$"
    test_str = "01 Ded.PASIVIC 05-01-2016.xlsx"
    subst = r"\1\2\3"
    result = re.sub(regex, subst, test_str, 1)
    
    if result:
        print (result) # 5-1-2016
    

    Demo