Search code examples

Catch the following capture groups with a regex and then reorder them with re sub method if the pattern is detected

import re

input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!

detection_regex_obligatory_preposition = r"\d{2}" + r"[\s|](?:del|de[\s|]el|de )[\s|]" + r"\d{2}" + r"[\s|](?:del|de[\s|]el|de )[\s|]" + r"\d*"

year, month, days_intervale_or_day = "", "", "" # = group()[2], group()[1], group()[0]
date_restructuring_structure = days_intervale_or_day + "-" + month + "-" + year

input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)

print(repr(input_text)) # --> output

Correct outputs for each of these cases

"05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!

"04 del 05 del 07 del 2000" #example 1 - Not modify!

"04 05-06-200" #example 2 - Yes modify!

"04 05-06-20076 55" #example 3 - Yes modify!

In the example 1 should not be replaced since there is more than one day indicated in front of it, leaving something like this \d{2} del \d{2} del \d{2} del \d* and not this \d{2} del \d{2} del \d*

Something similar happens in example 0 where there is no need to perform the replacement since this \d{2} del \d{2} del \d* de \d{2} or \d{2} del \d{2} del \d* de \d* and not this \d{2} del \d{2} del \d*

How to set the capture groups and the regex to be able to perform the replacements of examples 2 and 3, but not those of examples 0 and 1?


  • Demo:

    import re
    #input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
    #input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
    input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
    input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!
    detection_regex_obligatory_preposition = r"(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)"
    date_restructuring_structure = "\g<startDay> \g<finishDay>-\g<month>-\g<year>"
    input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)
    print(repr(input_text)) # --> output

    To see your code on Regex101, I combined your rules as the following:

    \d{2}[\s|](?:del|de[\s|]el|de )[\s|]\d{2}[\s|](?:del|de[\s|]el|de)[\s|]\d*

    I realized that it grabs the inputs, which are the exact opposite of what we want. Like the following:

    05 del 07 del 2000 del 09 hhggh #example 0 - Captured
    04 del 05 del 07 del 2000 #example 1 - Captured
    04 05 del 06 de 200 #example 2 - Not Captured
    04 05 del 06 de 20076 55 #example 3 - Not Captured

    To grab the correct inputs, I modified your rule by adding two digit number rule (\d{2}) to the beginning:

    \d{2}[\s|]\d{2}[\s|](?:del|de[\s|]el|de )[\s|]\d{2}[\s|](?:del|de[\s|]el|de)[\s|]\d*

    Now, it grabs the correct inputs, and we can turn our faces to replacement rules. There are two kinds of replacement rules. The first one is the number format (Like: \1 \2-\3-\4 in our case), which is the default behavior. When you wrap something with parenthesis, it is in number format. The second is name format (Like: \g<startDay> \g<finishDay>-\g{month}-\g{year} in our case), which I prefer. To make name-format replacements, you need to use named capturing groups (?P<startDay>***).

    Let's add named capturing groups to our rule:

    (?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)

    The final code:

    import re
    #input_text = "05 del 07 del 2000 del 09 hhggh" #example 0 - Not modify!
    #input_text = "04 del 05 del 07 del 2000" #example 1 - Not modify!
    input_text = "04 05 del 06 de 200" #example 2 - Yes modify!
    input_text = "04 05 del 06 de 20076 55" #example 3 - Yes modify!
    detection_regex_obligatory_preposition = r"(?P<startDay>\d{2})[\s|](?P<finishDay>\d{2})[\s|](?:del|de[\s|]el|de )[\s|](?P<month>\d{2})[\s|](?:del|de[\s|]el|de)[\s|](?P<year>\d*)"
    date_restructuring_structure = "\g<startDay> \g<finishDay>-\g<month>-\g<year>"
    input_text = re.sub(detection_regex_obligatory_preposition, date_restructuring_structure, input_text)
    print(repr(input_text)) # --> output