Search code examples
pythonpython-re

Why does re.findall return a list of tuples containing empty strings but re.finditer works correctly?


I am using python 3.9.13 version.
I am trying to use the findall function from re library but I am getting empty results.

The regex I am using is:

_regex = re.compile(r"(?:)0\d{1,4}(?:-?\d{2,4}-?\d{2,4}|\d{8}|\d(-)?\d{2,4}(-)?\d{3,4})")

I am testing that on the following string:

_text = "06-6206-567903-3668-067403-3668-400503-3668-429503-3668-432403-3668-039206-6206-572906-6206-630303-3668-481806-6206-564403-3668-053703-3668-070606-6206-5663"

From re.finditer, I am getting correct results:

_test = re.finditer(_regex, _text)
for item in _test:
    print(item)

<re.Match object; span=(0, 12), match='06-6206-5679'>
<re.Match object; span=(12, 24), match='03-3668-0674'>
<re.Match object; span=(24, 36), match='03-3668-4005'>
<re.Match object; span=(36, 48), match='03-3668-4295'>
<re.Match object; span=(48, 60), match='03-3668-4324'>
<re.Match object; span=(60, 72), match='03-3668-0392'>
<re.Match object; span=(72, 84), match='06-6206-5729'>
<re.Match object; span=(84, 96), match='06-6206-6303'>
<re.Match object; span=(96, 108), match='03-3668-4818'>
<re.Match object; span=(108, 120), match='06-6206-5644'>
<re.Match object; span=(120, 132), match='03-3668-0537'>
<re.Match object; span=(132, 144), match='03-3668-0706'>
<re.Match object; span=(144, 156), match='06-6206-5663'>

However, when using the re.findall function, I am getting empty results.

 _test = re.findall(_regex, _text)
[('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', ''), ('', '')]

I am wondering if this problem comes from the regex I am using (maybe the first non-capturing group ?). Please help.


Solution

  • After some testing, I believe that in the regular expression you wrote

    (?:)0\d{1,4}(?:-?\d{2,4}-?\d{2,4}|\d{8}|\d(-)?\d{2,4}(-)?\d{3,4})
                                              ^ ^        ^ ^
    

    the two pairs of parentheses shown are making these groups for the expression to match, thus when using re.findall the groups are None types when accessed by item.group(1. 2). When the parentheses are removed to form a regular expression like this

    (?:)0\d{1,4}(?:-?\d{2,4}-?\d{2,4}|\d{8}|\d-?\d{2,4}-?\d{3,4})
    

    expected result is produced from re.findall

    ['06-6206-5679', '03-3668-0674', '03-3668-4005', '03-3668-4295',
     '03-3668-4324', '03-3668-0392', '06-6206-5729', '06-6206-6303',
     '03-3668-4818', '06-6206-5644', '03-3668-0537', '03-3668-0706', 
     '06-6206-5663']
    

    also, re.finditer gives

    <re.Match object; span=(0, 12), match='06-6206-5679'>
    <re.Match object; span=(12, 24), match='03-3668-0674'>
    <re.Match object; span=(24, 36), match='03-3668-4005'>
    <re.Match object; span=(36, 48), match='03-3668-4295'>
    <re.Match object; span=(48, 60), match='03-3668-4324'>
    <re.Match object; span=(60, 72), match='03-3668-0392'>
    <re.Match object; span=(72, 84), match='06-6206-5729'>
    <re.Match object; span=(84, 96), match='06-6206-6303'>
    <re.Match object; span=(96, 108), match='03-3668-4818'>
    <re.Match object; span=(108, 120), match='06-6206-5644'>
    <re.Match object; span=(120, 132), match='03-3668-0537'>
    <re.Match object; span=(132, 144), match='03-3668-0706'>
    <re.Match object; span=(144, 156), match='06-6206-5663'>