I trying to write a pattern to get each CPNJ group inside a this block of text, but the condition is that, is needed starts with executados:
and ends with a CNPJ group. But, my pattern always get the last group, I don't know what I should do for it's works.
The answer getting specific groups of patterns inside a block text does not works!
pattern: (?:executados\:)[\p{L}\s\D\d]+CNPJ\W+(?P<cnpj>\d+\.\d+\.\d+\/\d+-\d+)
string to test:
Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 88.888.888/8888-88,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a
I would to get the values {'cnpj': ['88.888.888/8888-88', '99.999.999/9999-99']}
, this way is getting just the last.
You can use PyPi regex module with the regex like
(?s)(?<=executados:.*?)CNPJ\W+(\d+\.\d+\.\d+/\d+-\d+)
See the regex demo.
Here is the Python demo:
import regex
text = """Dados dos executados:
1. FOO TEST STRING LTDA., CNPJ: 99.999.999/9999-99,
2. ANOTHER TEST STRING LTDA LTDA LTDA - ME, CNPJ: 99.999.999/9999-99,
3. FOO TEST STRING LTDA., CPF: 999.999.999-99,
4. FOO TEST STRING LTDA., CPF: 999.999.999-99.
Como medida de economia e celeridade processuais, atribuo a"""
print( regex.findall(r'(?s)(?<=executados:.*?)CNPJ\W+(\d+\.\d+\.\d+/\d+-\d+)', text) )
yielding
['99.999.999/9999-99', '99.999.999/9999-99']
The regex matches
(?s)
- regex.DOTALL
, enables .
to match line break chars(?<=executados:.*?)
- right before the current location, there must be executados:
and then any zero or more charsCNPJ
- a fixed string\W+
- one or more non-word chars(\d+\.\d+\.\d+/\d+-\d+)
- the return value of regex.findall
, Group 1: one or more digits and a .
twice, then one or more digits, /
, one or more digits,
-` and one or more digits.