Search code examples
pythonregexstringvalidationpython-re

How do I extract with regex all the text (numbers, letters, symbols) after the second capital letter?


They won.             Elles gagnèrent.
They won.    Ils ont gagné.
They won.        Elles ont gagné.
Tom came.    Tom est venu.
Tom died.       Tom est mort.
Tom knew. Tom savait.
Tom left.    Tom est parti.
Tom left.       Tom partit.
Tom lied. Tom a menti.
Tom lies.    Tom ment.
Tom lost.            Tom a perdu.
Tom paid.    Tom a payé.

I'm having some trouble putting together a regex pattern that extracts all the text after the second capital letter (including it).

For example:

They won.             Elles gagnèrent.

in this case you should extract:

Elles gagnèrent.

This is my code, but it is not working well:

import re

line = "They won.             Elles gagnèrent." #for example this case

match = re.search(r"\s¿?(?:A|Á|B|C|D|E|É|F|G|H|I|Í|J|K|LL|L|M|N|Ñ|O|Ó|P|Q|R|S|T|U|Ú|V|W|X|Y|Z)\s((?:\w\s)+)?" , line)

n_sense = match.group()

print(repr(n_sense)) #should print "Elles gagnèrent."

Solution

  • You may try the following codes.

    with open(file, "r") as r:
        for line in r:
            line = re.sub('^[^A-Z]*[A-Z][^A-Z]*','', line)
            print(line, end="")