Search code examples
pythonregexregex-group

Use python regex substitution with groups to get the replaced characters


I'm looking for a way to get the grouped characters re.sub() has substituted from a string. So in e.g. this:

#!/usr/bin/env python3

import re
sentence="This is whatever. Foo"

# remove punctuation mark
new_sentence = re.sub('([\.,:;])', '', sentence)

removed_punctuation_mark = ??????????????

print(removed_punctuation_mark)

... how do I get the removed dot? There's re.subn() which would only show me, that one character was removed, but not which one.

Or to explain it in another way, do in python what this perl script does:

#!/usr/bin/perl

$sentence = "This is whatever. Foo";

# remove punctuation mark
$sentence =~ s/([\.,:;])//;

# first group of () in regex above
$removed_punctuation_mark = $1;    

print "$removed_punctuation_mark\n";

Of course I could first use re.search and group() followed by re.sub but I would have to repeat the regex, not very elegant.


Solution

  • Like @jasonharper suggested in his comment:

    import re
    
    replacements = []
    
    
    def replacement(x):
        replacements.append(x.group(1))
        return ''
    
    
    sentence = 'This is whatever. Foo'
    new_sentence = re.sub(r'([\.,:;])', replacement, sentence)
    
    print(new_sentence, replacements)
    

    This is probably what you're looking for. x is a match object, so it will have all the groups and other information about the match - you can get anything from it, the example grabs the first group, since that's what has the punctuation mark in your regex.