replace some part of a word with regex

how do you delete text inside <ref> *some text*</ref> together with ref itself?

in '...and so on<ref>Oxford University Press</ref>.'

re.sub(r'<ref>.+</ref>', '', string) only removes <ref> if <ref> is followed by a whitespace

EDIT: it has smth to do with word boundaries I guess...or?

EDIT2 What I need is that it will math the last (closing) </ref> even if it is on a newline.

Solution

I don't really see you problem, because the code pasted will remove the <ref>...</ref> part of the string. But if what you mean is that and empty ref tag is not removed:

re.sub(r'<ref>.+</ref>', '', '...and so on<ref></ref>.')

Then what you need to do is change the .+ with .*

A + means one or more, while * means zero or more.

From http://docs.python.org/library/re.html:

'.' (Dot.) In the default mode, this matches any character except a newline.
    If the DOTALL flag has been specified, this matches any character including
    a newline.
'*' Causes the resulting RE to match 0 or more repetitions of the preceding
    RE, as many repetitions as are possible. ab* will match ‘a’, ‘ab’, or ‘a’
    followed by any number of ‘b’s.
'+' Causes the resulting RE to match 1 or more repetitions of the preceding
    RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will
    not match just ‘a’.
'?' Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
    ab? will match either ‘a’ or ‘ab’.