Search code examples
pythonregexstringpython-re

python re.sub not replacing all the occurance of string


I'm not getting the desire output, re.sub is only replacing the last occurance using python regular expression, please explain me what i"m doing wrong

srr = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123|  http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
re.sub("http://.*[#]", "", srr)
'image-1CE005XG03'

Desire output without http://www.google.com/#image from the above string.

image-1CCCC|image-1VVDD|image-123|image-1CE005XG03

Solution

  • I would use re.findall here, rather than trying to do a replacement to remove the portions you don't want:

    src = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123|  http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
    matches = re.findall(r'https?://www\.\S+#([^|\s]+)', src)
    output = '|'.join(matches)
    print(output)  # image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03
    

    Note that if you want to be more specific and match only Google URLs, you may use the following pattern instead:

    https?://www\.google\.\S+#([^|\s]+)