I'm not getting the desire output, re.sub is only replacing the last occurance using python regular expression, please explain me what i"m doing wrong
srr = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123| http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
re.sub("http://.*[#]", "", srr)
'image-1CE005XG03'
Desire output without http://www.google.com/#image from the above string.
image-1CCCC|image-1VVDD|image-123|image-1CE005XG03
I would use re.findall
here, rather than trying to do a replacement to remove the portions you don't want:
src = "http://www.google.com/#image-1CCCC| http://www.google.com/#image-1VVDD| http://www.google.com/#image-123| http://www.google.com/#image-123| http://www.google.com/#image-1CE005XG03"
matches = re.findall(r'https?://www\.\S+#([^|\s]+)', src)
output = '|'.join(matches)
print(output) # image-1CCCC|image-1VVDD|image-123|image-123|image-1CE005XG03
Note that if you want to be more specific and match only Google URLs, you may use the following pattern instead:
https?://www\.google\.\S+#([^|\s]+)