I am trying to extract linkedin url that is written in this format,
text = "patra 12 EXPERIENCE in / in/sambhu-patra-49b4759/ 2020 - Now O Skin Curate Research Pvt Ltd Embedded System Developer, WB 0 /bindasssambhul O SKILLS LANGUAGES Arduino English Raspberry Pi Movidius Hindi Bengali ICS Intel Compute Stick PCB Design Python UI Design using Tkinter HOBBIES HTML iti CSS G JavaScript JQuery IOT\n"
pattern = \/?in\/.+\/?\s+
I need to extract this in/sambhu-patra-49b255129/
from the any noisy text like the one above,
It's a linkedin url written in short form.
My pattern is not working
You can use
m = re.search(r'\bin\s*/\s*(\S+)', text)
if m:
print(m.group(1))
See the regex demo.
Details:
\b
- word boundaryin
- a preposition in
\s*
- zero or more whitespaces/
- a /
char\s*
- zero or more whitespaces(\S+)
- Capturing group 1: any one or more whitespaces.