Search code examples
regexpython-re

pattern to extract linkedin username from text


I am trying to extract linkedin url that is written in this format,

text = "patra 12 EXPERIENCE in / in/sambhu-patra-49b4759/ 2020 - Now O Skin Curate Research Pvt Ltd Embedded System Developer, WB 0 /bindasssambhul O SKILLS LANGUAGES Arduino English Raspberry Pi Movidius Hindi Bengali ICS Intel Compute Stick PCB Design Python UI Design using Tkinter HOBBIES HTML iti CSS G JavaScript JQuery IOT\n"


pattern = \/?in\/.+\/?\s+

I need to extract this in/sambhu-patra-49b255129/ from the any noisy text like the one above,

It's a linkedin url written in short form.

My pattern is not working


Solution

  • You can use

    m = re.search(r'\bin\s*/\s*(\S+)', text)
    if m:
      print(m.group(1))
    

    See the regex demo.

    Details:

    • \b - word boundary
    • in - a preposition in
    • \s* - zero or more whitespaces
    • / - a / char
    • \s* - zero or more whitespaces
    • (\S+) - Capturing group 1: any one or more whitespaces.