Search code examples
rubyregexurl

regex matching urls that contain string in relative path not in domain


This is one of my interview questions. I didn't come up with a good enough solution and got rejected.

The question was

What is the one regex to match all urls that contain job(case insensitive) in the relative   
path(not domain) in the following list:

    - http://www.glassdoor.com/job/ABC
    - https://glassdoor.com/job/
    - HTTPs://job.com/test
    - Www.glassdoor.com/foo/bar/joBs
    - http://192.168.1.1/ABC/job
    - http://bankers.jobs/ABC/job

My solution was using lookahead and lookbehind, /(?<!\.)job(?!\.)/i. This works fine in above lists. However, if the url is HTTPs://jobs.com/test, it will not work.

I am wondering what is the correct answer for this question. Thanks in advance for any suggestions!


Solution

  • Try this regex:

    \b(?:https?:\/\/)?[^\/:\n]+\/.*?job
    

    Online RegEx Demo

    RegEx Details:

    • \b: Word boundary
    • (?:https?:\/\/)?: Match optional http:// or https://
    • [^\/:]+: Match 1+ of any characters that are not / and :
    • \/: Match a /
    • .*?job: Match 0 or more characters followed by text job