Search code examples
regexpostgresqlurl-parsing

Regex for extract for url path


I need help with creating a regex for PostgresSQL to extract specific url paths and place them into a separate column.

For example, using the following URLs I need to extract

https://example.com/drinks/ 
https://example.com/drinks/beverages/
https://example.com/drinks/beverages/pepsi/pepsi-can-16-oz/
Column 1                                                   | Column 2
---------------------------------------------------------------------------------------
https://example.com/drinks                                 | /drinks/
https://example.com/drinks/beverages                       | /drinks/beverage/
https://example.com/drinks/beverages/pepsi/pepsi-can-16-oz | /drinks/beverages/pepsi/

And lastly, a regex to get anything after /drinks/beverage/*...

Regex isn't my strong suit so I'm looking for some guidance on how to accomplish this.


Solution

  • Try this ...

    (?=((?i)http(s)?:\/\/(?:[\w0-9](?:[\w0-9-]{0,61}[\w0-9])?\.)+[\w0-9][\w0-9-]{0,61}[\w0-9])).*
    

    Essentially it'll match the domain (i.e. https://example.com) and then look ahead of that to get the path for you.