Search code examples
rregexsubstr

Extracting a parameter from a batch of URLS in R


I'm trying to extract a parameter from URLS in R. The exact position of the parameter will change so i need to identify it some other way.

Here's an example of a URL:

https://www.example.se/-Hotell.d178317.Reseguide-Hotell-SMP?destinationId=178317&kword=ZzZz.4650002325454

I want to extract the number after d - in this example 178317.

Currently i'm using this function sub(".d","",url) and i cant figure out how to proceed. Can someone suggest how to use this function for this example? Cheers!


Solution

  • Use a couple of subs:

    > url
    [1] "https://www.example.se/-Hotell.d178317.Reseguide-Hotell-SMP?destinationId=178317&kword=ZzZz.4650002325454"
    

    This chops of everything up to the first ".d":

    > sub(".*?\\.d","",url)
    [1] "178317.Reseguide-Hotell-SMP?destinationId=178317&kword=ZzZz.4650002325454"
    > 
    

    And wrap that with a sub that chops everything from the first non-digit onwards:

    > sub("[^0-9].*","",sub(".*?\\.d","",url))
    [1] "178317"
    

    Use as.numeric to make a number.