I'm trying to extract a parameter from URLS in R. The exact position of the parameter will change so i need to identify it some other way.
Here's an example of a URL:
https://www.example.se/-Hotell.d178317.Reseguide-Hotell-SMP?destinationId=178317&kword=ZzZz.4650002325454
I want to extract the number after d
- in this example 178317
.
Currently i'm using this function sub(".d","",url)
and i cant figure out how to proceed. Can someone suggest how to use this function for this example? Cheers!
Use a couple of sub
s:
> url
[1] "https://www.example.se/-Hotell.d178317.Reseguide-Hotell-SMP?destinationId=178317&kword=ZzZz.4650002325454"
This chops of everything up to the first ".d"
:
> sub(".*?\\.d","",url)
[1] "178317.Reseguide-Hotell-SMP?destinationId=178317&kword=ZzZz.4650002325454"
>
And wrap that with a sub
that chops everything from the first non-digit onwards:
> sub("[^0-9].*","",sub(".*?\\.d","",url))
[1] "178317"
Use as.numeric
to make a number.