Search code examples
rstringrstringi

Extract text between specific string in a URL "/"


I am trying to collect everything before a specific set of characters

i.e. I have a URL such as the following

url = "https://www.somewebsiteLink.com/someDirectory/Directory/ascensor/163235494/d"

url2 = "https://www.somewebsiteLink.com/someDirectory/Directory/aire-acondicionado-calefaccion-ascensor/45837493/d

I would like to extract two things from the links:

Link 1: ascensor and 163235494 Link 2: aire-acondicionado-calefaccion-ascensor and 45837493

So, the numbers between the last but one / and also the text between the last but 2 /.


Solution

  • Split the string on / and pull the 3rd and 2nd to last elements:

    url = "https://www.somewebsiteLink.com/someDirectory/Directory/ascensor/163235494/d"
    url2 = "https://www.somewebsiteLink.com/someDirectory/Directory/aire-acondicionado-calefaccion-ascensor/45837493/d"
    urls = c(url, url2)
    
    pieces = strsplit(urls, split = "/")
    result = lapply(pieces, \(x) x[length(x) - 2:1])
    ## for older R verions:
    # result = lapply(pieces, function(x) x[length(x) - 2:1])
    
    result                
    # [[1]]
    # [1] "ascensor"  "163235494"
    # 
    # [[2]]
    # [1] "aire-acondicionado-calefaccion-ascensor" "45837493"