R function to parse returning error in strsplit "subscript out of bounds"

I'm using R to extract domain names for a column of HTML pages. I created a function "domain" to do so. It seems to work fine, until it hits pages that came in as "mailto: person@example.com". These are obviously the links for emails. I still wanted to incorporate these into my dataset, but the error I get is: "Error in strsplit(gsub("http://|https://|www\.", "", x), "/")[[c(1, 1)]] : subscript out of bounds"

How can I modify this code to get around the "mailto" pages?

This is my function

domain <- function(x) strsplit(gsub("http://|https://|www\\.","", x),"/")[[c(1,1)]]

This is my command

mainpagelevel3$url <- sapply(mainpagelevel3$url, domain)

I ran this code on a set of urls that did not include a "mailto:" page and it worked just fine, so I think this must be where it's getting stuck. I don't mind if it resulted in "person@example.com" or stays as is.

Solution

We could try to write an if condition to check for strings which start with "mailto" and have "@" in them (this can be made more strict if needed). So the function might look like

domain <- function(x) {
   if(grepl("^mailto:.*@.*", x)) x 
      else strsplit(gsub("http://|https://|www\\.","", x),"/")[[c(1,1)]]
}

and then use sapply as usual

mainpagelevel3$url <- sapply(mainpagelevel3$url, domain)