Using urltools::url_parse with UTF-8 domains

The function url_parse is very fast and works fine most of the time. But recently, domain names may contain UTF-8 characters, for example

url <- "www.cordes-tiefkü"

Now if I apply url_parse on this url, I get a special character "< fc >" in the domain column:

  scheme                            domain port path parameter fragment
1   <NA> www.cordes-tiefk<fc> <NA> <NA>      <NA>     <NA>

My question is: How can I "fix" this entry to UTF-8? I tried iconv and some functions from the stringi package, but with no success.

(I am aware of httr::parse_url, which does not have this problem. So one approach would be to detect the urls that are not ascii, and use url_parse on those and parse_url on the few special cases. However, this leads to the problem to (efficiently) detect the non-ascii URLs.)

EDIT: Unfortunately, url1 <- URLencode(enc2utf8(url)) does not help. When I do


I get an error could not resolve host. However, plugging in the original URL and the 2nd level domain by hand, paths_allowed works.

  I could reproduce the issue. I could convert the column domain to UTF-8 by reading it with readr::parse_character and latin1 encoding:

    url <- "www.cordes-tiefkü"
    parts <- 
      url_parse(url) %>% 
      mutate(domain = parse_character(domain, locale = locale(encoding = "latin1")))
      scheme                         domain port path parameter fragment
    1   <NA> www.cordes-tiefkü <NA> <NA>      <NA>     <NA>

    I guess that the encoding you have to specify (here latin1) depends only on your locale and not on the url's special characters, but I'm not 100% sure about that.