Search code examples
jsoup

Jsoup is removing the href attribute with a placeholder value


I am using jsoup to clean some html. I am using Whitelist.relaxed() to clean. This works well for the most part and I would like to continue to use it.

Problem is that I have a place holder href value that the clean is removing.

For example, <a href="{placeholder}">text</a>. This is changed to <a>text</a>. Is there a way to preserve the href attribute with my place holder value?

Thanks in advance


Solution

  • I guess you do not give a valid base URI to the clean method. If you do that, then you can keep the hrefs. If you also specify preserveRelativeLinks(true) with the Whitelist, the links can be relative as well.

    So when cleaning do something like this:

    String html = "<a href=\"{placeholder}\">text</a>";
    String cleaned = Jsoup.clean(html, 
                                 "http://base.uri",
                                 Whitelist.relaxed().preserveRelativeLinks(true));
    System.out.println(cleaned);
    

    This will result in the following output:

    <a href="{placeholder}">text</a>