Search code examples
javahtmlapache-commonsdecoding

HTML entity decoding in Java: apostrophe


I have to decode, using Java, HTML strings which contain the following entities: "&#39" and "&apos". I'm using Apache Commons Lang, but it doesn't decode those two entities, so, I'm currently doing as follows, but I'm looking for the fastest way to do what I want.

import org.apache.commons.lang.StringEscapeUtils;

public class StringUtil {

        public static String decodeHTMLString(String s) {
            return StringEscapeUtils.unescapeHtml((s.replace("'", "`").replace("'", "'")));
        }

}

I searched for older questions, but none seems to answer my question.


Solution

  • Well, i would imagine that part of the problem is that one of your entities is double encoded: "'". That will not be turned into an apostrophe by any decoder.

    As for "'", apparently that one is not +technically+ part of the html entity set.