jsoup mistakes a token as an HTML tag

I've got an html fragment as follows:

<span class=#article-title#>About《About<SomeChineseChars》Blabla</span>

sorry here I use latin chars since the editor does not allow to type Chinese chars

when I try to extract text out of this element using

doc.select(".article-title").text();

I will finally have the below as the result:

About《About》Blabla

after debugging the programming, finding that

<SomeChineseChars>

was treated as an HTML tag and JSoup close the tag automatically as follows

<SomeChineseChars></SomeChineseChars>

So, if there is anyway to avoid this from happening, or if this is a BUG?

-=-=-= UPDATE =-=-=-

after dom is built and then check the parsed html, the output is

I cannot post img, so plz click me to view it

Thanks a lot, Ben

Solution

I made up a solution by hacking into the JSoup as following:

create a new package named org.jsoup.parser;

customize a HtmlTreeBuilder

public class TroilaHtmlTreeBuilder extends HtmlTreeBuilder {

private String zh = "[\\u4e00-\\u9fa5]+";

public TroilaHtmlTreeBuilder() {
}

@Override
Element insert(Token.StartTag startTag) {
    if (startTag.tagName.matches(zh)) {
        Token.Character ch = new Token.Character();
        ch.data(startTag.toString());
        insert(ch);
        return null;
    }
    return super.insert(startTag);
}

public Document parse(Reader input, String baseUri) {
    return super.parse(input, baseUri, ParseErrorList.noTracking(), this.defaultSettings());
}

}

I don't think this is a good way to solve the problem, so let me know if you have any better idea.

BTW: many thanks to @Abhilash for your help!