Search code examples
htmlperlasciispecial-characters

How can I decode HTML entities?


Here's a quick Perl question:

How can I convert HTML special characters like ü or ' to normal ASCII text?

I started with something like this:

s/\&#(\d+);/chr($1)/eg;

and could write it for all HTML characters, but some function like this probably already exists?

Note that I don't need a full HTML->Text converter. I already parse the HTML with the HTML::Parser. I just need to convert the text with the special chars I'm getting.


Solution

  • Take a look at HTML::Entities:

    use HTML::Entities;
    
    my $html = "Snoopy & Charlie Brown";
    
    print decode_entities($html), "\n";
    

    You can guess the output.