Search code examples
perlmojolicious

Perl scraping script not recognising certain characters


I have a script that works fine locally but on the server fails.

It displays the non-breaking space symbol

   

as

?

when printing to standard output.

In the parsing of the page, if I try to get rid of non breaking space symbol with

s/\&nbsp\;//g

nothing happens, neither getting rid of the question mark

s/\?//g

It seems to stick no matter what.

Bizzarely, this is not an issue when running the script locally.

However, an issue on both local machine and server is that apostrophes (represented in the HTML I am scraping as an acute symbol

´

are always represented as a question mark

?

even if explicitly trying

s/´/'/g

Confused, pls help.


Solution

  • Will try to remove it like this:

    No-break space

    my $non_break_space = "\x{A0}";
    $non_break_space =~ s/\xA0/ /g;
    

    Acute accent

    my $acute = "\x{B4}";
    $acute =~ s/\xB4/ /g;
    

    You can use this site fileformat.info for get more information about unicode characters and their different values.