Search code examples
htmlperlencodingcommand-linespecial-characters

perl not printing special characters in command prompt


Hi I wish to print result in command prompt as well as html file. I use encoding(cp1252) while printing in HTML however I can't see those special characters in the command prompt instead I am getting some junk values. For example "£" is printed as "ú". Thanks in advance

use strict;
use warnings;
use LWP::Simple;
use HTML::TreeBuilder::XPath;
use LWP::UserAgent;

my $competitor_declare='7shop';
my $xpath_declare='//strong';
my @urls = ("http://www.7dayshop.com/delivery-and-returns"); 


open HTML1, '>:encoding(cp1252)',"C:/Users/jeyakuma/Desktop/$competitor_declare.html";
open HTML, '>:encoding(cp1252)',"C:/Users/jeyakuma/Desktop/shipping project/database/$competitor_declare.html";  


foreach my $url (@urls)
        {
        print "\n\nworking on $url\n\n";
        my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
        my $req = HTTP::Request->new( GET => "$url" );
        my $res = $ua->request($req);

        if ( $res->is_success ) 
        {
           print "Please wait while we create file \n\n";
            my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
           my $node = $xp->findnodes_as_string("$xpath_declare") or print "couldn't find the node\n"; #give xpath
            print HTML1 $node and print "Dump file is created please configure the same in xpathconfiguration.pl\n" and print HTML $node;
            print $node;
        }
        else{  
                print "file creation failed\n";

        }
}

Expected output in command prompt

cost is - £1.99

current result

cost is - ú1.99

Solution

  • 7dayshop uses utf8 character set:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    

    There are two and ½ things you need to do on your windows machine to read utf8 on the console:

    1. Modify your encoding of STDOUT using the following:

      binmode STDOUT, ':utf8:raw';
      
    2. Change the encoding of your console using the following command before running your script:

      chcp 65001
      
    3. You might need to edit the font of your console to one like Lucida Console.

    The following demonstrates my output on a windows machine:

    use strict;
    use warnings;
    use autodie;
    
    use LWP::Simple;
    use HTML::TreeBuilder::XPath;
    use LWP::UserAgent;
    
    binmode STDOUT, ':utf8:raw';
    
    my $competitor_declare = '7shop';
    my $xpath_declare      = '//strong';
    my @urls               = ("http://www.7dayshop.com/delivery-and-returns");
    
    foreach my $url (@urls) {
        print "\n\nworking on $url\n\n";
        my $ua = LWP::UserAgent->new( agent => "Mozilla/5.0" );
        my $req = HTTP::Request->new( GET => "$url" );
        my $res = $ua->request($req);
    
        if ( $res->is_success ) {
            print "Please wait while we create file \n\n";
            my $xp = HTML::TreeBuilder::XPath->new_from_url($url);
            my $node = $xp->findnodes_as_string("$xpath_declare") or print "couldn't find the node\n";    #give xpath
            print $node;
        }
        else {
            print "file creation failed\n";
        }
    }
    

    Outputs:

    working on http://www.7dayshop.com/delivery-and-returns
    
    Please wait while we create file
    
    Dump file is created please configure the same in xpathconfiguration.pl
    <strong>JavaScript seem to be disabled in your browser.</strong>
    <strong>7DAYSHOP.COM</strong>
    <strong>Get weekly special offers and new product news</strong>
    <strong id="cartHeader"><span class="hide" id="basket-btn">My Basket</span> <span class="number">(<span>0</span>)</span></strong>
    <strong>UK Mainland, Highlands &amp; Islands,</strong>
    <strong>Ireland (ROI) &amp; select European destinations</strong>
    <strong>Deliveries to UK</strong>
    <strong>UK Mainland Standard - £1.99</strong>
    <strong>UK Mainland Standard Tracked - £2.99</strong>
    <strong>UK Mainland Express Tracked - £3.99</strong>
    <strong>UK Mainland DPD Express Courier - £5.99</strong>
    <strong>Deliveries to Highlands and Islands and Channel Islands</strong>
    <strong>Highlands and Islands Standard - £1.99</strong>
    <strong>Channel Islands Standard - £1.99</strong>
    <strong>Highlands and Islands Express Tracked - £3.99 (Not Channel Islands)<br /></strong>
    <strong>Highlands and Islands DPD Express Courier - £14.99</strong>
    <strong>Channel Islands DPD Express Courier - £14.99</strong>
    <strong>Deliveries to Ireland (ROI)</strong>
    <strong>Ireland Standard - £4.99</strong>
    <strong>Ireland DPD Express Courier - £14.99</strong>
    <strong>Deliveries to France (FR)</strong>
    <strong>France Standard - £1.99</strong>
    <strong>France DPD Express Courier - £8.49</strong>
    <strong>Deliveries to Germany (DE)</strong>
    <strong>Germany Standard - £1.99</strong>
    <strong>Germany DPD Express Courier - £6.49</strong>
    <strong style="color: #000080;">Shipping Restrictions</strong>
    <strong>All orders outside of the shipping restrictions will only be able to use our DPD Courier shipping service.</strong>
    <strong style="color: #000080;"><br />Standard Delivery<br /></strong>
    <strong>Oversized</strong>
    <strong>Lithium</strong>
    <strong><a href="http://www.7dayshop.com/lithium-batteries" target="_blank">Click here for further information about Lithium Battery deliveries.<br /><br /></a>Adhesives<br /></strong>
    <strong><br /></strong>
    <strong style="color: #000080;"><br />RETURNS / MISSING ITEMS</strong>
    <strong>Returns Address:</strong>
    ress:</strong>