Search code examples
perlunicodefile-handlingstring-parsing

perl read a file into string then check each character for unicode range


i am trying to read a file into string then check each character for unicode range 2816-2943. all other characters need to be skipped except those falling in range and \n. i've got following code from net but isnt working for me. i am sorry if i make silly mistakes, i am new to perl. plz help i need to finish this today only.

use utf8;
use encoding 'utf8';
use open qw/:std :utf8/;

binmode(STDOUT, ":utf8"); #makes STDOUT output in UTF-8 instead of ordinary ASCII.


$file="content.txt";
open FILE1, ">filtered.txt" or die $!;
    open(FILE, "<$file") or die "Can't read file 'filename' [$!]\n";  
    binmode(FILE);
    my $document = <FILE>; 
    close (FILE);  
    print $document;

Solution

  • The following reads line by line from the $input file and writes the filtered line to the $output file.

    my $input  = 'content.txt';
    my $output = 'filtered.txt';
    
    open(my $src_fh, '<:encoding(UTF-8)', $input)
      or die qq/Could not open file '$input' for reading: '$!'/;
    
    open(my $dst_fh, '>:encoding(UTF-8)', $output)
      or die qq/Could not open file '$output' for writing: '$!'/;
    
    while(<$src_fh>) {
        s/[^\x{0B00}-\x{0B7F}\n]//g;
        print {$dst_fh} $_
          or die qq/Could not write to file '$output': '$!'/;
    }
    
    close $dst_fh
      or die qq/Could not close output filehandle: '$!'/;
    
    close $src_fh
      or die qq/Could not close input filehandle: '$!'/;