Search code examples
perlencodinggedit

Why doesn't gedit recognize the encoding of my output file created from perl program?


#!/usr/bin/perl -w
use strict;

open (EVENTLOGFILE, "<eventlog.txt") || die("Could not open file eventlog file");
open (EVENTLOGFILE_NODATETIME, ">eventlog_nodatetime.txt") || die("Could not open new event log file");


my($line) = "";

while ($line = <EVENTLOGFILE>) {
 my @fields = split /[ \t]/, $line;
 my($newline) = "";
 my($i) = 1;

 foreach( @fields )
 {
  my($field) = $_;
  if( $i ne 3 )
  {
   $newline = $newline . $field;
  }

  $i++;
 }

 print EVENTLOGFILE_NODATETIME "$newline";
}

close(EVENTLOGFILE);
close(EVENTLOGFILE_NODATETIME); 

If I print out $line each time instead of $newline it can detect the encoding no problem. It's only when I try to modify the lines that it gets messed up.


Solution

  • I guess it isn't encoding (as in say ISO 8859-1 vs UTF-8) but line-endings (CR, LF vs LF).

    If you used chomp and printed "\n" you'd probably get line endings converted to platform native.

    I think your script might be better written something like this (Untested):

    #!/usr/bin/perl 
    use strict;
    use warnings;
    
    open ($old, '<', 'eventlog.txt') or die 'Could not open eventlog.txt';
    open ($new, '>', 'eventlog_nodatetime.txt') 
      or die 'Could not open eventlog.nodatetime.txt');
    
    $\ = "\n";
    
    while (<$old>) {
      chomp;
      s/^(\S+\s+\S+\s+)\S+\s+(.*)/$1$2/;
      print $new;
    }
    
    close $old;
    close $new; 
    

    Or

    perl -pe 's/^(\S+\s+\S+\s+)\S+\s+(.*)/$1$2/' eventlog.txt >eventlog.nodatetime.txt
    

    Or use a splice on a split? Or ...