Search code examples
windowsperlutf-8perl-io

Program dies on umlauts in filename


I hope you can help me, I can't find the reason why my code stops. I'd be grateful for every enhancement, I have to work with Perl and I've never done it before! Also I have to work on a Windows File System.

Error:

Could not open file '' No such file or directory at C:\Users\schacherl\Documents\perl\tester.pl line 29, line 1.

FYI: the FILElog.txt file contains subfolders like

"vps_bayern_justiz_15027148042584275712825768716427"

EDALOG contains the fully qualified link to the EDA-File

"W:\EGVP\manuelle Nachrichten\heruntergeladene_DONE\EGVP_GP114503661816195610088017045919978\attachments\Staßfurt_AIA100.eda"

At this exact file above the program dies. For all others it seems to work so far, just those "Staßfurt" files it can't handle as it seems. If I'm encoding the other files with UTF-8 like the first one, I get a lot of

UTF-8 "\x84" does not map to Unicode at C:\Users\zhengphor\Documents\perl\tester.pl line 32, line 4.

UTF-8 "\x81" does not map to Unicode at C:\Users\zhengpor\Documents\perl\tester.pl line 32, line 4.

If I don't have a Staßfurt file, it works fine. This is just the Part where the error happens, I've excluded the whole handling of the $returner variable.

I'd be really grateful! I can't find why the Staßfurt file makes this error.

#!/usr/local/bin/perl -w -l

use Switch;
use Data::Dumper;

`chcp 65001`;
sub getAusgabe{
`dir "W:\\EGVP\\manuelle Nachrichten\\heruntergeladene\\_DONE\\ /AD /B  1>FILElog.txt`;
print 'written file log';
my $filename = 'FILElog.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
  or die "Could not open file '$filename' $!";

while (my $row = <$fh>) {
chomp $row;
  if($row ne 'DONE'){
    `dir "W:\\EGVP\\manuelle Nachrichten\\heruntergeladene\\_DONE\\$row\\*.eda" /S /B  1>EDAlog.txt`;
    print 'written eda log';
    my $filename = 'EDAlog.txt';
    open(my $fh1, $filename)
      or die "Could not open file '$filename' $!";

      while(my $row2 = <$fh1>){
        chomp $row2;
        print 'Datei:'. $row2;
        open(my $fh2, $row2)
          or die "Could not open file '$$row2' $!";
            print 'ich bin drin';
            while (my $rowFile = <$fh2>) {
                $returner .= $rowFile;
                print 'hier könnte ihr text stehen';
            }

    }
  }

}
print 'ich habe fertig';
return $returner;
}


$ausgabe1 = getAusgabe;

Solution

  • On Windows, you need to either:

    • manually encode file names (that have non-ASCII characters in them) before use, or
    • use a package that does it for you while using its file functions, like Win32::LongPath.

    E.g.:

    use strict;
    use warnings;
    use utf8;
    
    use Win32::LongPath;
    
    my $filename;
    my $fh;
    
    $filename = "Unicode file with ä in name.txt";
    openL(\$fh, '>:encoding(UTF-8)', $filename)
        or die "Could not open file '$filename' ($^E)";
    print $fh "Unicode stuff written to file...\n";
    close $fh;
    
    $filename = "Another file with ö in it.txt";
    # only three-argument version is supported - MODE argument is always required:
    openL(\$fh, '<', $filename)
        or die "Could not open file '$filename' ($^E)";
    my @lines = <$fh>;
    close $fh;
    

    Note the use of a reference to the file handle variable (\$fh) instead of the variable itself.

    As a bonus, using Win32::LongPath allows you to manipulate files with extra long full names (full path beyond the usual limit of 260 characters including the terminating NUL character). (You shouldn't get into a habit of doing this, however, since many other applications can't access such files.)