Search code examples
perldebuggingtext-filescifs

Text file is moved incompletely to cifs mountpoint with Perl


Well, this one is the weirdest filesystem related issue i had in a loooooong time.

I have a script, that basically connects on a remote imap server, downloads emails, marks them as read, rip-off trash to download only .txt and .xml files. If .txt use Text::Unaccent to remove accents.

This is done on a 1-to-1 relationship of imap remote folder to a local cifs mounted folder on this server. The remove imap download and accentuation handling works just fine.

My problem is: IF i download the file, handle accentuation and move it to a cifs mounted directory, the file gets ripped off(last 4 to 10K is missing). If i move it to anoter partition on the same machine, files are moved on a sane fashion(same md5sum, same filesize, no changes noticed by diff).

The chunk of code that does the accents remove and moves the file:

      #If file extension = .txt
      if ("$temp_dir/$arquivo" =~ /txt$/i){

         #Put file line by line inside array
         open (LEITURA, "$temp_dir/$arquivo");
         @manipular = <LEITURA>;
         close LEITURA;

         #Open the same file to writing with other filehandler
         open (ESCRITA, ">", "$temp_dir/$arquivo");
         foreach $manipula_linha (@manipular){
           # Removes & and accents
           $manipula_linha =~ s/\&/e/g;
           $manipula_linha = unac_string("UTF-8", $manipula_linha);
           print ESCRITA $manipula_linha;
         };
      };

      # copy temp file to final destination. If cifs = crash
      # move also does not work...
      copy   "$temp_dir/$arquivo", "$dest_file";
      unlink "$temp_dir/$arquivo";

Cifs version:

[root@server mail_downloader]# modinfo cifs
filename:       /lib/modules/2.6.18-409.el5.centos.plus/kernel/fs/cifs/cifs.ko
version:        1.60RH
description:    VFS to access servers complying with the SNIA CIFS Specification e.g. Samba and Windows
license:        GPL

Perl version:

[root@server mail_downloader]# perl --version

This is perl, v5.8.8 built for i386-linux-thread-multi

Unaccent version:

[root@server mail_downloader]# rpm -qa | grep Unaccent
perl-Text-Unaccent-1.08-1.2.el5.rf

Question: Any clues on why perl move or copy have this behavior with a cifs mountpoint and how to solve this?

Obviously i cant post the files contents here, cause they are EDI related stuff, and have some financial info.

Also, if i comment the perl copy handle the file myself after unaccent is done using cp or mv, the file is moved correctly to the cifs mountpoint.


Solution

  • The problem is really obvious - you're not closing the file once you've finished writing to it. When you copy/move it to the other file system, you lose a chunk of it that hasn't been synced to disk.

         open (ESCRITA, ">", "$temp_dir/$arquivo");
         foreach $manipula_linha (@manipular){
           # Removes & and accents
           $manipula_linha =~ s/\&/e/g;
           $manipula_linha = unac_string("UTF-8", $manipula_linha);
           print ESCRITA $manipula_linha;
         };
    
         # Flush the file
         my $old_fh = select(ESCRITA);
         $| = 1;
         select($old_fh);
    
         close ESCRITA;
      };
    
      move   "$temp_dir/$arquivo", "$dest_file";