Search code examples
centos6line-endingsdos2unix

dos2unix modifies binary files - why


By default it is not supposed to affect binary files.

I tested it in a folder with images and although most images were not affected, a few were. If dos2unix cannot tell a binary file from a text file, must I resort to specifically including and/or excluding certain file extensions for it to work properly?

NOTE: when I run file image.jpg on any of the jpgs, whether it got modified or not, the result is:

JPEG image data, JFIF standard 1.01

Solution

  • This is a relevant part of the source code of dos2unix program:

    if ((ipFlag->Force == 0) &&
          (TempChar < 32) &&
          (TempChar != 0x0a) &&  /* Not an LF */
          (TempChar != 0x0d) &&  /* Not a CR */
          (TempChar != 0x09) &&  /* Not a TAB */
          (TempChar != 0x0c)) {  /* Not a form feed */
            RetVal = -1; 
            ipFlag->status |= BINARY_FILE ;
            if (ipFlag->verbose) {
              if ((ipFlag->stdio_mode) && (!ipFlag->error)) ipFlag->error = 1;
              d2u_fprintf(stderr, "%s: ", progname);
              d2u_fprintf(stderr, _("Binary symbol 0x00%02X found at line %u\n"),TempChar, line_nr);
            }
            break;
          } 
    

    It seems that if the file has other control character it is considered as a binary file and is skipped, otherwise it is processed as a text file. So if the binary file (e.g. an image) doesn't contain these characters, it will be corrupted.