Search code examples
chashbytesequentialdeduplication

Sequential Byte-By-Byte Comparison


How would I use xor bitwise operations to perform byte by byte comparison in c? When comparing two files

#include<stdio.h>
int main()
{
    FILE *fp1, *fp2;
    int ch1, ch2;
    char fname1[40], fname2[40] ;

    printf("Enter name of first file :") ;
    gets(fname1);

    printf("Enter name of second file:");
    gets(fname2);

    fp1 = fopen( fname1,  "r" );
    fp2 = fopen( fname2,  "r" ) ;

    if ( fp1 == NULL )
       {
       printf("Cannot open %s for reading ", fname1 );
       exit(1);
       }
    else if (fp2 == NULL)
       {
       printf("Cannot open %s for reading ", fname2 );
       exit(1);
       }
    else
       {
       ch1  =  getc( fp1 ) ;
       ch2  =  getc( fp2 ) ;

       while( (ch1!=EOF) && (ch2!=EOF) && (ch1 == ch2))
        {
            ch1 = getc(fp1);
            ch2 = getc(fp2) ;
        }

        if (ch1 == ch2)
            printf("Files are identical n");
        else if (ch1 !=  ch2)
            printf("Files are Not identical n");

        fclose ( fp1 );
        fclose ( fp2 );
       }
return(0);
 }

I get the following warnings and then when I run it says my test2.txt is null but there is data in it??

hb@hb:~/Desktop$ gcc -o check check.c
check.c: In function ‘main’:
check.c:21:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default]
check.c:26:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default]
hb@hb:~/Desktop$ 


hb@hb:~/Desktop$ ./check
Enter name of first file :test1.txt
Enter name of second file:test2.txt
Cannot open test2.txt for reading hb@hb:~/Desktop$

Any ideas?


Solution

  • There are many ways to do this, if you have the two files side by side, the easiest is simply reading them side by side and compare the buffer.

    #define BUFFERSIZE 4096
    FILE *filp1, *filp2;
    char *buf1, *buf2;
    bool files_equal;
    int read1, read2;
    
    
    filp1 = fopen("file1", "rb");
    filp2 = fopen("file2", "rb");
    
    // Don't forget to check that they opened correctly.
    
    buf1 = malloc(sizeof(*buf1)*BUFFERSIZE);
    buf2 = malloc(sizeof(*buf2)*BUFFERSIZE);
    
    files_equal = true;
    
    while ( true ) {
        read1 = fread(buf1, sizeof(*buf1), BUFFERSIZE, filp1);
        read2 = fread(buf2, sizeof(*buf2), BUFFERSIZE, filp2);
    
        if (read1 != read2 || memcmp( buf1, buf2, read1)) { 
             files_equal = false;
             break;
        }
    }
    

    You might get some false negatives if an error occurs while reading the files, but you could probably add some extra checks for that.

    If, on the other hand, your files are on two different computers, or you want to process a large number of files and find out if ANY of them are equal. The best approach is to use a checksum.

    A good checksum would come from a good hash-function. Depending on your security-requirements common implementations use:

    • SHA-1, SHA-2 or SHA-3
    • MD5

    Many others also exist. Wiki