Search code examples
ccomparisonbytebit-manipulationduplication

Int vs Float: Counter


Code:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include <time.h>

int main()
{
    FILE *fp1, *fp2;
    int ch1, ch2;
    clock_t elapsed;
    char fname1[40], fname2[40];

    printf("Enter name of first file:");
    fgets(fname1, 40, stdin);
    while ( fname1[strlen(fname1) - 1] == '\n')
    {
        fname1[strlen(fname1) -1] = '\0';
    }

    printf("Enter name of second file:");
    fgets(fname2, 40, stdin);
    while ( fname2[strlen(fname2) - 1] == '\n')
    {
        fname2[strlen(fname2) -1] = '\0';
    }

    fp1 = fopen(fname1, "r");
    if ( fp1 == NULL )
    {
        printf("Cannot open %s for reading\n", fname1 );
        exit(1);
    }

    fp2 = fopen( fname2,  "r");
    if (fp2 == NULL)
    {
        printf("Cannot open %s for reading\n", fname2);
        exit(1);
    }

    elapsed = clock(); // get starting time

    ch1  =  getc(fp1); // read a value from each file
    ch2  =  getc(fp2);

    float counter = 0.0;
    float total = 0.0;

    while(1) // keep reading while values are equal or not equal; only end if it reaches the end of one of the files
    {
        ch1 = getc(fp1);
        ch2 = getc(fp2);

    //printf("%d, %d\n", ch1, ch2);// for debugging purposes

    if((ch1 ^ ch2) == 0)
    {
       counter++;
    }

    total++;

        if ( ( ch1 == EOF) || ( ch2 == EOF)) // if either file reaches the end, then its over!
        {
            break; // if either value is EOF
        }
    }

    fclose (fp1); // close files
    fclose (fp2);

    float percent = (counter / (total)) * 100.0;

    printf("Counter: %.2f Total: %.2f\n", counter, (total));
    printf("Percentage: %.2f%\n", percent);

    elapsed = clock() - elapsed; // elapsed time
    printf("That took %.4f seconds.\n", (float)elapsed/CLOCKS_PER_SEC);
    return 0;
}

Trying to compare two .nc files that are about 1.4 GBs and these are my results:

$ gcc check2.c -w
$ ./a.out
Enter name of first file:air.197901.nc
Enter name of second file:air.197902.nc
Counter: 16777216.00 Total: 16777216.00
Percentage: 100.00%
That took 15.6500 seconds.

No way they are 100% identical lol, any ideas on why it seems to stop at the 16777216th byte?

The counter should be 1,256,756,880 bytes

1.3 GB (1,256,756,880 bytes)

I downloaded this climate data set here:

ftp://ftp.cdc.noaa.gov/Datasets/NARR/pressure/

Thanks for your help in advance


Solution

  • The float data type is only precise to 6 significant figures and is inappropriate for counter and total. Any floating point type would be inappropriate in any case. Ther are a number of issues with this, not least that ++ for example is an integer operator, the implicit conversion from float to int, increment, then back to float will fail for integer values with greater than 6 digits.

    I assume you chose such a type because it has greater range that unsigned int perhaps? I suggest that you use unsigned long long for these variables.

    unsigned long long counter = 0;
    unsigned long long total = 0;
    
    ...
    
    float percent = (float)counter / (float)total * 100.0f ;