Search code examples
cfilefopenfile-handlingfread

Reading bytes from binary file with 2 byte buffer


I am currently trying to read a file and calculate the frequencies of 1 byte equivalent numbers (0 to 255). I want to do the same for 2 byte equivalent numbers (0 to 65535)

Simplified version of what I have:

int length = 256; //any value 256>
long long values[length]
char buffer[length]
int i,nread;

fileptr = fopen("text.txt", "rb");

for (i=0; i<length; i++){ values[i]=0 }
while((nread = fread(buffer, 1, length, fileptr)) > 0){
   for(i=0;i<nread;i++){
      values[(unsigned char)buffer[i]]++;
   }
}

fclose(fileptr);

for(i=0;i<length;i++{ 
   printf("%d: %lld",i, values[i]); 
}

What I am getting now:

0: 21

1: 27

...

255: 19

What I want:

0: 4

1: 2

...

65535: 3

Solution

  • At the outset, let me correct what you have said. As of now you are not printing the frequencies of 2 byte range. In general unsigned char is 1 byte (8 bits) and the results you are getting are also in accordance with what I said 8 bits => 0 <-> 2^8 -1 => 0 <-> 255

    For getting the frequencies of 16 bits range you can use u_int16_t, Code goes something like this

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    
    int main () {
        FILE* fp = NULL;
    
        /* Open file and setup fp */
    
        int *freq = (int*) calloc(65536, sizeof(int));
    
        u_int16_t value;
    
        for ( ; ; ) {
            if (read(fileno(fp), &value, sizeof(value)) < sizeof(value)) {
                /* Assuming partial reads wont happen, EOF reached or data remaining is less than 2 bytes */
                break;
            }
    
            freq[value] = freq[value] + 1;
        }
    
        for (int i = 0; i < 65536 ; i++) {
            printf("%d : %d\n", i, freq[i]);
        }
    
        return 0;
    }