Search code examples
cmmapqsort

sorting a mmaped file with random integers in C


I'm trying to create a C program that creates a txt file with random integers, and then mmap the said file and qsort it. Creating the txt and mapping goes smoothly, but I can't figure out why qsort just destroys it. My guess, it's something to do with data types, but even after playing around with them, I get more or less the same result.

compar:

int cmp(const void *p1, const void *p2)
{
    return (*(const int *)p1 - *(const int *)p2);
}

mmap and qsort:

    char *addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, myFile, 0); 
    for (int i = 0; i < size; i++)
         printf("%c", addr[i]);
    qsort(addr, 20, sizeof(char), cmp);
    printf("\n-------------\n");
    for (int i = 0; i < size; i++)
         printf("%c", addr[i]);

output example:

10
19
9
8
18
2
9
6
3
7
15
12
12
14
6
2
4
3
15
13

-------------

296

1
9
93818
1
0

7
15
12
12
14
6
2
4
3
15
13

So 20 random integers are created in txt and mapped with no problems. But I guess qsort doesn't like what's given to it. I tried having mmap as int, but that created other problems like incorrect numbers in the array and incorrect amount of numbers and some 0s (perhaps I handled the output incorrectly). I'm not exactly sure what I'm doing wrong. here is the full code in case if it's needed:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <time.h>
#include <stdlib.h>

int cmp(const void *p1, const void *p2);
void rand_txt(int size);

int main() {

    rand_txt(20);

    int myFile = open("rand.txt", O_RDWR);
    struct stat myStat = {};
    fstat(myFile, &myStat);
    off_t size = myStat.st_size;

    char *addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, myFile, 0);

    for (int i = 0; i < size; i++)
        printf("%c", addr[i]);

    qsort(addr, 20, sizeof(char), cmp);
    printf("\n-------------\n");
    for (int i = 0; i < size; i++)
        printf("%c", addr[i]);

    return 0;
}

int cmp(const void *p1, const void *p2) {
    return (*(const int*)p1 - *(const int*)p2);
}

//Function to create a txt with random integers in given size
void rand_txt(int size) {
    FILE *fp = fopen("rand.txt", "w");
    srand(time(0));
    for (int i = 0; i < size; i++)
        fprintf(fp, "%d\n", (rand() % size) + 1);
    fclose(fp);
}

Also a quickie: Do the arrays and pointers behave in the same way in practice(like strings)? say we declare: int *x = { 10, 20, 30, 40 }; can we just fprint("%d\t", x[i]); or such type of arrays must be declared as int[] = { 10, 20... }; and not with pointers? (I can and will test it myself and edit the "quickie" out if there is no answer until after I am able to test and study it)


Solution

  • You cannot sort a text file with numbers of different sizes by mapping the file in memory and calling qsort() with the comparison function for multiple reasons:

    • qsort needs an array where all entries have the same size, which is not the case of textual representations.
    • the comparison function assumes an array of int, not an array of textual representations
    • even for an array of int, the comparison function would be inappropriate for large absolute values as the subtraction might overflow.

    To sort the text file, the classic approach is to allocate an array of int and load the values from the file, converting from text representation to int values, sort the array with qsort with an appropriate comparison function, and write the sorted array converting back to string representations.

    Here is a comparison function you can use:

    int cmp(const void *p1, const void *p2) {
        const int *i1 = p1;
        const int *i2 = p2;
        return (*i1 > *i2) - (*i1 < *i2);
    }