Search code examples
carraysscanfsizeoffgets

How to read the txt file line by line.(int, unknown length) using C?


I want to merge sort the data from a text file.(line by line)

For my merge sort to work, I have to read in the data line by line, and fit them into an array in order to sort them. (We only know that there are at most 10000 integer per line) So I've done my research and tried these approach:

1. Use fgets, and strtok/strtol.

Problem: I don't know the max length of the char array. Besides, declaring a huge array may cause buffer overflow.

Source: How many chars can be in a char array?

2. Use fscanf to input into integer array.

Problem: The Same. I don't know how many integers there are in a line. So I won't be ok with "%d" part.(don't know how many there should be)

3. Use fscanf to input in the form of char array, and use strtok/strtol.

Problem: The Same. Since I don't know the length, I can't do something like

char *data;
data = malloc(sizeof(char) * datacount);

since the "datacount" is unknown.

Is there any way out?

UPDATE

Sample Input:

-16342 2084 -10049 10117 2786
3335 3512 -10936 5343 -1612 -4845 -14514

Sample Output:

-16342 -10049 2084 2786 10117
-14514 -10936 -4845 -1612 3335 3512 5343

Solution

  • You can indeed use fscanf to read the individual integers. What you need besides that is to know not only about pointers and malloc but also about realloc.

    You simply do something like

    int temporary_int;
    int *array = NULL;
    size_t array_size = 0;
    
    while (fscanf(your_file, "%d", &temporary_int) == 1)
    {
        int *temporary_array = realloc(array, (array_size + 1) * sizeof(int));
        if (temporary_array != NULL)
        {
            array = temporary_array;
            array[array_size++] = temporary_int;
        }
    }
    

    After that loop, if array is not a null pointer then it will contain all the integers from the file, no matter how many there were. The size (number of elements) is in the array_size variable.


    After seeing the update it's much easier to understand what is wanted.

    In pseudo-code it's easy:

    while(getline(line))
    {
        array_of_ints = create_array_of_ints();
    
        for_each(token in line)
        {
            number = convert_to_integer(token);
            add_number_to_array(array_of_ints, number);
        }
    
        sort_array(array_of_ints);
        display_array(array_of_ints);
    }
    

    Actually implementing this is much harder, and depends somewhat on your environment (like if you have access to the POSIX getline function).

    If you have e..g getline (or a similar function) then the outer loop in the pseudo-code is easy, and will look just about what it already does. Otherwise you basically have to read character by character into a buffer that you dynamically expand (using realloc) to fit the whole line.

    That brings us to the contents of the outer loop: Splitting the input into a set of values. The basic solution you already have by the first code-snippet in this answer, where I reallocate the array as needed in the loop. to split the values then strtok is probably the simplest one to use. And converting to an integer can be done with strtol (if you want validation) of atoi if you don't care about validating your input.

    Note that you don't really need to allocate the arrays dynamically. 10000 int values will, on current systems where sizeof(int) == 4 be "only" 40000 bytes. That's small enough to fit even on the stack of most non-embedded systems.