Search code examples
c++filedoubleifstream

Read rows of numbers from txt file in C++


Good morning guys,

I have an assignment for school and I hoped you could help me on this one. The goal of the program is very simple. Calculate the sum of the numbers on each line of the file, and display on the screen the N highest distinct results in decreasing rorder, with N the number of occurrences for each result, N being supplied as a parameter by the user (default value = 3). So as the title says i'm working in C++, and my program has to read rows of numbers (double) from a txt file provided. I already know the concept ot ifsream types, and managed to open the file. I know that I can use the >> operator to read from the file, but the number of doubles per row is not fixed, so i can't make a simple loop. Here is how it looks so far on my side :

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int main(){

    int nbresult=3;
    string filename;
    double tmp;

    cin >> filename;
    cin >> nbresult;

    ifstream file;
    file.open(filename.c_str());

    if(file.is_open())
    {
        cout << "Opening file: " << filename << endl;

        while(file.eof() == false)
        {
            vector<double> nbres;
            file >> tmp;
            nbres.push_back(tmp);
        }

        fichier.close();
    }
    else 
    {
        cout << "Erreur à l'ouverture !" << endl;
    }

    return 0;
}

So my idea was putting the numbers in a vector and summing up, but I've realized that I would need to create an instance of vector for each row. Plus, my reading method wouldn't allow me to create multiple vectors because it reads without acknowledging the fact that the numbers are in different rows.

Can you guy guide me to an efficient solution ? I'm really starting to loose it lol.

Thanks in advance ! matt


Solution

  • If I understand your questions, and if you are still stuck, the overview of the process would simply be filling a vector with the top n sums (default: 3) computed from each line of values within a file. You want the value in descending order.

    Whenever you need to get an unknown number of values from a line within a file (regardless whether the values are delimited by commas, or spaces, etc..), your approach should be to read an entire line of data into a string, creating a stringsteam from the line, and then looping inputting values from the stringstream until EOF is encountered on the stringstream.

    Why a stringstream and not just read values directly from the file? (Answer: line-control). Since cin discards leading whitespace, and '\n' (the newline) is whitespace, there is no way to determine when you reach the end of a line reading directly from the file. By reading the line first and then creating a stringstream, you can simply read until you reach the end of the stringstream you have created -- and you have input all the values in a single line.

    The only vector you need to maintain throughout your code is the vector of sums in decreasing order. When reading each of the values from the stringstream you create, you can simply use a temporary vector for purposed of storing each of the values in a given line and then call accumulate on the temporary vector to provide the sum.

    The challenge is maintaining the top X number of sums in your final results vector to output at the end of the program. The approach there is actually fairly straight-forward. If the sum is the first sum, just use push_back() to store it. For all subsequent sums, use an iterator to traverse the vector comparing what is already stored against the current sum until the sum is greater than the element of the vector and then call the .insert() method to insert the current sum in your results vector before the element referenced by the iterator.

    When you are done, simply output the results vector using an auto-ranged for loop.

    There are many different ways to approach it, but sticking to what is above, you could do something like the following. The code is commented to help walk you through it:

    #include <iostream>
    #include <fstream>
    #include <sstream>
    #include <string>
    #include <vector>
    #include <numeric>  /* for accumulate */
    
    int main (int argc, char **argv) {
    
        if (argc < 2) {     /* validate at least filename given */
            std::cerr << "error: insufficient arguments\n"
                    "usage: " << argv[0] << " filename (nresults: 3)\n";
            return 1;
        }
    
        std::string filename = argv[1],     /* string for filename */
                    line;                   /* string to hold line */
        std::vector<int> results;           /* vector of results */
        std::ifstream f;                    /* input file stream */
        size_t nresults = 3, n = 0;         /* num of results, countner */
    
        if (argc >= 3)  /* if addition arg given, set nresults */
            nresults = std::stoi(argv[2]);
    
        f.open (filename);      /* open filename */
        if (! f.is_open()) {    /* validate file open for reading */
            perror (("error file open failed " + filename).c_str());
            return 1;
        }
    
        while (std::getline (f, line)) {    /* read each row of values */
            int val, sum;                   /* current value, line sum */
            std::vector<int> v;             /* vector to hold values */
            std::stringstream s (line);     /* create stringstream from line */
            while ((s >> val))              /* read each value */
                v.push_back (val);          /* add it to vector v */
            sum = accumulate (v.begin(), v.end(), 0);   /* sum values in v */
            if (results.empty())            /* if empty */
                results.push_back (sum);    /* just add */
            else    /* otherwise insert in decreasing order */
                for (auto it = results.begin(); it != results.end(); it++)
                    if (sum > *it) {
                        results.insert (it, sum);
                        break;
                    }
            if (results.size() > nresults)  /* trim excess elements */
                results.pop_back();
            n++;                            /* increment line count */
        }
        /* output results */
        std::cout << nresults << " greatest sums from " << n << " lines in " << 
                    filename << '\n';
        for (auto& p : results)
            std::cout << " " << p;
        std::cout << '\n';
    }
    

    (note: the code expects the filename as the 1st argument, and then takes an optional argument of the number of top sums to report -- using a default of 3)

    Example Input File

    The following input was simply produced by writing 50 lines containing 5 random values between 0 - 999:

    $ cat dat/50x5.txt
     106 114 604 482 340
     815 510 690 228 291
     250 341 774 224 545
     174 546 537 278 71
     706 139 767 320 948
     328 683 410 401 123
     140 507 238 744 990
     810 559 732 732 20
     24 982 361 30 439
     139 204 217 676 714
     288 615 853 287 935
     801 847 851 211 249
     206 583 756 676 328
     978 486 119 711 219
     139 967 433 733 997
     872 104 433 89 12
     147 609 627 0 897
     795 34 744 878 477
     225 84 61 982 761
     621 960 479 740 903
     930 112 870 364 77
     99 468 181 532 790
     193 911 399 53 912
     296 80 178 273 958
     887 498 274 180 712
     267 801 905 747 774
     40 677 118 911 273
     195 242 974 376 775
     764 801 686 163 854
     830 692 166 240 197
     124 128 927 399 540
     640 898 342 777 645
     348 817 555 466 960
     60 661 203 34 269
     978 798 302 896 194
     389 959 886 555 199
     83 680 559 10 311
     100 882 209 442 659
     87 22 709 874 488
     669 934 381 104 969
     650 314 999 952 211
     193 341 170 79 129
     601 394 809 161 637
     352 261 519 793 935
     411 112 957 352 986
     677 21 153 58 358
     122 708 672 353 892
     883 547 466 285 858
     595 887 253 636 48
     122 220 541 641 245
    

    If you want to validate the sums, you can use a short awk script[1].

    Example Use/Output

    $ ./bin/vector_n_greatest dat/50x5.txt
    3 greatest sums from 50 lines in dat/50x5.txt
     3703 3494 3302
    
    $ ./bin/vector_n_greatest dat/50x5.txt 4
    4 greatest sums from 50 lines in dat/50x5.txt
     3703 3494 3302 3269
    
    $ ./bin/vector_n_greatest dat/50x5.txt 10
    10 greatest sums from 50 lines in dat/50x5.txt
     3703 3494 3302 3269 3268 3168 3146 3126 3057 3039
    

    Look things over and let me know if you have further questions.

    footnotes:

    (1.) to output the sorted line sums for verification, you can use a short awk script and sort, e.g.

    awk '{
        sum = 0
        for (i=1; i<=NF; i++)
            sum += $i
        printf "%-20s (%4d)\n", $0, sum
    }' file | sort -r -b -k6.2
    

    The awk output would for the example file would show:

    $ awk '{
    >     sum = 0
    >     for (i=1; i<=NF; i++)
    >         sum += $i
    >     printf "%-20s (%4d)\n", $0, sum
    > }' dat/50x5.txt | sort -r -b -k6.2
     621 960 479 740 903 (3703)
     267 801 905 747 774 (3494)
     640 898 342 777 645 (3302)
     139 967 433 733 997 (3269)
     764 801 686 163 854 (3268)
     978 798 302 896 194 (3168)
     348 817 555 466 960 (3146)
     650 314 999 952 211 (3126)
     669 934 381 104 969 (3057)
     883 547 466 285 858 (3039)
     389 959 886 555 199 (2988)
     ...