Search code examples
c++fileifstreamgetlineistringstream

Using getline() when my line is split into two lines, C++


I am reading from a file, specifically BLS data in text files. Some of the metro names are long and so the data for that city extends to the next line. A snippet from the file:

999 26380 Houma-Thibodaux, LA  42   42   0   0  0   0   94
288 26420 Houston-The Woodlands-Sugar Land,
  TX              4424    3046       4       0    1374      34     100
170 26580 Huntington-Ashland, WV-KY-OH  7   1   0   6   0    0     85

I need to save the ints after the state into vectors, so first am placing them into the string variables "total, one,..., per." I omitted the vectors I am placing them in for brevity. When the first two ints, the city, state, and next seven ints are on the same line (as they are for Houma and Huntington) my program works fine.

#include <fstream>
#include <string>
#include <vector>
#include <sstream>
using namespace std;
int main() {
    ifstream BLS;
    vector <string> cities = {"Houston-The Woodlands-Sugar Land"};
    string city, state, total, one, two, three, five, struc, per, temp;
    BLS.open("trial.txt");
    if (BLS.is_open()) {
        string line;
        while(getline(BLS, line)) {
            istringstream sin(line);                 
            getline(sin, temp, ' ');
            getline(sin, temp, ' ');
            getline(sin, city, ','); 
            for (int i = 0; i < (int) cities.size(); i++) {
                if (city == cities[i]) {     
                    if (getline(sin, line) ) {
                        istringstream in(line);
                        in >> state >> total >> one >> two >> three >> five >> struc >> per;
                    }
                    else {
                        sin.ignore();
                        getline(sin, line);
                        istringstream in(line);
                        in >> state >> total >> one >> two >> three >> five >> struc >> per;
                    }
                    cout << " " << city << " " << state << " " << total << " " << five << endl;
                }
            }
        }
    }
}

As with Houston, the getline in the "if" statement is empty, so "else" executes. But I would expect sin.ignore() and getline(sin, line) again to then read the following line:

TX    4424    3046    4   0   1374   34   100

but it never does. getline(sin, line) seems to continue to read the empty line.

Any help would be massively appreciated.


Solution

  • Don't use getline

    If the data are not structured cleanly into lines, then don't use std::getline. At least, that is, not in this case.

    This solution uses a struct named BLS_Table_3u to contain the information from one record. It requires that the city field does not contain any commas, and also that each city is followed by a comma, and then the state. A scan of the data file on the BLS website confirms that these are valid assumptions.

    struct BLS_Table_3u
    {
        // Copy this struct into your own program.
        // Source: https://www.census.gov/construction/bps/txt/tb3u201501.txt
        int csa;
        int cbsa;
        std::string name;
        int total; 
        int units_1; 
        int units_2;
        int units_3_and_4;
        int units_5_or_more;
        int n_structures_with_5_units_or_more;
        int monthly_coverage_percent;
    
        // Stream operators follow ... copy them too!
    };
    

    The struct defines operator<< and operator>> to handle I/O. operator>> reads one record from the data file, and stores the data it a BLS_Table_3u struct. Similarly, operator<< writes the record stored in a BLS_Table_3u struct.

    First, of course, you must open the file.

        std::string file_name{ "BLS_Table_3u.txt" };
        std::ifstream ist(file_name);
        if (!ist.is_open())
        {
            std::cout << "Error - Could not open data file.\n\n";
            return 1;
        }
    

    With the file successfully opened, you can read a record into a BLS_Table_3u struct like this. If you have reached the end-of-file, the read attempt will fail.

        BLS_Table_3u bls;  // Variable `bls` holds the data from one record.
        if (ist >> bls)
        { 
            // Extraction was successful. Process the data stored in `bls`.
        }
        else
        {
            // End-of-file: All the records have been read.
        }
    

    More often than not, you will want to read the records in a loop. The test in the while-loop below is analogous to the test in the foregoing if-statement. The loop ends when you hit end-of-file.

        BLS_Table_3u bls;
        while (ist >> bls)
        {
            // After reading a record, you can access its fields 
            // using things like `bls.name` and `bls.total`.
            // For example:
            // 
            //     if (bls.total > 1000) 
            //     {
            //          std::cout << bls.name << " has " 
            //              << bls.total << " units.\n";
            //     }
            // 
            // You can refer to the entire record using `bls` by itself, 
            // which is done below.
    
            std::cout << bls;  // Write a record to `std::cout`.
        }
        // Loop ends when you hit eof.
    

    When an error is detected by operator>>, it throws a std::runtime_error. This might happen, for instance, if a field is missing, or non-numeric characters are encountered when a number was expected.

    The test program below simply copies the file onto the screen. This is the data file I used for testing.

    999 26380 Houma-Thibodaux, LA  42   42   0   0  0   0   94
    288 26420 Houston-The Woodlands-Sugar Land,
      TX              4424    3046       4       0    1374      34     100
    170 26580 Huntington-Ashland, WV-KY-OH  7   1   0   6   0    0     85
    

    And this is the complete test program. It contains the full definition for struct BLS_Table_3u. This is the one you should copy.

    // StackOverflow_77548359_BLS_data_Answer.ixx
    // https://stackoverflow.com/q/77548359/22193627
    
    export module main;
    import std;
    
    struct BLS_Table_3u
    {
        // Copy this struct into your own program.
        // Source: https://www.census.gov/construction/bps/txt/tb3u201501.txt
        int csa;
        int cbsa;
        std::string name;
        int total; 
        int units_1; 
        int units_2;
        int units_3_and_4;
        int units_5_or_more;
        int n_structures_with_5_units_or_more;
        int monthly_coverage_percent;
    
        friend std::ostream& operator<< (std::ostream& ost, BLS_Table_3u const& bls)
        {
            ost << bls.csa
                << ' ' << bls.cbsa
                << ' ' << bls.name
                << ' ' << bls.total
                << ' ' << bls.units_1
                << ' ' << bls.units_2
                << ' ' << bls.units_3_and_4
                << ' ' << bls.units_5_or_more
                << ' ' << bls.n_structures_with_5_units_or_more
                << ' ' << bls.monthly_coverage_percent
                << '\n';
            return ost;
        }
    
        friend std::istream& operator>> (std::istream& ist, BLS_Table_3u& bls)
        {
            if (!(ist >> bls.csa))
            {
                if (ist.eof())
                    return ist;
                throw std::runtime_error(
                    "BLS_Table_3u: Stream failed mid-record in `operator>>`");
            }
            ist >> bls.cbsa >> std::ws;
            std::string city, state;
            std::getline(ist, city, ',');
            ist >> state 
                >> bls.total 
                >> bls.units_1 
                >> bls.units_2 
                >> bls.units_3_and_4 
                >> bls.units_5_or_more 
                >> bls.n_structures_with_5_units_or_more 
                >> bls.monthly_coverage_percent;
            if (!ist.eof())
                ist.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
            if (ist.fail())
                throw std::runtime_error(
                    "BLS_Table_3u: Stream failed mid-record in `operator>>`");
            bls.name = city + ", " + state;
            return ist;
        }
    };
    
    export int main()
    {
        std::string file_name{ "BLS_Table_3u.txt" };
        std::ifstream ist(file_name);
        if (!ist.is_open())
        {
            std::cout << "Error - Could not open data file.\n\n";
            return 1;
        }
        BLS_Table_3u bls;
        while (ist >> bls)
        {
            // After reading a record, you can access its fields 
            // using things like `bls.name` and `bls.total`.
            // For example:
            // 
            //     if (bls.total > 1000) 
            //     {
            //          std::cout << bls.name << " has " 
            //              << bls.total << " units.\n";
            //     }
            // 
            // You can refer to the entire record using `bls` by itself, 
            // which is done below.
    
            std::cout << bls;  // Write a record to `std::cout`.
        }
        ist.close();
        return 0;
    }
    // end file: StackOverflow_77548359_BLS_data_Answer.ixx
    

    Output:

    999 26380 Houma-Thibodaux, LA 42 42 0 0 0 0 94
    288 26420 Houston-The Woodlands-Sugar Land, TX 4424 3046 4 0 1374 34 100
    170 26580 Huntington-Ashland, WV-KY-OH 7 1 0 6 0 0 85