Search code examples
c++inputstreamstringstream

C++: How to read from a custom text file based on position/length of fields (no delimiter)?


I am new to C++, and have some issues reading from a custom text file.

Suppose each line of the file includes the fields EXACTLY with the following format:

  • A float starting at position 0, with length 6
  • An integer starting at position 6, with length 2
  • A string starting at position 8, with length 4
  • An integer starting at position 12, with length 3

Note that there is no whitespace (or any other delimiter) between the fields. But their position and max length are determined. Here is two lines from a sample file:

13.33456Text-13
 3.123 3txt  23

Which should be extracted this way (i.e., stored in separate variables):

13.334  56  Text  -13
 3.123   3  txt    23

Here is the way I am currently implementing it via string.substr() method:

#include <iostream>
#include <fstream>
#include <sstream>

using namespace std;

int main()
{
    double f1;
    int d1, d2;
    string str;

    ifstream fin;
    fin.open("Input/TestText01.txt");

    if (fin.is_open())
    {
        string line;

        while (getline(fin, line))
        {
            f1  = stod(line.substr(0, 6));
            d1  = stoi(line.substr(6, 2));
            str = line.substr(8, 4);
            d2  = stoi(line.substr(12, 3));

            // Do whatever with the extracted data
        }
        cout << "Reached end of file" << endl;
        fin.close();
    }
    else
        cout << "Error! File did not open successfully" << endl;
}

But I guess using substr (with stod/stoi) is not optimized, since my input files might be very large. However, simply extracting from a stringstream is not possible here (due to lack of delimiter):

stringstream ss{ line };
ss >> f1 >> d1 >> str >> d2;  // does not work

So, what is the best way to accomplish this task? Giving an explanation alongside some sample code will be appreciated. Please also give me some hints for exception handling in case of a corrupted file (i.e., a file with wrong structure).


Solution

  • Your way of solving the problem is already good and I don't think that it can be significantly improved.

    One micro-optimization that you could do is the following: So that you don't have to copy the strings, you could use the functions std::strtol and std::strtod directly on the input string returned by std::getline. However, those two functions require a terminating null character. You could temporarily overwrite the first character of the next field with such a null character and, when you are finished, you can restore that character to its original value. However, I doubt that this optimization would be worth the effort, because it would sacrifice code clarify for a very minor performance increase.

    You will probably have more success in increasing performance by configuring the C++ standard library in such a way that it is optimized for performance instead of compatibility with the streams of the C standard library. See these questions for further information:

    Regarding handling exceptions in the case of a corrupted file:

    The functions std::string::substr, std::stod and std::stoi can throw exceptions of type std::invalid_argument and exceptions of type std::out_of_range. Both of these types are derived from the class std::logic_error. Therefore, if you want to skip the current line, print an error message and continue with the next line when such an exception is thrown, I recommend that you enclose the 4 lines of the loop in a try/catch block which catches exceptions of type std::logic_error, like this:

    #include <iostream>
    #include <fstream>
    #include <sstream>
    
    using namespace std;
    
    int main()
    {
        double f1;
        int d1, d2;
        string str;
    
        ifstream fin;
        fin.open("Input/TestText01.txt");
    
        if (fin.is_open())
        {
            string line;
    
            while (getline(fin, line))
            {
                try
                {
                    f1  = stod(line.substr(0, 6));
                    d1  = stoi(line.substr(6, 2));
                    str = line.substr(8, 4);
                    d2  = stoi(line.substr(12, 3));
                }
                catch ( const std::logic_error &e )
                {
                    cout << "Skipping line due to invalid input!\n";
                    continue;
                }
    
                // Do whatever with the extracted data
            }
            cout << "Reached end of file" << endl;
            fin.close();
        }
        else
            cout << "Error! File did not open successfully\n" << endl;
    }