Search code examples
c++operatorsextractiostreamfstream

How exactly does the extract>> operator works in C++


I am a computer science student, an so do not have much experience with the C++ language (considering it is my first semester using this language,) or coding for that matter.

I was given an assignment to read integers from a text file in the simple form of:

19 3 -2 9 14 4
5 -9 -10 3
.
.
.

This sent me of on a journey to understand I/O operators better, since I am required to do certain things with this stream (duh.)

I was looking everywhere and could not find a simple explanation as to how does the extract>> operator works internally. Let me clarify my question:

I know that the extractor>> operator would extract one continues element until it hits space, tab, or newline. What I try to figure out is, where would the pointer(?) or read-location(?) be AFTER it extracts an element. Will it be on the last char of the element just removed or was it removed and therefore gone? will it be on the space/tab/'\n' character itself? Perhaps the beginning of the next element to extract?

I hope I was clear enough. I lack all the appropriate jargon to describe my problem clearer.


Here is why I need to know this: (in case anyone is wondering...) One of the requirements is to sum all integers in each line separately. I have created a loop to extract all integers one-by-one until it reaches the end of the file. However, I soon learned that the extract>> operator ignores space/tab/newline. What I want to try is to extract>> an element, and then use inputFile.get() to get the space/tab/newline. Then, if it's a newline, do what I gotta do. This will only work if the stream pointer will be in a good position to extract the space/tab/newline after the last extraction>>.


In my previous question, I tried to solve it using getline() and an sstring.


SOLUTION:

For the sake of answering my specific question, of how operator>> works, I had to accept Ben Voigt's answer as the best one. I have used the other solutions suggested here (using an sstring for each line) and they did work! (you can see it in my previous question's link) However, I implemented another solution using Ben's answer and it also worked:

        .
        .
        .

if(readFile.is_open()) {
        while (readFile >> newInput) {
                char isNewLine = readFile.get();    //get() the next char after extraction

                if(isNewLine == '\n')               //This is just a test!
                        cout << isNewLine;          //If it's a newline, feed a newline.
                else
                        cout << "X" << isNewLine;   //Else, show X & feed a space or tab

                lineSum += newInput;
                allSum += newInput;
                intCounter++;
                minInt = min(minInt, newInput);
                maxInt = max(maxInt, newInput);

                if(isNewLine == '\n') {
                        lineCounter++;
                        statFile << "The sum of line " << lineCounter
                        << " is: " << lineSum << endl;
                            lineSum = 0;
                }
        }
        .
        .
        .

With no regards to my numerical values, the form is correct! Both spaces and '\n's were catched: test

Thank you Ben Voigt :)

Nonetheless, this solution is very format dependent and is very fragile. If any of the lines has anything else before '\n' (like space or tab), the code will miss the newline char. Therefore, the other solution, using getline() and sstrings, is much more reliable.


Solution

  • After extraction, the stream pointer will be placed on the whitespace that caused extraction to terminate (or other illegal character, in which case the failbit will also be set).

    This doesn't really matter though, since you aren't responsible for skipping over that whitespace. The next extraction will ignore whitespaces until it finds valid data.

    In summary:

    • leading whitespace is ignored
    • trailing whitespace is left in the stream

    There's also the noskipws modifier which can be used to change the default behavior.