Search code examples
c++windowscarriage-returnlinefeed

line-counting, how to process CRLF?


I'm writing a simple wrapper-class for scanning a stream of characters character-by-character.

Scanner scanner("Hi\r\nYou!");
const char* current =  scanner.cchar();
while (*current != 0) {
    printf("Char: %d, Column: %d, Line: %d\n", *current, scanner.column(), scanner.line());
    current = scanner.read();
}

C:\Users\niklas\Desktop>g++ main.cpp -o main.exe
C:\Users\niklas\Desktop>main.exe
Char: 72, Column: 0, Line: 0
Char: 105, Column: 1, Line: 0
Char: 13, Column: 0, Line: 1
Char: 10, Column: 0, Line: 2
Char: 89, Column: 1, Line: 2
Char: 111, Column: 2, Line: 2
Char: 117, Column: 3, Line: 2
Char: 33, Column: 4, Line: 2

This example already shows the problem I'm stuck with. One can interpret \r as a new-line, as well as \n. But together (\r\ n) they are just a single new-line as well!

The function that processes line- and column-numbers is this:

void _processChar(int revue) {
    char chr = _source[_position];
    if (chr == '\r' or chr == '\n') {
        _line += revue;
        _column = 0;
    }
    else {
        _column += revue;
    }
}

Sure, I could just look at the character that appears after the character at the current position, but: I do not check for NULL-termination on the source because I want to be able to process character streams that may contain \0 characters without being terminated at that point.

How can I handle CRLF this way?

Edit 1: DOH! This seems to be working fine. Is this safe in any case or do I have an issue somewhere?

void _processChar(int revue) {
    char chr = _source[_position];

    bool is_newline = (chr == '\r' or chr == '\n');
    if (chr == '\n' and _position > 0) {
        is_newline = (_source[_position - 1] != '\r');
    }

    if (is_newline) {
        _line += revue;
        _column = 0;
    }
    else {
        _column += revue;
    }
}

Thanks!


Solution

  • This seems legit to me:

    void _processChar() {
        char chr = _source[_position];
    
        // Treat CRLF as a single new-line
        bool is_newline = (chr == '\r' or chr == '\n');
        if (chr == '\n' and _position > 0) {
            is_newline = (_source[_position - 1] != '\r');
        }
    
        if (is_newline) {
            _line += 1;
            _column = 0;
        }
        else {
            _column += 1;
        }
    }
    

    At the point where a \n is processed, it checks whether the previous character is carriage return (\r). If so, the line-number is not increased.

    Also, before it checks the previous character, it tests whether there is actually a previous character (and _position > 0).

    I've removed the int revue argument as I just noticed that what I wanted to achieve is not possible they way I tried to achieve it. I wanted to be able to go backwards in the source, but I can not retrieve the column-number from the previous line then.