I'm writing a simple wrapper-class for scanning a stream of characters character-by-character.
Scanner scanner("Hi\r\nYou!");
const char* current = scanner.cchar();
while (*current != 0) {
printf("Char: %d, Column: %d, Line: %d\n", *current, scanner.column(), scanner.line());
current = scanner.read();
}
C:\Users\niklas\Desktop>g++ main.cpp -o main.exe
C:\Users\niklas\Desktop>main.exe
Char: 72, Column: 0, Line: 0
Char: 105, Column: 1, Line: 0
Char: 13, Column: 0, Line: 1
Char: 10, Column: 0, Line: 2
Char: 89, Column: 1, Line: 2
Char: 111, Column: 2, Line: 2
Char: 117, Column: 3, Line: 2
Char: 33, Column: 4, Line: 2
This example already shows the problem I'm stuck with. One can interpret \r
as a new-line, as well as \n
. But together (\r\ n
) they are just a single new-line as well!
The function that processes line- and column-numbers is this:
void _processChar(int revue) {
char chr = _source[_position];
if (chr == '\r' or chr == '\n') {
_line += revue;
_column = 0;
}
else {
_column += revue;
}
}
Sure, I could just look at the character that appears after the character at the current position, but: I do not check for NULL-termination on the source because I want to be able to process character streams that may contain \0
characters without being terminated at that point.
How can I handle CRLF this way?
Edit 1: DOH! This seems to be working fine. Is this safe in any case or do I have an issue somewhere?
void _processChar(int revue) {
char chr = _source[_position];
bool is_newline = (chr == '\r' or chr == '\n');
if (chr == '\n' and _position > 0) {
is_newline = (_source[_position - 1] != '\r');
}
if (is_newline) {
_line += revue;
_column = 0;
}
else {
_column += revue;
}
}
Thanks!
This seems legit to me:
void _processChar() {
char chr = _source[_position];
// Treat CRLF as a single new-line
bool is_newline = (chr == '\r' or chr == '\n');
if (chr == '\n' and _position > 0) {
is_newline = (_source[_position - 1] != '\r');
}
if (is_newline) {
_line += 1;
_column = 0;
}
else {
_column += 1;
}
}
At the point where a \n
is processed, it checks whether the previous character is carriage return (\r
). If so, the line-number is not increased.
Also, before it checks the previous character, it tests whether there is actually a previous character (and _position > 0
).
I've removed the int revue
argument as I just noticed that what I wanted to achieve is not possible they way I tried to achieve it. I wanted to be able to go backwards in the source, but I can not retrieve the column-number from the previous line then.