I am currently using the ICU library for parsing some Unicode text in C++. The parser may fail, so when it fails I need a rollback. For example, we might want to match a sequence aaab
, but after the aaa
we get a c
, then the whole matching fails, and logically we should roll back to just before the first a
, and prepare for a next matching.
I know when we are using a FILE *
in <stdio.h>
, we can just seek the file pointer to a place we save in advance.
FILE* file = fopen("...", "r");
long pos = ftell(file);
// ... read some characters from (FILE *) file
fseek(file, pos, SEEK_SET);
I tried this in ICU
, using the u_fgetfile
function to get a FILE *
from UFILE
, and seek the file pointer in that FILE *
.
UFILE* file = u_fopen("...", "r", nullptr, nullptr);
FILE* internal_file = u_fgetfile(file);
long pos = ftell(internal_file);
// ... read some characters from (UFILE *) file
fseek(file, pos, SEEK_SET);
But in my test cases, it turns out that the file pointer (returned from ftell
) is always at the end of the file. Since the file I tested with is a rather small file (only 16 characters is in the file), I guess that ICU
has read the file before I required it to read, and then cached the result, so the file pointer in FILE
is not synchronized with the current position I am reading from.
Also, the documentation of ICU
said that
The FILE must not be modified or closed
So I guess I am not allowed to seek the file pointer of the FILE
.
It is quite difficult to keep track of all the characters I read from the UFILE
, for it is scattered in tens of functions. So I cannot think of a way to use u_fungetc
since it requires me to know what character I want to put back. Also, calling u_fungetc
means linear time for rolling back, so I wonder whether there is a faster method.
So is it possible to somehow save the status of a UFILE
, and recover to that state after reading some characters from it?
Seems that nobody is going to answer this question, so I am here to give my temporary solution, in case that anyone might need it.
I will just use a ring buffer for cache, and use the cache instead of directly using the UFILE
.