Search code examples
c++fileioicu

Get a UFILE back to one of its previous state


I am currently using the ICU library for parsing some Unicode text in C++. The parser may fail, so when it fails I need a rollback. For example, we might want to match a sequence aaab, but after the aaa we get a c, then the whole matching fails, and logically we should roll back to just before the first a, and prepare for a next matching.

I know when we are using a FILE * in <stdio.h>, we can just seek the file pointer to a place we save in advance.

FILE* file = fopen("...", "r");
long pos = ftell(file);
// ... read some characters from (FILE *) file
fseek(file, pos, SEEK_SET);

I tried this in ICU, using the u_fgetfile function to get a FILE * from UFILE, and seek the file pointer in that FILE *.

UFILE* file = u_fopen("...", "r", nullptr, nullptr);
FILE* internal_file = u_fgetfile(file);
long pos = ftell(internal_file);
// ... read some characters from (UFILE *) file
fseek(file, pos, SEEK_SET);

But in my test cases, it turns out that the file pointer (returned from ftell) is always at the end of the file. Since the file I tested with is a rather small file (only 16 characters is in the file), I guess that ICU has read the file before I required it to read, and then cached the result, so the file pointer in FILE is not synchronized with the current position I am reading from.

Also, the documentation of ICU said that

The FILE must not be modified or closed

So I guess I am not allowed to seek the file pointer of the FILE.

It is quite difficult to keep track of all the characters I read from the UFILE, for it is scattered in tens of functions. So I cannot think of a way to use u_fungetc since it requires me to know what character I want to put back. Also, calling u_fungetc means linear time for rolling back, so I wonder whether there is a faster method.

So is it possible to somehow save the status of a UFILE, and recover to that state after reading some characters from it?


Solution

  • Seems that nobody is going to answer this question, so I am here to give my temporary solution, in case that anyone might need it.

    I will just use a ring buffer for cache, and use the cache instead of directly using the UFILE.