Search code examples
cutf-8libxml2

libxml2 htmlSaveFileEnc saves utf8 chars as Г


I am trying to save utf8 encoded html with libxml2, it works fine, but non ascii characters saved as Г. Code used to save file:

htmlSaveFileEnc("modified.html", docPtr, "utf8");

How can I prevent this and save it as

Г

utf8 character?


Solution

  • As a workaround use htmlDocContentDumpOutput() function. Dump document content to char buffer and write the buffer to file.

    //htmlSaveFileEnc("modified.html", docPtr, "utf8");
    xmlOutputBufferPtr out = xmlAllocOutputBuffer(NULL);
    if (out) {
      htmlDocContentDumpOutput(out, docPtr, "utf8");
      const xmlChar *buffer = xmlBufferContent((xmlBuffer *) out->buffer);       
      // write buffer to file
      FILE *file = fopen("modified.html", "w");
      fputs((char *) buffer, file);
      fclose(file);
    
      xmlOutputBufferClose(out);
    }