I've been wondering about extracted text's encoding using IFilter
.
IFilter::GetText()
retrieves WCHAR*
, but what if the file is encoded with ASCII? What about other Unicode encoding (such as UTF-8 or UTF-16?)?
As I see it, it's either IFilter taking care of converting the extracted text to a single encoding (if it is the case - what is this encoding?), and if not, how do I know which encoding is it?
The output text is UTF-16 (everything in Windows that uses WCHAR
is UTF-16). There is no way to query the encoding of the input data, you would have to analyze that data yourself if needed.