Search code examples
encodingurirdfutfredland

What's the character encoding for string representations of URIs in Redland RDF?


Is it safe to assume that strings returned by librdf_uri_as_string () use UTF-8 encoding. Or is it perhaps ISO-Latin (with additional URL encoding)?

I am dealing with URIs in an librdf_model that was read with librdf_parser_parse_file_handle_into_model () from a FILE *. Would it help if I switch to a raptor_parser (and perhaps raptor_iostream instead of FILE * as well)? The Raptor documentation specifically mentions UTF-8.

Is librdf_parser just a wrapper for raptor_parser and the answer is UTF-8 because of that?


Solution

  • librdf if one abstraction level up from Raptor. Basically, librdf is an application-level library that wraps Raptor, Rasqal and RDF storages together. If you're working on librdf level, you should basically be using librdf APIs only, though there are some leaky abstractions here and there.

    Generally in the API, when you see strings passed in as (const) unsigned char *, it's UTF-8 all the way down. Only some identifiers such as syntax names are passed as (const) char * and they are ASCII.

    Disclosure/caveat: I am a committer in Redland projects but haven't been actively working on them in recent years. I used to know the internals well but my memory isn't perfect.

    To answer the specific questions:

    Is it safe to assume that strings returned by librdf_uri_as_string () use UTF-8 encoding.

    Yes.

    Would it help if I switch to a raptor_parser (and perhaps raptor_iostream instead of FILE * as well)?

    No, there's practically no difference. Just negligible amount of less wrapper code and a slightly different API.

    Is librdf_parser just a wrapper for raptor_parser and the answer is UTF-8 because of that?

    Yes.