Search code examples
javasemantic-webowlprotegeowl-api

Generating unique IRI from a filename


I have an ontology, created using Protegé 4.3.0, and I would use the OWL-API in order to add some OWLNamedIndividual objects to a file OWL. I use the following instruction in order to create a new OWLNamedIndividual:

OWLNamedIndividual objSample = df.getOWLNamedIndividual(IRI.create(iri + "#" + id));
  • the variable id is a String;
  • iri is the base IRI of the loaded ontology; in order to get the base IRI of the ontology, I used the following instruction: iri = ontology.getOntologyID().getOntologyIRI().

So the new OWLNamedIndividual is added to the loaded ontology and then the ontology is saved to OWL file using the following instruction.

XMLWriterPreferences.getInstance().setUseNamespaceEntities(true);
OWLOntologyFormat format = manager.getOntologyFormat(ontology);
manager.saveOntology(ontology, format, IRI.create(file.toURI()));

The variable id is a String generated from the base name of a file (ie. the file name without the extension). If the base name of the file has one or more spaces in the name, the ontology is saved without any error, but when I open the newly saved OWL file, Protegé reports a parsing error at the first occurrence of the IRI containing spaces.

How could I create a valid IRI for an OWLNamedIndividual object using the base IRI of loaded ontology and the base name of a file?


Solution

  • IRIs are suppose to be a block that represents your resource. If I understand you correctly you have an id such as big boat and you are creating IRIs that look like <http://example.com#big boat>. This is not a valid IRI, and you need to replace the space with an _ or a -, such that you have <http://example.com#big_boat>. Even if you enter a modelling element name with a space in Protégé, it automatically will put a _ in the middle.

    Take a look at the this article for the invalid characters in an IRI.

    Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, "{", "}", "|", "\", "^", and "`", in step 2 above. If these characters are found but are not converted, then the conversion SHOULD fail.