I want to annotate a couple of XML-Files with the German STW Thesaurus for Economics. You can get the files here as ZIP-Archives in RDF/XML, N3 and Turtle (~14MB each).
So I wrote a Python-Script that deletes Stopwords, lemmatizes and does Part-of-Speech-Tagging. Now I want to check if a noun in one of the XML-Files is in the STW-Ontology. If yes, I'd like to do different options for a later to be done Automated Classification:
skos:altLabel
Word, replacing it with the skos:prefLabel
Wordskos:prefLabels
at the end of the file with a count of the appearances of the skos:prefLabel
and the associated skos:altLabels
skos:broader
to find e.g. the Economic sectors or the Commodities related to the skos:prefLabel
.I know GATE and Apolda, which are able to do this, but they're Java-based and I'd like to do everything from one Python-Script at the end.
Are there any suggestions?
I don't know if it's exactly what you are looking for but for working with RDF you have RDFLib.
You can get more guidance in the tools/libraries pointed in this answer or here.
Hope this can help! :)