Search code examples
pythonalignmentrdflib

how to load Alignment format in python?


Is there a way to load alignment file to python. If I have file like this:

<?xml version='1.0' encoding='utf-8' standalone='no'?>
<rdf:RDF xmlns='http://knowledgeweb.semanticweb.org/heterogeneity/alignment#'
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:xsd='http://www.w3.org/2001/XMLSchema#'
xmlns:align='http://knowledgeweb.semanticweb.org/heterogeneity/alignment#'>
<Alignment>
<map>
      <Cell>
          <entity1 rdf:resource="http://linkeddata.uriburner.com/about/id/entity//www.last.fm/music/Catie+Curtis"></entity1>
          <entity2 rdf:resource="http://discogs.dataincubator.org/artist/catie-curtis"></entity2>
        <relation>=</relation>
        <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.0</measure>
      </Cell>
    </map>
<map>
      <Cell>
          <entity1 rdf:resource="http://linkeddata.uriburner.com/about/id/entity//www.last.fm/music/Bigelf"></entity1>
          <entity2 rdf:resource="http://discogs.dataincubator.org/artist/bigelf"></entity2>
        <relation>=</relation>
        <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.8</measure>
      </Cell>
    </map>
<map>
      <Cell>
          <entity1 rdf:resource="http://linkeddata.uriburner.com/about/id/entity//www.last.fm/music/%C3%81kos"></entity1>
          <entity2 rdf:resource="http://discogs.dataincubator.org/artist/%C3%81kos"></entity2>
        <relation>=</relation>
        <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.9</measure>
      </Cell>
    </map>
</Alignment>
</rdf:RDF>

I want to keep confidence value as well as triple: Subject:http://linkeddata.uriburner.com/about/id/entity//www.last.fm/music/Catie+Curtis Predicate:owl:SameAs Object:http://discogs.dataincubator.org/artist/catie-curtis Confidence:1.0

I was trying to do it with RDFlib, but did not managed to. Any suggestions will help, thanks!


Solution

  • Try with Redland libraries: http://librdf.org/docs/python.html

    import RDF
    parser = RDF.Parser(name="rdfxml")
    model = RDF.Model()
    parser.parse_into_model(model, "file:./align.rdf", None)
    

    And then query the model variable. For example, in order to retrieve all the alignments and return their measure, the query is the following:

    for statement in RDF.Query("SELECT ?a ?m WHERE {?a a <http://knowledgeweb.semanticweb.org/heterogeneity/alignment#Cell> ; <http://knowledgeweb.semanticweb.org/heterogeneity/alignment#measure> ?m. }",query_language="sparql").execute(model):
    print "cell: %s measure:%s"%(statement['a'],statement['m'])
    

    The result will contain an iterator of dictionary objects (variable name , result) and it will be printed out as follows:

    cell: (r1301329275r1126r2) measure:1.0^^<http://www.w3.org/2001/XMLSchema#float>
    cell: (r1301329275r1126r3) measure:0.8^^<http://www.w3.org/2001/XMLSchema#float>
    cell: (r1301329275r1126r4) measure:0.9^^<http://www.w3.org/2001/XMLSchema#float>
    

    APIs in python for retrieving Nodes content can be retrieved here: http://librdf.org/docs/python.html For an overview of the SPARQL query language you can read this: http://www.w3.org/TR/rdf-sparql-query/