Search code examples
javaphphtmlrdfrdfa

How can I extract RDFa from HTML using PHP or Java?


I am a newbie, trying to learn about RDF, RDFa and stuffs related to it since few days..

My question is, consider following HTML + RDFa code .. is it possible to extract the RDF part separately? if so could you please demonstrate simple code snippet (PHP or Java)..

i have heard Jena could be used, but couldn't find a tutorial which explains this. So if it is possible with Jena could anyone post some code snippet please..

<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
version="XHTML+RDFa 1.0" xml:lang="en">
  <head>
    <title>John's Home Page</title>
    <base href="http://example.org/john-d/" />
    <meta property="dc:creator" content="Jonathan Doe" />
    <link rel="foaf:primaryTopic" href="http://example.org/john-d/#me" />
  </head>
  <body about="http://example.org/john-d/#me">
    <h1>John's Home Page</h1>
    <p>My name is <span property="foaf:nick">John D</span> and I like
      <a href="http://www.neubauten.org/" rel="foaf:interest"
        xml:lang="de">Einstürzende Neubauten</a>.
    </p>
    <p>
      My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
      book is the inspiring <span about="urn:ISBN:0752820907"><cite
      property="dc:title">Weaving the Web</cite> by
      <span property="dc:creator">Tim Berners-Lee</span></span>
     </span>
    </p>
  </body>
</html>

Solution

  • Yes, you can extract the RDF from the pages containing RDFa markup, and once extracted, you can put it into a local RDF triplestore if you want to do some stuff w/ that data alone, or you could insert it into a global triplestore and be able to query it alongside existing RDF data.

    Here is a relevant discussion on Java RDFa parsers.