Search code examples
facebook-opengraphsemantic-websemantic-markupmicroformatsrdfs

Tool that would show all semantic data contained in a given web page


I'm looking for a web service, browser extension, or anything else that directly extracts any and all semantic data contained in a given web page, as long as that semantic data is following any of the myriad of modern standards used for embedding semantic information inside web pages. Somehow I couldn't find anything that works. I could find many 'semantic crawlers' but no tool that just shows what semantic data you have at hand on a given web page.

I'd be very glad getting pointers to any such tool, if one exists out there. I can't fathom how people debug or develop their semantic harvesters without it.......

I listed some of the relevant standards as the tags for this question (see question's tags which usually show here below) but this list is not to be taken as exhaustive.

Thanks!


Solution

  • For some good starting points, you might consider:

    Sindice is perhaps the most general of these, most of the others focus on RDFa (my own bias, sorry). Your choice might depend a bit on what you consider semantic data (e.g. do you want HTML5 semantics like <title> to count? For just RDFa I have found Apache's Any23 best for my needs, with nice API, flexible formats and accurate extraction.

    Good question though, I'd be curious to see what tools others most recommend. W3C has a longer list that may be slightly dated.