Search code examples
phprdffreebaselarge-data

Parse 88 GB rdf with PHP


How can I parse an 88 GB RDF file with PHP?

This RDF is filled with entities and facts about each entity.

I'm trying to iterate through each entity and check for certain facts per each entity. Then write those facts to an XML document I created earlier in the script.

So as I am navigating the rdf, per each entity I create a <card></card> element and give it a child called <facts>. I run through all the facts on the entity and I take the ones I need and write them inside and as <fact></fact> element children inside the <facts></facts>.

How can I parse the rdf, extract the data, and write it to XML?


Solution

  • First, use an RDF parser. Googling for a PHP RDF parser turned up lots of results; I dont use PHP personally, but I'm sure one of them will do the job of parsing RDF. But make sure it's a streaming parser, you're not going to hold 88G of RDF in memory on your workstation.

    Second, you said you need to 'iterate through each entity' that might be tricky if either they're not sorted by subject in the original file, or the parser does not report them in the same order.

    Assuming that is not a problem, then you can just keep the triples for each subject in a local data structure, and when you get a triple w/ a subject different than the ones you've queued locally, do whatever business logic you need and write out the XML. Might want to make sure you can't queue up so many statements locally that you'll OOM.

    Lastly, I'm going to assume you have a good reason to take RDF and turn it into an XML format that is not RDF/XML. But I you might reconsider your design just in case.

    Or you could put the data in an RDF database and write SPARQL queries against it, transforming query results into whatever XML or anything else you need.