Search code examples
sequencerdftemporal

How to represent temporal relation like <time:before> in RDF?


I have a problem for RDF data representation. The table contains millions of rows and several thousands of subject_ids. Here is a sample of table.

row_id      subject_id    DateTime
34951953    144           14/07/2016 22:00
34952051    145           14/07/2016 22:00
34951954    146           14/07/2016 22:00    
34951976    144           15/07/2016 3:00
34952105    146           15/07/2016 3:00
34952004    144           15/07/2016 20:00

I have done simple 1:1 rdf mapping conversion like this using jena.

<foo/data/row_id=34951953>  <foo/data/subject_id>   "144"
<foo/data/row_id=34951954>  <foo/data/subject_id>   "146"
<foo/data/row_id=34951954>  <foo/data/subject_id>   "146"
<foo/data/row_id=34952051>  <foo/data/subject_id>   "145"
<foo/data/row_id=34951976>  <foo/data/subject_id>   "144"
<foo/data/row_id=34952105>  <foo/data/subject_id>   "146"
<foo/data/row_id=34952004>  <foo/data/subject_id>   "144"
<foo/data/row_id=34951953>  <foo/data/DateTime> "14/07/2016 22:00:00"
<foo/data/row_id=34952051>  <foo/data/DateTime> "14/07/2016 22:00:00"
<foo/data/row_id=34952054>  <foo/data/DateTime> "14/07/2016 22:00:00"
<foo/data/row_id=34951976>  <foo/data/DateTime> "15/07/2016 3:00:00"
<foo/data/row_id=34952105>  <foo/data/DateTime> "15/07/2016 3:00:00"
<foo/data/row_id=34952004>  <foo/data/DateTime> "15/07/2016 20:00:00"

Now, I want to add some temporal attributes like <time:before> for all the subject_id, i.e., for sequential information. Here are examples of what I want:

For subject_id = 144;

<foo/data/row_id=34951953> <time:before> <foo/data/row_id=34951976>
<foo/data/row_id=34951976> <time:before> <foo/data/row_id=34952004>

for subject_id = 146;

<foo/data/row_id=34951954> <time:before> <foo/data/row_id=34952105>

Can I explicitly add temporal relation, <time:before>? Is there any better way to solve this kind of issue?


Solution

  • What

    Obviously, you can use rdf:Seq or rdf:List. However, querying these structures is painful.

    I suggest you to find appropriate ontology or vocabulary for this kind of time series, or to use your own lightweight vocabulary. Please note that time: prefix is reserved by the Time ontology.

    Let us assume that you use property named foo:before.

    How

    You can add triples with this property in your RDF data using SPARQL:

    INSERT
    {
    ?row_1 foo:before ?row_2 .
    }
    WHERE {
        ?row_1  foo:subject ?subject .
        ?row_2  foo:subject ?subject .
        ?row_1  foo:time ?time_1 .
        ?row_2  foo:time ?time_2 .
        FILTER (?time_1 > ?time_2)
        FILTER NOT EXISTS {
            ?row_3  foo:subject ?subject .
            ?row_3  foo:time ?time_3 .
            FILTER ((?time_1 < ?time_3) && (?time_3 < ?time_2))
        }
    }
    

    Performance

    Analogous query performs about 1 minute on my endpoint with 3000+ "subjects" and 60000+ "rows".

    Probably your CSV table was exported from RDBMS, where you have all these data normalized. Then you could create SQL view with neighboring pairs of "rows" and export it or generate RDF triples using R2RML tools.

    Another option is to sort/transform RDF file in some way and generate triples that you need with sed, python etc.

    Update

    Of course, your dates should be of type xsd:dateTime, or at least should be comparable in a natural way.