I have a problem for RDF data representation. The table contains millions of rows and several thousands of subject_id
s. Here is a sample of table.
row_id subject_id DateTime
34951953 144 14/07/2016 22:00
34952051 145 14/07/2016 22:00
34951954 146 14/07/2016 22:00
34951976 144 15/07/2016 3:00
34952105 146 15/07/2016 3:00
34952004 144 15/07/2016 20:00
I have done simple 1:1 rdf mapping conversion like this using jena.
<foo/data/row_id=34951953> <foo/data/subject_id> "144"
<foo/data/row_id=34951954> <foo/data/subject_id> "146"
<foo/data/row_id=34951954> <foo/data/subject_id> "146"
<foo/data/row_id=34952051> <foo/data/subject_id> "145"
<foo/data/row_id=34951976> <foo/data/subject_id> "144"
<foo/data/row_id=34952105> <foo/data/subject_id> "146"
<foo/data/row_id=34952004> <foo/data/subject_id> "144"
<foo/data/row_id=34951953> <foo/data/DateTime> "14/07/2016 22:00:00"
<foo/data/row_id=34952051> <foo/data/DateTime> "14/07/2016 22:00:00"
<foo/data/row_id=34952054> <foo/data/DateTime> "14/07/2016 22:00:00"
<foo/data/row_id=34951976> <foo/data/DateTime> "15/07/2016 3:00:00"
<foo/data/row_id=34952105> <foo/data/DateTime> "15/07/2016 3:00:00"
<foo/data/row_id=34952004> <foo/data/DateTime> "15/07/2016 20:00:00"
Now, I want to add some temporal attributes like <time:before>
for all the subject_id
, i.e., for sequential information. Here are examples of what I want:
For subject_id = 144;
<foo/data/row_id=34951953> <time:before> <foo/data/row_id=34951976>
<foo/data/row_id=34951976> <time:before> <foo/data/row_id=34952004>
for subject_id = 146;
<foo/data/row_id=34951954> <time:before> <foo/data/row_id=34952105>
Can I explicitly add temporal relation, <time:before>
? Is there any better way to solve this kind of issue?
What
Obviously, you can use rdf:Seq
or rdf:List
. However, querying these structures is painful.
I suggest you to find appropriate ontology or vocabulary for this kind of time series, or to use your own lightweight vocabulary. Please note that time:
prefix is reserved by the Time ontology.
Let us assume that you use property named foo:before
.
How
You can add triples with this property in your RDF data using SPARQL:
INSERT
{
?row_1 foo:before ?row_2 .
}
WHERE {
?row_1 foo:subject ?subject .
?row_2 foo:subject ?subject .
?row_1 foo:time ?time_1 .
?row_2 foo:time ?time_2 .
FILTER (?time_1 > ?time_2)
FILTER NOT EXISTS {
?row_3 foo:subject ?subject .
?row_3 foo:time ?time_3 .
FILTER ((?time_1 < ?time_3) && (?time_3 < ?time_2))
}
}
Performance
Analogous query performs about 1 minute on my endpoint with 3000+ "subjects" and 60000+ "rows".
Probably your CSV table was exported from RDBMS, where you have all these data normalized. Then you could create SQL view with neighboring pairs of "rows" and export it or generate RDF triples using R2RML tools.
Another option is to sort/transform RDF file in some way and generate triples that you need with sed
, python
etc.
Update
Of course, your dates should be of type xsd:dateTime
, or at least should be comparable in a natural way.