Search code examples
sparqlrdfknowledge-capture

What's the simplest way to represent a grid/matrix in RDF?


I'm not sure how to phrase this question, so if there are better terms or existing answers point me that way! This is my first time designing anything with RDF.

I'm making a small personal knowledgebase to track items in lab, and am unsure how to best encode 2D locations. The only idea I've come up with so far is to make everything a container. For example if I have a 96-well plate, it would be one big container with 12 columns and 8 rows, and each of those would be containers with wells in them, and each well is a container that holds something I'm interested in tracking.

Seems flexible enough to handle lots of real situations, but querying it is kind of cumbersome. To get the strain in well B7 of plate p0001, it would be something like: "describe strain s which is in well w, which is in row r and also in column c, where r and c are in plate p, and p is labeled p0001, and c is labeled 7 and r is labeled B" (Excuse the horrible pseudo-SPARQL)

Is there an easier way? I imagine this comes up in a lot of business contexts involving inventory so people have probably figured it out.

The other thing I'm unsure about is encoding the indexes themselves. Should I just tag them on as literals?

EDIT: The plates look like this.


Solution

  • This may be too broad for a proper answer, but I think there are a few options. I'll start with the ones that are actually about encoding grids, but end with what I think is actually the most appropriate.

    Encode structures with all their array indices

    Containers in RDF, except for lists and structures analogous to them, don't get ordered storage. RDF is just a set of triples. That means that if you want to maintain any kind of index-based reference, then you'll need to encode it directly. That's not too hard. Suppose we have an array like

    [[a, b, c],
     [d, e, f]]
    

    Then we can easily do something like:

    @prefix : <urn:ex:>
    
    :array :hasElement [ :value :a ; :row 0 ; :column 0 ] ,
                       [ :value :b ; :row 0 ; :column 1 ] ,
                       [ :value :c ; :row 0 ; :column 2 ] ,
                       [ :value :d ; :row 1 ; :column 0 ] ,
                       [ :value :e ; :row 1 ; :column 1 ] ,
                       [ :value :f ; :row 1 ; :column 2 ] .
    

    Then you can easily use a SPARQL query like:

    prefix : <urn:ex:>
    
    select ?value where {
      :array :hasElement [ :value ?value ; :row 1 ; :column 2 ]
    }
    

    Encode structure with implicit indices

    You can also use structures like RDF lists (which are singly linked lists) and find the elements by position, in the same way that you can compute the position of elements in a list. I've described this in my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL? However, that's probably going to be rather inefficient, and I doubt that you want to do that.

    Encode structure with the underlying semantics

    However, if you have a table or grid of data, the rows and columns probably actually mean something; it's probably not just an grid of values. In that case, you can probably represent the data in a more semantically meaningful way. For instance, if you have a table like:

    Name    Age    Height
    ---------------------
    John     45        78
    Mary     30        60
    Susan    25        59
    

    Then a "conventional" way to represent this is with an individual for each row that has properties corresponding to each column:

    :row1 a :Row ; :name "John"  ; :age 45 ; :height 78 .
    :row2 a :Row ; :name "Mary"  ; :age 30 ; :height 60 .
    :row3 a :Row ; :name "Susan" ; :age 25 ; :height 59 .
    

    That's more or less the approach given in Defining N-ary Relations on the Semantic Web, if you treat each row as an instance of a relation. A Direct Mapping of Relational Data to RDF is also very relevant.

    For your use case

    Since your use case (I had to look up what a "well plate" is), it seems that you may actually want those numeric indices, so some mix of the first and third approaches may be what you want.

    Seems flexible enough to handle lots of real situations, but querying it is kind of cumbersome. To get the strain in well B7 of plate p0001, it would be something like: "describe strain s which is in well w, which is in row r and also in column c, where r and c are in plate p, and p is labeled p0001, and c is labeled 7 and r is labeled B" (Excuse the horrible pseudo-SPARQL)

    I don't think that this is all that cumbersome. Depending on how you label your columns and rows, it can be something like:

    select ?strain where {
      ?plate rdfs:label "p0001" ;
             :hasWell [ :row "7" ;             #-- or :row/rdfs:label "7", or ...
                        :col "B" ;             #-- or :col/rdfs:label "B", or ...
                        :contains ?strain ] .
    }