Search code examples
graphannotationsontologyvaticle-typeql

How to include "citation" attributes/properties in graphs?


I am creating a domain-specific model which includes entities that have attributes whose original source or citation needs to be defined.

In graql for example:

define "country" sub entity has population; "evidence" sub attribute datatype string; "population" sub attribute datatype string has evidence;

This seems to define an attribute of an attribute, and conceptually seems to make the meaning of the attribute dependent on a certain context, which is arguably better modelled as annotated "fact" entities with relationships to other entities.

What is the simplest way to model attributes like these without increasing complexity of the model?


Solution

  • Attributes of attributes Attributes of attributes don't necessarily work as you might expect. It's important to remember that in Grakn there will be only one node in the graph for an attribute of a particular type with a particular value.

    That is to say an attribute of type population value sixty million will only occur once in the knowledge graph.

    If we change your schema slightly to add names for countries (also there's no need for single quotes around types):

    define
    country sub entity
        has population,
        has name;
    name sub attribute datatype string; 
        evidence sub attribute datatype string;
        population sub attribute datatype string
            has evidence;
    

    Then add two countries to the knowledge graph:

    insert $uk isa country, has name 'UK', has population $p; $p 'sixty million' has evidence 'journal';
    insert $fr isa country, has name 'France', has population $p; $p 'sixty million' has evidence 'wikipedia';
    commit;
    

    What we can see if we visualise it is that we can't tell the source of the population for each country separately, because both of the countries and both of the pieces of evidence are connected to the same population instance.

    Visualised in Grakn Workbase Visualiser

    (Visualised in Grakn Workbase Visualiser)

    Attributes of attributes make sense in a case like: attribute phrase value Hi there! owning an attribute language value English. That is, the language attribute is referring to the value of the phrase attribute.

    This means that if you want to record the source of an attribute you'll need to do things differently. I suggest three possible options. Note, that for each of the following three ideas population shouldn't own evidence for the reason mentioned. In the schema above population sub attribute datatype string has evidence; should become population sub attribute datatype string;

    1. Implicit relationships

    Under the hood Grakn has implicit relationships to implement attribute ownership, always autogenerated and prefixed with @has-, for example @has-population. We can attach attributes to these implicit relationships!

    First delete the instances we inserted above (this will delete all entities and attributes in the graph, beware!):

    match $x isa entity; $y isa attribute; delete $x, $y;
    

    Then define that the implicit population attribute can own evidence and add examples:

    define @has-population has evidence;
    
    insert $uk isa country, has name 'UK', has population $p via $r; $p 'sixty million'; $r has evidence 'journal';
    insert $fr isa country, has name 'France', has population $p via $r; $p 'sixty million'; $r has evidence 'wikipedia';
    

    enter image description here

    Now we're able to disambiguate the evidence for the UK's population from the evidence for France's population. We can query for this:

    match $c isa country, has name $n, has population $p via $r; 
    $p 'sixty million'; $r has evidence $e; get $n, $e;
    

    Result:

    {$n val "France" isa name; $e val "wikipedia" isa evidence;}
    {$n val "UK" isa name; $e val "journal" isa evidence;}
    

    2. Relationships to implicit relationship

    If the evidence is more complex than a single attribute, then it may be better modelled as a relationship, in which @has-population plays a role.

    define 
    information-sourcing sub relationship, 
        relates sourced-information, 
        relates information-source;
    
    @has-population plays sourced-information;
    
    publication sub entity, 
        plays information-source;
    
    insert $uk isa country, has name 'UK', has population $p via $r; $p 'sixty million'; $pub isa publication; $i(sourced-information: $r, information-source: $pub) isa information-sourcing;
    insert $uk isa country, has name 'France', has population $p via $r; $p 'sixty million'; $pub isa publication; $i(sourced-information: $r, information-source: $pub) isa information-sourcing;
    

    Link to implicit relationships via normal relationships

    3. A normal relationship

    Finally, you could create a relationship that links the population, country, and evidence, that avoids using implicit relationships if these seem too complex.

    Conclusion

    Which method to use depends on the domain you're modelling. In answer to your question, the first method adds the fewest additional elements to the schema.