Search code examples
xsdrdf

How to represent a missing xsd:dateTime in RDF?


I have a dataset with data collected from a form that contains various date and value fields. Not all fields are mandatory so blanks are possible and in many cases expected, like a DeathDate field for a patient who is still alive.

How do I best represent these blanks in the data?

I represent DeathDate using xsd:dateTime. Blanks or empty spaces are not allowed. All of these are flagged as invalid when validating using Jena RIOT:

foo:DeathDate_1
    a foo:Deathdate ;
    time:inXSDDatetime  " "^^xsd:dateTime .

foo:DeathDate_2
    a                   foo:Deathdate ;
    time:inXSDDatetime  ""^^xsd:dateTime .

foo:DeathDate_3
    a                   foo:Deathdate ;
    time:inXSDDatetime  "--"^^xsd:dateTime .

I prefer to not omit the triple because I need to know if it was blank on the source versus a conversion error during construction of my RDF.

What is the best way to code these missing values?


Solution

  • You should represent this by just omitting the triple. That's the meaning of a triple that's "not present": it's information that is (currently) unknown.

    Alternatively, you can choose to give it the value "unknown"^^xsd:string when there's no death date. The solution in this case is to not datatype it as an xsd:dateTime, but just as a simple string. It doesn't have to be a string of course, you could use any kind of "special" value for this, e.g. a boolean false - just as long as it's a valid literal value that you can distinguish from actual death dates. This will solve the parsing problem, but IMHO if you do this, you are setting yourself up for headaches in processing the data further down the line (because you will need to ask queries over this data, and they will have to take two different types of values into account, plus the possibility that the field is missing).

    I prefer to not omit the triple because I need to know if it was blank on the source versus a conversion error during construction of my RDF.

    This sounds like an XY problem. If there are conversion errors, your application should signal that in another way, e.g. by logging an error. You shouldn't try to solve this by "corrupting" your data.