Search code examples
rdfsparql

SPARQL query. OPTIONAL subquery and HAVING MIN MAX values


I have some trouble in making SPARQL query for my DB.

There are individuals (materials) in classes (material groups). Every material have some properties like density, tensile strength etc., but not all materials have all of them.

I need to make a query where I will enter range values for each property, and if material have that property, compare my range of values with material's range of values.

I did that with subqueries, and HAVING becouse each property can have more values linked by blank node, but it doesn't work.

I would like to have results like:

material | propertyLabel | minValue | maxValue

mat01 | tensile strength | 250 | 250

mat02 | tensile strength | 255 | 280

mat01 | density | 4800 | 4900

mat02 | density | 5000 | 5010


and my query looks like this:

 SELECT DISTINCT ?material ?propertyLabel ?minValue ?maxValue WHERE {

?material rdf:type/rdfs:subClassOf* res:Material . #Get all materials

{ OPTIONAL {
 SELECT ?material ?propertyLabel (MIN(?value) AS ?minValue) (MAX(?value) AS ?maxValue) WHERE {
 ?material pro:hasDensity [ unit:hasValue ?value ; unit:hasUnit ?unit ] . 
  pro:hasDensity rdfs:label ?propertyLabel .
 } GROUP BY ?material ?propertyLabel
   HAVING ((MAX(?value) > 4500) && (MIN(?value) < 10000))
} }

{ OPTIONAL {
 SELECT ?material ?propertyLabel (MIN(?value) AS ?minValue) (MAX(?value) AS ?maxValue) WHERE {
 ?material pro:hasUltimateTensileStrength [ unit:hasValue ?value ; unit:hasUnit ?unit ] . 
  pro:hasUltimateTensileStrength rdfs:label ?propertyLabel .
 } GROUP BY ?material ?propertyLabel
   HAVING ((MAX(?value) > 50) && (MIN(?value) < 300))
} }

# Other properties...

}

ADDITIONAL INFO

Ok, here comes cleaner code and clarification...

Yes, this answer with VALUES now works, but...

It outputs materials that have value for even one property and that property, but what i need is materials that satisfy all properties that they have IF they have it.

So let's say we have materials:

res:m1
res:m2
res:m3

and material properties:

pro:hasYieldStrenght
pro:hasMeltingPoint
Pro:hasDensity

that properties are linked trough blank node to their values and units (for a range).

material 1 has some property 1 and that property can have multiple values (1, 2, 3.. Thats why I was using MIN and MAX) like this:

res:m1 pro:hasYieldStrength 
[ unit:hasValue "100"^^xsd:double ; unit:hasUnit unit:MegaPascal ] ,
[ unit:hasValue "133"^^xsd:double ; unit:hasUnit unit:MegaPascal ] ;

Some materials don't have some properties (they are unknown in real world or they are not in db).

Now, how I want to search materials is let's say I want all materials with: pro:hasDensity 2000 - 3000 pro:hasMeltingPoint 250 - 400

So i need range intersection of mine search values and min and max in db for particular property.

That's why i have:

HAVING ((MAX(?value) > ?low) && (MIN(?value) < ?high))

And all of that is working if I set only one property on input.

But when I want to search for multiple intersections (properties), result contains property per material, So let's say that material only satisfy one of the properties, then that property and material would be in results.

What I need as result is materials that satisfy ALL properties IF they have it .

If material don't have value for some property (nobody measure it let's say) and satisfy all other, it should be considered in results with or without empty values on that property, but if material fail to satisfy even one property then it should not be considered in results, not only for that property, but in general, that material.

That's why i was thinking about OPTIONAL but i'm loosing it allready because of that bug with values that i spent two days :p

here is new code:

SELECT ?m ?p (MAX(?value) AS ?maxValue) (MIN(?value) AS ?minValue) WHERE {
VALUES (?p ?low ?high)
{
  ( pro:hasDensity 1 300 )
  ( pro:hasYieldStrength 30 300 )
  ( pro:hasMeltingPoint 100 350 )
}
?m ?p [ unit:hasValue ?value ; unit:hasUnit ?unit ] 
} GROUP BY ?m ?p ?low ?high 
  HAVING ((MAX(?value) > ?low) && (MIN(?value) < ?high))

It's like it doesn't even consider density as none of them have density that low, and this is result:

m   p   maxValue    minValue
#m74653522  #hasYieldStrength   110.0   100.0
#m36767231  #hasYieldStrength   262.0   220.0
#Phosphorus #hasMeltingPoint    317.0   317.0

On the other hand, there are materials with that low density but it doesn't consider them couse other properties are not satisfied. (like it's the problem in first row of properties I don't know..)


Solution

  • Some data, please…

    Without seeing your data, it's hard to say exactly what the problem is, but it sounds like you have data something like the following, where each material has several values for a couple of properties. In this case, two materials, m1 and m2, have several properties each for properties p1 and p2 (which also have labels):

    @prefix : <urn:ex:>
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    :p1 rdfs:label "P one" .
    :p2 rdfs:label "P two" .
    
    :m1 :p1 10, 12, 14, 16 ; 
        :p2 90, 100, 110, 120 .
    :m2 :p1 11, 13, 15, 17 ;
        :p2 95, 105, 115, 125 .
    

    A query

    It sounds like you want to specify lower and upper bounds for each property, and then for each property and pair of bounds, retrieve all values of the property within the bounds for each instance, and record the maximum and minimum value for the property for each material. The following query does that by specifying the properties and bounds in a values block. (See the appendix if you're not familiar with values blocks.) The results follow.

    prefix : <urn:ex:>
    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    select ?material
           ?pLabel
           (min(?value) as ?min)
           (max(?value) as ?max)
    where {
      values (?p ?lowerBound ?upperBound) {
        (:p1 12 15)
        (:p2 97 116)
      }
      ?material ?p ?value .
      ?p rdfs:label ?pLabel .
      filter ( ?lowerBound <= ?value && ?value <= ?upperBound )
    }
    group by ?material ?pLabel
    order by ?pLabel ?material
    
    ----------------------------------
    | material | pLabel  | min | max |
    ==================================
    | :m1      | "P one" | 12  | 14  |
    | :m2      | "P one" | 13  | 15  |
    | :m1      | "P two" | 100 | 110 |
    | :m2      | "P two" | 105 | 115 |
    ----------------------------------
    

    Appendix A: using values to simply unions

    In SPARQL 1.1, the values block lets you replace, e.g.,

    { ?s :p1 :o1 }
    union 
    { ?s :p2 :o2 }
    

    with

    values (?p ?o) {
      (:p1 :o1)
      (:p2 :o2)
    }
    ?s ?p ?o
    

    Appendix B: If you can't use values

    Here's the equivalent query using union rather than values:

    prefix : <urn:ex:>
    prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    select ?material
           ?pLabel
           (min(?value) as ?min)
           (max(?value) as ?max)
    where {
      {
        ?material :p1 ?value .
        :p1 rdfs:label ?pLabel .
        filter ( 12 <= ?value && ?value <= 15 )
      }
      union 
      {
        ?material :p2 ?value .
        :p2 rdfs:label ?pLabel .
        filter ( 97 <= ?value && ?value <= 116 )
      }
    }
    group by ?material ?pLabel
    order by ?pLabel ?material