I am running this query to get list of all compounds from the DBPedia public SPARQL endpoint.
SELECT * WHERE {
?y rdf:type dbpedia-owl:Drug.
?y rdfs:label ?Name .
OPTIONAL {?y dbpedia-owl:iupacName ?iupacname} .
OPTIONAL {?y dcterms:subject ?y1}
FILTER (langMatches(lang(?Name),"en"))
}
LIMIT 50000
I am downloading in batches of 50000 (2 files) using offset
parameter.
Somehow Isopropyl_alcohol is not getting covered in this even where page exists at
and it has the properties that I am searching for?
There are two issues here. The first is that DBpedia Live and DBpedia do not have exactly the same content. According to the DBpedia live webpage
Wikipedia users constantly revise Wikipedia articles with updates happening almost each second. Hence, data stored in the official DBpedia endpoint can quickly become outdated, and Wikipedia articles need to be re-extracted. DBpedia Live enables such a continuous synchronization between DBpedia and Wikipedia.
That page also lists two SPARQL endpoints for DBpedia Live:
However, you'll run into issues on both. Isopropyl_alcohol is in DBpedia, and its URI is
Looking there, we see that Isopropyl alcohol doesn't have rdf:type dbpedia-owl:Drug
, but only
so you won't be able to find it with your query on DBpedia, because it doesn't have the type `dbpedia-owl:Drug. Now, Isopropyl_alcohol also exists in DBpedia live, and its URL is
but it only has the folllowing rdf:type
s:
so it won't be found by your query on DBpedia Live, for the same reason.
The second issue is the one that AndyS pointed out. Even if the query would select Isopropyl_alcohol in DBpedia or DBpedia Live, unless you provide an ordering constraint, the limit/offset
combination won't be guaranteed to return it, since without an ordering constraint, the server could legitimately return the same set of 50000 results to you every time.