Search code examples
sparqlrdfjenavirtuosofuseki

When triple count is very large, why is sparql federated query so slow, but local query so fast?


I set up SPARQL endpoints on several linux servers (RDF database: fuseki 4.4.0, Number of triples: 6,000,000), and then queried several SPARQL endpoints through SPARQL Federated Query.

Results: sparql federated query is so slow, but local query so fast.

Sparql federated query (very slow: Several hours passed and there was no response):

SELECT * WHERE {
    {
        SERVICE SILENT <fuseki endpoint 1> {
            SELECT * WHERE {
                ?s ?p ?o .
            }
        }
    }
    UNION
    {
        SERVICE SILENT <fuseki endpoint 2> {
            SELECT * WHERE {
                ?s ?p ?o .
            }
        }
    }
} OFFSET 0 LIMIT 5

Local query (very fast, used 0.02 s):

SELECT * WHERE {
    ?s ?p ?o .
} OFFSET 0 LIMIT 5

However, querying Virtuoso with the same sparql statement is very fast. Such as DBpedia, although there are hundreds of millions of triples.


Solution

  • SERVICE will return all results (a single HTTP request) for the SERVICE block. It does not know there is an overall query limit and a more complex query may be locally filtering of joining SERVICE results so they may need to be more than 5 returned.

    Apache Jena 4.6.1 has new support for enhancing SERVICE: https://jena.apache.org/documentation/query/service_enhancer.html