Search code examples
apigeolocationsparqlwikipediageotagging

Reverse wikipedia geotagging lookup


Wikipedia is geotagging a lot of its articles. (Look in the top right corner of the page.)

Is there any API for querying all geotagged pages within a specified radius of a geographical position?

Update

Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer):

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?subject ?label ?lat ?long WHERE {
    ?subject geo:lat ?lat.
    ?subject geo:long ?long.
    ?subject rdfs:label ?label.
    FILTER(xsd:float(?lat) - 57.03185 <= 0.05 && 57.03185 - xsd:float(?lat) <= 0.05
        && xsd:float(?long) - 9.94513 <= 0.05 && 9.94513 - xsd:float(?long) <= 0.05
        && lang(?label) = "en"
    ).
} LIMIT 20

This is very close to what I want, except it returns results within a (local) square around the point and not a circle. Also I would like if the results where sorted based on the distance from the point. (If possible.)

Update 2

I am trying to determine the euclidean distance as an approximation of the true distance, But I am having trouble on squaring a number in SPARQL. (Question opened here.) When I get something useful I will update the question, but in the meantime I will appreciate any suggestions on alternative approaches.

Update 3

A final update. I gave up on using SPARQL through DBpedia. I have written a simple parser which fetches the Wikipedia article text nightly database dump and parses all articles for geocodes. It works rather nicely and it allows me to store information about geotagged articles however I wish.

This is probably the solution I will continue using, and if I get around to create a nice interface to it I might consider allowing public API access and/or publishing the source to the parser.


Solution

  • The OpenLink Virtuoso server used by the dbpedia endpoint has several query features. I found the information on http://docs.openlinksw.com/virtuoso/rdfsparqlgeospat.html useful for a similar problem.

    I ended up with a query such as this:

    SELECT ?page ?lat ?long (bif:st_distance(?geo, bif:st_point(15.560278, 58.394167)))
    WHERE{
        ?m foaf:page ?page.
        ?m geo:geometry ?geo.
        ?m geo:lat ?lat.
        ?m geo:long ?long.
        FILTER (bif:st_intersects (?geo, bif:st_point(15.560278, 58.394167), 30))
    }
    ORDER BY ASC 4 LIMIT 15
    

    This example retrieves the geotagged locations within 30 km from the origin position.