Search code examples
pythonsparqlrdf

RDF File count distinct in python query


i have this rdf .ttl file example: @prefix ns1: http://schema.org/ . @prefix xsd: http://www.w3.org/2001/XMLSchema# .

<http://example.org/crime/100010117.0> ns1:beat "308" ;
    ns1:crime "AUTO_THEFT" ;
    ns1:date "1/1/2010" ;
    ns1:lat 3.369307e+01 ;
    ns1:location "960 CONSTITUTION RD SE" ;
    ns1:long -8.435805e+01 ;
    ns1:neighborhood "Norwood Manor" ;
    ns1:npu "Z" ;
    ns1:number 1.000101e+08 .

<http://example.org/crime/100010121.0> ns1:beat "309" ;
    ns1:crime "LARCENY-FROM_VEHICLE" ;
    ns1:date "1/1/2010" ;
    ns1:lat 3.368274e+01 ;
    ns1:location "2685 METROPOLITAN PKWY SW" ;
    ns1:long -8.440902e+01 ;
    ns1:neighborhood "Perkerson" ;
    ns1:npu "X" ;
    ns1:number 1.000101e+08 .

<http://example.org/crime/100010127.0> ns1:beat "208" ;
    ns1:crime "LARCENY-FROM_VEHICLE" ;
    ns1:date "1/1/2010" ;
    ns1:lat 3.385211e+01 ;
    ns1:location "3600 PIEDMONT RD NE" ;
    ns1:long -8.438044e+01 ;
    ns1:neighborhood "Buckhead Forest" ;
    ns1:npu "B" ;
    ns1:number 1.000101e+08 .

<http://example.org/crime/100010147.0> ns1:beat "512" ;
    ns1:crime "ROBBERY-PEDESTRIAN" ;
    ns1:date "1/1/2010" ;
    ns1:lat 3.375104e+01 ;
    ns1:location "FORSYTH ST SW / NELSON ST SW" ;
    ns1:long -8.439479e+01 ;
    ns1:neighborhood "Downtown" ;
    ns1:npu "M" ;
    ns1:number 1.000101e+08 .

<http://example.org/crime/100010149.0> ns1:beat "311" ;
    ns1:crime "BURGLARY-RESIDENCE" ;
    ns1:date "1/1/2010" ;
    ns1:lat 3.367399e+01 ;
    ns1:location "2950 SPRINGDALE RD SW" ;
    ns1:long -8.441557e+01 ;
    ns1:neighborhood "Hammond Park" ;
    ns1:npu "X" ;
    ns1:number 1.000101e+08 .

<http://example.org/crime/100010186.0> ns1:beat "501" ;
    ns1:crime "BURGLARY-RESIDENCE" ;
    ns1:date "1/1/2010" ;
    ns1:lat 3.378988e+01 ;
    ns1:location "288 16TH ST NW" ;
    ns1:long -8.439713e+01 ;
    ns1:neighborhood "Home Park" ;
    ns1:npu "E" ;
    ns1:number 1.000102e+08 .

I’m m trying to count the Different types of crimes (ns1:crime)

I want the result like that for example

[
    {
        "crime": "AUTO_THEFT",
        "count": 1
    },
    {
        "crime": "LARCENY-FROM_VEHICLE",
        "count": 2
    },
    {
        "crime": "ROBBERY-PEDESTRIAN",
        "count": 1
    },
    {
        "crime": "ROBBERY-PEDESTRIAN",
        "count": 2
    }
]

So the different types of crimes (distinct) and their count.

I have tried this:

def countTypes(g):
    crimes = []
    q = g.query(
        """
        PREFIX ns1: <http://schema.org/>
            SELECT ?crime (count(distinct ?crime) as ?crimeCount) WHERE {
                ?s ns1:crime ?crime .
            }""")
    for row in q:
        crimes.append(row)
    return crimes

but it does not work properly. Any idea on how to do it ? thank you


Solution

  • You can use a GROUP BY clause to group results by the crime value, and then COUNT (without distinct) to count how many results in each group there are:

    PREFIX ns1: <http://schema.org/>
    SELECT ?crime (count(*) as ?crimeCount) WHERE {
        ?s ns1:crime ?crime .
    }
    GROUP BY ?crime