Search code examples
sparqlrdfdbpedia

get relYear or released in dbpedia singles data without record duplication


Here is the browse output of the following query to try to get singles that are in the billboard top 100:

PREFIX prop: <http://dbpedia.org/property/>
PREFIX ont: <http://dbpedia.org/ontology/>

SELECT DISTINCT ?page, ?artist, ?relYear, ?released, ?runTime WHERE {
  ?page dct:subject dbc:Billboard_Hot_100_number-one_singles .
  OPTIONAL {?page prop:artist ?artist}.
  OPTIONAL {?page prop:relyear ?relYear}.
  OPTIONAL {?page prop:released ?released}.
  OPTIONAL {?page ont:runtime ?runTime}
}

I put relYear and released because some singles have one, some have the other, some have both, some have neither.

If you look at the output, it duplicates rows: one row with relYear, a second row with released. I want something like SQL COALESCE(released, relYear), i.e. give me the first element of (released, relYear) that exists, on a single row.

How do I do that?

P.S. I have the same problem with artist and musicalArtist, and so on, so the rows end up multiplying like crazy.

P.P.S. Looked at this, but it didn't help.


Solution

  • Basically, you already know the answer, use COALESCE:

    PREFIX prop: <http://dbpedia.org/property/>
    PREFIX ont: <http://dbpedia.org/ontology/>
    
    SELECT DISTINCT ?page ?artist (coalesce(?relYear, ?released) as ?releaseYear) ?runTime WHERE {
      ?page dct:subject dbc:Billboard_Hot_100_number-one_singles .
      OPTIONAL {?page prop:artist ?artist}.
      OPTIONAL {?page prop:relyear ?relYear}.
      OPTIONAL {?page prop:released ?released}.
      OPTIONAL {?page ont:runtime ?runTime}
    }
    

    But, and I guess you already figured out, that there can be multiple versions of the same single, done by different artists and done in different versions, thus, you might get multiple rows for the same single. As an extreme example, have a look at Total_Eclipse_of_the_Heart

    +------------------------------+--------------------------------+-------------+---------+
    |             page             |             artist             | releaseYear | runTime |
    +------------------------------+--------------------------------+-------------+---------+
    | :Total_Eclipse_of_the_Heart  | "Bonnie Tyler"^^rdf:langString |        1983 |   180.0 |
    | :Total_Eclipse_of_the_Heart  | "Nicki French"^^rdf:langString |        1983 |   180.0 |
    | :Total_Eclipse_of_the_Heart  | "Bonnie Tyler"^^rdf:langString |        1995 |   180.0 |
    | :Total_Eclipse_of_the_Heart  | "Nicki French"^^rdf:langString |        1995 |   180.0 |
    | :Total_Eclipse_of_the_Heart  | "Bonnie Tyler"^^rdf:langString |        2012 |   180.0 |
    | :Total_Eclipse_of_the_Heart  | "Nicki French"^^rdf:langString |        2012 |   180.0 |
    | :Total_Eclipse_of_the_Heart  | "Bonnie Tyler"^^rdf:langString |        1983 |   230.0 |
    | :Total_Eclipse_of_the_Heart  | "Nicki French"^^rdf:langString |        1983 |   230.0 |
    | :Total_Eclipse_of_the_Heart  | "Bonnie Tyler"^^rdf:langString |        1995 |   230.0 |
    | :Total_Eclipse_of_the_Heart  | "Nicki French"^^rdf:langString |        1995 |   230.0 |
    | :Total_Eclipse_of_the_Heart  | "Bonnie Tyler"^^rdf:langString |        2012 |   230.0 |
    | :Total_Eclipse_of_the_Heart  | "Nicki French"^^rdf:langString |        2012 |   230.0 |
    | ...                          | ...                            |         ... |     ... |
    +------------------------------+--------------------------------+-------------+---------+
    

    You could do some kin of group by here, to get at least for each artist only one row, e.g. in combination with group_concat or sample