Search code examples
sparqlwikidata-query-service

Amount of Wikidata entries in a given language


I wonder how can I found out how many labels in Wikidata are for each language, out of the total amount of 50 millions entries.

For example, in https://query.wikidata.org , for Catalán language ("ca") I tried with

SELECT ?lang (COUNT(DISTINCT ?item) AS ?count) WHERE {
  ?item schema:inLanguage "ca" .
} GROUP BY ?lang
ORDER BY DESC (?count)

and got a result of 703351, but I think it's not correct because I downloaded the Wikidata dump (from https://dumps.wikimedia.org/wikidatawiki/entities/ ), and I already extracted more than two millions of labels in Catalán (and the extraction process is still running)

So, any clue on what am I doing wrong?


Solution

  • As suggested in the notes above, using Quarry:

    https://quarry.wmflabs.org/query/27976

    USE wikidatawiki_p; 
    DESCRIBE wb_terms;
    
    SELECT COUNT(*) FROM wb_terms
    WHERE term_type = 'label' AND term_language = "ca";