Search code examples
ckan

Get HarvestSource for a package in CKAN


Based on the id of a dataset package, how do I figure out if the package was harvested, by which harvester and what the base URL of that harvester is?

Something along the lines of:

guid = '65715c6e-bbaf-3def-982b-3b5156272da7'
harvest_source = getHarvestSource(guid)

if (harvest_source):
  type = harvest_source.type() # whatever was set as the name attribute for this harvester class
  base_url = harvest_source.url() # whatever was set as the URL in the admin interface

Solution

  • I've not tried it, but from reading the model I expect something like this:

    from ckan.model import Package
    
    id = u'65715c6e-bbaf-3def-982b-3b5156272da7'
    dataset = model.Package.get(id)
    dataset_was_harvested = bool(len(dataset.harvest_objects) > 0)
    if dataset_was_harvested:
        ho = dataset.harvest_objects[0]  # there's not usually more than 1
        source = ho.source  # i.e. the harvest source is "the harvester"
        source.url  # i.e. the harvester's base url
        source.type # is also useful