Search code examples
pythonrecordsoai

Getting all records in a set using the Sickle package


How can I access all the records in each set using Sickle?

I can access sets like this, but I don't know how to go from here and download each record from every set:

from sickle import Sickle

sickle = Sickle('http://www.duo.uio.no/oai/request')
    sets = sickle.ListSets()
    for s in sets:
        print s

The print prints out every set like this:

<set xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><setSpec>com_10852_1</setSpec><setName>Det matematisk-naturvitenskapelige fakultet</setName></set>

I can also iterate through the sets to go deeper:

for s in sets:
    for rec in sets:
        print rec

This prints all the sub-sets, so it's probably from here I can get access to the individual records, but the API is hard to understand, and I have not be able to access the records.


Solution

  • Be sure to read the short and sweet Tutorial.

    For harvesting an entire OAI-PMH repository, you do not need to iterate over sets. Here is the complete code:

    from sickle import Sickle
    
    sickle = Sickle('http://www.duo.uio.no/oai/request')
    recs = sickle.ListRecords(metadataPrefix="oai_dc")
    for r in recs:
        print r
    

    If for some reason you really wish to harvest records set by set, you can certainly do so. Here is the complete code again:

    from sickle import Sickle
    
    sickle = Sickle('http://www.duo.uio.no/oai/request')
    sets = sickle.ListSets()
    for s in sets:
        recs = sickle.ListRecords(metadataPrefix="oai_dc", set=s.setSpec)
        for r in recs:
            print r