I'm using the library Cloudant in order to gather documents from a Cloudant Database. Everytime I run the python script I get all the documents but I would like to retrieve only the documents added from the last execution of the script, in other words a get_changes function.
I have searched for an answer but it not seems to be easely to find.
Thaks for any help,
Filippo.
Use the changes()
method. Keep track of the last sequence id, and restart from there to retrieve only the unseen changes.
# Iterate over a "normal" _changes feed
changes = db.changes()
for change in changes:
print(change)
# ...time passes
new_changes = db.changes(since=changes.last_seq)
for new_change in new_changes:
print(new_change)
If you also want the doc body, you can pass include_docs=True
.
See https://github.com/cloudant/python-cloudant/blob/master/src/cloudant/database.py#L458
If you want to capture only new additions (as opposed to all changes), you can either create a filter function in a db design doc along the lines of:
function(doc, req) {
// Skip deleted docs
if (doc._deleted) {
return false;
}
// Skip design docs
if (doc._id.startsWith('_design')) {
return false;
}
// Skip updates
if (!doc._rev.startsWith('1-')) {
return false;
}
return true;
}
and apply that to the changes feed:
new_changes = db.changes(since=changes.last_seq, filter='myddoc/myfilter'):
# do stuff here
but probably as easy to simply get all the changes and filter in the Python code.
Filter functions: https://console.bluemix.net/docs/services/Cloudant/guides/replication_guide.html#filtered-replication