Search code examples
postgresqlpostgresql-9.5dspace

How to delete generated bitstreams in DSpace 6x?


I would like to delete all the bitstreams generated by filter-media but only with a specific description "IM Thumbnail".

Item Bitstreams

I am aware that I can just regenerate the thumbnail by using the -f flag to force it to regenerate the thumbnail. I am testing some settings in my setup and I would just like to delete the generated thumbnails with this specific description first.

I've tried tinkering the database via PgAdmin but I can only go as far as selecting the bitstreams. I don't even know how to group or order the returned results and not really sure if I've selected the correct tables.

SELECT 
  * 
FROM 
  public.bitstream, 
  public.bundle, 
  public.bundle2bitstream, 
  public.metadatavalue, 
  public.item2bundle
WHERE 
  bitstream.uuid = metadatavalue.dspace_object_id AND
  bitstream.uuid = bundle2bitstream.bitstream_id AND
  bundle.uuid = item2bundle.bundle_id AND
  bundle2bitstream.bundle_id = bundle.uuid AND
  metadatavalue.text_value = 'IM Thumbnail';

Any advice on how to do this via database manipulation or any other means would be greatly appreciated. Applying the SQL deletion within a specific community or collection would be a really nice bonus too!

Thanks in advance!


Solution

  • Although the question was tagged with postgresql, I found the answer from DSpace Community Mailing List using Jython. Thanks for Joan Caparros for the original code. The message thread can be found here: Removing Thumbnails in DSpace 5. I also posted a similar query in the DSpace Technical Support Mailing List which can be found here: Batch delete bitstreams in the Bundle: THUMBNAIL where Joan posted a modified version of his code for my specific needs which is deleting only the thumbnails if it contains the description "IM Thumbnail". Below is the full code that achieved my goals:

    from org.dspace.curate import ScriptedTask
    from org.dspace.curate import Curator
    from org.dspace.content.service import DSpaceObjectService
    from org.dspace.content.factory import ContentServiceFactory
    
    #from org.dspace.content.service import BitstreamService
    
    class Main(ScriptedTask):
        def init(self, curator, taskName):
            print "initializing with Jython"
    
        def performDso(self, dso):
            #print "perform on dso "
            if dso.getType()==2:
                print "Item '" + dso.getName() + "' ("+dso.getHandle()+")"
                myBundles = dso.itemService.getBundles(dso,"THUMBNAIL")
                totalbundles = len(myBundles)
                for i in myBundles:
                    myBitstreams = i.getBitstreams()
                    total = len(myBitstreams)
                    if len(myBitstreams)==0:
                        print "- DELETING EMPTY BUNDLE"
                dso.itemService.removeBundle(Curator.curationContext(),dso,myBundles[0])
                if len(myBitstreams)>0:
                    for k in range(0,len(myBitstreams)):
                        if myBitstreams[k].getDescription() == 'IM Thumbnail':
                            print "DELETE "+myBitstreams[0].getDescription()
                            bitstreamService = ContentServiceFactory.getInstance().getBitstreamService()
                            bitstreamService.delete(Curator.curationContext(),myBitstreams[0])
                            print "- DELETING BUNDLE"
                            dso.itemService.removeBundle(Curator.curationContext(),dso,myBundles[0])
            return 0
    
        def performId(self, context, id):
            print "perform on id %s" % (id)
            return 0
    

    Please note that the code above was made as a curation task using Jython so the documentation on how to set up and use can be found here: Curation tasks in Jython.