Search code examples
amazon-s3xquerymarklogicmarklogic-8

Schedule job in aws to delete files in s3 bucket based on availability of flag files


I have a scheduled Marklogic task everyday where I access an S3 bucket, process a file (test.xml) in a directory and then add a flag file (test.done) to the same directory to notify that the file is processed. I need to delete the files (both test.xml and test.done) periodically based on the availability of flag file. Is there an option in amazon to create a job which deletes these files periodically?

Is there an option to use xdmp:http-delete()? If so can some one share a sample request with header to do it?


Solution

  • In MarkLogic, there is no supported way to delete files or directories. However, you can zero-out their content by writing an empty text node to them.

    I said no 'supported' way. However, there are two function in MarkLogic that exist: xdmp:filesystem-directory-delete and xdmp:filesystem-file-delete. They are undocumented, which is also an indicator that they are unsupported and subject to change or removal, I believe. So I would caution the use of these for production.

    To delete the files via HTTP, check out the API for deleting via AWS: http://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingObjects.html

    Another option is to mount S3 to the local file-system of the machine running MarkLogic and use the system to delete the files. In this case, you could also have MarkLogic write the test.done flag to a directory on the local filesystem in the form of a queue and process them from the OS.