Search code examples
amazon-web-servicesamazon-s3amazon-ses

Reading incoming emails saved to s3 bucket in SES


I have configured AWS SES for sending and receiving emails I have verified my domain and created rule set by which all incoming emails will now be stored in an S3 bucket with object key prefix as email. I found the following codes for reading files from an S3 bucket: http://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html

I am trying to read emails. My rule for SES stores all incoming emails to my specified s3 bucket. I am trying to add the code that reads the bucket, get the emails. The next time when I read the bucket, how can I understand which emails were read before and which to read. So is there any way I could read the bucket with emails and them mak them as read so that I dont have to process them again


Solution

  • S3 is just storage. It has no sense of "read" vs "unread," and if you're discovering messages by listing objects in the bucket, your best solution would be something like this:

    After processing each message, move it somewhere else. This could be another bucket, or a different prefix in the same bucket.

    S3 doesn't have a "move" operation, but it does have copy and it does have delete... so, for each message you process, modify the object key (the path+filename).

    If your emails are being stored with a prefix, like "incoming/" so that an individual message has a key that looks like (e.g.) "incoming/jozxyqkblahexample," change that string to "processed/jozxyqkblahexample." Then tell S3 to copy from the old to the new. When that succeeds, tell S3 to delete the original.

    This (mostly? solves your problem, because since you only list objects with the prefix "incoming/" then you won't see those the next time -- they're now out of the way.

    But, there's one potential problem with this solution... specifically, you may run afoul of the S3 consistency model. S3 does not guarantee that fetching a list of objects will immediately give you a response that reflects all of your recently-completed activity against the bucket... it's possible for objects to linger for a brief time in the object listing after being deleted... so it's still possible to see a message in the listing after you've deleted it. The chances are reasonably low, but you need to be aware of the possibility.

    When SES drops a message into your bucket, it's also possible to configure it to notify you that it just did that.

    Typically, a better solution than polling the bucket for mail is for SES to send you an SNS notification that the message was received. The notification will include information about the message, including the key where it was stored in the bucket. You then fetch exactly that message from the bucket, and process it, so no bucket object listing is needed.

    Note that SES has two different notification types -- for small emails, SES can actually include the mail in the SNS notification, but that'a not the notification type referred to, above. Above, I'm suggesting that you investigate the possibility of using an alert notification, sent by SES through SNS to tell you about each email as it is dropped into S3.