I have a collection of JSON messages in a file stored on S3 (one message per line). Each message has a unique key as part of the message. I also have a simple DynamoDB table where this key is used as the primary key. The table contains the name of the S3 file where the corresponding JSON message is located.
My goal is to extract a JSON message from the file given the key. Of course, the worst case scenario is when the message is the very last line in the file.
What is the fastest way of extracting the message from the file using the boto
library? In particular, is it possible to somehow read the file line by line directly? Of course, I can read the entire contents to a local file using boto.s3.key.get_file()
then open the file and read it line by line and check for the id to match. But is there a more efficient way?
Thanks much!
S3 cannot do this. That said, you have some other options:
Range:
header.{ S3 object key, line number } => { position, length }
tuples. When you want to look up a record by { S3 object key, line number }
, reference the cache. If you don't already have this data, you have to fetch the whole file like you do now -- but having fetched the file, you can calculate offsets for every line within it, and save yourself work down the line.Which is most appropriate for you depends on your application architecture, the way in which this data is accessed, concurrency issues (probably not significant given your current solution), and your sensitivities for latency and cost.