Search code examples
pythonamazon-web-servicesamazon-s3aws-lambdaaws-step-functions

AWS Map State keeps overwriting the file in S3


I have a map state in AWS step-function which is linked to a lambda which creates a csv file and save it to the S3.

I want to append the results to main.csv for each iteration of the map, but newer iterations just keeps on overwriting the main.csv which is stored in s3.

I am attaching the code code :

#some code to connect to s3

# describing the files
key = 'some_path/main.csv'
tempdir = tempfile.mkdtemp()
local_file = 'main.csv'
path = os.path.join(tempdir, local_file)

lists = []
# some processing to populate the lists

#writing the file to S3
with open(path, 'a', newline='') as output
    writer = csv.writer(outfile)
    for line in lists:
        writer.writerow(line)
bucket.upload_file(path, key)

It would be really helpful, if someone would suggest that, whenever i execute the step function, main.csv should be created from scratch, and map iterations should append to it. i don't want to append to the main.csv which is created by the older execution of state function.


Solution

  • This line creates a brand new file in the /tmp folder of the Lambda execution environment:

    with open(path, 'a', newline='') as output
    

    You are opening it in append mode, but the file /tmp/main.csv doesn't exist yet, so it's just going to create a new file.

    Later, when you interact with S3 via bucket.upload_file(path, key) you are uploading that new file you just created, overwriting the existing file in S3.

    You would need to download the file from S3 first, append to it, then upload the new version back to S3, like this:

    # get the file from s3
    bucket.download_file(key, path)
    
    # append to the file
    with open(path, 'a', newline='') as output
        writer = csv.writer(outfile)
        for line in lists:
            writer.writerow(line)
    
    # write the file to S3
    bucket.upload_file(path, key)