I am working on processing a dataset of large videos (~100 GB) for a collaborative project. To make it easier to share data and results, I am keeping all videos remotely on an amazon S3 bucket, and processing it by mounting the bucket on an EC2 instance.
One of the processing steps I am trying to do involves cropping the videos, and rewriting them into smaller segments. I am doing this with moviepy, splitting the video with the subclip method and calling:
subclip.write_videofile("PathtoS3Bucket"+VideoName.split('.')[0]+'part' +str(segment)+ '.mp4',codec = 'mpeg4',bitrate = "1500k",threads = 2)
I found that when the videos are too large (parameters set as above) calls to this function will sometimes generate empty files in my S3 bucket (~10% of the time). Does anyone have insight into features of moviepy/ffmpeg/S3 that would lead to this?
It is recommended not to use tools such as s3fs
because these merely simulate a file system, whereas Amazon S3 is an object storage system.
It is generally better to create files locally, then copy them to S3 using standard API calls.