I have a large dataframe I am trying to break into smaller piece and write to a csv file in S3. For testing purposes I have the groupby size set very low, but the concept is the same. Here is the code I have:
if not submittingdata.empty:
for i, g in submittingdata.groupby(df.index // 200):
data = BytesIO()
g.to_csv(data)
s3_client.upload_fileobj(
data,
Bucket='some-magic-bucket',
Key=f'file_prep_{i}.csv'
)
The chunks are working correctly and the files are all being created as intended, but they are all empty. Not sure what I am missing. My understanding is that the g.to_csv(data) should be writing the csv body to the BytesIO object, which is then what I'm using to write to the file. Maybe I'm misunderstanding that?
After following Patryks suggestion above I was able to find a piece of code that works. Using Resource rather than client in boto3 and then writing to the body of a put from the BytesIO buffer I was able to get files populated with values. The working code is:
if not submittingdata.empty:
for i, g in submittingdata.groupby(df.index // 200):
data = BytesIO()
g.to_csv(data)
s3_resource.Object(
'some-magic-bucket',
f'file_prep_{i}.csv'
).put(
Body=data.getvalue()
)