Search code examples
pythoncsvamazon-s3boto3amazon-sagemaker

Reading multiple CSV files from S3 using Python with Boto3


I am able to read multiple csv files from S3 bucket with boto3 in python and finally combine those files in single dataframe in pandas.However, in some of the folders there are some empty files which results in the error "No columns to parse from file". Can we skip those empty files in the below codes?

s3 = boto3.resource('s3')
bucket = s3.Bucket('testbucket')

prefix_objs = bucket.objects.filter(Prefix="extracted/abc")

    prefix_df = []

for obj in prefix_objs:
    key = obj.key
    body = obj.get()['Body'].read()
    temp = pd.read_csv(io.BytesIO(body),header=None, encoding='utf8',sep=',')        
    prefix_df.append(temp)

I have used this ans [https://stackoverflow.com/questions/52855221/reading-multiple-csv-files-from-s3-bucket-with-boto3][1]


Solution

  • s3 = boto3.resource('s3')
    bucket = s3.Bucket('testbucket')
    
    prefix_objs = bucket.objects.filter(Prefix="extracted/abc")
    
    prefix_df = []
    
    for obj in prefix_objs:
        try:
            key = obj.key
            body = obj.get()['Body'].read()
            temp = pd.read_csv(io.BytesIO(body),header=None, encoding='utf8',sep=',')        
            prefix_df.append(temp)
        except:
            continue