Search code examples
pythonboto3amazon-sagemaker

how to save uncompressed outputs from a training job in using aws Sagemaker python SDK?


I'm trying to upload training job artifacts to S3 in a non-compressed manner.

I am familiar with the output_dir one can provide to a sagemaker Estimator, then everything saved under /opt/ml/output is uploaded compressed to the S3 output dir.

I want to have the option to access a specific artifact without having to decompress the output every time. Is there a clean way to go about it? if not any workaround in mind? The artifacts of my interest are small meta-data files .txt or .csv, while in my case the rest of the artifacts can be ~1GB so downloading and decompressing is quite excessive.

any help would be appreciated


Solution

  • You can specify parameter disable_output_compression=True when specifying your Estimator (details in docs here). Then all your outputs will be saved in output_dir uncompressed.

    Example:

    import sagemaker
    from sagemaker.estimator import Estimator
    
    estimator = Estimator(
        image_uri="your-own-image-uri",
        role=sagemaker.get_execution_role(), 
        sagemaker_session=sagemaker.Session(),
        instance_count=1,
        instance_type='ml.c4.xlarge',
        disable_output_compression=True
    )