Search code examples
pandasmachine-learningamazon-sagemakerdata-preprocessing

SageMaker Processing Job permission denied to save csv file under /opt/ml/processing/<folder>


I am working on a project involving Step Functions with SageMaker. I have an existing Step Function that I need to integrate SageMaker into, and I tried adding steps such as processing, model training, registering the model, and batch transform job requests. I also added .sync at the end of each resource so it waits for one to complete before starting the next.

However, I encountered an issue with the Step Function's SageMaker processing job. The processing job runs but does not finish due to permission being denied to save a CSV file from my processed pandas dataframe.

# dependencies imports
df = pd.read_csv("/opt/ml/processing/input/data/data.csv")
print(df.head())

# some processing on df

df.to_csv("/opt/ml/processing/output/result.csv", index=False)

Here are my state machine configurations for the processing request: Please leave me a comment, if you have to see other parts of my configs

{
  "AppSpecification": {
    "ContainerEntryPoint": [
      "python3",
      "/opt/ml/processing/input/code/processing.py"
    ]
  },
  "ProcessingInputs": [
    {
      "InputName": "Input-1",
      "S3Input": {
        "S3Uri": "s3://my-dataset/data.csv",
        "LocalPath": "/opt/ml/processing/input/data"
      }
    },
    {
      "InputName": "Input-2",
      "S3Input": {
        "S3Uri": "s3://my-dataset/processing.py",
        "LocalPath": "/opt/ml/processing/input/code"
      }
    }
  ],
  "ProcessingOutputConfig": {
    "Outputs": [
      {
        "OutputName": "Output-1",
        "S3Output": {
          "S3Uri": "s3://my-dataset/data.csv",
          "LocalPath": "/opt/ml/processing/output/",
          "S3UploadMode": "EndOfJob"
        }
      }
    ]
  }
}

The ProcessingInputs configurations are working as expected. I saw in the log that data.csv content is correctly printed in the log by df.head(). However, when it reaches the last line of code, I get the following error: PermissionError: Permission Denied '/opt/ml/processing/output/result.csv' I also tried saving it to other folders as I saw in some examples found online, such as saving to folders like training, result, and others, but no luck so far. It's giving me the same permission error. I used a Lambda function created just for this and made a request to the SageMaker processing job, and got exactly the same permission denied error.

I also tried saving into completely different folder out of /opt/ml/processing/, but /result.csv But it gave me different error as SageMaker only allows us to save csv files under /opt/ml/processing/.... so I am not sure what to do with it.

Currently I am saving the result set manually using boto3 api and wait the processing job to pass the StoppingCondition.MaxRuntimeInSecondstime I set and eventually it stops and I use additinoal step to pick it up. But I dislike the way I make a workaround to the problem and I really need to find a better way to resolve this.

Can someone tell me what I am missing?


Solution

  • The specific POSIX permissions are depending on the user which is used by the container image of your processing job.

    In most cases, this is root but there also images which are using non-privileged users. If you're using the SageMaker Distribution, it is also a non-root user (uid: 1000, gid: 100).

    The paths for the ProcessingOutputs are generated when the entrypoint of the container image is invoked and are generated with root permissions.

    LocalPath: The local path of a directory where you want Amazon SageMaker to upload its contents to Amazon S3. LocalPath is an absolute path to a directory containing output files. This directory will be created by the platform and exist when your container's entrypoint is invoked.

    You can add the following to your processing code to update the permissions of the created output folder:

    import subprocess
    
    output_path = "/opt/ml/processing/output/"
    
    # Option A: if you're using the SageMaker Distribution
    subprocess.check_call(["sudo","chown","-R","sagemaker-user", output_path])
    # Option B: for other container images
    subprocess.check_call(["sudo","chmod","-R","777", output_path])
    

    Alternatively, you could also use the boto3 library within your processing job and write to S3 directly.