Search code examples
amazon-s3amazon-neptune

Unable to Load Data in Amazon Neptune from S3 Bucket


I am using Amazon Neptune as GraphDB.

While trying to load data from the S3 bucket facing an exception mentioning "{"code":"InvalidParameterException","detailedMessage":"The source s3-URL does not exist/not reachable"}".

Have checked the S3 resource URL, it is accessible publicly. Unable to understand why this error is coming.

I have referred this AWS documentation and use the below post call to the Neptune DB instance for the data loading.

curl -X POST -H 'Content-Type: application/json' neptune-endpoint:8182/loader -d '
{
      "source" : "s3-URL",
      "format" : "csv",
      "iamRoleArn" : "arn:aws:iam::",
      "region" : "us-east-2",
      "failOnError" : "FALSE",
      "parallelism" : "MEDIUM",
      "updateSingleCardinalityProperties" : "FALSE"
}'

Solution

  • Looking at your error message, it appears like you forgot to replace the placeholder s3-URL. You basically need to put your data in S3, and use a S3 URL to the folder to make a bulk load request.

    Also, your snippet does not have a valid value for iamRoleArn either. Please go through the docs in detail, as it explains a bunch of steps that need to be done to make a successful S3 Load. The short summary of steps are as follows:

    1. Upload your CSV or RDF data into S3, obtain the S3 URL
    2. Create an IAM Role who has access to the S3 data
    3. Use the addRoleToDbCluster API (or console) to add this role to the cluster. This makes the cluster impersonate that role when needed to fetch data.
    4. Attach a VPC-Endpoint to your VPC, so that it can talk to S3.
    5. Fire the /loader request and track status of your load.

    Docs: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html