Search code examples
amazon-web-servicesamazon-s3amazon-ec2amazon-neptune

Neptune loader FROM_OR_TO_VERTEX_ARE_MISSING


I tried to follow this example https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html to load data to neptune

curl X POST     -H 'Content-Type: application/json'     https://endpoint:port/loader -d '
{
  "source" : "s3://source.csv",
  "format" : "csv",
  "iamRoleArn" : "role",
  "region" : "region",
  "failOnError" : "FALSE",
  "parallelism" : "MEDIUM",
  "updateSingleCardinalityProperties" : "FALSE",
  "queueRequest" : "TRUE"
}'
{
"status" : "200 OK",
"payload" : {
    "loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb"
}

And I get status 200 but then I try to check if the data was loaded and get this:

curl G 'https://endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb'
{
"status" : "200 OK",
"payload" : {
    "feedCount" : [
        {
            "LOAD_FAILED" : 1
        }
    ],
    "overallStatus" : {
        "fullUri" : "s3://source.csv",
        "runNumber" : 1,
        "retryNumber" : 1,
        "status" : "LOAD_FAILED",
        "totalTimeSpent" : 4,
        "startTime" : 1617653964,
        "totalRecords" : 10500,
        "totalDuplicates" : 0,
        "parsingErrors" : 0,
        "datatypeMismatchErrors" : 0,
        "insertErrors" : 10500
    }
}

I had no idea why I get LOAD_FAILED so I decided to use get-status API to see what errors caused the load failure and got this:

curl -X GET 'endpoint:port/loader/411ee078-3c44-4620-85ac-e22ef5466bbb?details=true&errors=true'
{
"status" : "200 OK",
"payload" : {
    "feedCount" : [
        {
            "LOAD_FAILED" : 1
        }
    ],
    "overallStatus" : {
        "fullUri" : "s3://source.csv",
        "runNumber" : 1,
        "retryNumber" : 1,
        "status" : "LOAD_FAILED",
        "totalTimeSpent" : 4,
        "startTime" : 1617653964,
        "totalRecords" : 10500,
        "totalDuplicates" : 0,
        "parsingErrors" : 0,
        "datatypeMismatchErrors" : 0,
        "insertErrors" : 10500
    },
    "failedFeeds" : [
        {
            "fullUri" : "s3://source.csv",
            "runNumber" : 1,
            "retryNumber" : 1,
            "status" : "LOAD_FAILED",
            "totalTimeSpent" : 1,
            "startTime" : 1617653967,
            "totalRecords" : 10500,
            "totalDuplicates" : 0,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 10500
        }
    ],
    "errors" : {
        "startIndex" : 1,
        "endIndex" : 10,
        "loadId" : "411ee078-3c44-4620-85ac-e22ef5466bbb",
        "errorLogs" : [
            {
                "errorCode" : "FROM_OR_TO_VERTEX_ARE_MISSING",
                "errorMessage" : "Either from vertex, '1414', or to vertex, '70', is not present.",
                "fileName" : "s3://source.csv",
                "recordNum" : 0
            },

What does this error even mean and what is the possible fix?


Solution

  • It looks as if you were trying to load some edges. When an edge is loaded, the two vertices that the edge will be connecting must already have been loaded/created. The message:

    "errorMessage" : "Either from vertex, '1414', or to vertex, '70',is not present.",
    

    is letting you know that one (or both) of the vertices with ID values of '1414' and '70' are missing. All vertices referenced by a CSV file containing edges must already exist (have been created or loaded) prior to loading edges that reference them. If the CSV files for vertices and edges are in the same S3 location then the bulk loader can figure out the order to load them in. If you just ask the loader to load a file containing edges but the vertices are not yet loaded, you will get an error like the one you shared.