amazon-web-services elasticsearch aws-dms aws-documentdb

AWS DMS - DocumentDB > ElasticSearch - Error Getting Primary Key String

I am using DMS to periodically migrate all data from DocumentDB into a elasticSearch cluster.

When I run the task, some tables migrate normally into ElasticSearch whereas in some I get errors.

There are two errors that I am getting for every table that is giving out errors:

[TARGET_LOAD]E: Error Getting Primary Key String for hash key for table 'XXX' https://forums.aws.amazon.com/ (field_mapping_utils.c:644)
[TARGET_LOAD]E: Error extracting Field-type-info from data_record https://forums.aws.amazon.com/ (elasticsearch_utils.c:1088)

I am not sure what to do here. Here is a sample document from the "XXX" table above:

{"_id":"5fe07b894ae10f100cb3d623","orgId":"XXX","firstName":"XXX","middleName":"","lastName":"XXX","email":"XXX","previousUsedEmails":[{email: "XXX", timeOfCreation: 1234}],"phone":null,"passHash":"1234","oldPassHashes": ["1234"],"isEmailVerified":true,"isSuspended":false,"role": ["abc"],"isArchived":false,"lastSignedIn":0,"isNew":false,"recruiterAccount":null,"failedLoginAttempts":0,"lockedUntilTimestamp":null,"owner":"5fe0722ec2238046e8ade172","addedTimestamp":1608547190,"updatedTimestamp":1608547190}

_id is of type ObjectId in DocumentDB.

Any help would be appreciated.

EDIT:

I checked the _id as a separate column Extract document ("_id") as a separate column checkbox and added the following transformation rule:

{
      "rule-type": "transformation",
      "rule-id": "1",
      "rule-name": "1",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%",
        "column-name": "_id"
      },
      "rule-action": "add-prefix",
      "value": "old",
      "old-value": null
    }

This caused the error to go away but the elastic search document only had the _id (renamed to old__id) from the documentDB database and no other field.

I then changed the "metadata mode" in the documentDB endpoint to "table" from "document" earlier, put 1000 as the items to scan for to get the field names (that's more than enough), then only the boolean & integer fields carried forward to elastic search, i.e. no string, array etc fields.

I am not sure on what testing AWS is doing these days, no documentation seems to throw light here, nor does any error.

EDIT#2

From this documentation, I see that string and arrays in documentDb are treated as "CLOB" in AWS DMS: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.DocumentDB.html#CHAP_Source.DocumentDB.DataTypes

And from this documentation: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Elasticsearch.html I see that "LOB" data types are not supported in Elasticsearch, this seems to be crux of the problem that I am facing. This is stupid, at the end I can't seem to migrate fields like "lastName" and "firstName" which really aren't large objects at all.

Still digging, any help is appreciated.

Solution

figured it out, however dissapointing the answer may be.

You can migrate the fields by putting this transform in the task:

{
      "rule-type": "transformation",
      "rule-id": "3",
      "rule-name": "3",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%",
        "column-name": "%",
        "data-type": "clob"
      },
      "rule-action": "change-data-type",
      "data-type": {
        "type": "string",
        "length": 50
      }
    }

This works well for the string fields, however, since the documentDb reader also treats arrays as "clob" data type as well, the array content is stringified as well. While this may work for some use cases, I have abandaned DMS because of this issue and just wrote my own migrator (which took less time, but I had hoped for the auto-managed niceness of DMS, oh, well.)

To recap, you have to use the table metatadata type in the documentDb endpoint config and also need to add a mapping for the _id field as follows:

{
      "rule-type": "transformation",
      "rule-id": "1",
      "rule-name": "1",
      "rule-target": "column",
      "object-locator": {
        "schema-name": "%",
        "table-name": "%",
        "column-name": "_id"
      },
      "rule-action": "add-prefix",
      "value": "old",
      "old-value": null
    }