Search code examples
elasticsearchamazon-dynamodbamazon-dynamodb-streams

Datatype creation in AWS DynamoDB and elastic search for List of URL's


I have enabled Aws DynamoDB streams and created a lambda function to index the data into Elasticsearch.

In my DynamoDb table there is a column named URL in this i am going to store the list of URL's for a single row.

URL is most preferably like object URL of AWS S3 objects

After streaming i am indexing the data into elastic search here my question is what is the datatype should i prefer to store multiple URL in both DynamoDB (single row) and Elasticsearch (Single document)

Could some one help me to achieve this in most efficient way? Thanks in advance

Json structure

 {
      "id":"234561",
      "policyholdername":"xxxxxx",
      "age":"24",
      "claimnumber":"234561",
      "policynumber":"456784",
      "url":"https://dgs-dms.s3.amazonaws.com/G-3114_Textract.pdf",
      "claimtype":"Accident",
      "modified_date":"2020-02-05T17:36:49.053Z",
      "dob":"2020-02-05T17:36:49.053Z",
      "client_address":"no,7 royal avenue thirumullaivoyal chennai"
    }

In future for a single claim number there should be multiple URL's So, how to handle this?


Solution

  • Not sure about Dynamo DB types. But in Elasticsearch there is no dedicated type for list. To store list of strings(URLs in your case) you can use keyword field type.

    For example your data can be like

     {
          "id":"234561",
          "policyholdername":"xxxxxx",
          "age":"24",
          "claimnumber":"234561",
          "policynumber":"456784",
          "url":["https://dgs-dms.s3.amazonaws.com/G-3114_Textract.pdf","https://foo/bar/foo.pdf"]
          "claimtype":"Accident",
          "modified_date":"2020-02-05T17:36:49.053Z",
          "dob":"2020-02-05T17:36:49.053Z",
          "client_address":"no,7 royal avenue thirumullaivoyal chennai"
        }
    

    and the equivalent elasticsearch mapping could be

    {
      "mappings": {
        "_doc": {
          "properties": {
            "url": {
              "type": "keyword"
            }
          }
        }
      }
    }
    

    and the search query can be

    POST index/_search
    {
        "query": {
            "term": {
                "url": "https://foo/bar/foo.pdf"
            }
        }
    }