Search code examples
databaseamazon-dynamodbleveldbrocksdbtikv

Key Value Database Modeling for searchability


Let's say I am building a marketplace like eBay (or something) for example,

With a data that looks like this (pseudo-code):

public class Item {
    Double price;
    String geoHash;
    Long startAvailabilty; // timestamp
    Long endAvailabilty; // timestamp
    Set<Keywords> keywords;
    String category;
    String dateCreated; // iso date
    String dateUpdated; // iso date
    Integer likes;
    Boolean isActive;
}

Suppose I want to build a "query" that will filter items give the following:

Items are stored with field data (title, price, timestamp range), and also some texts (description). And then I need to filter based on the following:

  • Price range (e.g. 100-200)
  • Location (e.g. starts with a GeoHash prefix)
  • Between given millisecond timestamp (each record have a start and end date for example) -- e.g. valid period for the item
  • Has a given keyword (each record having an array of keyword preprocessed before storing)
  • Has a given category
  • Has a date created and date updated (which is common)
  • Has a given keyword text (this one I think is not really possible, as this is a full-text search)

And I want to order the result based on the following:

  • Number of likes first (each record are stored with a number of likes)
  • Latest or recently created first
  • Is still active (each record has a boolean value if it's active or not)

How should this be modeled/stored in a Key-Value database as such that it can be retrieved using the given query above? That is without using any schema (schemaless)


Solution

  • When you have access patterns that require fetching the same information in many different ways (e.g. fetch by tag, fetch by date, fetch by category, etc.), you will be fighting an uphill battle with DynamoDB. With clever data modeling, you can get quite far with the access patterns you can support. However, as a key/value store, search simply isn't the sweet spot for DynamoDB.

    A common approach to your problem is to use a tool that specializes in search, like ElasticSearch. You can still store your data in DynamoDB, but use ElasticSearch to support your search needs. AWS even has an article on this topic, which describes how you can use DynamoDB streams to keep an ElasticSearch index up-to-date.

    While it may be possible to support this list of access patterns in DynamoDB alone, it's going to be painful (and likely expensive). I'd suggest finding a solution that is purposely built for search.