Search code examples
node.jssortingamazon-dynamodbinverted-index

DynamoDb with sort?


I'm very new to the Dynamo Db concept so forgive me if my question is a bit stupid I have a file how looks like that

Appel,www.appel.com,www.cnn.com,www.bla.com....
Blabla,www.test.com,www.fox.com,www.bla.com.....
test,www.test.com,www.fox.com,www.bla.com...
www.appel.com,300
www.cnn.com,400

and so on. In short each line is 1: a word and all the URL's she in them 2: a URL and the number of appearance

What is need to do is to to make a query for the dynamo given the word the output need to be the list of the URL's sorted by the appearance.

for exapmple to this file for the word appel the output is:

www.cnn.com,www.appel.com,www.bla.com....

I have tried to create 2 tables `Invert-index' and 'rank' the first for the word and the list of URL's and the second for the URL and his rank, but i cant find a way to make the query without sorting my self

so first: is the Dynamo structure (the two tables) is correct? is there a way to query the db and sort the results?


Solution

  • In order to rely on DynamoDB to sort your data you have to use a Range Key. That being, in order to meet your requirements, the number of appearance has to be part of the Range Key.

    The Hash Key could then be the word (e.g. Appel or Blabla), and lastly you can store the urls as an string array in each record.

    From the documentation:

    Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order; otherwise, the results are returned in order of ASCII character code values. By default, the sort order is ascending. To reverse the order use the ScanIndexForward parameter set to false. Source: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html

    You can find more information about the available key types on DynamoDB on the links below:

    When to use what primary key type

    What is the use of a hash range in a dynamodb table

    Q: If I use the number of appearance as range key how can I store the the String array? each value there has a diffrent number so if each record has a primary key (word) range key(number) and value (string array) what is the number in this case?

    In that case I would recommend you to compose the Range Key with two fields (number and url) using a separator character (e.g. '#'). Your final table structure would be:

    Hash Key : <Word>
    Range Key : <AppearanceNumber>#<Url>
    

    Your Range Key would be of the String type which would still work to sort your data as the <AppearanceNumber> is the prefix.

    As an example by querying by the <Word>'Appel' you would get the following results:

    Appel,900#www.appel.com
    Appel,800#www.cnn.com
    Appel,700#www.bla.com
    

    Notice that you can still have the url and the appearanceNumber as separate fields in your table in case you want to minimize processing on your application side.