I'm very new to the Dynamo Db concept so forgive me if my question is a bit stupid I have a file how looks like that
Appel,www.appel.com,www.cnn.com,www.bla.com....
Blabla,www.test.com,www.fox.com,www.bla.com.....
test,www.test.com,www.fox.com,www.bla.com...
www.appel.com,300
www.cnn.com,400
and so on. In short each line is 1: a word and all the URL's she in them 2: a URL and the number of appearance
What is need to do is to to make a query for the dynamo given the word the output need to be the list of the URL's sorted by the appearance.
for exapmple to this file for the word appel the output is:
www.cnn.com,www.appel.com,www.bla.com....
I have tried to create 2 tables `Invert-index' and 'rank' the first for the word and the list of URL's and the second for the URL and his rank, but i cant find a way to make the query without sorting my self
so first: is the Dynamo structure (the two tables) is correct? is there a way to query the db and sort the results?
In order to rely on DynamoDB to sort your data you have to use a Range Key
. That being, in order to meet your requirements, the number of appearance
has to be part of the Range Key
.
The Hash Key
could then be the word (e.g. Appel or Blabla), and lastly you can store the urls as an string array in each record.
From the documentation:
Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order; otherwise, the results are returned in order of ASCII character code values. By default, the sort order is ascending. To reverse the order use the
ScanIndexForward
parameter set to false. Source: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
You can find more information about the available key types on DynamoDB on the links below:
When to use what primary key type
What is the use of a hash range in a dynamodb table
Q: If I use the number of appearance as range key how can I store the the String array? each value there has a diffrent number so if each record has a primary key (word) range key(number) and value (string array) what is the number in this case?
In that case I would recommend you to compose the Range Key
with two fields (number and url) using a separator character (e.g. '#'). Your final table structure would be:
Hash Key : <Word>
Range Key : <AppearanceNumber>#<Url>
Your Range Key
would be of the String
type which would still work to sort your data as the <AppearanceNumber>
is the prefix.
As an example by querying by the <Word>
'Appel' you would get the following results:
Appel,900#www.appel.com
Appel,800#www.cnn.com
Appel,700#www.bla.com
Notice that you can still have the url
and the appearanceNumber
as separate fields in your table in case you want to minimize processing on your application side.