Search code examples
databaseamazon-web-servicesamazon-dynamodbnosql

NoSQL data structure design for many to many relations + "NOT in" problem


the RDBMS logic is still in my head and I try to move to nosql. I know there are already million posts on this topic but I am searching for a specific scenario where I could not find any answer yet:

Scenario: User List

User

id username
1 A
2 B
3 D
4 K
5 B
6 C
7 A

Then I have a many to many relation list Relation

idFirst idSecond
1 2
1 6
1 3
3 7
7 2
6 5
4 1

Now I want to get ~1000 entries of the userId in the user list for userId=1 where there is no entry in the relation list.

Since the user list is big >1.000.000 and there are many entries in the relation list > 5.000.000 I cannot find any solution for this scenario.

Based on the high amount of data I think do two fetches and solving is locally is also no solution.

Does anybody has an idea for an noSQL solution for this?


Solution

  • Your question inherently requires a very compute-intensive calculation.

    If you grow your data size by 10x and grow your query rate by 10x, what's going to happen? It might work, or it might utterly fail with resource exhaustion.

    DynamoDB doesn't want to have a hard cliff like this as the data and load grows. That's one reason why it doesn't let you express compute-intensive queries like this. Each database action has to be very targeted.

    The NoSQL and DynamoDB approach to a problem like this is to pre-calculate the answer (however you want), store the answer in the database, and retrieve that answer quickly when you need it.