Couchbase modeling techniques

I'm doing some research for my team, trying to understand couchbase. Right now, I'm looking at modeling practices in couchbase.

I found this article, written in August 2016, that talks about couchbase modeling.

It suggests that instead of having one document

key : hernandez94
{
        "username" : "hernandez94",
        "firstName" : "Jennifer",
        "middleName" : "Maria",
        "lastName" : "Hernandez",
        "addresses" : [
                 { "type" : "home", "addr1" : "1929 Crisanto Ave", "address" : "Apt 123", "addr3" : "c/o  J. Hernandez", "city" : "Mountain View", "state" : "CA", "country" : "USA", "pcode" : "94040" },
                 { "type" : "work", "addr1" : "2700 W El Camino Real", "addr2" : "Suite #123", "city" : "Mountain View", "state" : "CA", "country" : "USA", "pcode" : "94040" }
        ],
        "createdate" : “2016-08-01 15:03:40”,
        "lastlogin": "2016-08-01 17:03:40",
        "pword": "app-hashed-password",
        "loc": "IP or fqdn",
        "enabled" : true,
        "sec-questions" : [
                 { "question1" : "Security question 1 goes here", "answer" : "Answer to security question 1 goes here" },
                 { "question2" : "Security question 2 goes here", "answer" : "Answer to security question 2 goes here" },
                 { "question3" : "Security question 3 goes here", "answer" : "Answer to security question 3 goes here" }
        ],
        "doc-type" : "user"
}

You split it up into multiple documents: user-doc

key : hernandez94

{
    "firstName" : "Jennifer",
    "middleName" : "Maria",
    "lastName" : "Hernandez",
    "addresses" : [
        { "type" : "home", "addr1" : "1929 Crisanto Ave", "address" : "Apt 123", "addr3" : "c/o J. Hernandez", "city" : "Mountain View", "state" : "CA", "country" : "USA", "pcode" : "94040" },
        { "type" : "work", "addr1" : "2700 W El Camino Real", "addr2" : "Suite #123", "city" : "Mountain View", "state" : "CA", "country" : "USA", "pcode" : "94040" }
    ]
    "createdate" : "2016-08-01 15:03:40",
    "doc-type" : "user"
}

key : login-info::hernandez94

{
        "lastlogin": "2016-08-01 15:03:40",
        "pword": "app-hashed-password",
        "loc": "IP or fqdn",
        "enabled" : true,
        "doc-type" : "login-info",
        "username" : "hernandez94"
}

sec-questions doc

key : sec-questions::hernandez94

{
 "question1" : { "question" : "Security question 1 goes here", "answer" : "Answer to security question 1 goes here" },
    "question2" : { "question" : "Security question 2 goes here", "answer" : "Answer to security question 2 goes here" },
    "question3" : { "question" : "Security question 3 goes here", "answer" : "Answer to security question 3 goes here" },
 "doc-type" : "sec-questions",
 "username" : "hernandez94"
}

Since this is a newer technology, the best way to do something changes more frequently, is this strategy still viable? Or is the performance of N1QL on couchbase 5.0 much better making this modeling technique outdated? Should put all of my data (per user) in one document or split it out into 10 million x (number of subdocuments)? I'll have around 10 million users.

Thanks

Solution

Without doing measurements, or knowing your exact usage pattern, I can only give general advice.

I suggest you consider how you will be accessing this user document. Will you often be fetching just the central document, or will you typically be joining it with the subsidiary documents and fetching everything? If the former dominates, by all means split up the document into pieces and fetch only what you need. But if the latter dominates, keep all the data in a single document, avoiding the cost of multiple fetches and joins every time you need to get the data for a user.