Currently I have a hundreds of thousands of files like so:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root"
}
Currently, I index all the names, and a client can search for files based on the name of the file. These files also have tags
that need to be associated with the file but they also have specific labels.
An example would be:
{
"tags": [
{ "client": "john doe" },
{ "office": "virginia" },
{ "ssn": "1234" }
]
}
Is adding the tags
array to my above file object the ideal solution if I want to be able to search thousands of files with a client of John Doe?
The only other solution I can think of is having something an object per tag and having an array of file ID's associated with each tag like so:
{
"_id": "11111111",
"type": "tag",
"label": "client",
"items": [
"1234567890",
"1222222222",
"1333333333"
]
}
With this being a LOT of objects I need to add tags to, I'd rather do it the most efficient way possible FIRST so I don't have to backtrack in the near future when I start running into issues.
Any guidance would be greatly appreciated.
Your original design, with a tags array, works well with Cloudant Search: https://console.ng.bluemix.net/docs/services/Cloudant/api/search.html#search.
With this approach you would define a single design document that will index any tag in the tags array. You do not have to create different views for different tags and you can use the Lucene syntax for queries: http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview.
So, using your example, if you have a document that looks like this with tags:
{
"_id": "1234567890",
"type": "file",
"name": "Demo File",
"file_type": "application/pdf",
"size": "1400",
"timestamp": "1491421149",
"folder_id": "root",
"tags": [
{ "client": "john doe" },
{ "office": "virginia" },
{ "ssn": "1234" }
]
}
You can create a design document that indexes each tag like so:
{
"_id": "_design/searchFiles",
"views": {},
"language": "javascript",
"indexes": {
"byTag": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.type === \"file\" && doc.tags) {\n for (var i=0; i<doc.tags.length; i++) {\n for (var name in doc.tags[i]) {\n index(name, doc.tags[i][name]);\n }\n }\n }\n}"
}
}
}
The function looks like this:
function (doc) {
if (doc.type === "file" && doc.tags) {
for (var i=0; i<doc.tags.length; i++) {
for (var name in doc.tags[i]) {
index(name, doc.tags[i][name]);
}
}
}
}
Then you would search like this:
https://your_cloudant_account.cloudant.com/your_db/_design/searchFiles/_search/byTag
?q=client:jack+OR+office:virginia
&include_docs=true