I imported 5 huge json files data from Yelp's dataset challenge into my mongodb on Ubuntu. The 5 files contain many records. I want to search for things like in MySQL or other structured query language, so I can look for keyword "UFC" under "text" and "Alcohol: full_bar" under attributes and return a count of them, at bare minimum. I also want to see if bars that mentioned UFC and MMA get more reviews and checkins and tips than other bars, that do not mention those things. I feel like this would need to incorporate the business_id variable. Also compounding the problem is that "tips.json" also uses the variable name "text" like reviews.json".
I've built this index successfully in my mongodb database:
> db.collection.createIndex({"text":"text", "attributes": "text"})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
My command to search keywords UFC works:
> db.collection.find({"$text": {"$search": "UFC"}})
{ "_id" : ObjectId("58fd4601051d56ff58e471f2"), "review_id" : "ogdaaLlAhmcyW1ZpGsiEGA", "user_id" : "rNbOmPzfWD1D4V8WOo7lBQ", "business_id" : "AVqjAx6j4HAvUb8t3_lv8Q", "stars" : 4, "date" : "2015-03-29", "text" : "We came here to watch the UFC. We had fries and wings, and they did not disappoint.\nWe opted to sit in the upstairs area where it was less crowded, and less noisy.\nThe waitress was a total dummy, but her niceness kind of made up for it....\nIf she had an attitude, she would have received zero tip.", "useful" : 0, "funny" : 0, "cool" : 0, "type" : "review" }
......
But when I tried to find alchhol: full_bar under attributes, I get this following error:
> db.collection.find({"$attributes": {"$search": "Alcohol: full_bar"}})
error: {
"$err" : "Can't canonicalize query: BadValue unknown top level operator: $attributes",
"code" : 17287
}
>
Your query syntax is wrong; you aren't specifying a field name to search in, you're specifying (with the special term $text) that the search should be through the text index - which means it will search both your "text" and "attributes" fields.
So when you run this query:
db.collection.find({"$text": {"$search": "UFC"}})
That is not limited to just the "text" field; it is searching through the entire text index, which covers both the "text" and "attributes" fields.
So if you want to search for some text in the "attributes" field, you construct the query the same way:
db.collection.find({"$text": {"$search": "Alcohol: full_bar"}})