i want to ask some info related findAndModify in MongoDB. As i know the query is "isolated by document".
This mean that if i run 2 findAndModify like this:
{a:1},{set:{status:"processing", engine:1}}
{a:1},{set:{status:"processing", engine:2}}
and this query potentially can effect 2.000 documents then because there are 2-query (2engine) then maybe that some document will have "engine:1" and someother "engine:2".
I don't think findAndModify will isolate the "first query". In order to isolate the first query i need to use $isolated.
Is everything write what i have write?
The idea is to write an proximity engine. The collection User has 1000-2000-3000 users, or millions.
1 - Order by Nearest from point "lng,lat" 2 - in NodeJS i make some computation that i CAN'T made in MongoDB 3 - Now i will group the Users in "UserGroup" and i write an Bulk Update
When i have 2000-3000 Users, then this process (from 1 to 3) take time. So i want to have Multiple Thread in parallel.
Parallel thread mean parallel query. This can be a problem since Query3 can take some users of Query1. If this happen, then at point (2) i don't have the most nearest Users but the most nearest "for this query" because maybe another query have take the rest of Users. This can create maybe that some users in New York is grouped with users of Los Angeles.
I have an collection like this:
{location:[lng,lat], name:"1",gender:"m", status:'undone'}
{location:[lng,lat], name:"2",gender:"m", status:'undone'}
{location:[lng,lat], name:"3",gender:"f", status:'undone'}
{location:[lng,lat], name:"4",gender:"f", status:'done'}
What i should be able to do, is create 'Group' of users by grouping by the most nearest. Each Group have 1male+1female. In the example above, i'm expecting to have only 1 group (user1+user3) since there are Male+Female and are so near each other (user-2 is also Male, but is far away from User-3 and also user-4 is also Female but have status 'done' so is already processed).
Now the Group are created (only 1 group) so the 2users are marked as 'done' and the other User-2 is marked as 'undone' for future operation.
I want to be able to manage 1000-2000-3000 users very fast.
UPDATE 3 : from community Okay now. Can I please try to summarise your case. Given your data, you want to "pair" male and female entries together based on their proximity to each other. Presumably you don't want to do every possible match but just set up a list of general "recommendations", and let's say 10 for each user by the nearest location. Now I'd have to be stupid to not see the full direction of where this is going, but does this sum up the basic initial problem statement. Process each user, find their "pairs", mark them as "done" once paired and exclude them from other pairings by combination where complete?
This is a non-trivial problem and can not be solved easily.
First of all, an iterative approach (which admittedly was my first one) may lead to wrong results.
Given we have the following documents
{
_id: "A",
gender: "m",
location: { longitude: 0, latitude: 1 }
}
{
_id: "B",
gender: "f",
location: { longitude: 0, latitude: 3 }
}
{
_id: "C",
gender: "m",
location: { longitude: 0, latitude: 4 }
}
{
_id: "D",
gender: "f",
location: { longitude: 0, latitude: 9 }
}
With an iterative approach, we now would start with "A" and calculate the closest female, which, of course would be "B" with a distance of 2. However, in fact, the closest distance between a male and a female would be 1 (distance from "B" to "C"). But even when we found this, that would leave the other match, "A" and "D", at a distance of 8, where, with our previous solution, "A" would have had a distance of only 2 to "B".
So we need to decide what way to go
var users = db.collection.find(yourQueryToFindThe1000users);
// We can safely use an unordered op here,
// which has greater performance.
// Since we use the "done" array do keep track of
// the processed members, there is no drawback.
var pairs = db.pairs.initializeUnorderedBulkOp();
var done = new Array();
users.forEach(
function(currentUser){
if( done.indexOf(currentUser._id) == -1 ) { return; }
var genderToLookFor = ( currentUser.gender === "m" ) ? "f" : "m";
// using the $near operator,
// the returned documents automatically are sorted from nearest
// to farest, and since findAndModify returns only one document
// we get the closest matching partner.
var nearPartner = db.collection.findAndModify(
query: {
status: "undone",
gender: genderToLookFor,
$near: {
$geometry: {
type: "Point" ,
coordinates: currentUser.location
}
}
},
update: { $set: { "status":"done" } },
fields: { _id: 1}
);
// Obviously, the current use already is processed.
// However, we store it for simplifying the process of
// setting the processed users to done.
done.push(currentUser._id, nearPartner._id);
// We have a pair, so we store it in a bulk operation
pairs.insert({
_id:{
a: currentUser._id,
b: nearPartner._id
}
});
}
)
// Write the found pairs
pairs.execute();
// Mark all that are unmarked by now as done
db.collection.update(
{
_id: { $in: done },
status: "undone"
},
{
$set: { status: "done" }
},
{ multi: true }
)
This would be the ideal solution, but it is extremely complex to solve. We need to all members of one gender, calculate all distances to all members of the other gender and iterate over all possible sets of matches. In our example it is quite simple, since there are only 4 combinations for any given gender. Thinking of it twice, this might be at least a variant of the traveling salesman problem (MTSP?). If I am right with that, the number of combinations should be
for all n>2, where n is the number of possible pairs.
and hence
for n=10
and an astonishing
for n=25
That's 7.755 quadrillion (long scale) or 7.755 septillion (short scale). While there are approaches to solving this kind of problem, the world record is somewhere in the range of 25,000 nodes using massive amounts of hardware and quite tricky algorithms. I think for all practical purposes, this "solution" can be ruled out.
In order to prevent the problem that people might be matched with unacceptable distances between them and depending on your use case, you might want to match people depending on their distance to a common landmark (where they are going to meet, for example the next bigger city).
For our example assume we have cities at [0,2] and [0,7]. The distance (5) between the cities hence has to be our acceptable range for matches. So we do a query for each city
db.collection.find({
$near: {
$geometry: {
type: "Point" ,
coordinates: [ 2 , 0 ]
},
$maxDistance: 5
}, status: "done"
})
and iterate over the results naively. Since "A" and "B" would be the first in the result set, they would be matched and done. Bad luck for "C" here, as no girl is left for him. But when we do the same query for the second city he gets his second chance. Ok, his travel gets a bit longer, but hey, he got a date with "D"!
To find the respective distances, take a fixed set of cities (towns, metropolitan areas, whatever your scale is), order them by location and set each cities radius to the bigger of the two distances to their immediate neighbors. This way, you get overlapping areas. So even when a match can not be found in one place, it may be found on others.
Iirc, Google Maps allows it to grab the cities of a nation based on their size. An easier way would be to let people choose their respective city.