Search code examples
python-3.xmongodbpymongo-3.x

PyMongo: how to query a series and find the closest match


This is a simplified example of how my data is stored in MongoDB of a single athlete:

{   "_id" : ObjectId('5bd6eab25f74b70e5abb3326'),
    "Result" : 12,
    "Race" : [0.170, 4.234, 9.170]
    "Painscore" : 68,
}

Now when this athlete has performed a race I want to search for the race that was MOST similar to the current one, and hence I want to compare both painscores.

IOT get the best 'match' I tried this:

query = [0.165, 4.031, 9.234]

closestBelow = db[athlete].find({'Race' : {"$lte": query}}, {"_id": 1, "Race": 1}).sort("Race", -1).limit(2)
for i in closestBelow:
    print(i)

closestAbove = db[athlete].find({'Race' : {"$gte": query}}, {"_id": 1, "Race": 1}).sort("Race", 1).limit(2)
for i in closestAbove:
    print(i)

This does not seem to work.

Question1: How can I give the mentioned query IOT find the race in Mongo that matches the best/closes?.. When taken in account that a race is almost never exactly the same.

Question2: How can i see a percentage of match per document so that an athlete knows how 'serious' he must interpreted the pain score?

Thank you.


Solution

  • Thanks to this website I found a solution: http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/

    Step 1: find your query;

    Step 2: make a first selection based on query and append the results into a list (for example average);

    Step 3: use a for loop to compare every item in the list with your query. Use Euclidean distance for this;

    Step 4: when you have your matching processed, define the best match into a variable.

    from pymongo import MongoClient
    client = MongoClient('mongodb://localhost:27017/')
    Database = 'Firstclass'
    
    def newSearch(Athlete):
        # STEP 1
        db = client[Database]
        lastDoc = [i for i in db[Athlete].find({},{ '_id': 1, 'Race': 1, 'Avarage': 1}).sort('_id', -1).limit(1)]
        query = { '$and': [ { 'Average' : {'$gte': lastDoc[0].get('Average')*0.9} }, { 'Average' : {'$lte': lastDoc[0].get('Average')*1.1} } ] }
        funnel = [x for x in db[Athlete].find(query, {'_id': 1, 'Race': 1}).sort('_id', -1).limit(15)]
    
        #STEP 2
        compareListID = []
        compareListRace = []
        for x in funnel:
            if lastDoc[0].get('_id') != x.get('_id'):
                compareListID.append(x.get('_id'))
                compareListRace.append(x.get('Race'))
    
        #STEP 3
        for y in compareListRace:
            ED = euclidean_distance(lastDoc[0].get('Race'),y)
            ESlist.append(ED)
    
    
        #STEP 4
        matchObjID = compareListID[numpy.argmax(ESlist)]
        matchRace = compareListRace[numpy.argmax(ESlist)]
    
    newSearch('Jim')