Search code examples
pythonmongodbindexingpymongocherrypy

What is the best way to query a mongodb collection using Python 3


First of, let me explain the project issue here:

I'm developing a web application, using the libraries CherryPy, PyMongo and the database backend is a MongoDB database, i am using Python 3 as development language.

My database collection contains 260.640 documents, which is simplified in the format here:

{"_id":1,"time":"2014-01-01 00:00:00","value":"1.37468"}

All documents in the collection have an id (going from 0 to 260640) and a time which is each increasing by one minute (so i have a total of 6 months of data).

I am running the Mongod.exe server in a windows console and my python web server in another windows console, and uses Google Chrome to view the webpages.

My Goal:

I want to query the database collection, so i can get an HTML table with rows between two dates, for example: 2014-01-01 00:00:00 and 2014-01-10 00:00:00, and that table should then be viewed on the webpage which CherryPy is producing.

My Problem:

Using the code which is provided here in the question, i can query the database and show the table on a webpage, but, it takes about 30-50 seconds to show about 7200 rows, which is only about 5 days of data, when i need to show 10 days of data or even a month of data, we talk about longer waiting time, the problem is first that the user has to wait, but also that the browser could time out if the user select a longer timespan, which kills the application.

My Slow Code:

Here is the code that, currently works, but only as a "standard car", i need a "super car".

def onem(self):
    # Get the MongoClient from the PyMongo lib.
    client = MongoClient()
    # our database name is raw_data
    db = client.raw_data
    # Get the starting date from the database (returns a valid date string).
    date_from = self.getFromDateToDatePickerFromDB();
    # Get the end date from the database (returns a valid date string).
    date_to = self.getToDateToDatePicker();
    # Query the database for the collection of documents.
    collection = db.data.find({'time' : {'$gte' : date_from, '$lt' : date_to}})
    # Define a variable to hold the temp rows.
    html = ""
    # for each document in our collection.
    for document in collection:
        # build the table.
        html = html + '''
                        <tr>
                            <td>''' + str(int(document['_id'])) + '''</td>
                            <td>''' + str(document['time']) + '''</td>
                            <td>''' + str(document['value']) + '''</td>
                        </tr>
                    '''
    table = '''
            <table border="1" celspacing="0" cellpadding="0" width="100%">
                <thead>
                    <tr>
                        <td>ID</td>
                        <td>TIME</td>
                        <td>VALUE</td>
                    </tr>
                </thead>
                <tbody>
                    ''' + html + '''
                </tbody>
            </table>
            '''
    # return the valid html with the table to a function which
    # outputs an html template to the browser.
    return self.GetStaticHTML(table)
# Tell CherryPy that we are done working on the webpage and it lets us show it to the browser.
onem.exposed = True

If any of you knows a better way to query the mongodb database than the provided code:

collection = db.data.find({'time' : {'$gte' : date_from, '$lt' : date_to}})

Or if you know a way to speed up the database, code or anything, then i really want to hear it.

Thank You,


Solution

  • There are potentially two weaknesses which makes your code slow and not scalable:

    1. Do you have an index on your time attribute in the mongo collection? If not, create that index, this is a one time operation.
    2. You cannot return all items matching the search regardless of the number of items you need to return. You have to use pagination, i.e. only return a fixed number of items, e.g. 200 and provide links to the previous and next 200 items.