Search code examples
pythonsqlitecursorfetchall

Pythons fetchall eats all my swap memory


I'm executing a query on a table where I get a column with some hundred million rows back, this is because I want to plot them in a histogram. The problem is that this eats up nearly all my memory (7.8 gb) and all my swap memory (8gb) before the program exits with exit code -9 before cur.fetchall() is done.

How can I prevent this from happening? Should I sort my column first, then do several queries on chuncks of it - or is there maybe a better way to fetch the data in my query? The cur.execute itself takes almost no time.

#!/usr/bin/python

import sqlite3 as lite
import numpy as np
import sys
import os
import matplotlib.pyplot as plt


def getQuantity(databasepath):
    con = lite.connect(databasepath)
    binwidth = 1
    start = time.time()
    with con:
        cur = con.cursor()
        cur.execute('SELECT latitude FROM MessageType1')
        con.commit()
        latitudes = cur.fetchall() #Breakdown here
        latitudes = [x[0] for x in latitudes]
        plt.hist(latitudes, bins=range(int(min(latitudes)), int(max(latitudes)) + binwidth, binwidth))
        plt.title("Bucket size: " + str(binwidth))
        plt.ylabel("Number of message")
        plt.savefig('latlongstats'+'t'+str(time.strftime("%H:%M:%S")), format='png')

if __name__ == "__main__":

    getQuantity('database/database.db')

Solution

  • I found that if I replaced the following:

        latitudes = cur.fetchall()
        print "fetched"
        latitudes = [x[0] for x in latitudes]
    

    with:

        while True:
            tmp = cur.fetchone()
            if tmp != None:
                latitudes.append(tmp[0])
            else:
                break
    

    I got the same result, allthough it takes forever, and only nearly eats up my ram (but not my swap).