Search code examples
pythonfor-looplarge-data

What is python behavior when loading 'too large' data, or not having enough resources (either CPU or memory)?


I am working on a program that at some point has a for loop over lists, some of them small, some of them huge (500k + elements). When the application runs for some time, it comes systematically a time when it fails in this loop, while it did not fail at previous iterations, with smaller lists.

To debug, and using pickle module, I dumped the list before entering the function. When the application fails, the list that has just been dumped is systematically a large one (500k+ elements). File that is obtained is 70MB or more (binary mode).

I know I have spare RAM (using htop, I see I use approximately 400MB over 2GB available). But CPU is frequently at a high percent of activity (more than 60%, sometime reaching 100%).

When I load the list from the file, (application is stopped) I can run the for iteration without trouble. The list does not appear corrupted.

So, I have absolutely no clue what can be wrong, except maybe CPU activity is high. If indeed Python has not enough resources to work, is there any way that it tells me so? How can I check this? What is its behavior? Is it possible the program stops without any error message?

Edit - Code sample

Please find here below the code where the app freezes. The code in itself executes normally. The context of execution seems to be what makes the code freezing.

I could pinpoint that the app freezes at the start of the for loop named 'Bermuda Triangle'. I can print the data with the pickle module, but I cannot get the print made within the for loop. What is really weird is that I get the file supposed to get the print, but this file (dump_loop.txt) is empty. I find this weird because either I should have 0 file (notice the os.remove at the beginning of the loop meaning I remove it systematically at each iteration) or if I have one, there should be something written in it.

I confirm that when the app runs (i.e. 'normally', it is not frozen), these two files are correctly updated.

Thanks for any help! Bests,

import os
import json
import pickle
from collections import defaultdict

data = [('1599324732926-0',
         {'data': '{"timestamp":1599324732.767, \
                "receipt_timestamp":1599324732.9256856,\
                "delta":true, \
                "bid":{"338.9":0.06482,"338.67":3.95535}, \
                "ask":{"339.12":2.47578,"339.13":6.43172} \
               }'
         }),
        ('1599324732926-1',
         {'data': '{"timestamp":1599324732.767, \
                "receipt_timestamp":1599324732.9256856,\
                "delta":true, \
                "bid":{"338.9":0.06482,"338.67":3.95535}, \
                "ask":{"339.12":2.47578,"339.13":6.43172} \
               }'
         })]

         
def book_flatten(book: dict, timestamp: float, receipt_timestamp: float, delta: str) -> dict:
    """
    Takes book and returns a list of dict, where each element in the list
    is a dictionary with a single row of book data.

    """
    ret = []
    for side in ('bid', 'ask'):
        for price, data in book[side].items():
            ret.append({'side': side, 'price': price, 'size': data, 'timestamp': timestamp, 'receipt_timestamp': receipt_timestamp, 'delta': delta})
    return ret
         
def read(data, dtype='l2_book', pair='AAPL-USD'):       
    key = f'{dtype}-{pair}'
    if len(data) == 0:
        return []
    print("{!s}: Read {!s} messages from Redis".format(key, len(data)))
    ret = []
    ids=dict()
    last_id=defaultdict(list)

    # 1/Start - Lines added for debug
    # Retrieve the last data before the app freezes
    dump_list='./dump_list.data'
    try:
        os.remove(dump_list)
    except OSError:
        pass
    with open(dump_list, 'wb') as filehandle:
        # store the data as binary data stream
        pickle.dump(data, filehandle)
    dump_loop = './dump_loop.txt'
    total_number = len(data)
    counter=1
    # 1/End

    # The mysterious loop aka Bermuda Triangle
    for update_id, update in data:
        #2/Start - Lines added for debug
        try:
            os.remove(dump_loop)
        except OSError:
            pass
        with open(dump_loop, 'w') as filehandle:
            filehandle.write('Starting new loop for item {!s} over {!s}.\n'.format(counter, total_number))
        #2/End
        if dtype in {'l2_book'}:
            update = json.loads(update['data'])
            update = book_flatten(update, update['timestamp'], update['receipt_timestamp'], update['delta'])
            for u in update:
                for k in ('size', 'amount', 'price', 'timestamp', 'receipt_timestamp'):
                    if k in u:
                        u[k] = float(u[k])
            ret.extend(update)
        elif dtype in {'trades'}:
            for k in ('size', 'amount', 'price', 'timestamp', 'receipt_timestamp', 'bid', 'ask'):
                if k in update:
                    update[k] = float(update[k])
            ret.append(update)
        ids[key] = update_id
        #3/Start - Lines added for debug
        with open(dump_loop, 'a') as filehandle:
            filehandle.write('Loop finished.\n')  
        counter+=1
        #3/End

    last_id[key] = ids[key][-1]
    return ids, last_id, ret

Solution

  • Ok, So I was suggested to have a look at syslog (being quite a newbie, I did not know about this) and 'voilà'! After a new test, the app my_app crashed at 17:45. Here is what syslog says at 17:45:

    Sep  6 17:45:12 cs1 kernel: [67093.681124] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/session-32.scope,task=cryp>
    Sep  6 17:45:12 cs1 kernel: [67093.681180] Out of memory: Killed process 15754 (my_app) total-vm:1752588kB, anon-rss:1332040kB, file-rss:32kB, shmem-rss:0kB, UID:0 pgtables:3144kB oom_sco>
    Sep  6 17:45:12 cs1 kernel: [67093.764851] oom_reaper: reaped process 15754 (my_app), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
    

    So no wonder python is not telling me anything, its process is killed...