Search code examples
pythonpython-itertoolspoker

working with HUGE lists in python


how can I manage a huge list of 100+ million strings? How can i begin to work with such a huge list?

example large list:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"

Solution

  • With lots and lots of memory. Python lists and strings are actually reasonably efficient, so provided you've got the memory, it shouldn't be an issue.

    That said, if what you're storing are specifically poker hands, you can definitely come up with more compact representations. For example, you can use one byte to encode each card, which means you only need one 64 bit int to store an entire hand. You could then store these in a NumPy array, which would be significantly more efficient than a Python list.

    For example:

    >>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
    >>> import numpy as np
    >>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
    >>> for num, hand in enumerate(itertools.combinations(cards, 7)):
    ...     hands[num] = [cards_to_bytes[card] for card in hand]
    

    And to speed up that last line a bit: hands[num] = map(cards_to_bytes.__getitem__, hand)

    This will only require 7 * 133784560 = ~1gb of memory… And that could be cut down if you pack four cards into each byte (I don't know the syntax for doing that off the top of my head…)