Search code examples
pythonpandasrandomsample

Python Randomly Select Rows Until Criteria is Met


I have a dataframe that has a few ID's and then a column for money like this,

Id1     Id2     Id3     Money
1       10      13      10000
2       15      12      12500
3       20      11      60000

I need a script to randomly select rows until I hit $80M in money. I'm assuming a while loop such as...

while sum(money) < 80000000:
    df.sample()

Solution

  • To perhaps rephrase your question a bit, it seems that you're looking for a random sample of rows such that the sum of Money is < 80000000. One way to do that would be to use .sample() to do shuffling, combined with .cumsum():

    >>> reordered = df.sample(n=df.shape[0])
    >>> lim = reordered[reordered.Money.cumsum() < 80000000]
    

    This will sample without replacement.

    This is perhaps not the most memory-efficient in comparison to taking rows one-by-one, but should do the trick for something of a reasonable size.