Search code examples
pythonpandaspython-hypothesis

Pandas Index example with repeated entries using hypothesis


I want to generate a pandas.Index with repeated entries, like this.

>>> pd.Index(np.random.choice(range(5), 10))
Int64Index([3, 0, 4, 1, 1, 3, 4, 3, 2, 0], dtype='int64')

So I wrote the following strategy:

from hypothesis.extra.pandas import indexes
from hypothesis.strategies import sampled_from

st_idx = indexes(
    elements=sampled_from(range(5)),
    min_size=10,
    max_size=10
)

However when I try to draw from a strategy like this, I get the following error:

>>> st_idx.example()
[...]
Unsatisfiable: Unable to satisfy assumptions of condition.

During handling of the above exception, another exception occurred:
[...]
NoExamples: Could not find any valid examples in 100 tries

On some experimentation, I realised it only works if min_size is less than equal to the number of choices (<= 5 in this case). However that means I'll never get repeated examples!

What am I doing wrong?

EDIT: Apparently only the indexes strategy has unique set to True by default, setting it to False as mentioned in the answer below also works with my approach.


Solution

  • If the resulting index does not have to have any particular distribution then one way to get what you need is to use integers strategy and use unique parameter of indexes strategy to produce duplicates if needed:

    import hypothesis.strategies as st
    
    st_idx = indexes(
        st.integers(min_value=0, max_value=5), 
        min_size=10, max_size=10, 
        unique=False
    )
    
    st_idx.example()
    

    Producing:

    Int64Index([4, 1, 3, 4, 2, 5, 0, 5, 0, 0], dtype='int64')