Search code examples
python-hypothesis

Hypothesis and empty-ish DataFrames


I'm using Hypothesis to test dataframes, and when they're "empty-ish" I'm getting some unexpected behavior.

In the example below, I have a dataframe of all nans, and it's getting viewed as a NoneType object rather than a dataframe (and thus it has no attribute notnull()):

Falsifying example: test_merge_csvs_properties(input_df_dict=    {'googletrend.csv':    file  week  trend
 0   NaN   NaN    NaN                                                        
 1   NaN   NaN    NaN                                                        
 2   NaN   NaN    NaN
 3   NaN   NaN    NaN                                 
 4   NaN   NaN    NaN                                                                                                                                                                                  5   NaN   NaN    NaN}
<snip>
Traceback (most recent call last):
  File "/home/chachi/Capstone-SalesForecasting/tests/test_make_dataset_with_composite.py", line 285, in test_merge_csvs_properties                                                                   
    input_dataframe, df_dict = make_dataset.merge_csvs(input_df_dict)
  File "/home/chachi/Capstone-SalesForecasting/tests/../src/data/make_dataset.py", line 238, in merge_csvs                                                                                           
    if dfs_dict['googletrend.csv'].notnull().any().any():
AttributeError: 'NoneType' object has no attribute 'notnull'

Compare to ipython session, where a dataframe of all nans is still a dataframe:

>>> import pandas as pd
>>> import numpy as np
>>> tester = pd.DataFrame({'test': [np.NaN]})
>>> tester
   test
0   NaN
>>> tester.notnull().any().any()
False

I'm explicitly testing for notnull() to allow for all sorts of pathological examples. Any suggestions?


Solution

  • It looks like you've somehow ended up with None instead of a dataframe as that value in the input_dfs_dict. Can you post the full test you're using, or at least the function definition and strategy? The traceback alone doesn't really have enough information to tell what's happening. Quick things to check:

    • Can the strategy generate None instead of a dataframe here? If so, there's no mystery because Hypothesis is reporting that it can trigger an AttributeError.
    • If not, can you write a simpler test with only the logic and strategy for this dataframe?