Search code examples
pythonpandaspython-unittest

How to use a pandas data frame in a unit test


I am developing a set of python scripts to pre-process a dataset then produce a series of machine learning models using scikit-learn. I would like to develop a set of unittests to check the data pre-processing functions, and would like to be able to use a small test pandas dataframe for which I can determine the answers for and use it in assert statements.

I cannot seem to get it to load the dataframe and to pass it to the unit tests using self. My code looks something like this;

def setUp(self):
    TEST_INPUT_DIR = 'data/'
    test_file_name =  'testdata.csv'
    try:
        data = pd.read_csv(INPUT_DIR + test_file_name,
            sep = ',',
            header = 0)
    except IOError:
        print 'cannot open file'
    self.fixture = data

def tearDown(self):
    del self.fixture

def test1(self):    
    self.assertEqual(somefunction(self.fixture), somevalue)

if __name__ == '__main__':
    unittest.main()

Thanks for the help.


Solution

  • Pandas has some utilities for testing.

    import unittest
    import pandas as pd
    from pandas.util.testing import assert_frame_equal # <-- for testing dataframes
    
    class DFTests(unittest.TestCase):
    
        """ class for running unittests """
    
        def setUp(self):
            """ Your setUp """
            TEST_INPUT_DIR = 'data/'
            test_file_name =  'testdata.csv'
            try:
                data = pd.read_csv(INPUT_DIR + test_file_name,
                    sep = ',',
                    header = 0)
            except IOError:
                print 'cannot open file'
            self.fixture = data
    
        def test_dataFrame_constructedAsExpected(self):
            """ Test that the dataframe read in equals what you expect"""
            foo = pd.DataFrame()
            assert_frame_equal(self.fixture, foo)