Search code examples
pythonpandasdoctest

How does one use doctest with a pandas dataframe?


I have a simple function (log_return) in a file called test.py (see below) that I'm trying to implement with a doctest.

import pandas as pd

def log_return(df):
    '''Return the log return based on closing prices
    
    >>> df = pd.DataFrame({'Close': [100, 101, 102, 99]}, index = pd.date_range('2022-01-01', periods=4, freq='D'))
    >>> log_return(df)

            Close         r
2022-01-01    100       NaN
2022-01-02    101  0.009950
2022-01-03    102  0.009852
2022-01-04     99 -0.029853
    '''
    df['r'] = np.log(df['Close']).diff()

However I'm getting the below error related to whitespace when I try to execute the doctest from the command line (e.g. $ python test.py). How can I fix this error?

ValueError: line 5 of the docstring for __main__.log_return has inconsistent leading whitespace: '2022-01-01    100       NaN'

Solution

  • You need to indent it like this:

        '''
        ...
    
        >>> log_return(df)
                    Close         r
        2022-01-01    100       NaN
        2022-01-02    101  0.009950
        2022-01-03    102  0.009852
        2022-01-04     99 -0.029853
        '''
    

    Keep in mind doctests are meant to look like interactive snippets, so in this case, that means things should be aligned like they are in an interactive session (prompt and output).

    Now, once you fix this, the test will fail, but that's a separate issue.