Search code examples
pythonpandasdeep-copyshallow-copy

why should I make a *shallow* copy of a dataframe?


related to why should I make a copy of a data frame in pandas

I noticed that in the popular backtesting library,

def __init__(self, data: pd.DataFrame)
    data = data.copy(False)

in row 631. What's the purpose of such a copy?


Solution

  • A shallow copy allows you

    1. have access to frames data without copying it (memory optimization, etc.)
    2. modify frames structure without reflecting it to the original dataframe

    In backtesting the developer tries to change the index to datetime format (line 640) and adds a new column 'Volume' with np.nan values if it's not already in dataframe. And those changes won't reflect on the original dataframe.

    Example

    >>> a = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['i', 's'])
    >>> b = a.copy(False)
    >>> a
        i  s
     0  1  a
     1  2  b
    >>> b
        i  s
     0  1  a
     1  2  b
    >>> b.index = pd.to_datetime(b.index)
    >>> b['volume'] = 0
    >>> b
                                   i  s  volume
    1970-01-01 00:00:00.000000000  1  a       0
    1970-01-01 00:00:00.000000001  2  b       0
    >>> a
        i  s
     0  1  a
     1  2  b
    

    Of course, if you won't create a shallow copy, those changes to dataframe structure will reflect in the original one.