Search code examples
pythonpandasmutable

List comprehension several frames


I will ask this question by using pd.DataFrames because the problem emerged from working with them. But it may be generalised to mutables in Python.

I want to create a list of DataFrames with one value differing. At the moment I do:

data = pd.DataFrame(np.full((2, 2), 0), columns=['A', 'B'])
list_of_frames = []
for i in range(3):
    tmp = data.copy()
    tmp.loc[0, 'A'] = i
    list_of_frames.append(tmp)

I really would like to write this as a list comprehension. For example like this:

[data.loc_set_copy([0, 'A'], i) for i in range(3)]

Since I am currently developing my own module with classes on top of pd.DataFrame, I thought about implementing this method in my own class. My class is composed around pd.DataFrame and does not inherit from pd.DataFrame.

It provides wrappers for a lot of DataFrame methods and especially for loc and iloc which behave in the same way as you know from pd.DataFrames.

Now I have two solutions:

Normal Method

def loc_set_copy(self, key, value):
    new = self.copy()
    new.loc[key[0], key[1]] = value
    return new

This allows:

[instance_of_my_class.loc_set_copy([0, 'A'], i) for i in range(3)]

The problem is that slices are not supported. So if I want to change a whole columns with:

[instance_of_my_class.loc_set_copy([:, 'A'], i) for i in range(3)]

I get a Syntax Error.

Crazy workaround

I define the following helper class:

class _Loc_Set_Copy():
    def __init__(self, molecule):
        self.data = data

    def __getitem__(self, key):
        new = self.data.copy()
        new.loc[key[0], key[1]] = key[2]
        return new

In my class definition I have:

 class my_class():
      def __init__(self):
          self.loc_set_copy = _Loc_Set_Copy(self)

Now I can use:

[instance_of_my_class.loc_set_copy[:, 'A', i] for i in range(3)]

I know that this is an abuse of syntax. Is there any other way to do this or should I just rely on the for loop in the beginning?


Solution

  • Sure you can pass a slice, use a slice object:

    >>> [loc_set_copy(data, [slice(None), 'A'], i) for i in range(3)]
    [   A    B
    0  0  0.0
    1  0  0.0,    A    B
    0  1  0.0
    1  1  0.0,    A    B
    0  2  0.0
    1  2  0.0]
    

    More prettily:

    >>> from pprint import pprint
    >>> pprint([loc_set_copy(data, [slice(None), 'A'], i) for i in range(3)])
    [   A    B
    0  0  0.0
    1  0  0.0,
        A    B
    0  1  0.0
    1  1  0.0,
        A    B
    0  2  0.0
    1  2  0.0]
    >>>
    

    Note:

    >>> data.loc[:, 'A']
    0    0.0
    1    0.0
    Name: A, dtype: float64
    >>> data.loc[slice(None), 'A']
    0    0.0
    1    0.0
    Name: A, dtype: float64
    

    Essentially, the slice notation is syntactic sugar for passing slice objects to __getitem__:

    >>> x = list(range(22))
    >>> x
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
    >>> x[0:10:2]
    [0, 2, 4, 6, 8]
    >>> x[slice(0,10,2)]
    [0, 2, 4, 6, 8]
    >>> x.__getitem__(slice(0,10,2))
    [0, 2, 4, 6, 8]
    >>>
    

    Note, given the above, you can simplify your method to:

    >>> def loc_set_copy(self, key, value):
    ...     new = self.copy()
    ...     new.loc[key] = value
    ...     return new
    ...
    

    If you are careful to pass tuples for the key parameter:

    >>> pprint([loc_set_copy(data, (0, 'A'), i) for i in range(3)])
    [     A    B
    0  0.0  0.0
    1  0.0  0.0,
          A    B
    0  1.0  0.0
    1  0.0  0.0,
          A    B
    0  2.0  0.0
    1  0.0  0.0]
    >>> pprint([loc_set_copy(data, (slice(None), 'A'), i) for i in range(3)])
    [   A    B
    0  0  0.0
    1  0  0.0,
        A    B
    0  1  0.0
    1  1  0.0,
        A    B
    0  2  0.0
    1  2  0.0]
    >>>
    

    The following should make perfect sense now:

    >>> class A:
    ...   def __getitem__(self, key):
    ...     print(type(key))
    ...     print(key)
    ...
    >>> a = A()
    >>> a[1]
    <class 'int'>
    1
    >>> a[[1]]
    <class 'list'>
    [1]
    >>> a[object()]
    <class 'object'>
    <object object at 0x1003932e0>
    >>>
    >>> a[:1]
    <class 'slice'>
    slice(None, 1, None)
    >>> a[:]
    <class 'slice'>
    slice(None, None, None)
    >>> a[:,:,1:4]
    <class 'tuple'>
    (slice(None, None, None), slice(None, None, None), slice(1, 4, None))
    >>> a[:,:,[1,2]]
    <class 'tuple'>
    (slice(None, None, None), slice(None, None, None), [1, 2])
    >>> a[:,object():,[1,2]]
    <class 'tuple'>
    (slice(None, None, None), slice(<object object at 0x1003932e0>, None, None), [1, 2])
    >>>