I will ask this question by using pd.DataFrames because the problem emerged from working with them. But it may be generalised to mutables in Python.
I want to create a list of DataFrames with one value differing. At the moment I do:
data = pd.DataFrame(np.full((2, 2), 0), columns=['A', 'B'])
list_of_frames = []
for i in range(3):
tmp = data.copy()
tmp.loc[0, 'A'] = i
list_of_frames.append(tmp)
I really would like to write this as a list comprehension. For example like this:
[data.loc_set_copy([0, 'A'], i) for i in range(3)]
Since I am currently developing my own module with classes on top of pd.DataFrame, I thought about implementing this method in my own class. My class is composed around pd.DataFrame and does not inherit from pd.DataFrame.
It provides wrappers for a lot of DataFrame methods and especially for loc
and iloc
which behave in the same way as you know from pd.DataFrames
.
Now I have two solutions:
def loc_set_copy(self, key, value):
new = self.copy()
new.loc[key[0], key[1]] = value
return new
This allows:
[instance_of_my_class.loc_set_copy([0, 'A'], i) for i in range(3)]
The problem is that slices are not supported. So if I want to change a whole columns with:
[instance_of_my_class.loc_set_copy([:, 'A'], i) for i in range(3)]
I get a Syntax Error.
I define the following helper class:
class _Loc_Set_Copy():
def __init__(self, molecule):
self.data = data
def __getitem__(self, key):
new = self.data.copy()
new.loc[key[0], key[1]] = key[2]
return new
In my class definition I have:
class my_class():
def __init__(self):
self.loc_set_copy = _Loc_Set_Copy(self)
Now I can use:
[instance_of_my_class.loc_set_copy[:, 'A', i] for i in range(3)]
I know that this is an abuse of syntax. Is there any other way to do this or should I just rely on the for loop in the beginning?
Sure you can pass a slice
, use a slice
object:
>>> [loc_set_copy(data, [slice(None), 'A'], i) for i in range(3)]
[ A B
0 0 0.0
1 0 0.0, A B
0 1 0.0
1 1 0.0, A B
0 2 0.0
1 2 0.0]
More prettily:
>>> from pprint import pprint
>>> pprint([loc_set_copy(data, [slice(None), 'A'], i) for i in range(3)])
[ A B
0 0 0.0
1 0 0.0,
A B
0 1 0.0
1 1 0.0,
A B
0 2 0.0
1 2 0.0]
>>>
Note:
>>> data.loc[:, 'A']
0 0.0
1 0.0
Name: A, dtype: float64
>>> data.loc[slice(None), 'A']
0 0.0
1 0.0
Name: A, dtype: float64
Essentially, the slice notation is syntactic sugar for passing slice
objects to __getitem__
:
>>> x = list(range(22))
>>> x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
>>> x[0:10:2]
[0, 2, 4, 6, 8]
>>> x[slice(0,10,2)]
[0, 2, 4, 6, 8]
>>> x.__getitem__(slice(0,10,2))
[0, 2, 4, 6, 8]
>>>
Note, given the above, you can simplify your method to:
>>> def loc_set_copy(self, key, value):
... new = self.copy()
... new.loc[key] = value
... return new
...
If you are careful to pass tuple
s for the key
parameter:
>>> pprint([loc_set_copy(data, (0, 'A'), i) for i in range(3)])
[ A B
0 0.0 0.0
1 0.0 0.0,
A B
0 1.0 0.0
1 0.0 0.0,
A B
0 2.0 0.0
1 0.0 0.0]
>>> pprint([loc_set_copy(data, (slice(None), 'A'), i) for i in range(3)])
[ A B
0 0 0.0
1 0 0.0,
A B
0 1 0.0
1 1 0.0,
A B
0 2 0.0
1 2 0.0]
>>>
The following should make perfect sense now:
>>> class A:
... def __getitem__(self, key):
... print(type(key))
... print(key)
...
>>> a = A()
>>> a[1]
<class 'int'>
1
>>> a[[1]]
<class 'list'>
[1]
>>> a[object()]
<class 'object'>
<object object at 0x1003932e0>
>>>
>>> a[:1]
<class 'slice'>
slice(None, 1, None)
>>> a[:]
<class 'slice'>
slice(None, None, None)
>>> a[:,:,1:4]
<class 'tuple'>
(slice(None, None, None), slice(None, None, None), slice(1, 4, None))
>>> a[:,:,[1,2]]
<class 'tuple'>
(slice(None, None, None), slice(None, None, None), [1, 2])
>>> a[:,object():,[1,2]]
<class 'tuple'>
(slice(None, None, None), slice(<object object at 0x1003932e0>, None, None), [1, 2])
>>>