I have a dataframe with the following structure:
date position quantity Unit
0 2015-01-01 00:00:00 1 720 MAW
Based on the conditions I set I wish to reorder dates during the daylight savings time in October (1 hour gets added here) based on the "position" column. So I subseted the dataframe and got the index values back:
indexData = data[(data.date.dt.hour >= 2) & (data.date.dt.hour < 3) & (data.date.dt.day == 25) & (data.date.dt.month == 10) & (data.date.dt.year == 2015)].index
So when I do this I can see that the range is ordered correctly (but not yet replaced in the dataframe):
data.iloc[indexData[0]:(indexData[-1])+1].sort_values("position")
date position quantity Unit
28521 2015-10-25 02:00:00 25305 420 MAW
28523 2015-10-25 02:15:00 25306 418 MAW
28524 2015-10-25 02:30:00 25307 415 MAW
28526 2015-10-25 02:45:00 25308 415 MAW
28520 2015-10-25 02:00:00 25309 410 MAW
28522 2015-10-25 02:15:00 25310 407 MAW
28525 2015-10-25 02:30:00 25311 399 MAW
28527 2015-10-25 02:45:00 25312 378 MAW
BUT when I try to assign this range to the original data drame it does not change:
data.iloc[indexData[0]:(indexData[-1])+1] = data.iloc[indexData[0]:(indexData[-1])+1].sort_values("position")
# here I check the range if updated
data[(data.date.dt.hour >= 2) & (data.date.dt.hour < 3) & (data.date.dt.day == 25) & (data.date.dt.month == 10) & (data.date.dt.year == 2015)]
date position quantity Unit
28520 2015-10-25 02:00:00 25309 410 MAW
28521 2015-10-25 02:00:00 25305 420 MAW
28522 2015-10-25 02:15:00 25310 407 MAW
28523 2015-10-25 02:15:00 25306 418 MAW
28524 2015-10-25 02:30:00 25307 415 MAW
28525 2015-10-25 02:30:00 25311 399 MAW
28526 2015-10-25 02:45:00 25308 415 MAW
28527 2015-10-25 02:45:00 25312 378 MAW
Any ideas where I am going wrong or should try?
Let's assume we have data like the following and we want to sort by position
rows only for October (basically we want to swap lines 4 and 5):
date position quantity Unit
0 2015-09-25 02:15:00 25310 407 MAW
1 2015-09-25 02:00:00 25309 410 MAW
2 2015-09-25 02:30:00 25311 399 MAW
3 2015-09-25 02:45:00 25312 378 MAW
4 2015-10-25 02:15:00 25306 418 MAW
5 2015-10-25 02:00:00 25305 420 MAW
6 2015-10-25 02:30:00 25307 415 MAW
7 2015-10-25 02:45:00 25308 415 MAW
Choosing October with some other filters you applied (basically using your logic here):
subset_index = data[(data.date.dt.hour >= 2) & (data.date.dt.hour < 3) & (data.date.dt.day == 25) & (data.date.dt.month == 10) & (data.date.dt.year == 2015)].index.values
subset_index
is array([4, 5, 6, 7])
- these are rows we want to sort.
Having extracted index values to sort, you have two options:
Option 1: keep the index and reassign values in the right order(right index).
Just assign the values(numpy array) in the correct order to a dataframe subset:
data.loc[subset_index] = data.loc[subset_index].sort_values("position").values
The result is
date position quantity Unit
0 2015-09-25 02:15:00 25310 407 MAW
1 2015-09-25 02:00:00 25309 410 MAW
2 2015-09-25 02:30:00 25311 399 MAW
3 2015-09-25 02:45:00 25312 378 MAW
4 2015-10-25 02:00:00 25305 420 MAW
5 2015-10-25 02:15:00 25306 418 MAW
6 2015-10-25 02:30:00 25307 415 MAW
7 2015-10-25 02:45:00 25308 415 MAW
As you can see only October rows have been sorted by position
, September rows 0 and 1 were left unsorted.
Option 2: sort the index in the right order and reindex the dataframe.
Create a new sorted index order for this subset
reordered_subset_index = data.loc[subset_index].sort_values("position").index.values
reordered_subset_index
is array([5, 4, 6, 7])
. Now we need to create a correct index order for the whole dataframe:
new_index = data.index.values
new_index[subset_index] = new_index[reordered_subset_index]
new_index
is array([0, 1, 2, 3, 5, 4, 6, 7])
. This index defines the correct order of rows in our initial dataframe.
So now we can reindex our dataframe and just sort it by our new index:
data.reindex(new_index).sort_index()
The result is the same as in Option 1.