we are currently working on a college project. We have been tasked to optimize the maintenance schedule for repairs on bikes from a bike sharing service. The bikes can only be rented from and returned to bike docking stations. We need to calculate the idle time, which is defined as follows:
the time period from when bike with bike id = x was dropped off at station y until any other bike at station y is booked
We implemented our solution as follows:
import pandas as pd
csv_file = '../Data_Cleanup/outCSV/Clean_Metro_Set.csv'
metro = pd.read_csv(csv_file)
metro['start_time'] = pd.to_datetime(metro['start_time'])
metro ['end_time'] = pd.to_datetime(metro['end_time'])
metro = metro.sort_values(by='start_time')
metro['idle_time'] = None
BigDict = {
# station_id: {
# bike_id: (transaction_id ,end_time)
# }
}
for i, row in metro.iterrows():
current_start_time = row["start_time"]
current_end_time = row["end_time"]
current_end_station_id = row["end_station_id"]
current_start_station_id = row["start_station_id"]
current_bike_id = row["bike_id"]
current_index = i
if current_start_station_id in BigDict:
for bike in list(BigDict[current_start_station_id]): # Create a copy of the keys
idle_time = current_start_time - BigDict[current_start_station_id][bike][1]
metro.at[BigDict[current_start_station_id][bike][0], "idle_time"] = idle_time
if idle_time.total_seconds() >= 0:
del BigDict[current_start_station_id][bike]
if current_end_station_id not in BigDict:
BigDict[current_end_station_id] = {current_bike_id: (current_index, current_end_time)}
BigDict[current_end_station_id][current_bike_id] = (current_index, current_end_time)
metro.to_csv('../Data_Cleanup/outCSV/Metro_Set_with_IdleTime.csv')
The Input data looks like this:
Expected output:
Although some of the values don't get calculated correctly. E.g. error 1
As you can see, in the first picture there is a row with a negative idle time. Because we sorted the dataframe by end time, we sometimes run into the issue that a transaction at a later row has an earlier start time than the end time of the previous transaction(c.f error 1). In this case the idle time should be updated whenever transaction meets the following two conditions:
In the error above(c.f. screen snippets) this does not occur and we cannot figure out the reason. Any help would be appreciated
We managed to fix it ourselves with the following teaks:
BigDict = {
# station_id: {
# transaction_id: end_time
# }
}
for i, row in metro.iterrows():
current_start_time = row["start_time"]
current_end_time = row["end_time"]
current_end_station_id = row["end_station_id"]
current_start_station_id = row["start_station_id"]
current_bike_id = row["bike_id"]
current_transaction_id = i
if current_start_station_id in BigDict:
for transaction in list(BigDict[current_start_station_id]): # Create a copy of the keys
if current_start_time < BigDict[current_start_station_id][transaction]:
continue
if metro.at[transaction, "idle_time"] is not None:
continue
idle_time = current_start_time - BigDict[current_start_station_id][transaction]
metro.at[transaction, "idle_time"] = idle_time
#if idle_time.total_seconds() >= 0:
del BigDict[current_start_station_id][transaction]
if current_end_station_id not in BigDict:
BigDict[current_end_station_id] = {current_transaction_id: current_end_time}
BigDict[current_end_station_id][current_transaction_id] = current_end_time