python pandas dataframe algorithm data-science

Why does the algorithm sometimes not behave as intended?

we are currently working on a college project. We have been tasked to optimize the maintenance schedule for repairs on bikes from a bike sharing service. The bikes can only be rented from and returned to bike docking stations. We need to calculate the idle time, which is defined as follows:

the time period from when bike with bike id = x was dropped off at station y until any other bike at station y is booked

We implemented our solution as follows:

import pandas as pd
csv_file = '../Data_Cleanup/outCSV/Clean_Metro_Set.csv'
metro = pd.read_csv(csv_file)
metro['start_time'] = pd.to_datetime(metro['start_time'])
metro ['end_time'] = pd.to_datetime(metro['end_time'])
metro = metro.sort_values(by='start_time')
metro['idle_time'] = None

BigDict = {
    # station_id: {
    #     bike_id: (transaction_id ,end_time)
    # }
}

for i, row in metro.iterrows():
    current_start_time = row["start_time"]
    current_end_time = row["end_time"]
    current_end_station_id = row["end_station_id"]
    current_start_station_id = row["start_station_id"]
    current_bike_id = row["bike_id"]
    current_index = i

    if current_start_station_id in BigDict:
        for bike in list(BigDict[current_start_station_id]):  # Create a copy of the keys
            idle_time = current_start_time - BigDict[current_start_station_id][bike][1]
            metro.at[BigDict[current_start_station_id][bike][0], "idle_time"] = idle_time
            if idle_time.total_seconds() >= 0:
                del BigDict[current_start_station_id][bike]

    if current_end_station_id not in BigDict:
        BigDict[current_end_station_id] = {current_bike_id: (current_index, current_end_time)}

    BigDict[current_end_station_id][current_bike_id] = (current_index, current_end_time)

metro.to_csv('../Data_Cleanup/outCSV/Metro_Set_with_IdleTime.csv')

The Input data looks like this:

input data

Expected output:

expected output

Although some of the values don't get calculated correctly. E.g. error 1

error 2

As you can see, in the first picture there is a row with a negative idle time. Because we sorted the dataframe by end time, we sometimes run into the issue that a transaction at a later row has an earlier start time than the end time of the previous transaction(c.f error 1). In this case the idle time should be updated whenever transaction meets the following two conditions:

the end_station_id of the transaction, for which the idle time is being calculated, is the same as the start station id of the transaction, over which the for loop is currently iterating.
the transaction, over which the for loop is currently iterating, has a later start time than the end_time of the transaction, for which we calulate the idle time

In the error above(c.f. screen snippets) this does not occur and we cannot figure out the reason. Any help would be appreciated

Solution

We managed to fix it ourselves with the following teaks:

BigDict = {
# station_id: {
#     transaction_id: end_time
# }
}
 
for i, row in metro.iterrows():
    current_start_time = row["start_time"]
    current_end_time = row["end_time"]
    current_end_station_id = row["end_station_id"]
    current_start_station_id = row["start_station_id"]
    current_bike_id = row["bike_id"]
    current_transaction_id = i

    if current_start_station_id in BigDict:
        for transaction in list(BigDict[current_start_station_id]):  # Create a copy of the keys
            if current_start_time < BigDict[current_start_station_id][transaction]:
                continue
            if metro.at[transaction, "idle_time"] is not None:
                continue
            idle_time = current_start_time - BigDict[current_start_station_id][transaction]
            metro.at[transaction, "idle_time"] = idle_time
            #if idle_time.total_seconds() >= 0:
            del BigDict[current_start_station_id][transaction]

    if current_end_station_id not in BigDict:
        BigDict[current_end_station_id] = {current_transaction_id: current_end_time}

    BigDict[current_end_station_id][current_transaction_id] = current_end_time