Search code examples
pandasdataframemulti-index

Pandas df, get row number using second index


I have a multiindexed pandas df and need to find the row number of a specific index. the Index is epoch time (Timestamp column) so several rows with the same index exist, I only need to find one of them.

df looks like this:

                  Temp1  Temp2 Uptime  CPULoadAvg1  CPULoadAvg5  CPULoadAvg15 GPUV1 GPUV2 GPUV3 GPUV4                  ts
Board Timestamp                                                                                                          
src   1689884603   30.6   34.0      0         0.08         0.02          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884604   30.8   30.0      0         0.08         0.02          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884605   30.7   31.0      0         0.15         0.03          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884606   30.7   30.0      0         0.15         0.03          0.01     0     0     0   100 2023-07-20 20:22:24
      1689884607   30.5   30.0      0         0.15         0.03          0.01     0     0     0   100 2023-07-20 20:22:24
...                 ...    ...    ...          ...          ...           ...   ...   ...   ...   ...                 ...
coms  1690214970   47.1   53.0   2:59         2.12         2.06          2.06   689   641   876    64 2023-07-24 16:08:32
lr2   1690214970   34.3   49.0   2:59         0.31         0.13          0.04     0     0     0   100 2023-07-24 16:08:32
pp    1690214970   38.5   40.0    NaN          NaN          NaN           NaN     0     0     0   100 2023-07-24 16:08:32
srs   1690214970   43.0   49.0    NaN          NaN          NaN           NaN   NaN   NaN   NaN   NaN 2023-07-24 16:08:32
vel   1690214970   37.4   34.0    NaN          NaN          NaN           NaN   NaN   NaN   NaN   NaN 2023-07-24 16:08:32

EDIT: So my task is to grab the last x minutes of the syslogs to graph the temperatures and other things. I filtered the syslog and made a df of the data. Ask the user for the number of minutes, and then have this code:

how_far_back_to_go_sec = (int(how_far_back_to_go_mins) * 60)
first_row_timestamp = int(df.index.levels[2][0])
last_row_timestamp = int(df.index.levels[2][-1])

timestamp_to_go_to = last_row_timestamp - how_far_back_to_go_sec      # Get desired Timestamp to start graph at

if timestamp_to_go_to < first_row_timestamp:
    print("Error: The logs to not go that far back. (",timestamp_to_go_to," seconds)")
    exit(1)  # Exit the code with an error code of 1

# try 1
df_xs = df.xs(timestamp_to_go_to, level='Timestamp')
print(df_xs)

# try 2
df_loc = df.loc[(slice(None), timestamp_to_go_to),:]
print(df_loc)

This returns a KeyError though. I have checked the df and timestamp_to_go_to is a valid index


added print(df.to_dict()) output:

'39', 'src', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216116'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216117'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216118'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216119'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216120'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr1', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'lr2', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'pp', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'src', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'srs', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'vel', '1690216121'): Timestamp('2023-07-24 16:27:44'), ('39', 'coms', '1690216122'): Timestamp('2023-07-24 16:27:44'), ('39', 'loc', '1690216122'): Timestamp('2023-07-24 16:27:44'), 

for that run the key was '1690216119' and still gave the keyError.


graphing code:

for idn, adfs in adfs.groupby(['Board'], level=0):
    fig.add_trace(go.Scatter(x=adfs.index.get_level_values('Timestamp').values,
                             y=adfs['Temp1'],
                             mode='lines+markers',  # Display both lines and markers
                             name=adfs.index.get_level_values('Board').values[0],
                             line=dict(width=1)))

fig.update_layout(xaxis=dict(title="Time", tickformat='%H:%M:%S', type='date'),
                  yaxis=dict(title='Temperature (deg C)'),
                  title=figure_title)
# Show the plot
fig.show()

Solution

  • So I ended up doing a df.reset_index() to get rid of the multi indexing. After that I looped through each row and searched for the epoch time value I wanted (none of the build in searches seemed to work). Once found I appended the rest of the rows to a new df

    # single index way to cut down the df
    rows_to_append = []
    time_found = False
    for row in adfs.itertuples(index=False):
        if int(row[1]) == timestamp_to_go_to:                   
            time_found = True
        if time_found:
            rows_to_append.append(row)
    if not time_found:
        print("Log time not found, try going back to another point")
        exit(1) # Exit cause value not found
    
    # After the loop, reset the index of the new DataFrame
    adfs = pd.DataFrame(rows_to_append)
    adfs.reset_index(drop=True, inplace=True)
    print(adfs)
    

    Its not pretty, but it works.

    Graphed it with:

    for idn, adfs in adfs.groupby(['Board']):                                                       # single index
        fig.add_trace(go.Scatter(
                                x=adfs['Timestamp'],                                                # single index
                                 y=adfs['Temp1'],
                                 mode='lines+markers',  # Display both lines and markers
                                 name=adfs['Board'].values[0],                                      # single index
                                 line=dict(width=1)))
    
    fig.update_layout(xaxis=dict(title="Time", tickformat='%H:%M:%S', tickmode='auto', nticks=20),
                      yaxis=dict(title='Temperature (deg C)'),
                      title=figure_title)
    # Show the plot
    fig.show()