Search code examples
pandasappendgroupinguniqueidentifier

Appending unique timestamps to a row of columns based on another column value


Good afternoon Community,

I currently have a large file with several headers which include notably fix_timestamps and id. The data is formatted like this.

fix_timestamps id
2023-08-01 00:02:52.527 WPA54
2023-08-01 00:02:52.527 WPA54
2023-08-01 00:02:52.527 WPA54
2023-08-01 00:10:10.640 WPA54
2023-08-01 00:10:10.640 WPA54
2023-08-01 00:10:26.937 WPA54
2023-08-01 00:10:26.937 WPA54
2023-08-01 00:10:10.640 IBT675
2023-08-01 00:10:10.640 IBT675
2023-08-01 00:10:10.640 IBT675
2023-08-01 00:10:26.937 IBT675
2023-08-01 00:10:26.937 IBT675
2023-08-01 00:02:52.527 IBT675
2023-08-01 00:02:52.527 IBT675
etc...etc.

I would like to be able to read through my file and for each value for the length id column, look for unique times in the fix_timestamps column and append these times in a new fix_timestamp, delimited by ' ; '. Essentially generating a dataframe output that looks like this:

fix_timestamps id
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:02:52.527;2023-08-01 00:10:10.640;2023-08-01 00:10:26.937 WPA54
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527 IBT675

My current script is able to follow some of the logic but not completely and I have been trying to figure out why

import pandas as pd

# Read the CSV file
file_path = 'input.csv'
df = pd.read_csv(file_path)

# Function to append unique timestamps based on ID length
def append_timestamp(row):
    id_length = len(row['id'])
    timestamps = []

    # for i in range(id_length):
    #     timestamps.add(row['fix_timestamps'])
    # return ';'.join(timestamps)

    for i in range(id_length):
        timestamps.append(row['fix_timestamps'])
    return ';'.join(timestamps)

# Apply the function to the DataFrame rows
df['fix_timestamps'] = df.apply(append_timestamp, axis=1)

print(df)

# Save the DataFrame to a CSV file
output_file_path = 'output'
df.to_csv(output_file_path, index=False)

Using the above input example and running it through the current script, my output essentially generates this:

fix_timestamps id
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;.... WPA54
2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;2023-08-01 00:02:52.527;..... WPA54
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... IBT675
2023-08-01 00:10:10.640;2023-08-01 00:10:26.937;2023-08-01 00:02:52.527.... IBT675

So it seems to just repeat the timestamp at each unique time and append the same value when I would like to do it for all unique times for the id.

Thank you all


Solution

  • You can convert to string (with astype), then use a custom groupby.transform and unique:

    df['fix_timestamps'] = (df['fix_timestamps'].astype(str).groupby(df['id'])
                            .transform(lambda x: ';'.join(x.unique()))
                           )
    

    Output:

                                           fix_timestamps      id
    0   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
    1   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
    2   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
    3   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
    4   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
    5   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
    6   2023-08-01 00:02:52.527;2023-08-01 00:10:10.64...   WPA54
    7   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
    8   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
    9   2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
    10  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
    11  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
    12  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675
    13  2023-08-01 00:10:10.640;2023-08-01 00:10:26.93...  IBT675